ruby taskrabbit
2014-04-24 - Originally posted at https://tech.taskrabbit.com/blog/2014/04/24/active-record-mysql-and-emoji/
↞ See all posts
More and more, people are adopting emoji in their online communications. At TaskRabbit, we noticed that our users are starting to use emoji all over the place, from task descriptions to reviews.
There are some problems when supporting the emoji character set wit our stack, which includes Rails 4.0 and MySQL. The main problem is that MySQL’s utf8 encoding does not actually support multi-byte strings, which emoji relies on. In MySQL 5.5, the utf8mb4 encoding was introduced which allows for Multi-Byte (mb) strings… and therefore emoji would work! The MySQL gem introduced support for utf8mb4 about a year ago, but only recently did active_record (and rails) add support for this in rails 4.1.
Initially, we decided to ignore all emoji characters, literally stripping them out of strings with our demogi gem (Thanks Pablo!). However, with our new product launch in the UK, we thought it was time to actually address the problem. Here is what we learned:
The good news is that the upgrade path from utf8 to utf8mb4 is easy. As we are adding bytes, the migration is really just a definition change at the table-level. Nothing has to change with your existing data. This is a non-blocking and non-downtime migration. If you are using normal rails migrations, all of your column types for VARCHAR columns will be based on the table’s encoding. Changing the table will change the column type. The bad news is that any text-type (or blob-type) columns will need to be explicitly changed.
Check out the migration steps:
1class Utf8mb4 < ActiveRecord::Migration 2 3 UTF8_PAIRS = { 4 'users' => 'notes', 5 'comments' => 'message' 6 # ... 7 } 8 9 def self.up 10 execute "ALTER DATABASE `#{ActiveRecord::Base.connection.current_database}` CHARACTER SET utf8mb4;" 11 12 ActiveRecord::Base.connection.tables.each do |table| 13 execute "ALTER TABLE `#{table}` CHARACTER SET = utf8mb4;" 14 end 15 16 UTF8_PAIRS.each do |table, col| 17 execute "ALTER TABLE `#{table}` CHANGE `#{col}` `#{col}` TEXT CHARACTER SET utf8mb4 NULL;" 18 end 19 20 end 21 22 def self.down 23 execute "ALTER DATABASE `#{ActiveRecord::Base.connection.current_database}` CHARACTER SET utf8;" 24 25 ActiveRecord::Base.connection.tables.each do |table| 26 execute "ALTER TABLE `#{table}` CHARACTER SET = utf8;" 27 end 28 29 UTF8_PAIRS.each do |table, col| 30 execute "ALTER TABLE `#{table}` CHANGE `#{col}` `#{col}` TEXT CHARACTER SET utf8 NULL;" 31 end 32 end 33end
The only change here is to change the encoding:
1development: 2 adapter: mysql2 3 encoding: utf8mb4 # <--- HERE 4 database: my_db_name 5 username: root 6 password: my_password 7 host: 127.0.0.1 8 port: 3306
The last step here is to worry about index lengths, as mentioned above. If you are on rails 4.1, you have nothing to worry about! The rest of us have a few options:
We chose #2 due to the simplicity of the solution. Check the links above for a detailed discussion of the problem.
1module ActiveRecord 2 module ConnectionAdapters 3 class AbstractMysqlAdapter 4 NATIVE_DATABASE_TYPES[:string] = { :name => "varchar", :limit => 191 } 5 end 6 end 7end
And now you can emoji to your ❤’s content!
I write about Technology, Software, and Startups. I use my Product Management, Software Engineering, and Leadership skills to build teams that create world-class digital products.
Get in touch