Monday, September 05, 2016

Emoji support with ColdFusion and mySQL

So for a while, I have been trying to get Emoji support on CrossTrackr and no matter what I did I would still get an error until I finally threw my hands in the air and gave up. So what I ended up doing is using some code from this post on Drupal's blog, https://www.drupal.org/node/2043439, which allowed me to clean up the string before submitting to the database. When I found this post I was extremely happy but when I tried to execute the regex examples using reReplace() I would get the following ColdFusion error:

Malformed regular expression "/[\x{1F600}-\x{1F64F}]/".

This led to more frustration and then moving to my next solution which was to make a call to a PHP page using CFHTTP.

An example of how I did that can be seen in the following 2 gists.

ColdFusion Function making http call to local PHP File The local PHP File

Today I decided to attack this issue again as I got tired of getting errors, especially when I introduced new fields which I would forget to sanitize and users would try to use an emoticon 😫 . I found this great post (How to support full Unicode in MySQL databases) from Mathias Bynens @mathias which got me the answer I was looking for. See, I had already understood that I needed to set my fields to utf8mb4, but even though I did I would get the error, reason I went with just cleaning the string. In this post there is a section on modifying connection, client, and server character sets which was the piece I was missing. I just needed to set a couple of settings on mySQL server cnf file in order for data from ColdFusion to be able to be saved. Which are as follows:

The settings that actually did the trick where character-set-server which sets the default character set to use, character-set-client-handshake which ignores character set information sent from client and uses the server character set instead and collation-server which sets the default collation. The settings under the [mysql] and [client] parts did not really affect it working but I left them in regardless. I was previously using utf8 as the default and since utf8mb4 is fully backwards compatible with utf8, there was no data loss when I updated the columns. He does go into modifying the database, table and columns but when I tested, all I did was work with the columns that I needed to support and modifying the database or table itself was not necessary, thus not creating any issues with my index keys as he explains. An example of how I update the character set of a specific column is as follows:

ALTER TABLE `table_name` CHANGE `column_name` `column_name` TEXT CHARACTER SET utf8mb4;

Again, in his example he does a little more and sets the collation to utf8mb4_unicode_ci but after testing, the default collation for utf8mb4 also worked. Once I added the settings to mySQL and restarted and removed my cleanEmoji() wrappers in my ORM files, my saves no longer threw an error. One thing I did have to remove is surrounding my output with encodeForHTML() as that would replace the emojis with question marks.

Now in doing this today and finally getting it to work I did figure out something else, and that is that the call to PHP is not really necessary. We can use those regex patterns using Java's internal replaceAll() string function. So now I can wrap saves to fields where I do not support emoji's without incurring an http call from the server to itself and PHP handling the work for me. Below is the updated cleanEmoji(), which I now call cleanEmojiWithColdFusion().

I hope this helps you as it helped me get Emoji's (High Ascii Values) working on my App. I will try to see what is required if working with MS SQL and either update this post or create a new one.

No comments:

Post a Comment