“the j stands for Japanese”
about me | research blog | wordpress plugins | jQuery plugins

18 September, 2012

A MySQL Unicode collation for Maltese

If you’ve tried to store and retrieve Maltese text in a MySQL database before, you may have noticed that there is no way to sort it correctly according to the Maltese alphabet.

The  utf8_unicode_ci collation treats g and ġ etc. as interchangeable but that is of course not right. You can try utf8_bin, but since this sorts according to Unicode codepoints, ċ, ġ, ħ, ż get sorted after the letter z — which is even worse (although it does at least mean you can search for ħ without getting h back too).

What you really need is a custom collation for the Maltese alphabet. There isn’t one built-in, but luckily MySQL makes adding custom collations relatively painless. So I went ahead and implemented such a collation for Maltese, and called it utf8_maltese_ci. You can find the code, along with detailed installation and usage instructions at the GitHub repository for the Maltese MySQL collation.