MySQL Unicode character set has following collations mainly:
- xxx_bin: compare all characters by these code point as weight.
- xxx_general_ci: compare almost characters by these code point as weight.
- xxx_unicode_ci: compare all characters by these collating weight.
ref. http://dev.mysql.com/doc/refman/5.6/en/charset-unicode-sets.html
When xxx is utf8, can treat only BMP characters.
When xxx is utf8mb4, can treat SMP characters also.
Currently, collating behavior of utf8mb4_general_ci and utf8mb4_unicode_ci for SMP characters are known as Sushi-Beer issue:
- http://bugs.mysql.com/bug.php?id=76553
- http://blog.kamipo.net/entry/2015/03/23/093052 (Japanese entry)
In my opinion, I think that it is good to think as follows:
- When want to treat SMP characters, use utf8mb4_general_ci (default collation of utf8mb4).
- If want to compare SMP characters, use utf8mb4_bin.
- When want to treat only BMP characters, use utf8_general_ci (default collation of utf8).
- If want to case sensitive comparison, use utf8_bin.
- When want to treat only ASCII characters, use ascii_general_ci (default collation of ascii).
- If want to case sensitive comparison, use ascii_bin.