Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save tranghaviet/632ad392947b57d4b15c8d7778ac75c3 to your computer and use it in GitHub Desktop.
Save tranghaviet/632ad392947b57d4b15c8d7778ac75c3 to your computer and use it in GitHub Desktop.
MySQL Unicode Collation Behavior

MySQL Unicode character set has following collations mainly:

  • xxx_bin: compare all characters by these code point as weight.
  • xxx_general_ci: compare almost characters by these code point as weight.
  • xxx_unicode_ci: compare all characters by these collating weight.

ref. http://dev.mysql.com/doc/refman/5.6/en/charset-unicode-sets.html

When xxx is utf8, can treat only BMP characters.

When xxx is utf8mb4, can treat SMP characters also.

Currently, collating behavior of utf8mb4_general_ci and utf8mb4_unicode_ci for SMP characters are known as Sushi-Beer issue:

In my opinion, I think that it is good to think as follows:

  • When want to treat SMP characters, use utf8mb4_general_ci (default collation of utf8mb4).
    • If want to compare SMP characters, use utf8mb4_bin.
  • When want to treat only BMP characters, use utf8_general_ci (default collation of utf8).
    • If want to case sensitive comparison, use utf8_bin.
  • When want to treat only ASCII characters, use ascii_general_ci (default collation of ascii).
    • If want to case sensitive comparison, use ascii_bin.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment