- How to support full Unicode in MySQL databases · Mathias Bynens
- MySQL Unicode Encoding: utf8 vs utf8mb4
- How to convert a big MySQL Database from utf8 to utf8mb4?
MySQL’s utf8
charset only implements some parts of the whole UTF-8 encoding and it only uses 1 to 3 bytes for encoding.
So when you try to use some characters occupying 4 bytes, those characters cannot be stored and there are some warnings.
In MySQL, utf8
is an alias for utf8mb3
.
If you want to use more UTF-8 encoding characters, you could use MySQL’s utf8mb4
.
For the Basic Multilingual Plane (BMP) characters, utf8mb4
and utf8mb3
have identical storage characteristics: same code values, same encoding, same length.
For a supplementary character, utf8mb4
using 4 bytes to store it could store more when utf8mb3
cannot at all.
Run this query:
SELECT
s.SCHEMA_NAME,
s.DEFAULT_CHARACTER_SET_NAME,
s.DEFAULT_COLLATION_NAME,
t.TABLE_NAME,
t.TABLE_COLLATION,
ccsa.CHARACTER_SET_NAME,
c.COLUMN_NAME,
c.CHARACTER_SET_NAME,
c.COLLATION_NAME
FROM
information_schema.SCHEMATA s
LEFT JOIN information_schema.TABLES t ON
s.SCHEMA_NAME = t.TABLE_SCHEMA
LEFT JOIN information_schema.COLLATION_CHARACTER_SET_APPLICABILITY ccsa ON
t.TABLE_COLLATION = ccsa.COLLATION_NAME
LEFT JOIN information_schema.COLUMNS c ON
s.SCHEMA_NAME = c.TABLE_SCHEMA
AND t.TABLE_NAME = c.TABLE_NAME
WHERE
s.schema_name = 'database_name';
Run these queries after creating a backup of the database and at a time when the traffic is minimal. Depending on the table size, the convert query takes a while to run.
-- database
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
-- for each table (also converts columns within table)
-- USE
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
-- for each table (does NOT convert existing columns within table)
-- do NOT use
ALTER TABLE table_name DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Add the following in the appropriate sections in my.cnf:
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
This will ensure all future databases and tables created will the utfmb4
character set and utfmb4_unicode_ci
collation.