180 links
  • BigBrainBag
  • Home
  • Login
  • RSS Feed
  • Tag cloud
  • Picture wall
  • Daily
Links per page: 20 50 100
page 1 / 1
1 results tagged character x
  • How to support full Unicode in MySQL databases · Mathias Bynens
    MySQL’s utf8mb4
    Luckily, MySQL 5.5.3 (released in early 2010) introduced a new encoding called utf8mb4 which maps to proper UTF-8 and thus fully supports Unicode, including astral symbols.

    Switching from MySQL’s utf8 to utf8mb4
    Step 1: Create a backup

    Create a backup of all the databases on the server you want to upgrade. Safety first!

    Step 2: Upgrade the MySQL server

    Upgrade the MySQL server to v5.5.3+, or ask your server administrator to do it for you.

    Step 3: Modify databases, tables, and columns

    Change the character set and collation properties of the databases, tables, and columns to use utf8mb4 instead of utf8.

    # For each database:
    ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
    # For each table:
    ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    # For each column:
    ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    # (Don’t blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a `VARCHAR` column.)
    Since utf8mb4 is fully backwards compatible with utf8, no mojibake or other forms of data loss should occur. (But you have a backup, right?)

    Step 4: Check the maximum length of columns and index keys

    This is probably the most tedious part of the whole upgrading process.

    When converting from utf8 to utf8mb4, the maximum length of a column or index key is unchanged in terms of bytes. Therefore, it is smaller in terms of characters, because the maximum length of a character is now four bytes instead of three.

    For example, a TINYTEXT column can hold up to 255 bytes, which correlates to 85 three-byte or 63 four-byte characters. Let’s say you have a TINYTEXT column that uses utf8 but must be able to contain more than 63 characters. Given this requirement, you can’t convert this column to utf8mb4 unless you also change the data type to a longer type such as TEXT — because if you’d try to fill it with four-byte characters, you’d only be able to enter 63 characters, but not more.

    The same goes for index keys. The InnoDB storage engine has a maximum index length of 767 bytes, so for utf8 or utf8mb4 columns, you can index a maximum of 255 or 191 characters, respectively. If you currently have utf8 columns with indexes longer than 191 characters, you will need to index a smaller number of characters when using utf8mb4. (Because of this, I had to change some indexed VARCHAR(255) columns to VARCHAR(191).)

    Section 10.1.11 of the MySQL 5.5 Reference Manual has some more information on this.

    Step 5: Modify connection, client, and server character sets

    In your application code, set the connection character set to utf8mb4. This can be done by simply replacing any variants of SET NAMES utf8 with SET NAMES utf8mb4. If your old SET NAMES statement specified the collation, make sure to change that as well, e.g. SET NAMES utf8 COLLATE utf8_unicode_ci becomes SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci.

    Make sure to set the client and server character set as well. I have the following in my MySQL configuration file (/etc/my.cnf):

    [client]
    default-character-set = utf8mb4

    [mysql]
    default-character-set = utf8mb4

    [mysqld]
    character-set-client-handshake = FALSE
    character-set-server = utf8mb4
    collation-server = utf8mb4_unicode_ci
    You can easily confirm these settings work correctly:

    mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
    +--------------------------+--------------------+
    | Variable_name            | Value              |
    +--------------------------+--------------------+
    | character_set_client     | utf8mb4            |
    | character_set_connection | utf8mb4            |
    | character_set_database   | utf8mb4            |
    | character_set_filesystem | binary             |
    | character_set_results    | utf8mb4            |
    | character_set_server     | utf8mb4            |
    | character_set_system     | utf8               |
    | collation_connection     | utf8mb4_unicode_ci |
    | collation_database       | utf8mb4_unicode_ci |
    | collation_server         | utf8mb4_unicode_ci |
    +--------------------------+--------------------+
    10 rows in set (0.00 sec)
    As you can see, all the relevant options are set to utf8mb4, except for character_set_filesystem which should be binary unless you’re on a file system that supports multi-byte UTF-8-encoded characters in file names, and character_set_system which is always utf8 and can’t be overridden.

    Note: The default character set and collation can be configured at some other levels as well.

    Step 6: Repair and optimize all tables

    After upgrading the MySQL server and making the necessary changes explained above, make sure to repair and optimize all databases and tables. I didn’t do this right away after upgrading (I didn’t think it was necessary, as everything seemed to work fine at first glance), and ran into some weird bugs where UPDATE statements didn’t have any effect, even though no errors were thrown.

    You could run the following MySQL queries for each table you want to repair and optimize:

    # For each table
    REPAIR TABLE table_name;
    OPTIMIZE TABLE table_name;
    Luckily, this can easily be done in one go using the command-line mysqlcheck utility:

    $ mysqlcheck -u root -p --auto-repair --optimize --all-databases
    This will prompt for the root user’s password, after which all tables in all databases will be repaired and optimized.
    Tue Mar 28 09:33:48 2017 - permalink -
    QRCode
    - archive.org - https://mathiasbynens.be/notes/mysql-utf8mb4
    character collation database db emoji mysql php unicode utf8 utf8mb4
Links per page: 20 50 100
page 1 / 1
Shaarli - The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community - Help/documentation