How to store emojis in a MySQL table? Tried everything

461 Views Asked by At

I have a CSV file containing tweets with emojis (eg. "Cool! ") and I need to import them into a MySQL table in such a way they will be saved/displayed correctly... What do I have to set up and how for a correct import (I mean collation, etc.)?

More details:

  • In the CSV file, the emoji are visible
  • The encoding of the CSV file is UTF-8
  • I am on Windows 11

I already tried:

  • To set the character set to utf8mb4 and collation to utf8mb4_unicode_ci in the table
  • To add " SET NAMES 'utf8mb4';" (also tried with Latin1) before the LOAD query
1

There are 1 best solutions below

0
Bill Karwin On

The table must encode text in character set utf8mb4 to store emojis.

Demo:

mysql> create table no ( t text ) character set=utf8;

mysql> load data local infile 'm.csv' into table no;

mysql> select * from no;
+---------+
| t       |
+---------+
| Cool! ? |
+---------+

So utf8 does not support emojis.

mysql> create table yes ( t text ) character set=utf8mb4;

mysql> load data local infile 'm.csv' into table yes;

mysql> select * from yes;
+------------+
| t          |
+------------+
| Cool!      |
+------------+

But utf8mb4 does support emojis. The difference is that utf8mb4 supports 4-byte encodings, but utf8 doesn't. This is an unfortunate part of MySQL's history, that they didn't implement utf8 originally to support the Supplemental Multilingual Plane of the UTF-8 standard.

Let's see if altering the first table helps.

mysql> alter table no character set utf8mb4;

mysql> load data local infile 'm.csv' into table no;

mysql> select * from no;
+---------+
| t       |
+---------+
| Cool! ? |
| Cool! ? |
+---------+

Why didn't this work? Because alter table ... character set does not convert existing columns. It only changes the table's default character set, which will not be used until the next time we add a column to that table.

We can see that the existing column is still using the old character set:

mysql> show create table no\G
*************************** 1. row ***************************
       Table: no
Create Table: CREATE TABLE `no` (
  `t` text CHARACTER SET utf8mb3  
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

utf8mb3 is the character set that utf8 is an alias for in MySQL 8.0.

To convert existing columns, use:

mysql> alter table no convert to character set utf8mb4;

mysql> show create table no\G
*************************** 1. row ***************************
       Table: no
Create Table: CREATE TABLE `no` (
  `t` mediumtext
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

No try the load again:

mysql> load data local infile 'm.csv' into table no;

mysql> select * from no;
+------------+
| t          |
+------------+
| Cool! ?    |
| Cool! ?    |
| Cool!      |
+------------+

Note that someday, MySQL may change the 'utf8' alias to mean utf8mb4. This is shown in warnings on many of the above usages of 'utf8':

'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous.