I have a CSV file containing tweets with emojis (eg. "Cool! ") and I need to import them into a MySQL table in such a way they will be saved/displayed correctly... What do I have to set up and how for a correct import (I mean collation, etc.)?
More details:
- In the CSV file, the emoji are visible
- The encoding of the CSV file is UTF-8
- I am on Windows 11
I already tried:
- To set the character set to utf8mb4 and collation to utf8mb4_unicode_ci in the table
- To add " SET NAMES 'utf8mb4';" (also tried with Latin1) before the LOAD query
The table must encode text in character set utf8mb4 to store emojis.
Demo:
So utf8 does not support emojis.
But utf8mb4 does support emojis. The difference is that utf8mb4 supports 4-byte encodings, but utf8 doesn't. This is an unfortunate part of MySQL's history, that they didn't implement utf8 originally to support the Supplemental Multilingual Plane of the UTF-8 standard.
Let's see if altering the first table helps.
Why didn't this work? Because
alter table ... character setdoes not convert existing columns. It only changes the table's default character set, which will not be used until the next time we add a column to that table.We can see that the existing column is still using the old character set:
utf8mb3 is the character set that
utf8is an alias for in MySQL 8.0.To convert existing columns, use:
No try the load again:
Note that someday, MySQL may change the 'utf8' alias to mean utf8mb4. This is shown in warnings on many of the above usages of 'utf8':