I come back with utf8mb4 character issues. I'm using mariadb Server version: 10.3.37-MariaDB-0ubuntu0.20.04.1 Ubuntu 20.04 on a local machine with Linux Mint 20.3. Here is a test table:
CREATE TABLE chartable (
k1 VARCHAR(5)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
INSERT INTO chartable (k1) values
('食C' ),
('食B'),
('亰BC');
Now the issues:
MariaDB [test]> select * from chartable where k1 like '%%';
+----------+
| k1 |
+----------+
| 食C |
| 食B |
+----------+
2 rows in set (0,008 sec)
MariaDB [test]> select hex('');
+----------+
| hex('?') |
+----------+
| F0A38580 |
+----------+
1 row in set (0,001 sec)
I notice the first query doesn't return what I expected, which is 食C Second query displays a ? instead of I ensured that character set and collations are consistent, database and table have the same character set and collation, utf8mb4_general_ci as I was told in this thread https://stackoverflow.com/questions/74975518/false-duplicate-with-mariadb-and-chinese-characters/74975829#74975829
MariaDB [nihongo]> show variables like 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
and:
MariaDB [nihongo]> show variables like 'collation%';
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_general_ci |
| collation_server | utf8mb4_general_ci |
+----------------------+--------------------+
I've tested with others tables and got the same type of issues. Could you explain me what I'm doing wrong and what to do to use correctly LIKE with mariadb and chinese characters? Thank you in advance
utf8mb4_general_ci is a simplified faster sort/comparison implementation . It does not follow the Unicode rules and might result in undesirable sorting or comparison.
Instead of you should use utf8mb4_unicode_ci, which is based on the official unicode rules for ,sorting and comparison.
hex('%%')is displayed ashex('?')since the use of unicode characters in identifier names is limited to characters < +U10000 (see also https://mariadb.com/kb/en/identifier-names/).