Optimization: Storing ISO 639-3 language in a MySQL column (int language_id or varchar language)

340 Views Asked by Edward At 31 December 2020 at 13:05

I have a MySQL DB that needs to be fast at scale.

Option 1 Tables can store the language ISO 639-3 code as a column: varchar(3) language

Option 2 Tables can store the ID for the language as a column: int(2?) language_id, and there can be a languages table with the ISO 639-3 code.

Question What makes sense for speed at scale? Option 1 is easier to read in the DB. I'd prefer it if speed is the same or completely negligible even at scale.

Thanks!

Original Q&A

There are 1 best solutions below

Rick James On 31 December 2020 at 20:12

I recommend:

CREATE TABLE ...
    ISO_630_3 CHAR(3) CHARACTER SET ascii

That will be 3 bytes, which is smaller than INT (4 bytes)and not much bigger thanSMALLINT UNSIGNED` (2 bytes).

(Am I correct in saying that the codes are always 3 ascii letters? Hence no need for VAR, which takes an extra byte or two.)

CHAR(3) is readily indexable. There is no significant advantage in 'normalizing' even to smallint. This still applies even at the scale of a billion rows.

And, as you point out, "easier to read" is worth something.

If you are also storing text, I assume that all such text can be mapped to UTF-8? If so, use

     my_text TEXT CHARACTER SET utf8mb4

In MySQL, there is no problem having different columns in a single table using different charsets (or collations).

Perhaps worth noting... Many languages can be discovered from the hex utf-8 encoding:

⚈  Cxyy -- More Western Europe: Latin (C3-CA), Combining Diacritical Marks (CC-CD), Greek (CE-CF)
⚈  Dxyy -- Cyrillic (D0-D4), Hebrew (D6-D7), Arabic/Persian/Farsi (D8-DB), etc
⚈  E0yyyy -- various Indian character sets, southern Asia, etc.
⚈  E1yyyy -- Cherokee, Balinese, Khmer, Mongolian, Vietnamese, etc.
(etc)

-- http://mysql.rjweb.org/doc.php/charcoll#diagnosing_charset_issues

Optimization: Storing ISO 639-3 language in a MySQL column (int language_id or varchar language)

There are 1 best solutions below

Related Questions in MYSQL

Related Questions in QUERY-OPTIMIZATION

Related Questions in ISO-639

Trending Questions

Popular # Hahtags

Popular Questions