I have a MySQL DB that needs to be fast at scale.
Option 1 Tables can store the language ISO 639-3 code as a column: varchar(3) language
Option 2 Tables can store the ID for the language as a column: int(2?) language_id, and there can be a languages table with the ISO 639-3 code.
Question What makes sense for speed at scale? Option 1 is easier to read in the DB. I'd prefer it if speed is the same or completely negligible even at scale.
Thanks!
I recommend:
That will be 3 bytes, which is smaller than
INT(4 bytes)and not much bigger thanSMALLINT UNSIGNED` (2 bytes).(Am I correct in saying that the codes are always 3 ascii letters? Hence no need for
VAR, which takes an extra byte or two.)CHAR(3)is readily indexable. There is no significant advantage in 'normalizing' even to smallint. This still applies even at the scale of a billion rows.And, as you point out, "easier to read" is worth something.
If you are also storing text, I assume that all such text can be mapped to UTF-8? If so, use
In MySQL, there is no problem having different columns in a single table using different charsets (or collations).
Perhaps worth noting... Many languages can be discovered from the hex utf-8 encoding:
-- http://mysql.rjweb.org/doc.php/charcoll#diagnosing_charset_issues