how to interpret mecab unidic csv columns

274 Views Asked by At

Here are some sample entries from mecab unidic:

ネコ - 名詞,普通名詞,一般,,,,ネコ,猫,ネコ,ネコ,ネコ,ネコ,和,,,,,,,体,ネコ,ネコ,ネコ,ネコ,1,C4,,7918141644612096,28806

が - 助詞,格助詞,,,,,ガ,が,が,ガ,が,ガ,和,,,,,,,格助,ガ,ガ,ガ,ガ,,動詞%F2@0,名詞%F1,,2168520431510016,7889

蚊 - 名詞,普通名詞,一般,,,,カ,蚊,蚊,カ,蚊,カ,和,,,,,,,体,カ,カ,カ,カ,0,C4,,1536851034907136,5591

を - 助詞,格助詞,,,,,ヲ,を,を,オ,を,オ,和,,,,,,,格助,ヲ,ヲ,ヲ,ヲ,,動詞%F2@0,名詞%F1,形容詞%F2@-1,,11381878116459008,41407

As you can see, there are 30 csv columns in those unidic entries. What do they all represent?

1

There are 1 best solutions below

0
polm23 On BEST ANSWER

You can see a list of the Japanese names of all columns at the UniDic FAQ. Most of the columns are pretty obvious once you see the name.

There are more details in the UniDic Manual that explain all the fields, though for some of them - mainly the *ConType and *ModType fields - they're pretty complicated. These fields are mostly related to the pronunciation of compound words.