Can't figure out output character encoding for MeCab

485 Views Asked by e-e At 07 July 2020 at 06:38

I'm trying to parse some Japanese text, and I can't seem to figure out the output encoding.

This is the output I'm getting:

これは ̾��,����,*,*,*,*,*
本   ̾��,����,*,*,*,*,*
です  ̾��,����,*,*,*,*,*
。   ̾��,������³,*,*,*,*,*
EOS

Steps I took:

git clone https://github.com/taku910/mecab
cd mecab/mecab
./configure --enable-utf8-only --with-charset=utf8
make
sudo make install
mecab -o ~/Desktop/output.txt ~/Desktop/input.txt, where input.txt contains "これは本です。"

Using OSX 10.15.3

There are 0 best solutions below