Python 2.7 - how do you write MeCab parsed information to a text file?

1.5k Views Asked by At

I've written a GUI that allows Japanese input and when you go to file > parse writes in to a text file. That text file then gets run through MeCab where spaces are put in between the words. After that it is supposed to be written to the text file once again, so it can be displayed in another GUI window.

The issue I'm having is it doesn't want to write the parsed data to the text file. It has no problem writing it the first time. Also, it prints the parsed info to IDLE no problem as well. Here is the parser and the error:

#!/usr/bin/python
# -*- coding: <utf-8> -*-
import sys

import MeCab
import codecs

read_from = open("pholder.txt").read()
mecab = MeCab.Tagger("-Owakati")
output = mecab.parse(read_from)
print output


text = output
write_to = codecs.open("pholder.txt", "w", "utf-8")
write_to.write(text)
write_to.close()

Traceback (most recent call last):
  File "C:\...\mecabSpaces.py", line 16, in <module>
    write_to.write(text)
  File "C:\...\codecs.py", line 691, in write
    return self.writer.write(data)
  File "C:\...\codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
2

There are 2 best solutions below

2
agf On BEST ANSWER

The parsed data isn't unicode, it's a byte string.

So when you try to write the data to the file, it tries to decode it to unicode before encoding it to utf-8. Since your default codec is ascii, but you actually have utf-8, it chokes on the first character with byte value of 128 or above.

You should .decode('utf-8') the returned data, or else use a mecab method that returns unicode data.

0
jeffberhow On

Here's working code. Thanks to agf for helping me pull my head out of my butt.

#!/usr/bin/python
# -*- coding: <utf-8> -*-
import MeCab

read_from = open("pholder.txt", "r").read()
mecab = MeCab.Tagger("-Owakati")
output = mecab.parse(read_from)
print output

text = output
write_to = open("pholder.txt", "w").write(text)