In one of the python module, there is this name string that contains non-ascii characters. While logging this object, python gives UnicodeDecodeError. For example:
# coding: UTF-8
import logging
root_logger= logging.getLogger()
root_logger.setLevel(logging.DEBUG) # or whatever
handler = logging.FileHandler('example.log', 'w', 'utf-8') # or whatever
formatter = logging.Formatter('%(name)s %(message)s') # or whatever
handler.setFormatter(formatter) # Pass handler as a parameter, not assign
root_logger.addHandler(handler)
class C(object):
def __init__(self, name):
self._name = name
def __str__(self):
print("__str__ start")
return self.to_unicode().encode("utf-8")
def __repr__(self):
print("__repr__ start")
return self.to_unicode().encode("utf-8")
def to_unicode(self):
print("to_unicode start")
return u"name:{}".format(self._name)
obj = C(name="vm_nearsync_한국")
logging.debug(u"obj:{}".format(obj))
It retunes below error:
__str__ start
to_unicode start
Traceback (most recent call last):
File "test.py", line 31, in <module>
logging.debug(u"obj:{}".format(obj))
File "test.py", line 19, in __str__
return self.to_unicode().encode("utf-8")
File "test.py", line 27, in to_unicode
return u"name:{}".format(self._name)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 12: ordinal not in range(128)
It is actually trying to decode the string name into unicode with default ascii encoding but rather I expect it to use utf-8 encoding.
It only works when I change default system encodings as below:
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
My python version is Python 2.7.5
Is there any way to workaround this without changing systems default encoding? The object can have many such data and there are many places where software is logging data.
What @deceze said in comment is correct. Unicode and byte string should not be mixed. However if really needed, user should explicitly specify encoding and should not rely on systems default encoding.
Below approaches work
Approach1 (Use everything in unicode):
"""
Approach2 (Explicitly decode byte string to unicode with utf-8):
or