UnicodeEncodeError 'charmap' codec can't encode characters in position 1-12
I get this error on trying to paste the string in Myanmar language into Jinja2 template and save the template. I installed all needed fonts in the OS, tried to use codec lib. The psocess: python script parses CSV file with data, then creates a dictionary and this dictionary is then used to fill variables used in Jinja2 template with values. Error raises on the moment of writing to the file. Using Python 3.4. There is a package called python-myanmar but it's for 2.7 and I do not want to downgrade my own code.
Read already all this: http://www.unicode.org/notes/tn11/, http://chimera.labs.oreilly.com/books/1230000000393/ch02.html#_discussion_31, https://code.google.com/p/python-myanmar/ package and installed system fonts. I can encode the string into .encode('utf-8'), but cant then .decode() w/o the error! The question is: how can I not downgrading the code, maybe installing something additional, but best is using only python 3.4 embedded functions write the data into the file?
C:\Users\...\autocrm.py in create_templates(csvfile_location, csv_delimiter, template_location, count
ies_to_update, push_onthefly, csv_gspreadsheet, **kwargs)
270 ### use different parsers for ventures due to possible difference in website design
271 ### checks if there is a link in CSV/TSV
--> 272 if variables['promo_link'] != '':
273 article_values = soup_the_newsletter_article(variables['promo_link'])
274 if variables['item1_link'] != '':
C:\Users\...\autocrm.py in push_to_ums(countries_to_update, html_template, **kwargs)
471 ### save to import.xml
472 with open(xml_path_upload, 'w') as writefile:
--> 473 writefile.write(template.render(**values))
474 print('saved the import.xml')
475
C:\Python34\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):
UnicodeEncodeError: 'charmap' codec can't encode characters in position 6761-6772: character maps to <undefined>
BTW, why is it pointing to cp1251.py if my sys.getdefaultencoding() output is UTF8??
with open(template_location, 'r') as raw_html:
template = Template(raw_html.read())
print('writing to template: ' + variables['country_id'])
# import ipdb;ipdb.set_trace()
with open('rendered_templates_L\\NL_' +
variables['country_id'] + ".html", 'w', encoding='utf-8') as writefile:
rendered_template = template.render(**alldata)
writefile.write(rendered_template)
You opened the output file without specifying an encoding, so the default system encoding is used; here CP1251.
The Jinja template result produces a Unicode string, which needs to be encoded, but the default system encoding doesn't support the codepoints produced.
The solution is to pick an explict codec. If you are producing XML, UTF-8 is the default encoding and can handle all of Unicode: