Incorrect encoding cyrillic symbols in Pydantic json method (Python)

1.3k Views Asked by At

Simple example of the problem is given below:

from pydantic import BaseModel

class City(BaseModel):
    name: str

city = City(name="Город")
print(city)  # name='Город'
print(city.json())  # {"name": "\u0413\u043e\u0440\u043e\u0434"}

My system info:

  • Windows 11
  • Python 3.11.3
  • Pydantic 1.10.7
  • File .py encoding is UTF-8

Problem remains with any chcp option (console encoding): 866, 1251, 65001. If I try to write json() output into txt file, the output is same \u0413\u043e\u0440\u043e\u0434. I would really appreciate if you could help me to fix the root problem. I want this code to output pure json with proper cyrillic symbols.

I've tried:

  • Change chcp option
  • Change Windows language settings
  • Change .py file encoding
  • Reinstalled Python
1

There are 1 best solutions below

0
Nick ODell On BEST ANSWER

Python's JSON module tries to keep all JSON output within ASCII, which doesn't contain any cyrillic characters.

You can turn off this setting with ensure_ascii=False:

print(city.json(ensure_ascii=False))

Output:

{"name": "Город"}

Note that some JSON parsers might not be able to read this file.

If you want to output this string using codepage 866 instead of UTF-8, you might need this code, in order to encode the string from Python's str type into a bytes type:

city.json(ensure_ascii=False).encode('cp866')

Note that cp866 stands for Code Page 866.