Given the following python script test.py
# -*- coding: utf-8 -*-
print("Er, Süleyman")
Here is the result of running this script in powershell
python test.py # success
Er, Süleyman
python test.py > tmp.txt
print("Er, Süleyman")
UnicodeEncodeError: 'gbk' codec can't encode character '\u0308' in position 6: illegal multibyte sequence
python test.py | Out-File -Encoding utf8 tmp.txt
print("Er, Süleyman")
UnicodeEncodeError: 'gbk' codec can't encode character '\u0308' in position 6: illegal multibyte sequence
I have no idea about how to wrong around this.
The default language of my laptop is Chinese, by running the following code, I get the output cp936.
import locale
print(locale.getpreferredencoding())
I also try write buffer directly and the error is gone, but the content is incorrect.
# -*- coding: utf-8 -*-
import sys
sys.stdout.buffer.write("Er, Süleyman".encode('utf-8'))
python test.py > test.txt
cat test.txt
Er, Su虉leyman
Update
I found a solution to change the default encoding of powershell in this answer: Using UTF-8 Encoding (CHCP 65001) in Command Prompt / Windows Powershell (Windows 10)
In PSv5.1 or higher, where > and >> are effectively aliases of Out-File, you can set the default encoding for > / >> / Out-File via the $PSDefaultParameterValues preference variable:
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
Though it is not a problem of Python, but without knowledge of the difference among the default encoding of Python Stdout, Windows and Powershell, it will be hard to find the right solution. For example, Setting the correct encoding when piping stdout in Python is trying to fix from the Python's side but in fact it will introduce a new problem: force stdout to use gtk encoding will end up with some unsupported chars are displayed incorrectly. So I think this issue still have value for those who are not familiar with such complicated situation.