Python, Unicode, and the Windows console
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Magic Ocean Looping
--
Chapters
00:00 Question
01:09 Accepted answer (Score 38)
02:07 Answer 2 (Score 91)
04:30 Answer 3 (Score 31)
06:04 Answer 4 (Score 11)
06:45 Thank you
--
Full question
https://stackoverflow.com/questions/5419...
Question links:
[@JFSebastian answer]: https://stackoverflow.com/a/32176732/610...
Accepted answer links:
[PrintFails - Python Wiki]: http://wiki.python.org/moin/PrintFails
Answer 2 links:
[Python 3.6]: https://docs.python.org/3.6/whatsnew/3.6...
[PEP 528: Change Windows console encoding to UTF-8]: https://www.python.org/dev/peps/pep-0528/
[the ]: https://github.com/Drekin/win-unicode-co...
[@Daira Hopwood's answer]: https://stackoverflow.com/a/4637795/4279
[win-unicode-console]: https://github.com/Drekin/win-unicode-co...
[What's the deal with Python 3.4, Unicode, different languages and Windows?]: https://stackoverflow.com/a/30551552/427...
[PYTHONIOENCODING]: https://docs.python.org/3/using/cmdline....
Answer 3 links:
[since December 2021]: https://endoflife.date/python
https://github.com/Drekin/win-unicode-co...
[previously linked here]: https://stackoverflow.com/a/3259271/3931...
[did not work prior to Python 3.8]: https://github.com/python/cpython/issues...
[not a good idea]: https://stackoverflow.com/questions/3578...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #unicode
#avk47
ACCEPTED ANSWER
Score 38
Note: This answer is sort of outdated (from 2008). Please use the solution below with care!!
Here is a page that details the problem and a solution (search the page for the text Wrapping sys.stdout into an instance):
Here's a code excerpt from that page:
$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
line = u"\u0411\n"; print type(line), len(line); \
sys.stdout.write(line); print line'
UTF-8
<type 'unicode'> 2
Б
Б
$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
line = u"\u0411\n"; print type(line), len(line); \
sys.stdout.write(line); print line' | cat
None
<type 'unicode'> 2
Б
Б
There's some more information on that page, well worth a read.
ANSWER 2
Score 31
Update: On Python 3.6 or later, printing Unicode strings to the console on Windows just works.
So, upgrade to recent Python and you're done. At this point I recommend using 2to3 to update your code to Python 3.x if needed, and just dropping support for Python 2.x. Note that there has been no security support for any version of Python before 3.7 (including Python 2.7) since December 2021.
If you really still need to support earlier versions of Python (including Python 2.7), you can use https://github.com/Drekin/win-unicode-console , which is based on, and uses the same APIs as the code in the answer that was previously linked here. (That link does include some information on Windows font configuration but I doubt it still applies to Windows 8 or later.)
Note: despite other plausible-sounding answers that suggest changing the code page to 65001, that did not work prior to Python 3.8. (It does kind-of work since then, but as pointed out above, you don't need to do so for Python 3.6+ anyway.) Also, changing the default encoding using sys.setdefaultencoding is (still) not a good idea.
ANSWER 3
Score 11
If you're not interested in getting a reliable representation of the bad character(s) you might use something like this (working with python >= 2.6, including 3.x):
from __future__ import print_function
import sys
def safeprint(s):
try:
print(s)
except UnicodeEncodeError:
if sys.version_info >= (3,):
print(s.encode('utf8').decode(sys.stdout.encoding))
else:
print(s.encode('utf8'))
safeprint(u"\N{EM DASH}")
The bad character(s) in the string will be converted in a representation which is printable by the Windows console.
ANSWER 4
Score 10
The below code will make Python output to console as UTF-8 even on Windows.
The console will display the characters well on Windows 7 but on Windows XP it will not display them well, but at least it will work and most important you will have a consistent output from your script on all platforms. You'll be able to redirect the output to a file.
Below code was tested with Python 2.6 on Windows.
#!/usr/bin/python
# -*- coding: UTF-8 -*-
import codecs, sys
reload(sys)
sys.setdefaultencoding('utf-8')
print sys.getdefaultencoding()
if sys.platform == 'win32':
try:
import win32console
except:
print "Python Win32 Extensions module is required.\n You can download it from https://sourceforge.net/projects/pywin32/ (x86 and x64 builds are available)\n"
exit(-1)
# win32console implementation of SetConsoleCP does not return a value
# CP_UTF8 = 65001
win32console.SetConsoleCP(65001)
if (win32console.GetConsoleCP() != 65001):
raise Exception ("Cannot set console codepage to 65001 (UTF-8)")
win32console.SetConsoleOutputCP(65001)
if (win32console.GetConsoleOutputCP() != 65001):
raise Exception ("Cannot set console output codepage to 65001 (UTF-8)")
#import sys, codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
print "This is an Е乂αmp١ȅ testing Unicode support using Arabic, Latin, Cyrillic, Greek, Hebrew and CJK code points.\n"