The Python Oracle

Python, Unicode, and the Windows console

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Magic Ocean Looping

--

Chapters
00:00 Question
01:09 Accepted answer (Score 38)
02:07 Answer 2 (Score 91)
04:30 Answer 3 (Score 31)
06:04 Answer 4 (Score 11)
06:45 Thank you

--

Full question
https://stackoverflow.com/questions/5419...

Question links:
[@JFSebastian answer]: https://stackoverflow.com/a/32176732/610...

Accepted answer links:
[PrintFails - Python Wiki]: http://wiki.python.org/moin/PrintFails

Answer 2 links:
[Python 3.6]: https://docs.python.org/3.6/whatsnew/3.6...
[PEP 528: Change Windows console encoding to UTF-8]: https://www.python.org/dev/peps/pep-0528/
[the ]: https://github.com/Drekin/win-unicode-co...
[@Daira Hopwood's answer]: https://stackoverflow.com/a/4637795/4279
[win-unicode-console]: https://github.com/Drekin/win-unicode-co...
[What's the deal with Python 3.4, Unicode, different languages and Windows?]: https://stackoverflow.com/a/30551552/427...
[PYTHONIOENCODING]: https://docs.python.org/3/using/cmdline....

Answer 3 links:
[since December 2021]: https://endoflife.date/python
https://github.com/Drekin/win-unicode-co...
[previously linked here]: https://stackoverflow.com/a/3259271/3931...
[did not work prior to Python 3.8]: https://github.com/python/cpython/issues...
[not a good idea]: https://stackoverflow.com/questions/3578...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #unicode

#avk47



ACCEPTED ANSWER

Score 38


Note: This answer is sort of outdated (from 2008). Please use the solution below with care!!


Here is a page that details the problem and a solution (search the page for the text Wrapping sys.stdout into an instance):

PrintFails - Python Wiki

Here's a code excerpt from that page:

$ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
    sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
    line = u"\u0411\n"; print type(line), len(line); \
    sys.stdout.write(line); print line'
  UTF-8
  <type 'unicode'> 2
  Б
  Б

  $ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
    sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
    line = u"\u0411\n"; print type(line), len(line); \
    sys.stdout.write(line); print line' | cat
  None
  <type 'unicode'> 2
  Б
  Б

There's some more information on that page, well worth a read.




ANSWER 2

Score 31


Update: On Python 3.6 or later, printing Unicode strings to the console on Windows just works.

So, upgrade to recent Python and you're done. At this point I recommend using 2to3 to update your code to Python 3.x if needed, and just dropping support for Python 2.x. Note that there has been no security support for any version of Python before 3.7 (including Python 2.7) since December 2021.

If you really still need to support earlier versions of Python (including Python 2.7), you can use https://github.com/Drekin/win-unicode-console , which is based on, and uses the same APIs as the code in the answer that was previously linked here. (That link does include some information on Windows font configuration but I doubt it still applies to Windows 8 or later.)

Note: despite other plausible-sounding answers that suggest changing the code page to 65001, that did not work prior to Python 3.8. (It does kind-of work since then, but as pointed out above, you don't need to do so for Python 3.6+ anyway.) Also, changing the default encoding using sys.setdefaultencoding is (still) not a good idea.




ANSWER 3

Score 11


If you're not interested in getting a reliable representation of the bad character(s) you might use something like this (working with python >= 2.6, including 3.x):

from __future__ import print_function
import sys

def safeprint(s):
    try:
        print(s)
    except UnicodeEncodeError:
        if sys.version_info >= (3,):
            print(s.encode('utf8').decode(sys.stdout.encoding))
        else:
            print(s.encode('utf8'))

safeprint(u"\N{EM DASH}")

The bad character(s) in the string will be converted in a representation which is printable by the Windows console.




ANSWER 4

Score 10


The below code will make Python output to console as UTF-8 even on Windows.

The console will display the characters well on Windows 7 but on Windows XP it will not display them well, but at least it will work and most important you will have a consistent output from your script on all platforms. You'll be able to redirect the output to a file.

Below code was tested with Python 2.6 on Windows.


#!/usr/bin/python
# -*- coding: UTF-8 -*-

import codecs, sys

reload(sys)
sys.setdefaultencoding('utf-8')

print sys.getdefaultencoding()

if sys.platform == 'win32':
    try:
        import win32console 
    except:
        print "Python Win32 Extensions module is required.\n You can download it from https://sourceforge.net/projects/pywin32/ (x86 and x64 builds are available)\n"
        exit(-1)
    # win32console implementation  of SetConsoleCP does not return a value
    # CP_UTF8 = 65001
    win32console.SetConsoleCP(65001)
    if (win32console.GetConsoleCP() != 65001):
        raise Exception ("Cannot set console codepage to 65001 (UTF-8)")
    win32console.SetConsoleOutputCP(65001)
    if (win32console.GetConsoleOutputCP() != 65001):
        raise Exception ("Cannot set console output codepage to 65001 (UTF-8)")

#import sys, codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)

print "This is an Е乂αmp١ȅ testing Unicode support using Arabic, Latin, Cyrillic, Greek, Hebrew and CJK code points.\n"