The Python Oracle

UnicodeEncodeError: 'charmap' codec can't encode characters

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game Looping

--

Chapters
00:00 Question
00:41 Accepted answer (Score 256)
00:59 Answer 2 (Score 669)
01:42 Answer 3 (Score 80)
02:29 Answer 4 (Score 53)
03:08 Thank you

--

Full question
https://stackoverflow.com/questions/2709...

Answer 3 links:
[PYTHONLEGACYWINDOWSSTDIO]: https://docs.python.org/3/using/cmdline....

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #beautifulsoup #urllib

#avk47



ANSWER 1

Score 807


I was getting the same UnicodeEncodeError when saving scraped web content to a file. To fix it I replaced this code:

with open(fname, "w") as f:
    f.write(html)

with this:

with open(fname, "w", encoding="utf-8") as f:
    f.write(html)

If you need to support Python 2, then use this:

import io
with io.open(fname, "w", encoding="utf-8") as f:
    f.write(html)

If you want to use a different encoding than UTF-8, specify whatever your actual encoding is for encoding.




ACCEPTED ANSWER

Score 281


I fixed it by adding .encode("utf-8") to soup.

That means that print(soup) becomes print(soup.encode("utf-8")).




ANSWER 3

Score 101


In Python 3.7, and running Windows 10 this worked (I am not sure whether it will work on other platforms and/or other versions of Python) Replacing this line:

with open('filename', 'w') as f:

With this:

with open('filename', 'w', encoding='utf-8') as f:

The reason why it is working is because the encoding is changed to UTF-8 when using the file, so characters in UTF-8 are able to be converted to text, instead of returning an error when it encounters a UTF-8 character that is not suppord by the current encoding.




ANSWER 4

Score 22


While saving the response of get request, same error was thrown on Python 3.7 on window 10. The response received from the URL, encoding was UTF-8 so it is always recommended to check the encoding so same can be passed to avoid such trivial issue as it really kills lots of time in production

import requests
resp = requests.get('https://en.wikipedia.org/wiki/NIFTY_50')
print(resp.encoding)
with open ('NiftyList.txt', 'w') as f:
    f.write(resp.text)

When I added encoding="utf-8" with the open command it saved the file with the correct response

with open ('NiftyList.txt', 'w', encoding="utf-8") as f:
    f.write(resp.text)