Decode HTML entities in Python string?
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Puddle Jumping Looping
--
Chapters
00:00 Decode Html Entities In Python String?
00:24 Accepted Answer Score 730
01:20 Answer 2 Score 72
01:53 Answer 3 Score 8
02:25 Answer 4 Score 16
02:38 Thank you
--
Full question
https://stackoverflow.com/questions/2087...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #html #htmlentities
#avk47
ACCEPTED ANSWER
Score 730
Python 3.4+
Use html.unescape():
import html
print(html.unescape('£682m'))
FYI html.parser.HTMLParser.unescape is deprecated, and was supposed to be removed in 3.5, although it was left in by mistake. It will be removed from the language soon.
Python 2.6-3.3
You can use HTMLParser.unescape() from the standard library:
- For Python 2.6-2.7 it's in 
HTMLParser - For Python 3 it's in 
html.parser 
>>> try:
...     # Python 2.6-2.7 
...     from HTMLParser import HTMLParser
... except ImportError:
...     # Python 3
...     from html.parser import HTMLParser
... 
>>> h = HTMLParser()
>>> print(h.unescape('£682m'))
£682m
You can also use the six compatibility library to simplify the import:
>>> from six.moves.html_parser import HTMLParser
>>> h = HTMLParser()
>>> print(h.unescape('£682m'))
£682m
ANSWER 2
Score 72
Beautiful Soup handles entity conversion. In Beautiful Soup 3, you'll need to specify the convertEntities argument to the BeautifulSoup constructor (see the 'Entity Conversion' section of the archived docs). In Beautiful Soup 4, entities get decoded automatically.
Beautiful Soup 3
>>> from BeautifulSoup import BeautifulSoup
>>> BeautifulSoup("<p>£682m</p>", 
...               convertEntities=BeautifulSoup.HTML_ENTITIES)
<p>£682m</p>
Beautiful Soup 4
>>> from bs4 import BeautifulSoup
>>> BeautifulSoup("<p>£682m</p>")
<html><body><p>£682m</p></body></html>
ANSWER 3
Score 16
You can use replace_entities from w3lib.html library
In [202]: from w3lib.html import replace_entities
In [203]: replace_entities("£682m")
Out[203]: u'\xa3682m'
In [204]: print replace_entities("£682m")
£682m
ANSWER 4
Score 8
Beautiful Soup 4 allows you to set a formatter to your output
If you pass in
formatter=None, Beautiful Soup will not modify strings at all on output. This is the fastest option, but it may lead to Beautiful Soup generating invalid HTML/XML, as in these examples:
print(soup.prettify(formatter=None))
# <html>
#  <body>
#   <p>
#    Il a dit <<Sacré bleu!>>
#   </p>
#  </body>
# </html>
link_soup = BeautifulSoup('<a href="http://example.com/?foo=val1&bar=val2">A link</a>')
print(link_soup.a.encode(formatter=None))
# <a href="http://example.com/?foo=val1&bar=val2">A link</a>