The Python Oracle

Extract part of a regex match

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: The World Wide Mind

--

Chapters
00:00 Question
00:29 Accepted answer (Score 347)
00:52 Answer 2 (Score 60)
01:25 Answer 3 (Score 12)
01:38 Answer 4 (Score 9)
01:56 Thank you

--

Full question
https://stackoverflow.com/questions/1327...

Accepted answer links:
[group(1)]: https://docs.python.org/3.8/library/re.h...
[re.search]: https://docs.python.org/3.8/library/re.h...

Answer 2 links:
[assignment expressions (PEP 572)]: https://www.python.org/dev/peps/pep-0572/
[Krzysztof KrasoĊ„'s solution]: https://stackoverflow.com/a/1327389/9297...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #html #regex #htmlcontentextraction

#avk47



ACCEPTED ANSWER

Score 385


Use ( ) in regexp and group(1) in python to retrieve the captured string (re.search will return None if it doesn't find the result, so don't use group() directly):

title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE)

if title_search:
    title = title_search.group(1)



ANSWER 2

Score 12


Try using capturing groups:

title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)



ANSWER 3

Score 10


May I recommend you to Beautiful Soup. Soup is a very good lib to parse all of your html document.

soup = BeatifulSoup(html_doc)
titleName = soup.title.name



ANSWER 4

Score 7


Try:

title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)