Extract part of a regex match
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: The World Wide Mind
--
Chapters
00:00 Question
00:29 Accepted answer (Score 347)
00:52 Answer 2 (Score 60)
01:25 Answer 3 (Score 12)
01:38 Answer 4 (Score 9)
01:56 Thank you
--
Full question
https://stackoverflow.com/questions/1327...
Accepted answer links:
[group(1)]: https://docs.python.org/3.8/library/re.h...
[re.search]: https://docs.python.org/3.8/library/re.h...
Answer 2 links:
[assignment expressions (PEP 572)]: https://www.python.org/dev/peps/pep-0572/
[Krzysztof KrasoĊ's solution]: https://stackoverflow.com/a/1327389/9297...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #html #regex #htmlcontentextraction
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: The World Wide Mind
--
Chapters
00:00 Question
00:29 Accepted answer (Score 347)
00:52 Answer 2 (Score 60)
01:25 Answer 3 (Score 12)
01:38 Answer 4 (Score 9)
01:56 Thank you
--
Full question
https://stackoverflow.com/questions/1327...
Accepted answer links:
[group(1)]: https://docs.python.org/3.8/library/re.h...
[re.search]: https://docs.python.org/3.8/library/re.h...
Answer 2 links:
[assignment expressions (PEP 572)]: https://www.python.org/dev/peps/pep-0572/
[Krzysztof KrasoĊ's solution]: https://stackoverflow.com/a/1327389/9297...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #html #regex #htmlcontentextraction
#avk47
ACCEPTED ANSWER
Score 385
Use ( ) in regexp and group(1) in python to retrieve the captured string (re.search will return None if it doesn't find the result, so don't use group() directly):
title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE)
if title_search:
title = title_search.group(1)
ANSWER 2
Score 12
Try using capturing groups:
title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)
ANSWER 3
Score 10
May I recommend you to Beautiful Soup. Soup is a very good lib to parse all of your html document.
soup = BeatifulSoup(html_doc)
titleName = soup.title.name
ANSWER 4
Score 7
Try:
title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)