Extract part of a regex match
--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Over a Mysterious Island
--
Chapters
00:00 Extract Part Of A Regex Match
00:23 Answer 1 Score 7
00:31 Accepted Answer Score 385
00:48 Answer 3 Score 12
00:59 Answer 4 Score 10
01:11 Thank you
--
Full question
https://stackoverflow.com/questions/1327...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #html #regex #htmlcontentextraction
#avk47
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Over a Mysterious Island
--
Chapters
00:00 Extract Part Of A Regex Match
00:23 Answer 1 Score 7
00:31 Accepted Answer Score 385
00:48 Answer 3 Score 12
00:59 Answer 4 Score 10
01:11 Thank you
--
Full question
https://stackoverflow.com/questions/1327...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #html #regex #htmlcontentextraction
#avk47
ACCEPTED ANSWER
Score 385
Use ( ) in regexp and group(1) in python to retrieve the captured string (re.search will return None if it doesn't find the result, so don't use group() directly):
title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE)
if title_search:
title = title_search.group(1)
ANSWER 2
Score 12
Try using capturing groups:
title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)
ANSWER 3
Score 10
May I recommend you to Beautiful Soup. Soup is a very good lib to parse all of your html document.
soup = BeatifulSoup(html_doc)
titleName = soup.title.name
ANSWER 4
Score 7
Try:
title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)