The Python Oracle

Extract part of a regex match

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Over a Mysterious Island

--

Chapters
00:00 Extract Part Of A Regex Match
00:23 Answer 1 Score 7
00:31 Accepted Answer Score 385
00:48 Answer 3 Score 12
00:59 Answer 4 Score 10
01:11 Thank you

--

Full question
https://stackoverflow.com/questions/1327...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #html #regex #htmlcontentextraction

#avk47



ACCEPTED ANSWER

Score 385


Use ( ) in regexp and group(1) in python to retrieve the captured string (re.search will return None if it doesn't find the result, so don't use group() directly):

title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE)

if title_search:
    title = title_search.group(1)



ANSWER 2

Score 12


Try using capturing groups:

title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)



ANSWER 3

Score 10


May I recommend you to Beautiful Soup. Soup is a very good lib to parse all of your html document.

soup = BeatifulSoup(html_doc)
titleName = soup.title.name



ANSWER 4

Score 7


Try:

title = re.search('<title>(.*)</title>', html, re.IGNORECASE).group(1)