match text against multiple regex in python
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Peaceful Mind
--
Chapters
00:00 Match Text Against Multiple Regex In Python
01:08 Answer 1 Score 1
01:59 Accepted Answer Score 7
02:53 Thank you
--
Full question
https://stackoverflow.com/questions/1303...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #regex #multiplematches
#avk47
ACCEPTED ANSWER
Score 7
The approach below is in the case that you want the matches. In the case that you need the regular expression in a list that triggered a match, you are out of luck and will probably need to loop.
Based on the link you have provided:
import re
regexes= 'quick', 'brown', 'fox'
combinedRegex = re.compile('|'.join('(?:{0})'.format(x) for x in regexes))
lines = 'The quick brown fox jumps over the lazy dog', 'Lorem ipsum dolor sit amet', 'The lazy dog jumps over the fox'
for line in lines:
print combinedRegex.findall(line)
outputs:
['quick', 'brown', 'fox']
[]
['fox']
The point here is that you do not loop over the regex but combine them.
The difference with the looping approach is that re.findall will not find overlapping matches. For instance if your regexes were: regexes= 'bro', 'own', the output of the lines above would be:
['bro']
[]
[]
whereas the looping approach would result in:
['bro', 'own']
[]
[]
ANSWER 2
Score 1
If you're just trying to match literal strings, it's probably easier to just do:
strings = 'foo','bar','baz','qux'
regex = re.compile('|'.join(re.escape(x) for x in strings))
and then you can test the whole thing at once:
match = regex.match(line)
Of course, you can get the string which matched from the resulting MatchObject:
if match:
matching_string = match.group(0)
In action:
import re
strings = 'foo','bar','baz','qux'
regex = re.compile('|'.join(re.escape(x) for x in strings))
lines = 'foo is a word I know', 'baz is a word I know', 'buz is unfamiliar to me'
for line in lines:
match = regex.match(line)
if match:
print match.group(0)
It appears that you're really looking to search the string for your regex. In this case, you'll need to use re.search (or some variant), not re.match no matter what you do. As long as none of your regular expressions overlap, you can use my above posted solution with re.findall:
matches = regex.findall(line)
for word in matches:
print ("found {word} in line".format(word=word))