match text against multiple regex in python
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Unforgiving Himalayas Looping
--
Chapters
00:00 Question
01:29 Accepted answer (Score 7)
02:31 Answer 2 (Score 1)
03:34 Thank you
--
Full question
https://stackoverflow.com/questions/1303...
Question links:
[Match a line with multiple regex using Python]: https://stackoverflow.com/questions/8888...
Accepted answer links:
[the link you have provided]: https://stackoverflow.com/questions/8888...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #regex #multiplematches
#avk47
ACCEPTED ANSWER
Score 7
The approach below is in the case that you want the matches. In the case that you need the regular expression in a list that triggered a match, you are out of luck and will probably need to loop.
Based on the link you have provided:
import re
regexes= 'quick', 'brown', 'fox'
combinedRegex = re.compile('|'.join('(?:{0})'.format(x) for x in regexes))
lines = 'The quick brown fox jumps over the lazy dog', 'Lorem ipsum dolor sit amet', 'The lazy dog jumps over the fox'
for line in lines:
print combinedRegex.findall(line)
outputs:
['quick', 'brown', 'fox']
[]
['fox']
The point here is that you do not loop over the regex but combine them.
The difference with the looping approach is that re.findall will not find overlapping matches. For instance if your regexes were: regexes= 'bro', 'own', the output of the lines above would be:
['bro']
[]
[]
whereas the looping approach would result in:
['bro', 'own']
[]
[]
ANSWER 2
Score 1
If you're just trying to match literal strings, it's probably easier to just do:
strings = 'foo','bar','baz','qux'
regex = re.compile('|'.join(re.escape(x) for x in strings))
and then you can test the whole thing at once:
match = regex.match(line)
Of course, you can get the string which matched from the resulting MatchObject:
if match:
matching_string = match.group(0)
In action:
import re
strings = 'foo','bar','baz','qux'
regex = re.compile('|'.join(re.escape(x) for x in strings))
lines = 'foo is a word I know', 'baz is a word I know', 'buz is unfamiliar to me'
for line in lines:
match = regex.match(line)
if match:
print match.group(0)
It appears that you're really looking to search the string for your regex. In this case, you'll need to use re.search (or some variant), not re.match no matter what you do. As long as none of your regular expressions overlap, you can use my above posted solution with re.findall:
matches = regex.findall(line)
for word in matches:
print ("found {word} in line".format(word=word))