Python Regex slower than expected
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Puzzle2
--
Chapters
00:00 Question
01:27 Accepted answer (Score 2)
02:05 Answer 2 (Score 2)
03:07 Thank you
--
Full question
https://stackoverflow.com/questions/3116...
Question links:
[article]: https://www.loggly.com/blog/regexes-the-.../
Accepted answer links:
http://www.regular-expressions.info/cata...
[Greedy vs. Reluctant vs. Possessive Quantifiers]: https://stackoverflow.com/questions/5319...
Answer 2 links:
[re.compile]: https://docs.python.org/2/library/re.htm...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #regex #performance
#avk47
ACCEPTED ANSWER
Score 2
The more a regular expression has to backtrack, the slower it is.
This might not hold for very small input data. However, who would care about the performance on small data? :D
This topic is well covered in this article:
Also there are interesting contributions in this question:
ANSWER 2
Score 2
You also are not using the re.compile feature of Python regexes, meaning your search time also includes time for the re module to compile the regex at each iteration.
>>> print(timeit(setup="import re", stmt='''r = re.search(r'(\d{1,3}\.\d{1,3}.\d{1,3}.\d{1,3})', '192.168.1.1 999.999.999.999')''', number=1000000))
0.73820400238
>>> print(timeit(setup="import re; regex = re.compile(r'(\d{1,3}\.\d{1,3}.\d{1,3}.\d{1,3})')", stmt='''r = regex.search('192.168.1.1 999.999.999.999')''', number=1000000))
0.271140813828
>>> print(timeit(setup="import re; regex = re.compile(r'((?:\d{1,3}\.){3}\d{1,3})')", stmt='''r = regex.search('192.168.1.1 999.999.999.999')''', number=1000000))
0.31952214241
>>> print(timeit(setup="import re; regex = re.compile(r'(\d{1,2}/\w{3}/[2][0]\d{2}:\d{2}:\d{2}:\d{2}\s[+][0]{4})')", stmt='''r = regex.search("[23/Jun/2015:11:10:57 +0000]")''', number=1000000))
0.371844053268
>>> 
The difference between the greedy and non-greedy regex here is actually much closer to expected when you pre-compile. The rest of the explanation goes to backtracking.
We can see that your tests speed up almost by a factor of 3 if you pre-compile your regexes for a large number of iterations.
This answer is meant to complement @mescalinum's answer, but for a large number of regexes, you should really be compiling the regexes ahead of time for a fair comparison.