Comparing the speed of startswith() .vs. in()

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzling Curiosities

--

Chapters
00:00 Question
02:29 Accepted answer (Score 11)
03:32 Answer 2 (Score 6)
04:13 Answer 3 (Score 0)
04:53 Thank you

--

Full question
https://stackoverflow.com/questions/4452...

Accepted answer links:
[cmp_outcome]: https://github.com/python/cpython/blob/m...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #performance #python3x #time

#avk47

ACCEPTED ANSWER

Score 13

This is due to the fact that you have to look-up and invoke a method. in is specialized and leads directly to COMPARE_OP (calling cmp_outcome which, in turn, calls PySequence_Contains) while str.startswith goes through slower byte-code:

2 LOAD_ATTR                0 (startswith)
4 LOAD_FAST                1 (word)
6 CALL_FUNCTION            1              # the slow part

Replacing in with __contains__, forcing a function call for that case too, pretty much negates the speed difference:

setup1='''
def in_test(sent, word):
    if sent.__contains__(word):
        return True
    else:
        return False
'''

And, the timings:

print(timeit.timeit('in_test("this is a standard sentence", "this")', setup=setup1))
print(timeit.timeit('startswith_test("this is a standard sentence", "this")', setup=setup2))
0.43849368393421173
0.4993997460696846

in is winning here because of the fact that it doesn't need to go through the whole function call setup and due to the favorable case it's presented with.

ANSWER 2

Score 6

You're comparing an operator on strings -vs- an attribute lookup and a function call. The second one will have a higher overhead, even if the first one takes a long time on a lot of data.

Additionally you're looking for the first word, so if it does match, in will look at just as much data as startswith(). To see the difference you should look at a pessimistic case (no results found, or match at the end of the string):

setup1='''
data = "xxxx"*1000
def ....

print(timeit.timeit('in_test(data, "this")', setup=setup1))
0.932795189000899
print(timeit.timeit('startswith_test(data, "this")', setup=setup2))
0.22242475600069156

ANSWER 3

Score 0

If you look at bytecode produced by your functions:

>>> dis.dis(in_test)
  2           0 LOAD_FAST                1 (word)
              3 LOAD_FAST                0 (sent)
              6 COMPARE_OP               6 (in)
              9 POP_JUMP_IF_FALSE       16

  3          12 LOAD_CONST               1 (True)
             15 RETURN_VALUE

  5     >>   16 LOAD_CONST               2 (False)
             19 RETURN_VALUE
             20 LOAD_CONST               0 (None)
             23 RETURN_VALUE

you'll notice there is much overhead not directly related to string matching. Doing the test on a simpler function:

def in_test(sent, word):
    return word in sent

will be more reliable.