The Python Oracle

Comparing the speed of startswith() .vs. in()

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Take control of your privacy with Proton's trusted, Swiss-based, secure services.
Choose what you need and safeguard your digital life:
Mail: https://go.getproton.me/SH1CU
VPN: https://go.getproton.me/SH1DI
Password Manager: https://go.getproton.me/SH1DJ
Drive: https://go.getproton.me/SH1CT


Music by Eric Matyas
https://www.soundimage.org
Track title: Techno Bleepage Open

--

Chapters
00:00 Comparing The Speed Of Startswith() .Vs. In()
01:38 Accepted Answer Score 13
02:27 Answer 2 Score 6
02:59 Answer 3 Score 0
03:30 Thank you

--

Full question
https://stackoverflow.com/questions/4452...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #performance #python3x #time

#avk47



ACCEPTED ANSWER

Score 13


This is due to the fact that you have to look-up and invoke a method. in is specialized and leads directly to COMPARE_OP (calling cmp_outcome which, in turn, calls PySequence_Contains) while str.startswith goes through slower byte-code:

2 LOAD_ATTR                0 (startswith)
4 LOAD_FAST                1 (word)
6 CALL_FUNCTION            1              # the slow part

Replacing in with __contains__, forcing a function call for that case too, pretty much negates the speed difference:

setup1='''
def in_test(sent, word):
    if sent.__contains__(word):
        return True
    else:
        return False
'''

And, the timings:

print(timeit.timeit('in_test("this is a standard sentence", "this")', setup=setup1))
print(timeit.timeit('startswith_test("this is a standard sentence", "this")', setup=setup2))
0.43849368393421173
0.4993997460696846

in is winning here because of the fact that it doesn't need to go through the whole function call setup and due to the favorable case it's presented with.




ANSWER 2

Score 6


You're comparing an operator on strings -vs- an attribute lookup and a function call. The second one will have a higher overhead, even if the first one takes a long time on a lot of data.

Additionally you're looking for the first word, so if it does match, in will look at just as much data as startswith(). To see the difference you should look at a pessimistic case (no results found, or match at the end of the string):

setup1='''
data = "xxxx"*1000
def ....

print(timeit.timeit('in_test(data, "this")', setup=setup1))
0.932795189000899
print(timeit.timeit('startswith_test(data, "this")', setup=setup2))
0.22242475600069156



ANSWER 3

Score 0


If you look at bytecode produced by your functions:

>>> dis.dis(in_test)
  2           0 LOAD_FAST                1 (word)
              3 LOAD_FAST                0 (sent)
              6 COMPARE_OP               6 (in)
              9 POP_JUMP_IF_FALSE       16

  3          12 LOAD_CONST               1 (True)
             15 RETURN_VALUE

  5     >>   16 LOAD_CONST               2 (False)
             19 RETURN_VALUE
             20 LOAD_CONST               0 (None)
             23 RETURN_VALUE

you'll notice there is much overhead not directly related to string matching. Doing the test on a simpler function:

def in_test(sent, word):
    return word in sent

will be more reliable.