The Python Oracle

Why is taking the mod of a number in python faster with exponents?

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Island

--

Chapters
00:00 Question
02:48 Accepted answer (Score 19)
04:36 Answer 2 (Score 6)
06:08 Answer 3 (Score 4)
06:45 Thank you

--

Full question
https://stackoverflow.com/questions/1055...

Answer 1 links:
[dis]: http://docs.python.org/library/dis.html

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #optimization #profiling

#avk47



ACCEPTED ANSWER

Score 20


There is no difference in the generated bytecode, because the compiler does its job well and optimizes away the constant arithmetic expression. That means your test results are just a coincidence (try timing the functions in a different order!).

>>> import dis
>>> dis.dis(test1)
  2           0 SETUP_LOOP              30 (to 33)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_GLOBAL              1 (AMOUNT)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                16 (to 32)
             16 STORE_FAST               0 (i)

  3          19 LOAD_FAST                0 (i)
             22 LOAD_CONST               1 (65536)
             25 BINARY_MODULO       
             26 STORE_FAST               1 (value)
             29 JUMP_ABSOLUTE           13
        >>   32 POP_BLOCK           

  4     >>   33 LOAD_CONST               0 (None)
             36 RETURN_VALUE        
>>> dis.dis(test5)
  2           0 SETUP_LOOP              30 (to 33)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_GLOBAL              1 (AMOUNT)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                16 (to 32)
             16 STORE_FAST               0 (i)

  3          19 LOAD_FAST                0 (i)
             22 LOAD_CONST               3 (65536)
             25 BINARY_MODULO       
             26 STORE_FAST               1 (value)
             29 JUMP_ABSOLUTE           13
        >>   32 POP_BLOCK           

  4     >>   33 LOAD_CONST               0 (None)
             36 RETURN_VALUE        

(well actually there is a difference: The number is stored at different offsets in the constant table. I can't imagine this causing any difference, though).

For completeness, here's a proper test that uses the timeit module:

import timeit

setup = "i = 1337"

best1 = best2 = float("inf")
for _ in range(5000):
  best1 = min(best1, timeit.timeit("i % 65536", setup=setup, number=10000))
for _ in range(5000):
  best2 = min(best2, timeit.timeit("i % (2**16)", setup=setup, number=10000))
print best1
print best2

Note that I am measuring the minimum time needed, rather than the average. If it takes longer for some reason, this just means that it was interrupted more often (because the code doesn't depend on anything but the power of your CPU).




ANSWER 2

Score 6


Hmmm, using dis to show the python bytes codes shows that the functions are identical. Python has optimised the constant (as expected). So I suspect the time differences are caching effects. The timings on my laptop bear this out (using Python 2.7.3 64 bit on linux)

Fri May 11 23:37:49 2012    results

     8 function calls in 38.825 seconds

Ordered by: call count, name/file/line

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1    0.000    0.000   38.825   38.825 <string>:1(<module>)
     1    0.000    0.000   38.825   38.825 z.py:31(run_tests)
     1    7.880    7.880    7.880    7.880 z.py:6(test1)
     1    7.658    7.658    7.658    7.658 z.py:11(test2)
     1    7.806    7.806    7.806    7.806 z.py:16(test3)
     1    7.784    7.784    7.784    7.784 z.py:21(test4)
     1    7.697    7.697    7.697    7.697 z.py:26(test5)

All pretty much identical

>>> from dis import dis
>>> def test1():
...     for i in xrange(AMOUNT):
...         value = i % 65536
...     return
... 
>>> def test5():
...     for i in xrange(AMOUNT):
...         value = i % (2**16)
...     return
... 
>>> dis(test1)
  2           0 SETUP_LOOP              30 (to 33)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_GLOBAL              1 (AMOUNT)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                16 (to 32)
             16 STORE_FAST               0 (i)

  3          19 LOAD_FAST                0 (i)
             22 LOAD_CONST               1 (65536)
             25 BINARY_MODULO       
             26 STORE_FAST               1 (value)
             29 JUMP_ABSOLUTE           13
        >>   32 POP_BLOCK           

  4     >>   33 LOAD_CONST               0 (None)
             36 RETURN_VALUE        
>>> dis(test5)
  2           0 SETUP_LOOP              30 (to 33)
              3 LOAD_GLOBAL              0 (xrange)
              6 LOAD_GLOBAL              1 (AMOUNT)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                16 (to 32)
             16 STORE_FAST               0 (i)

  3          19 LOAD_FAST                0 (i)
             22 LOAD_CONST               3 (65536)
             25 BINARY_MODULO       
             26 STORE_FAST               1 (value)
             29 JUMP_ABSOLUTE           13
        >>   32 POP_BLOCK           

  4     >>   33 LOAD_CONST               0 (None)
             36 RETURN_VALUE        
>>> 



ANSWER 3

Score 5


You have run every test only once. The speed of your CPU isn't the same all the time, at the start of the test it was most probably sleeping and that's why the first test was slower. For benchmarking small parts of code (like mod) use timeit module:

>>> timeit.timeit('for i in range(10000): i % 65536', number=1000)
0.8686108589172363
>>> timeit.timeit('for i in range(10000): i % 256**2', number=1000)
0.862062931060791
>>> timeit.timeit('for i in range(10000): i % 4**8', number=1000)
0.8644928932189941
>>> timeit.timeit('for i in range(10000): i % 2**16', number=1000)
0.8643178939819336
>>> timeit.timeit('for i in range(10000): i % 65536', number=1000)
0.8640358448028564

You can see that the average is always around 0.864.