The Python Oracle

Why is "if not (a and b)" faster than "if not a or not b"?

This video explains
Why is "if not (a and b)" faster than "if not a or not b"?

--

Become part of the top 3% of the developers by applying to Toptal
https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Ancient Construction

--

Chapters
00:00 Question
02:24 Accepted answer (Score 29)
04:38 Answer 2 (Score 3)
10:29 Thank you

--

Full question
https://stackoverflow.com/questions/2955...

Accepted answer links:
[dis module]: https://docs.python.org/2/library/dis.ht...

Answer 2 links:
[Is it appropriate to share the results of my research toward solving my own minor questions?]: https://meta.stackoverflow.com/q/293686/...
[premature optimization is the root of all evil]: http://en.wikiquote.org/wiki/Donald_Knut...
[Mephy's comment]: https://stackoverflow.com/questions/2955...
[timeit docs]: https://docs.python.org/2/library/timeit...
[factorial]: https://stackoverflow.com/a/28477818/401...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #python27 #ifstatement #logicaloperators #microoptimization

#avk47



ACCEPTED ANSWER

Score 29


TL;DR

The not_or_chk function requires two unary operations in addition to two jumps (in the worst case), while the and_chk function only has the two jumps (again, in the worst case).

Details

The dis module to the rescue! The dis module lets you take a look at the Python bytecode disassembly of your code. For example:

import dis

"""Test method returns True if either argument is falsey, else False."""

def and_chk((a, b)):
    if not (a and b):
        return True
    return False

def not_or_chk((a, b)):
    if not a or not b:
        return True
    return False

print("And Check:\n")
print(dis.dis(and_chk))

print("Or Check:\n")
print(dis.dis(not_or_chk))

Produces this output:

And Check:

  5           0 LOAD_FAST                0 (.0)
              3 UNPACK_SEQUENCE          2
              6 STORE_FAST               1 (a)
              9 STORE_FAST               2 (b)

  6          12 LOAD_FAST                1 (a)    * This block is the *
             15 JUMP_IF_FALSE_OR_POP    21        * disassembly of    *
             18 LOAD_FAST                2 (b)    * the "and_chk"     *
        >>   21 POP_JUMP_IF_TRUE        28        * function          *

  7          24 LOAD_GLOBAL              0 (True)
             27 RETURN_VALUE

  8     >>   28 LOAD_GLOBAL              1 (False)
             31 RETURN_VALUE
None
Or Check:

 10           0 LOAD_FAST                0 (.0)
              3 UNPACK_SEQUENCE          2
              6 STORE_FAST               1 (a)
              9 STORE_FAST               2 (b)

 11          12 LOAD_FAST                1 (a)    * This block is the *
             15 UNARY_NOT                         * disassembly of    *
             16 POP_JUMP_IF_TRUE        26        * the "not_or_chk"  *
             19 LOAD_FAST                2 (b)    * function          *
             22 UNARY_NOT
             23 POP_JUMP_IF_FALSE       30

 12     >>   26 LOAD_GLOBAL              0 (True)
             29 RETURN_VALUE

 13     >>   30 LOAD_GLOBAL              1 (False)
             33 RETURN_VALUE
None

Take a look at the two blocks of Python bytecode that I've marked with the asterisks. Those blocks are your two disassembled functions. Note that and_chk only has two jumps, and the calculations in the function are made while deciding whether or not to take the jump.

On the other hand, the not_or_chkfunction requires the not operation to be carried out twice in the worst case, in addition to the interpreter deciding whether or not to take the jump.




ANSWER 2

Score 3


I just noticed this question via your Meta SO question: Is it appropriate to share the results of my research toward solving my own minor questions?. As you mention in that question (and one of the tags on this question indicates), this sort of thing falls into the category of micro-optimization. Ideally, one shouldn't need to worry about such minor performance differences, and as Knuth says, premature optimization is the root of all evil. But I guess it is fun and instructive to investigate such matters, as it can give you a better feel for how Python works "under the hood".

Mephy's comment prompted me to see what the timing differences were for if-less versions of your functions. The results are interesting, IMHO. I also took the opportunity to streamline your testing procedure.

#!/usr/bin/env python

''' Do timeit tests on various implementations of NAND

    NAND returns True if either argument is falsey, else False.
    From https://stackoverflow.com/q/29551438/4014959
    Written by PM 2Ring 2015.04.09
'''

from timeit import Timer
import dis

def and_chk(a, b):
    return not (a and b)

def and_chk_if(a, b):
    if not (a and b):
        return True
    else:
        return False

def not_or_chk(a, b):
    return not a or not b

def not_or_chk_if(a, b):
    if not a or not b:
        return True
    else:
        return False


#All the functions
funcs = (
    and_chk,
    and_chk_if,
    not_or_chk,
    not_or_chk_if,
)

#Argument tuples to test the functions with
bools = (0, 1)
bool_tups = [(u, v) for u in bools for v in bools]


def show_dis():
    ''' Show the disassembly for each function '''
    print 'Disassembly'
    for func in funcs:
        fname = func.func_name
        print '\n%s' % fname
        dis.dis(func)
    print


def verify():
    ''' Verify that the functions actually perform as intended '''
    print 'Verifying...'
    for func in funcs:
        fname = func.func_name
        print '\n%s' % fname
        for args in bool_tups:
            print args, func(*args)
    print


def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    print 'Timing tests\nLoops = %d, Repetitions = %d' % (loops, reps)

    for func in funcs:
        fname = func.func_name
        print '\n%s' % fname
        setup = 'from __main__ import %s' % fname
        for args in bool_tups:
            t = Timer('%s%s' % (fname, args), setup)
            r = t.repeat(reps, loops)
            r.sort()
            print args, r


show_dis()
verify()
time_test(loops=520000, reps=3)

output

Disassembly

and_chk
 13           0 LOAD_FAST                0 (a)
              3 JUMP_IF_FALSE            4 (to 10)
              6 POP_TOP             
              7 LOAD_FAST                1 (b)
        >>   10 UNARY_NOT           
             11 RETURN_VALUE        

and_chk_if
 16           0 LOAD_FAST                0 (a)
              3 JUMP_IF_FALSE            4 (to 10)
              6 POP_TOP             
              7 LOAD_FAST                1 (b)
        >>   10 JUMP_IF_TRUE             5 (to 18)
             13 POP_TOP             

 17          14 LOAD_GLOBAL              0 (True)
             17 RETURN_VALUE        
        >>   18 POP_TOP             

 19          19 LOAD_GLOBAL              1 (False)
             22 RETURN_VALUE        
             23 LOAD_CONST               0 (None)
             26 RETURN_VALUE        

not_or_chk
 22           0 LOAD_FAST                0 (a)
              3 UNARY_NOT           
              4 JUMP_IF_TRUE             5 (to 12)
              7 POP_TOP             
              8 LOAD_FAST                1 (b)
             11 UNARY_NOT           
        >>   12 RETURN_VALUE        

not_or_chk_if
 25           0 LOAD_FAST                0 (a)
              3 UNARY_NOT           
              4 JUMP_IF_TRUE             8 (to 15)
              7 POP_TOP             
              8 LOAD_FAST                1 (b)
             11 UNARY_NOT           
             12 JUMP_IF_FALSE            5 (to 20)
        >>   15 POP_TOP             

 26          16 LOAD_GLOBAL              0 (True)
             19 RETURN_VALUE        
        >>   20 POP_TOP             

 28          21 LOAD_GLOBAL              1 (False)
             24 RETURN_VALUE        
             25 LOAD_CONST               0 (None)
             28 RETURN_VALUE        

Verifying...

and_chk
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False

and_chk_if
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False

not_or_chk
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False

not_or_chk_if
(0, 0) True
(0, 1) True
(1, 0) True
(1, 1) False

Timing tests
Loops = 520000, Repetitions = 3

and_chk
(0, 0) [0.36773014068603516, 0.37793493270874023, 0.38387489318847656]
(0, 1) [0.36292791366577148, 0.37119913101196289, 0.37146902084350586]
(1, 0) [0.38673520088195801, 0.39340090751647949, 0.39670205116271973]
(1, 1) [0.38938498497009277, 0.39505791664123535, 0.40034103393554688]

and_chk_if
(0, 0) [0.4021449089050293, 0.40345501899719238, 0.41558098793029785]
(0, 1) [0.40686416625976562, 0.41213202476501465, 0.44800615310668945]
(1, 0) [0.4281308650970459, 0.42916202545166016, 0.43681907653808594]
(1, 1) [0.46246123313903809, 0.46759700775146484, 0.47544980049133301]

not_or_chk
(0, 0) [0.35435700416564941, 0.36368083953857422, 0.36867713928222656]
(0, 1) [0.35602092742919922, 0.35732793807983398, 0.36237406730651855]
(1, 0) [0.39550518989562988, 0.40660715103149414, 0.40977287292480469]
(1, 1) [0.4060060977935791, 0.4112389087677002, 0.41296815872192383]

not_or_chk_if
(0, 0) [0.4308779239654541, 0.44109201431274414, 0.45827698707580566]
(0, 1) [0.43600606918334961, 0.4370419979095459, 0.47623395919799805]
(1, 0) [0.48452401161193848, 0.48769593238830566, 0.49147915840148926]
(1, 1) [0.53107500076293945, 0.54048299789428711, 0.55434417724609375]

These timings were performed using Python 2.6.6 on a 2GHz Pentium 4 (single-core 32 bit) running Mepis 11 (a Debian family Linux distro).

You'll note that I've avoided using your next(tups) strategy to get the arguments for each function call and instead I'm passing the args directly, as constants. The time taken to perform next(tups) should be fairly constant, but it's best to eliminate such overheads, when practical, so that the reported time measurements more accurately reflect the performance of the code we're really interested in.

Also, it's usual to perform several repetitions of the timing loop and take the minimum value; FWIW, I generally do 3 to 5 reps. From the timeit docs:

Note

It’s tempting to calculate mean and standard deviation from the result vector and report these. However, this is not very useful. In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet; higher values in the result vector are typically not caused by variability in Python’s speed, but by other processes interfering with your timing accuracy. So the min() of the result is probably the only number you should be interested in. After that, you should look at the entire vector and apply common sense rather than statistics.

Your Meta post says that you want to perform & report on other micro-optimization experiments, so you might be interested in taking a look at some code I posted a few months ago that does time tests on various implementations of the factorial function.