The Python Oracle

Is the defaultdict in Python's collections module really faster than using setdefault?

This video explains
Is the defaultdict in Python's collections module really faster than using setdefault?

--

Become part of the top 3% of the developers by applying to Toptal
https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 2

--

Chapters
00:00 Question
01:13 Accepted answer (Score 25)
02:07 Thank you

--

Full question
https://stackoverflow.com/questions/1255...

Question links:
[claim that using defaultdict is faster]: http://docs.python.org/library/collectio...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #collections #defaultdict #setdefault #pythoncollections

#avk47



ACCEPTED ANSWER

Score 25


Yes, there is something "wrong":

You have put the creation of the (default)dict into the statement instead of the setup. Constructing a new defaultdict is more expensive than a normal dict, and usually that's not the bottleneck you should be profiling in a program - after all, you build your data structures once but you use them many times.

If you do your tests like below, you see that defaultdict operations are indeed faster:

>>> import timeit
>>> setup1 = """from collections import defaultdict
... s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
... d = defaultdict(list)"""
>>> stmt1 = """for k, v in s:
...     d[k].append(v)"""
>>> setup2 = """s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
... d = {}"""
>>> stmt2 = """for k, v in s:
...     d.setdefault(k, []).append(v)"""
>>> timeit.timeit(setup=setup1, stmt=stmt1)
1.0283400125194078
>>> timeit.timeit(setup=setup2, stmt=stmt2)
1.7767367580925395

Python 2.7.3 on Win7 x64.




ANSWER 2

Score 0


Adding to Tim's answer, when you make multiple defaultdicts, adding elements becomes costlier. Though initialization adds time, the main difference comes from the time taken to add elements to the multiple defaultdicts.

Here's some code to show the observation

import time
from collections import defaultdict

def time_taken_mult_dd(sz):
    t1 = t2 = 0
    for i1 in range(1000000):
        start1 = time.time()
        d = defaultdict(int)
        end1 = time.time()
        
        start2 = time.time()
        for i2 in range(sz):
            d[i2] += 1
        end2 = time.time()
        
        t1 += end1 - start1
        t2 += end2 - start2
    
    print(f"Time taken for initialization {t1:5.2f}e-06")
    print(f"Time taken for addition       {t2:5.2f}e-06")

def time_taken_one_dd(sz):
    t = 0
    d = defaultdict(int)
    for i1 in range(1000000):
        
        start = time.time()
        for i2 in range(sz):
            d[i2] += 1
        end = time.time()
        
        t += end - start
    
    print(f"Time taken for addition       {t:5.2f}e-06")

And here's the output. Observe how initialization time remains same while adding elements become more time consuming

>>> time_taken_one_dd(10)
Time taken for addition        1.59e-06
>>> time_taken_mult_dd(10)
Time taken for initialization  0.41e-06
Time taken for addition        2.64e-06
>>> 
>>> time_taken_one_dd(50)
Time taken for addition        6.46e-06
>>> time_taken_mult_dd(50)
Time taken for initialization  0.59e-06
Time taken for addition       11.80e-06
>>> 
>>> time_taken_one_dd(100)
Time taken for addition       12.61e-06
>>> time_taken_mult_dd(100)
Time taken for initialization  0.71e-06
Time taken for addition       23.05e-06