Is the defaultdict in Python's collections module really faster than using setdefault?
Is the defaultdict in Python's collections module really faster than using setdefault?
--
Become part of the top 3% of the developers by applying to Toptal
https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 2
--
Chapters
00:00 Question
01:13 Accepted answer (Score 25)
02:07 Thank you
--
Full question
https://stackoverflow.com/questions/1255...
Question links:
[claim that using defaultdict is faster]: http://docs.python.org/library/collectio...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #collections #defaultdict #setdefault #pythoncollections
#avk47
ACCEPTED ANSWER
Score 25
Yes, there is something "wrong":
You have put the creation of the (default)dict into the statement instead of the setup. Constructing a new defaultdict is more expensive than a normal dict, and usually that's not the bottleneck you should be profiling in a program - after all, you build your data structures once but you use them many times.
If you do your tests like below, you see that defaultdict operations are indeed faster:
>>> import timeit
>>> setup1 = """from collections import defaultdict
... s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
... d = defaultdict(list)"""
>>> stmt1 = """for k, v in s:
... d[k].append(v)"""
>>> setup2 = """s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
... d = {}"""
>>> stmt2 = """for k, v in s:
... d.setdefault(k, []).append(v)"""
>>> timeit.timeit(setup=setup1, stmt=stmt1)
1.0283400125194078
>>> timeit.timeit(setup=setup2, stmt=stmt2)
1.7767367580925395
Python 2.7.3 on Win7 x64.
ANSWER 2
Score 0
Adding to Tim's answer, when you make multiple defaultdicts, adding elements becomes costlier. Though initialization adds time, the main difference comes from the time taken to add elements to the multiple defaultdicts.
Here's some code to show the observation
import time
from collections import defaultdict
def time_taken_mult_dd(sz):
t1 = t2 = 0
for i1 in range(1000000):
start1 = time.time()
d = defaultdict(int)
end1 = time.time()
start2 = time.time()
for i2 in range(sz):
d[i2] += 1
end2 = time.time()
t1 += end1 - start1
t2 += end2 - start2
print(f"Time taken for initialization {t1:5.2f}e-06")
print(f"Time taken for addition {t2:5.2f}e-06")
def time_taken_one_dd(sz):
t = 0
d = defaultdict(int)
for i1 in range(1000000):
start = time.time()
for i2 in range(sz):
d[i2] += 1
end = time.time()
t += end - start
print(f"Time taken for addition {t:5.2f}e-06")
And here's the output. Observe how initialization time remains same while adding elements become more time consuming
>>> time_taken_one_dd(10)
Time taken for addition 1.59e-06
>>> time_taken_mult_dd(10)
Time taken for initialization 0.41e-06
Time taken for addition 2.64e-06
>>>
>>> time_taken_one_dd(50)
Time taken for addition 6.46e-06
>>> time_taken_mult_dd(50)
Time taken for initialization 0.59e-06
Time taken for addition 11.80e-06
>>>
>>> time_taken_one_dd(100)
Time taken for addition 12.61e-06
>>> time_taken_mult_dd(100)
Time taken for initialization 0.71e-06
Time taken for addition 23.05e-06