How do I use itertools.groupby()?
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Orient Looping
--
Chapters
00:00 How Do I Use Itertools.Groupby()?
00:41 Answer 1 Score 75
01:11 Accepted Answer Score 877
02:56 Answer 3 Score 52
03:24 Answer 4 Score 35
04:09 Thank you
--
Full question
https://stackoverflow.com/questions/773/...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #itertools
#avk47
ACCEPTED ANSWER
Score 877
IMPORTANT NOTE: You may have to sort your data first.
The part I didn't get is that in the example construction
groups = []
uniquekeys = []
for k, g in groupby(data, keyfunc):
groups.append(list(g)) # Store group iterator as a list
uniquekeys.append(k)
k is the current grouping key, and g is an iterator that you can use to iterate over the group defined by that grouping key. In other words, the groupby iterator itself returns iterators.
Here's an example of that, using clearer variable names:
from itertools import groupby
things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "speed boat"), ("vehicle", "school bus")]
for key, group in groupby(things, lambda x: x[0]):
for thing in group:
print("A %s is a %s." % (thing[1], key))
print("")
This will give you the output:
A bear is a animal.
A duck is a animal.A cactus is a plant.
A speed boat is a vehicle.
A school bus is a vehicle.
In this example, things is a list of tuples where the first item in each tuple is the group the second item belongs to.
The groupby() function takes two arguments: (1) the data to group and (2) the function to group it with.
Here, lambda x: x[0] tells groupby() to use the first item in each tuple as the grouping key.
In the above for statement, groupby returns three (key, group iterator) pairs - once for each unique key. You can use the returned iterator to iterate over each individual item in that group.
Here's a slightly different example with the same data, using a list comprehension:
for key, group in groupby(things, lambda x: x[0]):
listOfThings = " and ".join([thing[1] for thing in group])
print(key + "s: " + listOfThings + ".")
This will give you the output:
animals: bear and duck.
plants: cactus.
vehicles: speed boat and school bus.
ANSWER 2
Score 75
The example on the Python docs is quite straightforward:
groups = []
uniquekeys = []
for k, g in groupby(data, keyfunc):
groups.append(list(g)) # Store group iterator as a list
uniquekeys.append(k)
So in your case, data is a list of nodes, keyfunc is where the logic of your criteria function goes and then groupby() groups the data.
You must be careful to sort the data by the criteria before you call groupby or it won't work. groupby method actually just iterates through a list and whenever the key changes it creates a new group.
ANSWER 3
Score 52
A neato trick with groupby is to run length encoding in one line:
[(c,len(list(cgen))) for c,cgen in groupby(some_string)]
will give you a list of 2-tuples where the first element is the char and the 2nd is the number of repetitions.
Edit: Note that this is what separates itertools.groupby from the SQL GROUP BY semantics: itertools doesn't (and in general can't) sort the iterator in advance, so groups with the same "key" aren't merged.
ANSWER 4
Score 35
Another example:
for key, igroup in itertools.groupby(xrange(12), lambda x: x // 5):
print key, list(igroup)
results in
0 [0, 1, 2, 3, 4]
1 [5, 6, 7, 8, 9]
2 [10, 11]
Note that igroup is an iterator (a sub-iterator as the documentation calls it).
This is useful for chunking a generator:
def chunker(items, chunk_size):
'''Group items in chunks of chunk_size'''
for _key, group in itertools.groupby(enumerate(items), lambda x: x[0] // chunk_size):
yield (g[1] for g in group)
with open('file.txt') as fobj:
for chunk in chunker(fobj):
process(chunk)
Another example of groupby - when the keys are not sorted. In the following example, items in xx are grouped by values in yy. In this case, one set of zeros is output first, followed by a set of ones, followed again by a set of zeros.
xx = range(10)
yy = [0, 0, 0, 1, 1, 1, 0, 0, 0, 0]
for group in itertools.groupby(iter(xx), lambda x: yy[x]):
print group[0], list(group[1])
Produces:
0 [0, 1, 2]
1 [3, 4, 5]
0 [6, 7, 8, 9]