Best way to find the intersection of multiple sets?

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Luau

--

Chapters
00:00 Question
00:30 Accepted answer (Score 607)
01:09 Answer 2 (Score 91)
01:28 Answer 3 (Score 30)
01:55 Answer 4 (Score 12)
03:27 Thank you

--

Full question
https://stackoverflow.com/questions/2541...

Accepted answer links:
[set.intersection()]: https://docs.python.org/library/stdtypes...
[list expansion]: https://stackoverflow.com/questions/3690...

Answer 4 links:
[including Guido himself]: http://lambda-the-ultimate.org/node/587

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #set #setintersection

#avk47

ACCEPTED ANSWER

Score 673

From Python version 2.6 on you can use multiple arguments to set.intersection(), like

u = set.intersection(s1, s2, s3)

If the sets are in a list, this translates to:

u = set.intersection(*setlist)

where *a_list is list expansion

Note that set.intersection is not a static method, but this uses the functional notation to apply intersection of the first set with the rest of the list. So if the argument list is empty this will fail.

ANSWER 2

Score 104

As of 2.6, set.intersection takes arbitrarily many iterables.

>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s3 = set([2, 4, 6])
>>> s1 & s2 & s3
set([2])
>>> s1.intersection(s2, s3)
set([2])
>>> sets = [s1, s2, s3]
>>> set.intersection(*sets)
set([2])

ANSWER 3

Score 35

Clearly set.intersection is what you want here, but in case you ever need a generalisation of "take the sum of all these", "take the product of all these", "take the xor of all these", what you are looking for is the reduce function:

from operator import and_
from functools import reduce
print(reduce(and_, [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

print(reduce((lambda x,y: x&y), [{1,2,3},{2,3,4},{3,4,5}])) # = {3}

ANSWER 4

Score 14

If you don't have Python 2.6 or higher, the alternative is to write an explicit for loop:

def set_list_intersection(set_list):
  if not set_list:
    return set()
  result = set_list[0]
  for s in set_list[1:]:
    result &= s
  return result

set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print set_list_intersection(set_list)
# Output: set([1])

You can also use reduce:

set_list = [set([1, 2]), set([1, 3]), set([1, 4])]
print reduce(lambda s1, s2: s1 & s2, set_list)
# Output: set([1])

However, many Python programmers dislike it, including Guido himself:

About 12 years ago, Python aquired lambda, reduce(), filter() and map(), courtesy of (I believe) a Lisp hacker who missed them and submitted working patches. But, despite of the PR value, I think these features should be cut from Python 3000.

So now reduce(). This is actually the one I've always hated most, because, apart from a few examples involving + or *, almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what's actually being fed into that function before I understand what the reduce() is supposed to do. So in my mind, the applicability of reduce() is pretty much limited to associative operators, and in all other cases it's better to write out the accumulation loop explicitly.