The Python Oracle

How do I check if there are duplicates in a flat list?

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Take control of your privacy with Proton's trusted, Swiss-based, secure services.
Choose what you need and safeguard your digital life:
Mail: https://go.getproton.me/SH1CU
VPN: https://go.getproton.me/SH1DI
Password Manager: https://go.getproton.me/SH1DJ
Drive: https://go.getproton.me/SH1CT


Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Puzzle2

--

Chapters
00:00 How Do I Check If There Are Duplicates In A Flat List?
00:18 Answer 1 Score 69
01:09 Accepted Answer Score 521
01:21 Answer 3 Score 9
02:14 Answer 4 Score 14
02:30 Thank you

--

Full question
https://stackoverflow.com/questions/1541...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #string #list #duplicates

#avk47



ACCEPTED ANSWER

Score 521


Use set() to remove duplicates if all values are hashable:

>>> your_list = ['one', 'two', 'one']
>>> len(your_list) != len(set(your_list))
True



ANSWER 2

Score 69


Recommended for short lists only:

any(thelist.count(x) > 1 for x in thelist)

Do not use on a long list -- it can take time proportional to the square of the number of items in the list!

For longer lists with hashable items (strings, numbers, &c):

def anydup(thelist):
  seen = set()
  for x in thelist:
    if x in seen: return True
    seen.add(x)
  return False

If your items are not hashable (sublists, dicts, etc) it gets hairier, though it may still be possible to get O(N logN) if they're at least comparable. But you need to know or test the characteristics of the items (hashable or not, comparable or not) to get the best performance you can -- O(N) for hashables, O(N log N) for non-hashable comparables, otherwise it's down to O(N squared) and there's nothing one can do about it:-(.




ANSWER 3

Score 14


This is old, but the answers here led me to a slightly different solution. If you are up for abusing comprehensions, you can get short-circuiting this way.

xs = [1, 2, 1]
s = set()
any(x in s or s.add(x) for x in xs)
# You can use a similar approach to actually retrieve the duplicates.
s = set()
duplicates = set(x for x in xs if x in s or s.add(x))



ANSWER 4

Score 9


If you are fond of functional programming style, here is a useful function, self-documented and tested code using doctest.

def decompose(a_list):
    """Turns a list into a set of all elements and a set of duplicated elements.

    Returns a pair of sets. The first one contains elements
    that are found at least once in the list. The second one
    contains elements that appear more than once.

    >>> decompose([1,2,3,5,3,2,6])
    (set([1, 2, 3, 5, 6]), set([2, 3]))
    """
    return reduce(
        lambda (u, d), o : (u.union([o]), d.union(u.intersection([o]))),
        a_list,
        (set(), set()))

if __name__ == "__main__":
    import doctest
    doctest.testmod()

From there you can test unicity by checking whether the second element of the returned pair is empty:

def is_set(l):
    """Test if there is no duplicate element in l.

    >>> is_set([1,2,3])
    True
    >>> is_set([1,2,1])
    False
    >>> is_set([])
    True
    """
    return not decompose(l)[1]

Note that this is not efficient since you are explicitly constructing the decomposition. But along the line of using reduce, you can come up to something equivalent (but slightly less efficient) to answer 5:

def is_set(l):
    try:
        def func(s, o):
            if o in s:
                raise Exception
            return s.union([o])
        reduce(func, l, set())
        return True
    except:
        return False