The Python Oracle

How to split a list based on a condition?

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 5

--

Chapters
00:00 Question
00:41 Accepted answer (Score 147)
01:39 Answer 2 (Score 305)
01:56 Answer 3 (Score 121)
03:37 Answer 4 (Score 28)
04:05 Thank you

--

Full question
https://stackoverflow.com/questions/9490...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #list

#avk47



ACCEPTED ANSWER

Score 156


good = [x for x in mylist if x in goodvals]
bad  = [x for x in mylist if x not in goodvals]

is there a more elegant way to do this?

That code is perfectly readable, and extremely clear!

# files looks like: [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ... ]
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims  = [f for f in files if f[2].lower() not in IMAGE_TYPES]

Again, this is fine!

There might be slight performance improvements using sets, but it's a trivial difference, and I find the list comprehension far easier to read, and you don't have to worry about the order being messed up, duplicates being removed as so on.

In fact, I may go another step "backward", and just use a simple for loop:

images, anims = [], []

for f in files:
    if f.lower() in IMAGE_TYPES:
        images.append(f)
    else:
        anims.append(f)

The a list-comprehension or using set() is fine until you need to add some other check or another bit of logic - say you want to remove all 0-byte jpeg's, you just add something like..

if f[1] == 0:
    continue



ANSWER 2

Score 125


Here's the lazy iterator approach:

from itertools import tee

def split_on_condition(seq, condition):
    l1, l2 = tee((condition(item), item) for item in seq)
    return (i for p, i in l1 if p), (i for p, i in l2 if not p)

It evaluates the condition once per item and returns two generators, first yielding values from the sequence where the condition is true, the other where it's false.

Because it's lazy you can use it on any iterator, even an infinite one:

from itertools import count, islice

def is_prime(n):
    return n > 1 and all(n % i for i in xrange(2, n))

primes, not_primes = split_on_condition(count(), is_prime)
print("First 10 primes", list(islice(primes, 10)))
print("First 10 non-primes", list(islice(not_primes, 10)))

Usually though the non-lazy list returning approach is better:

def split_on_condition(seq, condition):
    a, b = [], []
    for item in seq:
        (a if condition(item) else b).append(item)
    return a, b

Edit: For your more specific usecase of splitting items into different lists by some key, heres a generic function that does that:

DROP_VALUE = lambda _:_
def split_by_key(seq, resultmapping, keyfunc, default=DROP_VALUE):
    """Split a sequence into lists based on a key function.

        seq - input sequence
        resultmapping - a dictionary that maps from target lists to keys that go to that list
        keyfunc - function to calculate the key of an input value
        default - the target where items that don't have a corresponding key go, by default they are dropped
    """
    result_lists = dict((key, []) for key in resultmapping)
    appenders = dict((key, result_lists[target].append) for target, keys in resultmapping.items() for key in keys)

    if default is not DROP_VALUE:
        result_lists.setdefault(default, [])
        default_action = result_lists[default].append
    else:
        default_action = DROP_VALUE

    for item in seq:
        appenders.get(keyfunc(item), default_action)(item)

    return result_lists

Usage:

def file_extension(f):
    return f[2].lower()

split_files = split_by_key(files, {'images': IMAGE_TYPES}, keyfunc=file_extension, default='anims')
print split_files['images']
print split_files['anims']



ANSWER 3

Score 29


Problem with all proposed solutions is that it will scan and apply the filtering function twice. I'd make a simple small function like this:

def split_into_two_lists(lst, f):
    a = []
    b = []
    for elem in lst:
        if f(elem):
            a.append(elem)
        else:
            b.append(elem)
    return a, b

That way you are not processing anything twice and also are not repeating code.




ANSWER 4

Score 18


I basically like Anders' approach as it is very general. Here's a version that puts the categorizer first (to match filter syntax) and uses a defaultdict (assumed imported).

def categorize(func, seq):
    """Return mapping from categories to lists
    of categorized items.
    """
    d = defaultdict(list)
    for item in seq:
        d[func(item)].append(item)
    return d