The Python Oracle

Add ID found in list to new column in pandas dataframe

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Isolated

--

Chapters
00:00 Question
02:58 Accepted answer (Score 9)
03:26 Answer 2 (Score 3)
04:43 Answer 3 (Score 1)
05:20 Answer 4 (Score 1)
05:52 Thank you

--

Full question
https://stackoverflow.com/questions/6098...

Accepted answer links:
[np.intersect1d]: https://docs.scipy.org/doc/numpy/referen...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #python3x #pandas #dataframe

#avk47



ACCEPTED ANSWER

Score 9


Using np.intersect1d to get the intersect of the two lists:

df['bad_id'] = df['Found_IDs'].apply(lambda x: np.intersect1d(x, bad_ids))

      ID                   Found_IDs    bad_id
0  12345        [15443, 15533, 3433]   [15533]
1  15533  [2234, 16608, 12002, 7654]        []
2   6789      [43322, 876544, 36789]  [876544]

Or with just vanilla python using intersect of sets:

bad_ids_set = set(bad_ids)
df['Found_IDs'].apply(lambda x: list(set(x) & bad_ids_set))



ANSWER 2

Score 3


If want test all values of lists in Found_IDs column by all values of bad_ids use:

bad_ids = [15533, 876544]

df['bad_id'] = [any(c in l for c in bad_ids) for l  in df['Found_IDs']]
print (df)
      ID                   Found_IDs  bad_id
0  12345        [15443, 15533, 3433]    True
1  15533  [2234, 16608, 12002, 7654]   False
2   6789      [43322, 876544, 36789]    True

If want all match:

df['bad_id'] = [[c for c in bad_ids if c in l] for l  in df['Found_IDs']]
print (df)
      ID                   Found_IDs    bad_id
0  12345        [15443, 15533, 3433]   [15533]
1  15533  [2234, 16608, 12002, 7654]        []
2   6789      [43322, 876544, 36789]  [876544]

And for first match, if empty list is set False, possible solution, but not recommended mixing boolean and numbers:

df['bad_id'] = [next(iter([c for c in bad_ids if c in l]), False) for l  in df['Found_IDs']]
print (df)
      ID                   Found_IDs  bad_id
0  12345        [15443, 15533, 3433]   15533
1  15533  [2234, 16608, 12002, 7654]   False
2   6789      [43322, 876544, 36789]  876544

Solution with sets:

df['bad_id'] = df['Found_IDs'].map(set(bad_ids).intersection)
print (df)

      ID                   Found_IDs    bad_id
0  12345        [15443, 15533, 3433]   {15533}
1  15533  [2234, 16608, 12002, 7654]        {}
2   6789      [43322, 876544, 36789]  {876544}

And also similar with list comprehension:

df['bad_id'] = [list(set(bad_ids).intersection(l)) for l  in df['Found_IDs']]
print (df)
      ID                   Found_IDs    bad_id
0  12345        [15443, 15533, 3433]   [15533]
1  15533  [2234, 16608, 12002, 7654]        []
2   6789      [43322, 876544, 36789]  [876544]



ANSWER 3

Score 1


You can apply and use np.any:

df['bad_id'] = df['Found_IDs'].apply(lambda x: np.any([c in x for c in bad_ids]))

This return the bool if exist a bad_id in Found_IDs, if you want to retrieve this bad_ids:

df['bad_id'] = df['Found_IDs'].apply(lambda x: [*filter(lambda x: c in x, bad_ids)])

This will return a list of the bad_ids at found_ids, if there is 0 it returns []




ANSWER 4

Score 0


Use explode and groupby aggregate

s = df['Found_IDs'].explode()
df['bad_ids'] = s.isin(bad_ids).groupby(s.index).any()

For bad_ids = [15533, 876544]

>>> df
      ID                   Found_IDs  bad_ids
0  12345        [15443, 15533, 3433]     True
1  15533  [2234, 16608, 12002, 7654]    False
2   6789      [43322, 876544, 36789]     True

OR

For getting values matching

s = df['Found_IDs'].explode()
s.where(s.isin(bad_ids)).groupby(s.index).agg(lambda x: list(x.dropna()))

For bad_ids = [15533, 876544]

      ID                   Found_IDs   bad_ids
0  12345        [15443, 15533, 3433]   [15533]
1  15533  [2234, 16608, 12002, 7654]        []
2   6789      [43322, 876544, 36789]  [876544]