Add ID found in list to new column in pandas dataframe
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Isolated
--
Chapters
00:00 Question
02:58 Accepted answer (Score 9)
03:26 Answer 2 (Score 3)
04:43 Answer 3 (Score 1)
05:20 Answer 4 (Score 1)
05:52 Thank you
--
Full question
https://stackoverflow.com/questions/6098...
Accepted answer links:
[np.intersect1d]: https://docs.scipy.org/doc/numpy/referen...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #python3x #pandas #dataframe
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Isolated
--
Chapters
00:00 Question
02:58 Accepted answer (Score 9)
03:26 Answer 2 (Score 3)
04:43 Answer 3 (Score 1)
05:20 Answer 4 (Score 1)
05:52 Thank you
--
Full question
https://stackoverflow.com/questions/6098...
Accepted answer links:
[np.intersect1d]: https://docs.scipy.org/doc/numpy/referen...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #python3x #pandas #dataframe
#avk47
ACCEPTED ANSWER
Score 9
Using np.intersect1d to get the intersect of the two lists:
df['bad_id'] = df['Found_IDs'].apply(lambda x: np.intersect1d(x, bad_ids))
ID Found_IDs bad_id
0 12345 [15443, 15533, 3433] [15533]
1 15533 [2234, 16608, 12002, 7654] []
2 6789 [43322, 876544, 36789] [876544]
Or with just vanilla python using intersect of sets:
bad_ids_set = set(bad_ids)
df['Found_IDs'].apply(lambda x: list(set(x) & bad_ids_set))
ANSWER 2
Score 3
If want test all values of lists in Found_IDs column by all values of bad_ids use:
bad_ids = [15533, 876544]
df['bad_id'] = [any(c in l for c in bad_ids) for l in df['Found_IDs']]
print (df)
ID Found_IDs bad_id
0 12345 [15443, 15533, 3433] True
1 15533 [2234, 16608, 12002, 7654] False
2 6789 [43322, 876544, 36789] True
If want all match:
df['bad_id'] = [[c for c in bad_ids if c in l] for l in df['Found_IDs']]
print (df)
ID Found_IDs bad_id
0 12345 [15443, 15533, 3433] [15533]
1 15533 [2234, 16608, 12002, 7654] []
2 6789 [43322, 876544, 36789] [876544]
And for first match, if empty list is set False, possible solution, but not recommended mixing boolean and numbers:
df['bad_id'] = [next(iter([c for c in bad_ids if c in l]), False) for l in df['Found_IDs']]
print (df)
ID Found_IDs bad_id
0 12345 [15443, 15533, 3433] 15533
1 15533 [2234, 16608, 12002, 7654] False
2 6789 [43322, 876544, 36789] 876544
Solution with sets:
df['bad_id'] = df['Found_IDs'].map(set(bad_ids).intersection)
print (df)
ID Found_IDs bad_id
0 12345 [15443, 15533, 3433] {15533}
1 15533 [2234, 16608, 12002, 7654] {}
2 6789 [43322, 876544, 36789] {876544}
And also similar with list comprehension:
df['bad_id'] = [list(set(bad_ids).intersection(l)) for l in df['Found_IDs']]
print (df)
ID Found_IDs bad_id
0 12345 [15443, 15533, 3433] [15533]
1 15533 [2234, 16608, 12002, 7654] []
2 6789 [43322, 876544, 36789] [876544]
ANSWER 3
Score 1
You can apply and use np.any:
df['bad_id'] = df['Found_IDs'].apply(lambda x: np.any([c in x for c in bad_ids]))
This return the bool if exist a bad_id in Found_IDs, if you want to retrieve this bad_ids:
df['bad_id'] = df['Found_IDs'].apply(lambda x: [*filter(lambda x: c in x, bad_ids)])
This will return a list of the bad_ids at found_ids, if there is 0 it returns []
ANSWER 4
Score 0
Use explode and groupby aggregate
s = df['Found_IDs'].explode()
df['bad_ids'] = s.isin(bad_ids).groupby(s.index).any()
For bad_ids = [15533, 876544]
>>> df
ID Found_IDs bad_ids
0 12345 [15443, 15533, 3433] True
1 15533 [2234, 16608, 12002, 7654] False
2 6789 [43322, 876544, 36789] True
OR
For getting values matching
s = df['Found_IDs'].explode()
s.where(s.isin(bad_ids)).groupby(s.index).agg(lambda x: list(x.dropna()))
For bad_ids = [15533, 876544]
ID Found_IDs bad_ids
0 12345 [15443, 15533, 3433] [15533]
1 15533 [2234, 16608, 12002, 7654] []
2 6789 [43322, 876544, 36789] [876544]