Assigning True/False if a token is present in a data-frame
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Meadow
--
Chapters
00:00 Question
01:31 Accepted answer (Score 4)
01:44 Answer 2 (Score 2)
01:59 Answer 3 (Score 2)
03:02 Answer 4 (Score 0)
03:31 Thank you
--
Full question
https://stackoverflow.com/questions/7060...
Answer 1 links:
[vectorized]: https://pandas.pydata.org/docs/reference...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe #text #nlp
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Meadow
--
Chapters
00:00 Question
01:31 Accepted answer (Score 4)
01:44 Answer 2 (Score 2)
01:59 Answer 3 (Score 2)
03:02 Answer 4 (Score 0)
03:31 Thank you
--
Full question
https://stackoverflow.com/questions/7060...
Answer 1 links:
[vectorized]: https://pandas.pydata.org/docs/reference...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe #text #nlp
#avk47
ACCEPTED ANSWER
Score 4
try
df["trumpMention"] = df["keywords"].apply(lambda x: "Trump, Donald J" in x)
ANSWER 2
Score 2
How about applying a function that checks set membership?
df['trumpMention'] = df['keywords'].apply(lambda x: 'Trump, Donald J' in set(x))
Output:
articleID keywords trumpMention
0 58b61d1d [Second Avenue (Manhattan, NY)] False
1 58b6393b [Crossword Puzzles] False
2 58b6556e [Workplace Hazards and Violations, Trump, Dona... True
3 58b657fa [Trump, Donald J, Speeches and Statements] True
As to your attempts:
np.where(any(df['keywords']) == 'Trump, Donald J', True, False)
wouldn't work because any(df['keywords']) would always evaluate True which isn't equal to 'Trump, Donald J', so the above will always return array(False).
df['keywords'].apply(lambda x: any(token == 'Trump, Donald J') for token in x)
doesn't work because it raises TypeError since there is no comprehension here.
df['keywords'].apply(lambda x: ([ True for token in x if any(token in lst)]))
doesn't work because token in lst is a boolean value, so
any(token in lst)
is nonsensical.
ANSWER 3
Score 2
Use a vectorized approach, it will be faster than using apply.
df.keywords.astype(str).str.contains("Trump, Donald J")
ANSWER 4
Score 0
Try my way. I create a list before adding it to dataframe.
def mentioned_Trump(s, lst):
if s in lst:
return True
else:
return False
s = [[1,['Second Avenue (Manhattan, NY)']],[2,['Crossword Puzzles']],
[3, ['Workplace Hazards and Violations', 'Trump, Donald J']],
[4, ['Trump, Donald J', 'Speeches and Statements']]]
import pandas as pd
df = pd.DataFrame(s)
df.columns =['ID','keywords']
s = list( df['keywords'])
s1 = [mentioned_Trump('Trump, Donald J',x) for x in s]
df['trumpMention']= s1
print(df)