The Python Oracle

Assigning True/False if a token is present in a data-frame

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------


Take control of your privacy with Proton's trusted, Swiss-based, secure services.
Choose what you need and safeguard your digital life:
Mail: https://go.getproton.me/SH1CU
VPN: https://go.getproton.me/SH1DI
Password Manager: https://go.getproton.me/SH1DJ
Drive: https://go.getproton.me/SH1CT


Music by Eric Matyas
https://www.soundimage.org
Track title: Switch On Looping

--

Chapters
00:00 Assigning True/False If A Token Is Present In A Data-Frame
01:08 Accepted Answer Score 4
01:17 Answer 2 Score 2
02:09 Answer 3 Score 0
02:29 Answer 4 Score 2
02:39 Thank you

--

Full question
https://stackoverflow.com/questions/7060...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #dataframe #text #nlp

#avk47



ACCEPTED ANSWER

Score 4


try

df["trumpMention"] = df["keywords"].apply(lambda x: "Trump, Donald J" in x)



ANSWER 2

Score 2


How about applying a function that checks set membership?

df['trumpMention'] = df['keywords'].apply(lambda x: 'Trump, Donald J' in set(x))

Output:

  articleID                                           keywords  trumpMention
0  58b61d1d                    [Second Avenue (Manhattan, NY)]         False
1  58b6393b                                [Crossword Puzzles]         False
2  58b6556e  [Workplace Hazards and Violations, Trump, Dona...          True
3  58b657fa         [Trump, Donald J, Speeches and Statements]          True

As to your attempts:

np.where(any(df['keywords']) == 'Trump, Donald J', True, False) 

wouldn't work because any(df['keywords']) would always evaluate True which isn't equal to 'Trump, Donald J', so the above will always return array(False).

df['keywords'].apply(lambda x: any(token == 'Trump, Donald J') for token in x) 

doesn't work because it raises TypeError since there is no comprehension here.

df['keywords'].apply(lambda x: ([ True for token in x if any(token in lst)]))  

doesn't work because token in lst is a boolean value, so

any(token in lst)

is nonsensical.




ANSWER 3

Score 2


Use a vectorized approach, it will be faster than using apply.

df.keywords.astype(str).str.contains("Trump, Donald J")



ANSWER 4

Score 0


Try my way. I create a list before adding it to dataframe.

def mentioned_Trump(s, lst):
    if s in lst:
        return True
    else:
        return False
s = [[1,['Second Avenue (Manhattan, NY)']],[2,['Crossword Puzzles']],
    [3, ['Workplace Hazards and Violations', 'Trump, Donald J']],
    [4, ['Trump, Donald J', 'Speeches and Statements']]]

import pandas as pd
df = pd.DataFrame(s)
df.columns =['ID','keywords']

s = list( df['keywords'])
s1 = [mentioned_Trump('Trump, Donald J',x) for x in s]

df['trumpMention']= s1 
print(df)