pandas matching database with string keeping index of database
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Over Ancient Waters Looping
--
Chapters
00:00 Pandas Matching Database With String Keeping Index Of Database
00:51 Answer 1 Score 0
01:20 Accepted Answer Score 3
02:15 Thank you
--
Full question
https://stackoverflow.com/questions/6283...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Over Ancient Waters Looping
--
Chapters
00:00 Pandas Matching Database With String Keeping Index Of Database
00:51 Answer 1 Score 0
01:20 Accepted Answer Score 3
02:15 Thank you
--
Full question
https://stackoverflow.com/questions/6283...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
ACCEPTED ANSWER
Score 3
Because is filtered df0 DataFrame then is index values not changed if use Series.isin by df1['string_line_1', only order of columns is like in original df0:
out = df0[df0['string_line_0'].isin(df1['string_line_1'])]
print (out)
name_id_code string_line_0
idx
0 0.010000 A
3 29.800000 D
5 88.100001 F
6 66.400001 G
9 551.000000 J
Or if use DataFrame.merge then for avoid lost df0.index is necessary add DataFrame.reset_index:
out = (df1.rename(columns={'string_line_1':'string_line_0'})
.merge(df0.reset_index(), on='string_line_0'))
print (out)
string_line_0 idx name_id_code
0 A 0 0.010000
1 F 5 88.100001
2 J 9 551.000000
3 G 6 66.400001
4 D 3 29.800000
Similar solution, only same values in string_line_0 and string_line_1 columns:
out = (df1.merge(df0.reset_index(), left_on='string_line_1', right_on='string_line_0'))
print (out)
string_line_1 idx name_id_code string_line_0
0 A 0 0.010000 A
1 F 5 88.100001 F
2 J 9 551.000000 J
3 G 6 66.400001 G
4 D 3 29.800000 D
ANSWER 2
Score 0
You can do:
out = df0.loc[(df0["string_line_0"].isin(df1["string_line_1"]))].copy()
out["string_line_0"] = pd.Categorical(out["string_line_0"], categories=df1["string_line_1"].unique())
out.sort_values(by=["string_line_0"], inplace=True)
The first line filters df0 to just the rows where string_line_0 is in the string_line_1 column of df1.
The second line converts string_line_0 in the output df to a Categorical feature, which is then custom sorted by the order of the values in df1