pandas matching database with string keeping index of database
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Breezy Bay
--
Chapters
00:00 Question
01:16 Accepted answer (Score 3)
02:29 Answer 2 (Score 0)
03:05 Thank you
--
Full question
https://stackoverflow.com/questions/6283...
Accepted answer links:
[Series.isin]: http://pandas.pydata.org/pandas-docs/sta...
[DataFrame.merge]: http://pandas.pydata.org/pandas-docs/sta...
[DataFrame.reset_index]: http://pandas.pydata.org/pandas-docs/sta...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Breezy Bay
--
Chapters
00:00 Question
01:16 Accepted answer (Score 3)
02:29 Answer 2 (Score 0)
03:05 Thank you
--
Full question
https://stackoverflow.com/questions/6283...
Accepted answer links:
[Series.isin]: http://pandas.pydata.org/pandas-docs/sta...
[DataFrame.merge]: http://pandas.pydata.org/pandas-docs/sta...
[DataFrame.reset_index]: http://pandas.pydata.org/pandas-docs/sta...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
ACCEPTED ANSWER
Score 3
Because is filtered df0 DataFrame then is index values not changed if use Series.isin by df1['string_line_1', only order of columns is like in original df0:
out = df0[df0['string_line_0'].isin(df1['string_line_1'])]
print (out)
name_id_code string_line_0
idx
0 0.010000 A
3 29.800000 D
5 88.100001 F
6 66.400001 G
9 551.000000 J
Or if use DataFrame.merge then for avoid lost df0.index is necessary add DataFrame.reset_index:
out = (df1.rename(columns={'string_line_1':'string_line_0'})
.merge(df0.reset_index(), on='string_line_0'))
print (out)
string_line_0 idx name_id_code
0 A 0 0.010000
1 F 5 88.100001
2 J 9 551.000000
3 G 6 66.400001
4 D 3 29.800000
Similar solution, only same values in string_line_0 and string_line_1 columns:
out = (df1.merge(df0.reset_index(), left_on='string_line_1', right_on='string_line_0'))
print (out)
string_line_1 idx name_id_code string_line_0
0 A 0 0.010000 A
1 F 5 88.100001 F
2 J 9 551.000000 J
3 G 6 66.400001 G
4 D 3 29.800000 D
ANSWER 2
Score 0
You can do:
out = df0.loc[(df0["string_line_0"].isin(df1["string_line_1"]))].copy()
out["string_line_0"] = pd.Categorical(out["string_line_0"], categories=df1["string_line_1"].unique())
out.sort_values(by=["string_line_0"], inplace=True)
The first line filters df0 to just the rows where string_line_0 is in the string_line_1 column of df1.
The second line converts string_line_0 in the output df to a Categorical feature, which is then custom sorted by the order of the values in df1