Python Pandas replace NaN in one column with value from another column of the same row it has be as list column
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Droplet of life
--
Chapters
00:00 Question
01:27 Accepted answer (Score 2)
02:08 Answer 2 (Score 2)
02:36 Answer 3 (Score 1)
03:07 Answer 4 (Score 1)
03:47 Thank you
--
Full question
https://stackoverflow.com/questions/5939...
Question links:
[python-pandas-replace-nan-in-one-column]: https://stackoverflow.com/questions/2917...
Answer 1 links:
[explode()]: https://pandas.pydata.org/pandas-docs/st...
[groupby()]: https://pandas.pydata.org/pandas-docs/st...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Droplet of life
--
Chapters
00:00 Question
01:27 Accepted answer (Score 2)
02:08 Answer 2 (Score 2)
02:36 Answer 3 (Score 1)
03:07 Answer 4 (Score 1)
03:47 Thank you
--
Full question
https://stackoverflow.com/questions/5939...
Question links:
[python-pandas-replace-nan-in-one-column]: https://stackoverflow.com/questions/2917...
Answer 1 links:
[explode()]: https://pandas.pydata.org/pandas-docs/st...
[groupby()]: https://pandas.pydata.org/pandas-docs/st...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe
#avk47
ANSWER 1
Score 2
You can use explode() and groupby():
(df.explode('r_id').ffill(axis=1).reset_index().groupby(['index','id'],sort=False).agg(list)
.reset_index(1))
id r_id
index
0 70 [70, 34, 44, 23, 11, 71]
1 70 [70, 53, 33, 73, 41]
2 1148 [1148]
3 557 [557]
4 557 [557]
5 104 [104]
6 581 [581]
7 69 [69, 68, 7]
ACCEPTED ANSWER
Score 2
We can use list_comprehension + Series.fillna.
First we create a list with all the id values converted to list type.
Then we replace NaN here by our list values:
df['temp'] = [[x] for x in df['id']]
df['r_id'] = df['r_id'].fillna(df['temp'])
df = df.drop(columns='temp')
Or in one line using apply (thanks r.ook)
df['r_id'] = df['r_id'].fillna(df['id'].apply(lambda x: [x]))
id r_id
0 70 [70, 34, 44, 23, 11, 71]
1 70 [70, 53, 33, 73, 41]
2 1148 [1148]
3 557 [557]
4 557 [557]
5 104 [104]
6 581 [581]
7 69 [69, 68, 7]
ANSWER 3
Score 1
You can transform the column id to an array, add a dimension, then make a list of it and fillna with a Series like:
df['r_id'] = df['r_id'].fillna(pd.Series(df.id.to_numpy()[:,None].tolist(), index=df.index))
print (df)
id r_id
0 70 [70, 34, 44, 23, 11, 71]
1 70 [70, 53, 33, 73, 41]
2 1148 [1148]
3 557 [557]
4 557 [557]
5 104 [104]
6 581 [581]
7 69 [69, 68, 7]
or if you don't have a lot of nan, it may worth to select only these rows prior to do anything:
mask_na = df.r_id.isna()
df.loc[mask_na, 'r_id'] = pd.Series(df.loc[mask_na,'id'].to_numpy()[:,None].tolist(),
index=df[mask_na].index)
ANSWER 4
Score 1
I think anky_91's answer will be faster, but you could also try this:
df['r_id'] = np.where(df['r_id'].isnull(),
df['id'].apply(lambda x: [x]),
df['r_id'])
Output:
id r_id
0 70 [70, 34, 44, 23, 11, 71]
1 70 [70, 53, 33, 73, 41]
2 1148 [1148]
3 557 [557]
4 557 [557]
5 104 [104]
6 581 [581]
7 69 [69, 68, 7]