The Python Oracle

Python Pandas replace NaN in one column with value from another column of the same row it has be as list column

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Sunrise at the Stream

--

Chapters
00:00 Python Pandas Replace Nan In One Column With Value From Another Column Of The Same Row It Has Be As
01:02 Answer 1 Score 2
01:23 Answer 2 Score 1
01:55 Answer 3 Score 1
02:14 Accepted Answer Score 2
02:44 Thank you

--

Full question
https://stackoverflow.com/questions/5939...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #dataframe

#avk47



ANSWER 1

Score 2


You can use explode() and groupby():

(df.explode('r_id').ffill(axis=1).reset_index().groupby(['index','id'],sort=False).agg(list)
                                                               .reset_index(1))

         id                      r_id
index                                
0        70  [70, 34, 44, 23, 11, 71]
1        70      [70, 53, 33, 73, 41]
2      1148                    [1148]
3       557                     [557]
4       557                     [557]
5       104                     [104]
6       581                     [581]
7        69               [69, 68, 7]



ACCEPTED ANSWER

Score 2


We can use list_comprehension + Series.fillna.

First we create a list with all the id values converted to list type. Then we replace NaN here by our list values:

df['temp'] = [[x] for x in df['id']]
df['r_id'] = df['r_id'].fillna(df['temp'])
df = df.drop(columns='temp')

Or in one line using apply (thanks r.ook)

df['r_id'] = df['r_id'].fillna(df['id'].apply(lambda x: [x]))
     id                      r_id
0    70  [70, 34, 44, 23, 11, 71]
1    70      [70, 53, 33, 73, 41]
2  1148                    [1148]
3   557                     [557]
4   557                     [557]
5   104                     [104]
6   581                     [581]
7    69               [69, 68, 7]



ANSWER 3

Score 1


You can transform the column id to an array, add a dimension, then make a list of it and fillna with a Series like:

df['r_id'] = df['r_id'].fillna(pd.Series(df.id.to_numpy()[:,None].tolist(), index=df.index))
print (df)
     id                      r_id
0    70  [70, 34, 44, 23, 11, 71]
1    70      [70, 53, 33, 73, 41]
2  1148                    [1148]
3   557                     [557]
4   557                     [557]
5   104                     [104]
6   581                     [581]
7    69               [69, 68, 7]

or if you don't have a lot of nan, it may worth to select only these rows prior to do anything:

mask_na = df.r_id.isna()
df.loc[mask_na, 'r_id'] = pd.Series(df.loc[mask_na,'id'].to_numpy()[:,None].tolist(), 
                                    index=df[mask_na].index)



ANSWER 4

Score 1


I think anky_91's answer will be faster, but you could also try this:

df['r_id'] = np.where(df['r_id'].isnull(),
                      df['id'].apply(lambda x: [x]),
                      df['r_id'])

Output:

     id                      r_id
0    70  [70, 34, 44, 23, 11, 71]
1    70      [70, 53, 33, 73, 41]
2  1148                    [1148]
3   557                     [557]
4   557                     [557]
5   104                     [104]
6   581                     [581]
7    69               [69, 68, 7]