The Python Oracle

Rebuild pandas Dataframe

This video explains
Rebuild pandas Dataframe

--

Become part of the top 3% of the developers by applying to Toptal
https://topt.al/25cXVn

Music by Eric Matyas
https://www.soundimage.org

Track title: Book End

Full question
https://stackoverflow.com/questions/6469...

Question links:
[image]: https://i.stack.imgur.com/QXhAm.png
[image]: https://i.stack.imgur.com/RcOH5.png

Answer 1 links:
[Series.str.extract]: http://pandas.pydata.org/pandas-docs/sta...

Answer 2 links:
[melt]: https://pandas.pydata.org/pandas-docs/st...
[numpy select]: https://numpy.org/doc/stable/reference/g...
[pivot]: https://pandas.pydata.org/pandas-docs/st...

Answer 2 links:
[melt]: https://pandas.pydata.org/pandas-docs/st...
[numpy select]: https://numpy.org/doc/stable/reference/g...
[pivot]: https://pandas.pydata.org/pandas-docs/st...

Answer 2 links:
[melt]: https://pandas.pydata.org/pandas-docs/st...
[numpy select]: https://numpy.org/doc/stable/reference/g...
[pivot]: https://pandas.pydata.org/pandas-docs/st...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Chapters
00:00 Question
01:15 Accepted answer
01:55 Answer 2
02:35 Answer 3
03:01 Answer 4
03:19 Thank you

--

Tags
#python #pandas #dataframe



ACCEPTED ANSWER

Score 3


You can use sorted() with custom key function:

def key_fn(x):
    if 'id' in x:
        return 0
    if 'test' in x:
        return 1
    if 'Number' in x:
        return 2
    return 3 

df = df.apply(lambda x: pd.Series(sorted(x, key=key_fn)), axis=1)
df = df.rename(columns=lambda x: 'col{}'.format(x+1))
print(df)

Prints:

   col1    col2          col3
0  id 1  test 1  Number 12344
1  id 2  test 2  Number 21612
2  id 3  test 3   Number 6135
3  id 4  test 4   Number 1131

Another version, from the comments:

df = pd.DataFrame([sorted(l, key=key_fn) for l in df.values], columns=df.columns)
print(df)



ANSWER 2

Score 3


If possible simplify solution by split values by first blank:

df = (df.reset_index()
        .melt('index')
        .assign(new = lambda x: x['value'].str.split().str[0])
        .pivot('index','new','value'))
print (df)
new          Number    id    test
index                            
0      Number 12344  id 1  test 1
1      Number 21612  id 2  test 2
2       Number 6135  id 3  test 3
3       Number 1131  id 4  test 4

Else you can use Series.str.extract:

L = ['id','test','Number']

df = (df.reset_index()
        .melt('index')
        .assign(new = lambda x: x['value'].str.extract(f'({"|".join(L)})', expand=False))
        .pivot('index','new','value'))
print (df)
new          Number    id    test
index                            
0      Number 12344  id 1  test 1
1      Number 21612  id 2  test 2
2       Number 6135  id 3  test 3
3       Number 1131  id 4  test 4



ANSWER 3

Score 0


Try this:

s = df.melt()['value']
df_final = pd.DataFrame({x: s[s.str.startswith(x)].values 
                                        for x in s.str.split().str[0].unique()})

Out[27]:
     id    test        Number
0  id 1  test 3   Number 6135
1  id 2  test 4  Number 12344
2  id 4  test 1  Number 21612
3  id 3  test 2   Number 1131



ANSWER 4

Score 0


You could first melt the dataframe, then use numpy select to reorder the names, and finally pivot :

(df.melt(ignore_index=False)
   .assign(variable=lambda x: np.select([x.value.str.startswith("id"),
                                          x.value.str.startswith("test"),
                                          x.value.str.startswith("Number")],
                                          ["col1", "col2", "col3"]))
    .reset_index()
    .pivot("index", "variable", "value")
    .rename_axis(columns=None, index=None))


    col1    col2    col3
0   id 1    test 1  Number 12344
1   id 2    test 2  Number 21612
2   id 3    test 3  Number 6135
3   id 4    test 4  Number 1131