The Python Oracle

Trying to split a dataframe column based on the values of another column line per line

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Techno Intrigue Looping

--

Chapters
00:00 Trying To Split A Dataframe Column Based On The Values Of Another Column Line Per Line
01:59 Accepted Answer Score 1
02:52 Thank you

--

Full question
https://stackoverflow.com/questions/5613...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #string #pandas #split

#avk47



ACCEPTED ANSWER

Score 1


For highly specific manipulations, I recommend for loops given their flexibility and readability (though I will stress that this isn't automatically the most optimized way to do this sort of thing).

First, initialize your dataframe:

import pandas as pd

s = {'Test Type':'GRE',
     'Test Score':'GRE Verbal 156.0/170.0 GRE Analytical Writing 4.5/6.0 GRE Quantitative 157.0/170.0',
    }

df = pd.DataFrame([s])

print(df.head())
#
#                                          Test Score Test Type
# 0  GRE Verbal 156.0/170.0 GRE Analytical Writing ...       GRE

Next, iterate over your df and perform necessary string manipulations:

new_values = []

for idx, row in df.iterrows():
  scores = row['Test Score'].split(row['Test Type'])
  for s in scores:
    # You don't want the blank items
    if s!='':
      s = s.strip().split()
      # get the section and the score for each
      section, score_actual = ' '.join(s[:-1]),s[-1]
      new_values.append({
          'Test': row['Test Type'],
          'Section':section,
          'Score': score_actual})

df_new = pd.DataFrame(new_values)
print(df_new.head())
#
#          Score             Section Test
# 0  156.0/170.0              Verbal  GRE
# 1      4.5/6.0  Analytical Writing  GRE
# 2  157.0/170.0        Quantitative  GRE

You could go a step further and begin manipulating each row down to its percent score, or create a new table with maximum score for each section per exam but I'll leave that to you.