Trying to split a dataframe column based on the values of another column line per line
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Puddle Jumping Looping
--
Chapters
00:00 Question
02:42 Accepted answer (Score 1)
03:59 Thank you
--
Full question
https://stackoverflow.com/questions/5613...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #string #pandas #split
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Puddle Jumping Looping
--
Chapters
00:00 Question
02:42 Accepted answer (Score 1)
03:59 Thank you
--
Full question
https://stackoverflow.com/questions/5613...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #string #pandas #split
#avk47
ACCEPTED ANSWER
Score 1
For highly specific manipulations, I recommend for loops given their flexibility and readability (though I will stress that this isn't automatically the most optimized way to do this sort of thing).
First, initialize your dataframe:
import pandas as pd
s = {'Test Type':'GRE',
'Test Score':'GRE Verbal 156.0/170.0 GRE Analytical Writing 4.5/6.0 GRE Quantitative 157.0/170.0',
}
df = pd.DataFrame([s])
print(df.head())
#
# Test Score Test Type
# 0 GRE Verbal 156.0/170.0 GRE Analytical Writing ... GRE
Next, iterate over your df and perform necessary string manipulations:
new_values = []
for idx, row in df.iterrows():
scores = row['Test Score'].split(row['Test Type'])
for s in scores:
# You don't want the blank items
if s!='':
s = s.strip().split()
# get the section and the score for each
section, score_actual = ' '.join(s[:-1]),s[-1]
new_values.append({
'Test': row['Test Type'],
'Section':section,
'Score': score_actual})
df_new = pd.DataFrame(new_values)
print(df_new.head())
#
# Score Section Test
# 0 156.0/170.0 Verbal GRE
# 1 4.5/6.0 Analytical Writing GRE
# 2 157.0/170.0 Quantitative GRE
You could go a step further and begin manipulating each row down to its percent score, or create a new table with maximum score for each section per exam but I'll leave that to you.