Trying to split a dataframe column based on the values of another column line per line
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Techno Intrigue Looping
--
Chapters
00:00 Trying To Split A Dataframe Column Based On The Values Of Another Column Line Per Line
01:59 Accepted Answer Score 1
02:52 Thank you
--
Full question
https://stackoverflow.com/questions/5613...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #string #pandas #split
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Techno Intrigue Looping
--
Chapters
00:00 Trying To Split A Dataframe Column Based On The Values Of Another Column Line Per Line
01:59 Accepted Answer Score 1
02:52 Thank you
--
Full question
https://stackoverflow.com/questions/5613...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #string #pandas #split
#avk47
ACCEPTED ANSWER
Score 1
For highly specific manipulations, I recommend for loops given their flexibility and readability (though I will stress that this isn't automatically the most optimized way to do this sort of thing).
First, initialize your dataframe:
import pandas as pd
s = {'Test Type':'GRE',
'Test Score':'GRE Verbal 156.0/170.0 GRE Analytical Writing 4.5/6.0 GRE Quantitative 157.0/170.0',
}
df = pd.DataFrame([s])
print(df.head())
#
# Test Score Test Type
# 0 GRE Verbal 156.0/170.0 GRE Analytical Writing ... GRE
Next, iterate over your df and perform necessary string manipulations:
new_values = []
for idx, row in df.iterrows():
scores = row['Test Score'].split(row['Test Type'])
for s in scores:
# You don't want the blank items
if s!='':
s = s.strip().split()
# get the section and the score for each
section, score_actual = ' '.join(s[:-1]),s[-1]
new_values.append({
'Test': row['Test Type'],
'Section':section,
'Score': score_actual})
df_new = pd.DataFrame(new_values)
print(df_new.head())
#
# Score Section Test
# 0 156.0/170.0 Verbal GRE
# 1 4.5/6.0 Analytical Writing GRE
# 2 157.0/170.0 Quantitative GRE
You could go a step further and begin manipulating each row down to its percent score, or create a new table with maximum score for each section per exam but I'll leave that to you.