The Python Oracle

How to conditionally copy a substring into a new column of a pandas dataframe?

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Magical Minnie Puzzles

--

Chapters
00:00 How To Conditionally Copy A Substring Into A New Column Of A Pandas Dataframe?
01:21 Accepted Answer Score 3
01:58 Thank you

--

Full question
https://stackoverflow.com/questions/4585...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #string #pandas #dataframe #substring

#avk47



ACCEPTED ANSWER

Score 3


You can use pd.Series.str.extract:

In [737]: df
Out[737]: 
         A                                   B
0    VALID       asdfafX'XextractthisY'Yeaaadf
1  INVALID         secondrowX'XsubtextY'Yelakj
2    VALID  secondrowX'XextractthistooY'Yelakj

In [745]: df['C'] = df[df.A == 'VALID'].B.str.extract("(?<=X'X)(.*?)(?=Y'Y)", expand=False)

In [746]: df
Out[746]: 
         A                                   B               C
0    VALID       asdfafX'XextractthisY'Yeaaadf     extractthis
1  INVALID         secondrowX'XsubtextY'Yelakj             NaN
2    VALID  secondrowX'XextractthistooY'Yelakj  extractthistoo

The regex pattern is:

(?<=X'X)(.*?)(?=Y'Y)
  • (?<=X'X) is a lookbehind for X'X

  • (.*?) matches everything between the lookbehind and lookahead

  • (?=Y'Y) is a lookahead for Y'Y