The Python Oracle

How to replace text in a string column of a Pandas dataframe?

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Flying Over Ancient Lands

--

Chapters
00:00 Question
00:33 Accepted answer (Score 431)
01:29 Answer 2 (Score 114)
01:59 Answer 3 (Score 20)
02:41 Answer 4 (Score 9)
02:56 Thank you

--

Full question
https://stackoverflow.com/questions/2898...

Accepted answer links:
[str]: http://pandas.pydata.org/pandas-docs/sta...
[replace]: http://pandas.pydata.org/pandas-docs/sta...
[docs]: http://pandas.pydata.org/pandas-docs/sta...

Answer 2 links:
[Docs]: https://pandas.pydata.org/pandas-docs/st...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #replace #pandas #dataframe

#avk47



ACCEPTED ANSWER

Score 494


Use the vectorised str method replace:

df['range'] = df['range'].str.replace(',','-')

df
      range
0    (2-30)
1  (50-290)

EDIT: so if we look at what you tried and why it didn't work:

df['range'].replace(',','-',inplace=True)

from the docs we see this description:

str or regex: str: string exactly matching to_replace will be replaced with value

So because the str values do not match, no replacement occurs, compare with the following:

df = pd.DataFrame({'range':['(2,30)',',']})
df['range'].replace(',','-', inplace=True)

df['range']

0    (2,30)
1         -
Name: range, dtype: object

here we get an exact match on the second row and the replacement occurs.




ANSWER 2

Score 129


For anyone else arriving here from Google search on how to do a string replacement on all columns (for example, if one has multiple columns like the OP's 'range' column): Pandas has a built in replace method available on a dataframe object.

df.replace(',', '-', regex=True)

Source: Docs




ANSWER 3

Score 10


Replace all commas with underscore in the column names

data.columns= data.columns.str.replace(' ','_',regex=True)



ANSWER 4

Score 8


In addition, for those looking to replace more than one character in a column, you can do it using regular expressions:

import re
chars_to_remove = ['.', '-', '(', ')', '']
regular_expression = '[' + re.escape (''. join (chars_to_remove)) + ']'

df['string_col'].str.replace(regular_expression, '', regex=True)