The Python Oracle

Removing all non-numeric characters from string in Python

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Meditation

--

Chapters
00:00 Question
00:20 Accepted answer (Score 386)
00:31 Answer 2 (Score 132)
01:01 Answer 3 (Score 25)
01:23 Answer 4 (Score 19)
01:42 Thank you

--

Full question
https://stackoverflow.com/questions/1249...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #numbers

#avk47



ACCEPTED ANSWER

Score 419


>>> import re
>>> re.sub("[^0-9]", "", "sdkjh987978asd098as0980a98sd")
'987978098098098'
>>> # or
>>> re.sub(r"\D", "", "sdkjh987978asd098as0980a98sd")
'987978098098098'



ANSWER 2

Score 149


Not sure if this is the most efficient way, but:

>>> ''.join(c for c in "abc123def456" if c.isdigit())
'123456'

The ''.join part means to combine all the resulting characters together without any characters in between. Then the rest of it is a generator expression, where (as you can probably guess) we only take the parts of the string that match the condition isdigit.




ANSWER 3

Score 27


This should work for both strings and unicode objects in Python2, and both strings and bytes in Python3:

# python <3.0
def only_numerics(seq):
    return filter(type(seq).isdigit, seq)

# python ≥3.0
def only_numerics(seq):
    seq_type= type(seq)
    return seq_type().join(filter(seq_type.isdigit, seq))



ANSWER 4

Score 11


Just to add another option to the mix, there are several useful constants within the string module. While more useful in other cases, they can be used here.

>>> from string import digits
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'

There are several constants in the module, including:

  • ascii_letters (abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ)
  • hexdigits (0123456789abcdefABCDEF)

If you are using these constants heavily, it can be worthwhile to covert them to a frozenset. That enables O(1) lookups, rather than O(n), where n is the length of the constant for the original strings.

>>> digits = frozenset(digits)
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'