The Python Oracle

Removing all non-numeric characters from string in Python

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Track title: CC G Dvoks String Quartet No 12 Ame 2

--

Chapters
00:00 Removing All Non-Numeric Characters From String In Python
00:14 Accepted Answer Score 410
00:27 Answer 2 Score 144
00:56 Answer 3 Score 27
01:17 Answer 4 Score 27
01:37 Answer 5 Score 11
02:29 Thank you

--

Full question
https://stackoverflow.com/questions/1249...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #numbers

#avk47



ACCEPTED ANSWER

Score 419


>>> import re
>>> re.sub("[^0-9]", "", "sdkjh987978asd098as0980a98sd")
'987978098098098'
>>> # or
>>> re.sub(r"\D", "", "sdkjh987978asd098as0980a98sd")
'987978098098098'



ANSWER 2

Score 149


Not sure if this is the most efficient way, but:

>>> ''.join(c for c in "abc123def456" if c.isdigit())
'123456'

The ''.join part means to combine all the resulting characters together without any characters in between. Then the rest of it is a generator expression, where (as you can probably guess) we only take the parts of the string that match the condition isdigit.




ANSWER 3

Score 27


This should work for both strings and unicode objects in Python2, and both strings and bytes in Python3:

# python <3.0
def only_numerics(seq):
    return filter(type(seq).isdigit, seq)

# python ≥3.0
def only_numerics(seq):
    seq_type= type(seq)
    return seq_type().join(filter(seq_type.isdigit, seq))



ANSWER 4

Score 11


Just to add another option to the mix, there are several useful constants within the string module. While more useful in other cases, they can be used here.

>>> from string import digits
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'

There are several constants in the module, including:

  • ascii_letters (abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ)
  • hexdigits (0123456789abcdefABCDEF)

If you are using these constants heavily, it can be worthwhile to covert them to a frozenset. That enables O(1) lookups, rather than O(n), where n is the length of the constant for the original strings.

>>> digits = frozenset(digits)
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'