The Python Oracle

How to correctly sort a string with a number inside?

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Ocean Floor

--

Chapters
00:00 Question
00:34 Accepted answer (Score 340)
02:23 Thank you

--

Full question
https://stackoverflow.com/questions/5967...

Accepted answer links:
[human sorting]: http://nedbatchelder.com/blog/200712/hum...
[natural sorting]: http://www.codinghorror.com/blog/2007/12...
[here]: http://nedbatchelder.com/blog/200712/hum...
[a regex that matches floats]: https://stackoverflow.com/a/12643073/190...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #regex #sorting #string

#avk47



ACCEPTED ANSWER

Score 359


Perhaps you are looking for human sorting (also known as natural sorting):

import re

def atoi(text):
    return int(text) if text.isdigit() else text

def natural_keys(text):
    '''
    alist.sort(key=natural_keys) sorts in human order
    http://nedbatchelder.com/blog/200712/human_sorting.html
    (See Toothy's implementation in the comments)
    '''
    return [ atoi(c) for c in re.split(r'(\d+)', text) ]

alist=[
    "something1",
    "something12",
    "something17",
    "something2",
    "something25",
    "something29"]

alist.sort(key=natural_keys)
print(alist)

yields

['something1', 'something2', 'something12', 'something17', 'something25', 'something29']

PS. I've changed my answer to use Toothy's implementation of natural sorting (posted in the comments here) since it is significantly faster than my original answer.


If you wish to sort text with floats, then you'll need to change the regex from one that matches ints (i.e. (\d+)) to a regex that matches floats:

import re

def atof(text):
    try:
        retval = float(text)
    except ValueError:
        retval = text
    return retval

def natural_keys(text):
    '''
    alist.sort(key=natural_keys) sorts in human order
    http://nedbatchelder.com/blog/200712/human_sorting.html
    (See Toothy's implementation in the comments)
    float regex comes from https://stackoverflow.com/a/12643073/190597
    '''
    return [ atof(c) for c in re.split(r'[+-]?([0-9]+(?:[.][0-9]*)?|[.][0-9]+)', text) ]

alist=[
    "something1",
    "something2",
    "something1.0",
    "something1.25",
    "something1.105"]

alist.sort(key=natural_keys)
print(alist)

yields

['something1', 'something1.0', 'something1.105', 'something1.25', 'something2']