The Python Oracle

How to correctly sort a string with a number inside?

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Music Box Puzzles

--

Chapters
00:00 How To Correctly Sort A String With A Number Inside?
00:24 Accepted Answer Score 359
01:31 Thank you

--

Full question
https://stackoverflow.com/questions/5967...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #regex #sorting #string

#avk47



ACCEPTED ANSWER

Score 359


Perhaps you are looking for human sorting (also known as natural sorting):

import re

def atoi(text):
    return int(text) if text.isdigit() else text

def natural_keys(text):
    '''
    alist.sort(key=natural_keys) sorts in human order
    http://nedbatchelder.com/blog/200712/human_sorting.html
    (See Toothy's implementation in the comments)
    '''
    return [ atoi(c) for c in re.split(r'(\d+)', text) ]

alist=[
    "something1",
    "something12",
    "something17",
    "something2",
    "something25",
    "something29"]

alist.sort(key=natural_keys)
print(alist)

yields

['something1', 'something2', 'something12', 'something17', 'something25', 'something29']

PS. I've changed my answer to use Toothy's implementation of natural sorting (posted in the comments here) since it is significantly faster than my original answer.


If you wish to sort text with floats, then you'll need to change the regex from one that matches ints (i.e. (\d+)) to a regex that matches floats:

import re

def atof(text):
    try:
        retval = float(text)
    except ValueError:
        retval = text
    return retval

def natural_keys(text):
    '''
    alist.sort(key=natural_keys) sorts in human order
    http://nedbatchelder.com/blog/200712/human_sorting.html
    (See Toothy's implementation in the comments)
    float regex comes from https://stackoverflow.com/a/12643073/190597
    '''
    return [ atof(c) for c in re.split(r'[+-]?([0-9]+(?:[.][0-9]*)?|[.][0-9]+)', text) ]

alist=[
    "something1",
    "something2",
    "something1.0",
    "something1.25",
    "something1.105"]

alist.sort(key=natural_keys)
print(alist)

yields

['something1', 'something1.0', 'something1.105', 'something1.25', 'something2']