The Python Oracle

How can I check if a string has the same characters? Python

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: RPG Blues Looping

--

Chapters
00:00 Question
00:40 Accepted answer (Score 20)
01:00 Answer 2 (Score 5)
01:20 Answer 3 (Score 1)
01:43 Answer 4 (Score 0)
02:10 Thank you

--

Full question
https://stackoverflow.com/questions/1832...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #string #python27

#avk47



ACCEPTED ANSWER

Score 21


Sort the two strings and then compare them:

sorted(str1) == sorted(str2)

If the strings might not be the same length, you might want to make sure of that first to save time:

len(str1) == len(str2) and sorted(str1) == sorted(str2)



ANSWER 2

Score 5


This is the O(n) solution

from collections import Counter
Counter(str1) == Counter(str2)

But the O(n * log n) solution using sorted is likely faster for sensible values of n




ANSWER 3

Score 1


Here's a variation on @Joowani's solution that only uses one dictionary and runs even faster (at least on my machine) :

def cmp4(str1, str2):
    if len(str1) != len(str2):
        return False
    d = collections.defaultdict(int)
    for c in str1:
        d[c] += 1
    for c in str2:
        d[c] -= 1
    return all(v == 0 for v in d.itervalues())



ANSWER 4

Score 0


Here is another O(n) solution, longer but slightly faster than others:

def cmp(str1, str2):
    if len(str1) != len(str2):
        return False

    d, d2 = {}, {}
    for char in str1:
        if char not in d:
            d[char] = 1
        else:
            d[char] += 1
    for char in str2:
        if char not in d:
            return False
        if char not in d2:
            d2[char] = 1
        else:
            d2[char] += 1

    return d == d2

It basically does the same thing as gnibber's solution (but for some strange reasons the Counter() from collections library seems quite slow). Here are some timeit results:

setup = '''
import collections
from collections import Counter

s1 = "abcdefghijklmnopqrstuvwxyz" * 10000
s2 = s1[::-1]

def cmp1(str1, str2):
    if len(str1) != len(str2):
        return False

    d, d2 = {}, {}
    for char in str1:
        if char not in d:
            d[char] = 1
        else:
            d[char] += 1
    for char in str2:
        if char not in d:
            return False
        if char not in d2:
            d2[char] = 1
        else:
            d2[char] += 1
    return d == d2

def cmp2(str1, str2):
    return len(str1) == len(str2) and sorted(str1) == sorted(str2)

def cmp3(str1, str2):    
    return Counter(str1) == Counter(str2)

def cmp4(str1, str2):
    if len(str1) != len(str2):
        return False
    d = collections.defaultdict(int)
    for c in str1:
        d[c] += 1
    for c in str2:
        d[c] -= 1
    return all(v == 0 for v in d.itervalues())
'''

    timeit.timeit("cmp1(s1, s2)", setup=setup, number = 100)
    8.027034027221656
    timeit.timeit("cmp2(s1, s2)", setup=setup, number = 100)
    8.175071701324946
    timeit.timeit("cmp3(s1, s2)", setup=setup, number = 100)
    14.243422195893174
    timeit.timeit("cmp4(s1, s2)", setup=setup, number = 100)
    5.0937542822775015

Also, David's solution comes out on top when the string sizes are small and they actually have same characters.

EDIT: updated the test results