Python 3: Perfect Alphabetical Order

This video explains
Python 3: Perfect Alphabetical Order

--

Become part of the top 3% of the developers by applying to Toptal
https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game Looping

--

Chapters
00:00 Question
02:00 Accepted answer (Score 3)
04:04 Answer 2 (Score 11)
05:48 Answer 3 (Score 8)
08:02 Answer 4 (Score 8)
08:54 Thank you

--

Full question
https://stackoverflow.com/questions/4447...

Answer 1 links:
[this post]: https://stackoverflow.com/a/18717762/456...

Answer 2 links:
[https://github.com/kasramvd/SuffixTree]: https://github.com/kasramvd/SuffixTree

Answer 3 links:
[zip(strg, strg[1:])]: https://docs.python.org/3/library/functi...
[ord]: https://docs.python.org/3/library/functi...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #string #python27 #forloop #python35

#avk47

ANSWER 1

Score 11

TL;DR: the last code after the edit solves the problem

This is a variant of the Longest Common Substring Problem.

def longestSubstring(string1, string2):
    answer = ""
    len1, len2 = len(string1), len(string2)
    for i in range(len1):
        match = ""
        for j in range(len2):
            if (i + j < len1 and string1[i + j] == string2[j]):
                match += string2[j]
            else:
                if (len(match) > len(answer)): answer = match
                match = ""
    return answer

alphabets = "abcdefghijklmnopqrstuvwxyz"
s = 'jamabcdaskl'

print('longest substring:', longestSubstring(s,alphabets))

Credits to this post for the subroutine.

Edit:

It seems like the above code doesn't work for all cases, so I had to redesign the function.

def longestAlphaSubstring(str2):
    str1 = "abcdefghijklmnopqrstuvwxyz"
    longest = ""
    for i in range(len(str1)+1):
        if str1[:i] in str2 and len(str1[:i])>len(longest):
            longest = str1[:i]
    return longest

print(longestAlphaSubstring('jamabcdaskl'))
print(longestAlphaSubstring('asdklfjalkdfjabcdefghijklmnopqrstuvwxyzaaabcasdkfjl;kasdf'))

Output:

abcd
abcdefghijklmnopqrstuvwxyz

This works on the assumption that the substring should always begin with a. This iterates through every possible substring from 'a', 'ab', 'abc', ... , up to the full string of alphabets and then stores the longest substring encountered in the check.

For the sake of completeness, here is the code that would work for any longest common substring:

def longestSubstring(str1, str2):
    longest = ""
    for j in range(len(str1)):
        for i in range(len(str1)+1):
            if str1[j:i] in str2 and len(str1[j:i])>len(longest):
                longest = str1[j:i]
    return longest

where one string contains the alphabets in order and the other contains the test string. Be aware that this is O(n^2) in complexity (not that it matters for small cases).

ANSWER 2

Score 8

a different version looping over zip(strg, strg[1:]) in order to compare the previous and the current character in the same loop iteration:

def longest_substring(strg):

    substring = strg[0]
    max_substring = ''

    for cur, nxt in zip(strg, strg[1:]):

        if ord(nxt) >= ord(cur):
            substring += nxt
            if len(substring) > len(max_substring):
                max_substring = substring
        else:
            substring = nxt

    return max_substring

comparing the characters with ord this way has the disadvantage that these characters !"#$%&'()*+,-./0123456789:;=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_``abcdefghijklmnopqrstuvwxyz{|}~ will be considered as 'in alphabetical order'. you may need to tweak that to your needs...

ANSWER 3

Score 8

From an algorithmic perspective, the most optimized approach is using a Suffix Tree in order to find the longest sub string in a given string. (I've implemented an optimized version of the suffix tree in python a while ago. In case you're interested you can check https://github.com/kasramvd/SuffixTree

As another hacky way you can utilize the Numpy in order to find the largest sub-string using the diff(), where() and split() functions:

In [52]: s = 'pqsrjamabcdaskl'

In [54]: ''.join(max(np.split(list(s), np.where((np.diff([ord(i) for i in s]) == 1) == False)[0] + 1), key=len))
Out[54]: 'abcd'

Explanation:

The logic behind this code is to find the indices of the characters that the difference of their ascii value is 1.

In numpy we can simply do that by np.diff function. But since it needs an array of items we can use a list comprehension to pass the list of intended values to the function. Then by comparing the result with 1 we can get a list of bool array as follows:

In [55]: np.diff([ord(i) for i in s]) == 1           
Out[55]: 
array([ True, False, False, False, False, False, False,  True,  True,
        True, False, False, False,  True], dtype=bool)

Now we can get the indices of the False items using np.where to pass them to split function:

In [57]: np.split(list(s), np.where((np.diff([ord(i) for i in s]) == 1) == False)[0] + 1)
Out[57]: 
[array(['p', 'q'], 
       dtype='<U1'), array(['s'], 
       dtype='<U1'), array(['r'], 
       dtype='<U1'), array(['j'], 
       dtype='<U1'), array(['a'], 
       dtype='<U1'), array(['m'], 
       dtype='<U1'), array(['a', 'b', 'c', 'd'], 
       dtype='<U1'), array(['a'], 
       dtype='<U1'), array(['s'], 
       dtype='<U1'), array(['k', 'l'], 
       dtype='<U1')]

The +1 is actually because np.split splits from 0 to our first index and then from first index to the next and etc.

And at the end we can get the longest array using max function by passing the len as the key function.

Also note that this approach will give you the longest consecutive sequence, if you just care about the order you can replace the == 1 with > 0. Here is an example:

In [13]: s = 'pqsrjamabcdasklptvxz'

In [14]: ''.join(max(np.split(list(s), np.where((np.diff([ord(i) for i in s]) > 0) == False)[0] + 1), key=len))
Out[14]: 'klptvxz'

ANSWER 4

Score 3

Here's a simple, efficient and (IMO) quite readable solution, which dodges the "what does alphabetical order actually mean" issue by taking a custom test function as a parameter:

def longest_matching_substring(s, match):
    current_run_length = 1
    longest_run_length = 1
    longest_run_end = 0

    for i in range(1, len(s)):
        if match(s[i-1], s[i]):
            current_run_length += 1
        else:
            current_run_length = 1

        if current_run_length > longest_run_length:
            longest_run_length = current_run_length
            longest_run_end = i

    longest_run_start = longest_run_end - longest_run_length + 1
    return s[longest_run_start:longest_run_end+1]

You can use it e.g. like this:

print(longest_matching_substring('jamabcdaskl', lambda a,b: a < b))

which will print "abcd". If you'd like to use a case-insensitive comparison, and/or to ignore non-alphabetic characters entirely, you can do that by changing the test function, e.g. like this:

def is_alphabetized(a, b):
    return a.lower() < b.lower()

# this prints "eSt"
print(longest_matching_substring('TeStInG', is_alphabetized))

Of course, by defining a suitable test function, you can also use the same longest_matching_substring function to find the longest consecutive substring where the adjacent pairs of characters satisfy any arbitrary criterion (like, say, "does not contain a consonant followed by a vowel"). And you can even use the same function for finding longest matching consecutive subsequences in other sequence types like lists and tuples, not just in strings.

(This implementation does not, however, work for arbitrary iterable types; to handle those, we'd have to memorize the current and the longest matching substring as we iterate over the input. While doable, that would somewhat complicate the code, and also make it less efficient for ordinary strings and other sequence types.)