Algorithm to match 2 lists with wildcards

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Flying Over Ancient Lands

--

Chapters
00:00 Algorithm To Match 2 Lists With Wildcards
00:38 Answer 1 Score 0
01:06 Answer 2 Score 1
01:39 Answer 3 Score 1
02:47 Answer 4 Score 0
03:55 Thank you

--

Full question
https://stackoverflow.com/questions/8847...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #string #algorithm #patternmatching #stringmatching

#avk47

ANSWER 1

Score 1

How about the following:

import re

def match(pat, lst):
  regex = ''.join(term if term != '*' else '.*' for term in pat) + '$'
  s = ''.join(lst)
  return re.match(regex, s) is not None

print match( ['A', 'B', '*', 'D'], ['A', 'B', 'C', 'C', 'C', 'D'] )

It uses regular expressions. Wildcards (*) are changed to .* and all other search terms are kept as-is.

One caveat is that if your search terms could contain things that have special meaning in the regex language, those would need to be properly escaped. It's pretty easy to handle this in the match function, I just wasn't sure if this was something you required.

ANSWER 2

Score 1

I'd recommend converting ['A', 'B', '*', 'D'] to '^AB.*D$', ['A', 'B', 'C', 'C', 'C', 'D'] to 'ABCCCD', and then using the re module (regular expressions) to do the match.

This will be valid if the elements of your lists are only one character each, and if they're strings.

something like:

import(re)
def myMatch( patternList, stringList ):
    # convert pattern to flat string with wildcards
    # convert AB*D to valid regex ^AB.*D$
    pattern = ''.join(patternList) 
    regexPattern = '^' + pattern.replace('*','.*') + '$' 
    # perform matching
    against = ''.join(stringList) # convert ['A','B','C','C','D'] to ABCCCD
    # return whether there is a match
    return (re.match(regexPattern,against) is not None)

If the lists contain numbers, or words, choose a character that you wouldn't expect to be in either, for example #. Then ['Aa','Bs','Ce','Cc','CC','Dd'] can be converted to Aa#Bs#Ce#Cc#CC#Dd, the wildcard pattern ['Aa','Bs','*','Dd'] could be converted to ^Aa#Bs#.*#Dd$, and the match performed.

Practically speaking this just means all the ''.join(...) becomes '#'.join(...) in myMatch.

ANSWER 3

Score 0

I agree with the comment regarding this could be done with regular expressions. For example:

import re

lst = ['A', 'B', 'C', 'C', 'C', 'D']
pattern = ['A', 'B', 'C+', 'D']

print re.match(''.join(pattern), ''.join(lst)) # Will successfully match

Edit: As pointed out by a comment, it might be known in advance just that some character has to be matched, but not which one. In that case, regular expressions are useful still:

import re

lst = ['A', 'B', 'C', 'C', 'C', 'D']
pattern = r'AB(\w)\1*D'

print re.match(pattern, ''.join(lst)).groups()

ANSWER 4

Score 0

I agree, regular expressions are usually the way to go with this sort of thing. This algorithm works, but it just looks convoluted to me. It was fun to write though.

def match(listx, listy):
    listx, listy = map(iter, (listx, listy))
    while 1:
        try:
            x = next(listx)
        except StopIteration:
            # This means there are values left in listx that are not in listy.
            try:
                y = next(listy)
            except StopIteration:
                # This means there are no more values to be compared in either
                # listx or listy; since no exception was raied elsewhere, the
                # lists match.
                return True
            else:
                # This means that there are values in listy that are not in
                # listx.
                return False
        else:
            try:
                y = next(listy)
            except StopIteration:
                # Similarly, there are values in listy that aren't in listx.
                return False
        if x == y:
            pass
        elif x == '*':
            try:
                # Get the value in listx after '*'.
                x = next(listx)
            except StopIteration:
                # This means that listx terminates with '*'. If there are any
                # remaining values of listy, they will, by definition, match.
                return True
            while 1:
                if x == y:
                    # I didn't shift to the next value in listy because I
                    # assume that a '*' matches the empty string and well as
                    # any other.
                    break
                else:
                    try:
                        y = next(listy)
                    except StopIteration:
                        # This means there is at least one remaining value in
                        # listx that is not in listy, because listy has no
                        # more values.
                        return False
                    else:
                        pass
        # Same algorithm as above, given there is a '*' in listy.
        elif y == '*':
            try:
                y = next(listy)
            except StopIteration:
                return True
            while 1:
                if x == y:
                    break
                else:
                    try:
                        x = next(listx)
                    except StopIteration:
                        return False
                    else:
                        pass