The Python Oracle

How to not match whole word "king" to "king?"?

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Thinking It Over

--

Chapters
00:00 How To Not Match Whole Word &Quot;King&Quot; To &Quot;King?&Quot;?
00:53 Answer 1 Score 5
01:55 Answer 2 Score 0
02:06 Accepted Answer Score 4
02:39 Thank you

--

Full question
https://stackoverflow.com/questions/4293...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #regex #nlp

#avk47



ANSWER 1

Score 5


return answer in context.split():

>>> answer in context.split()
False

You don't need a regex for this.

If you're looking for keywords:

all([ans in context.split() for ans in answer.split()])

will work with "king tut", but that depends if you want to match strings like:

"we tut with the king"

If you don't, you still don't need a regex (although you should probably use one), given that you want to consider only whole terms (which are properly split, by default, via .split()):

def ngram_in(match, string):
    matches = match.split()
    if len(matches) == 1:
        return matches[0] in string.split()
    words = string.split()
    words_len = len(words)
    matches_len = len(matches)
    for index, word in enumerate(words):
        if index + matches_len > words_len:
            return False
        if word == matches[0]:
            for match_index, match in enumerate(matches):
                potential_match = True
                if words[index + match_index] != match:
                    potential_match = False
                    break
            if potential_match == True:
                return True
    return False

which is O(n*m) on a worst case string and about half as fast as a regex on normal strings.

>>> ngram_in("king", "was king tut a nice dude?")
True
>>> ngram_in("king", "was king? tut a nice dude?")
False
>>> ngram_in("king tut a", "was king tut a nice dude?")
True
>>> ngram_in("king tut a", "was king tut? a nice dude?")
False
>>> ngram_in("king tut a", "was king tut an nice dude?")
False
>>> ngram_in("king tut", "was king tut an nice dude?")
True




ACCEPTED ANSWER

Score 4


Use a regular expression like this:

reg_answer = re.compile(r"(?<!\S)" + re.escape(answer) + r"(?!\S)")

See the Python demo

Details:

  • (?<!\S) - a negative lookbehind to ensure a match is preceded with whitespace or start of a string
  • re.escape(answer) - a preprocessing step to make all special chars inside the search word be treated as literal chars
  • (?!\S) - a negative lookahead to ensure the match is followed with whitespace or end of string.



ANSWER 3

Score 0


Why not check:

if answer in context: do stuff

Check this post for more details