How to not match whole word "king" to "king?"?
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Riding Sky Waves v001
--
Chapters
00:00 Question
01:04 Accepted answer (Score 4)
01:53 Answer 2 (Score 5)
03:22 Answer 3 (Score 0)
03:36 Thank you
--
Full question
https://stackoverflow.com/questions/4293...
Accepted answer links:
[Python demo]: http://ideone.com/accowW
Answer 2 links:
[(although you should probably use one)]: https://repl.it/G8zH/0
[image]: https://xkcd.com/1171/
Answer 3 links:
[this post]: https://stackoverflow.com/questions/4901...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #regex #nlp
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Riding Sky Waves v001
--
Chapters
00:00 Question
01:04 Accepted answer (Score 4)
01:53 Answer 2 (Score 5)
03:22 Answer 3 (Score 0)
03:36 Thank you
--
Full question
https://stackoverflow.com/questions/4293...
Accepted answer links:
[Python demo]: http://ideone.com/accowW
Answer 2 links:
[(although you should probably use one)]: https://repl.it/G8zH/0
[image]: https://xkcd.com/1171/
Answer 3 links:
[this post]: https://stackoverflow.com/questions/4901...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #regex #nlp
#avk47
ANSWER 1
Score 5
return answer in context.split():
>>> answer in context.split()
False
You don't need a regex for this.
If you're looking for keywords:
all([ans in context.split() for ans in answer.split()])
will work with "king tut", but that depends if you want to match strings like:
"we tut with the king"
If you don't, you still don't need a regex (although you should probably use one), given that you want to consider only whole terms (which are properly split, by default, via .split()):
def ngram_in(match, string):
matches = match.split()
if len(matches) == 1:
return matches[0] in string.split()
words = string.split()
words_len = len(words)
matches_len = len(matches)
for index, word in enumerate(words):
if index + matches_len > words_len:
return False
if word == matches[0]:
for match_index, match in enumerate(matches):
potential_match = True
if words[index + match_index] != match:
potential_match = False
break
if potential_match == True:
return True
return False
which is O(n*m) on a worst case string and about half as fast as a regex on normal strings.
>>> ngram_in("king", "was king tut a nice dude?")
True
>>> ngram_in("king", "was king? tut a nice dude?")
False
>>> ngram_in("king tut a", "was king tut a nice dude?")
True
>>> ngram_in("king tut a", "was king tut? a nice dude?")
False
>>> ngram_in("king tut a", "was king tut an nice dude?")
False
>>> ngram_in("king tut", "was king tut an nice dude?")
True
ACCEPTED ANSWER
Score 4
Use a regular expression like this:
reg_answer = re.compile(r"(?<!\S)" + re.escape(answer) + r"(?!\S)")
See the Python demo
Details:
(?<!\S)- a negative lookbehind to ensure a match is preceded with whitespace or start of a stringre.escape(answer)- a preprocessing step to make all special chars inside the search word be treated as literal chars(?!\S)- a negative lookahead to ensure the match is followed with whitespace or end of string.
