The Python Oracle

detect allusions (e.g. very fuzzy matches) in language of inaugural addresses

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Sunrise at the Stream

--

Chapters
00:00 Question
04:10 Accepted answer (Score 2)
04:46 Thank you

--

Full question
https://stackoverflow.com/questions/1449...

Question links:
[available on Cloud9IDE]: https://c9.io/wilson428/inaugurals

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #text #nlp #nltk

#avk47



ACCEPTED ANSWER

Score 2


If you are inspired to use bigrams, you could build your bigrams while allowing gaps of one, two, or even three words so as to loosen up the definition of bigram a little bit. This could work since allowing n gaps means not even n times as many "bigrams", and your corpus is pretty small. With this, for example, a "bigram" from your first paragraph could be (similar, inaugurals).