How to check if a word is an English word with Python?
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Life in a Drop
--
Chapters
00:00 Question
00:44 Accepted answer (Score 276)
01:33 Answer 2 (Score 73)
01:58 Answer 3 (Score 52)
02:18 Answer 4 (Score 42)
03:16 Thank you
--
Full question
https://stackoverflow.com/questions/3788...
Accepted answer links:
[PyEnchant]: https://pypi.org/project/pyenchant/
[tutorial]: https://pyenchant.github.io/pyenchant/tu...
[OpenOffice ones]: http://wiki.services.openoffice.org/wiki...
[inflect]: http://pypi.python.org/pypi/inflect
Answer 3 links:
[this article]: http://www.velvetcache.org/2010/03/01/lo...
Answer 4 links:
http://www.sil.org/linguistics/wordlists...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #nltk #wordnet
#avk47
ACCEPTED ANSWER
Score 284
For (much) more power and flexibility, use a dedicated spellchecking library like PyEnchant. There's a tutorial, or you could just dive straight in:
>>> import enchant
>>> d = enchant.Dict("en_US")
>>> d.check("Hello")
True
>>> d.check("Helo")
False
>>> d.suggest("Helo")
['He lo', 'He-lo', 'Hello', 'Helot', 'Help', 'Halo', 'Hell', 'Held', 'Helm', 'Hero', "He'll"]
>>>
PyEnchant comes with a few dictionaries (en_GB, en_US, de_DE, fr_FR), but can use any of the OpenOffice ones if you want more languages.
There appears to be a pluralisation library called inflect, but I've no idea whether it's any good.
ANSWER 2
Score 78
It won't work well with WordNet, because WordNet does not contain all english words. Another possibility based on NLTK without enchant is NLTK's words corpus
>>> from nltk.corpus import words
>>> "would" in words.words()
True
>>> "could" in words.words()
True
>>> "should" in words.words()
True
>>> "I" in words.words()
True
>>> "you" in words.words()
True
ANSWER 3
Score 53
Using NLTK:
from nltk.corpus import wordnet
if not wordnet.synsets(word_to_test):
#Not an English Word
else:
#English Word
You should refer to this article if you have trouble installing wordnet or want to try other approaches.
ANSWER 4
Score 42
Using a set to store the word list because looking them up will be faster:
with open("english_words.txt") as word_file:
english_words = set(word.strip().lower() for word in word_file)
def is_english_word(word):
return word.lower() in english_words
print is_english_word("ham") # should be true if you have a good english_words.txt
To answer the second part of the question, the plurals would already be in a good word list, but if you wanted to specifically exclude those from the list for some reason, you could indeed write a function to handle it. But English pluralization rules are tricky enough that I'd just include the plurals in the word list to begin with.
As to where to find English word lists, I found several just by Googling "English word list". Here is one: http://www.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt You could Google for British or American English if you want specifically one of those dialects.