Spelling checking or spelling correction is a basic requirement in any text processing or analysis. The python package pyspellchecker
provides us this feature to find the words that may have been mis-spelled and also suggest the possible corrections. pyspellchecker
supports multiple languages including English, Spanish, German, French, and Portuguese. And it supports Python 3 and Python 2.7. pyspellchecker
allows for the setting of the Levenshtein Distance to check. For longer words, it is highly recommended to use a distance of 1 and not the default 2.
1 | pip install pyspellchecker |
1 | from spellchecker import SpellChecker |
Output:
1 | group |
You can add additional text to generate a more appropriate list for your use case.
1 | from spellchecker import SpellChecker |
If the words that you wish to check are long, it is recommended to reduce the distance to 1. This can be accomplished either when initializing the spell check class or after the fact.
1 | from spellchecker import SpellChecker |
candidates(word)
: Returns a set of possible candidates for the misspelled word
word_probability(word)
: The frequency of the given word out of all words in the frequency list
correction(word)
: Returns the most probable result for the misspelled word
known([words])
: Returns those words that are in the word frequency list
unknown([words])
: Returns those words that are not in the frequency list
Natural-Language-Processing, Spelling-Correction — Aug 16, 2020
Made with ❤️ and ☀️ on Earth.