🌑

Stephen's Blog

Spelling Correction with Python Spellchecker

 

Stephen Cheng

Intro

Spelling checking or spelling correction is a basic requirement in any text processing or analysis. The python package pyspellchecker provides us this feature to find the words that may have been mis-spelled and also suggest the possible corrections. pyspellchecker supports multiple languages including English, Spanish, German, French, and Portuguese. And it supports Python 3 and Python 2.7. pyspellchecker allows for the setting of the Levenshtein Distance to check. For longer words, it is highly recommended to use a distance of 1 and not the default 2.

How to install?

1
pip install pyspellchecker

How to use?

  • With the default Word Frequency list
1
2
3
4
5
6
7
8
9
10
11
12
13
from spellchecker import SpellChecker

spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['let', 'us', 'wlak','on','the','groun'])

for word in misspelled:
# Get the one `most likely` answer
print(spell.correction(word))

# Get a list of `likely` options
print(spell.candidates(word))

Output:

1
2
3
4
group
{'group', 'ground', 'groan', 'grout', 'grown', 'groin'}
walk
{'flak', 'weak', 'walk'}
  • With the customized Word Frequency list

You can add additional text to generate a more appropriate list for your use case.

1
2
3
4
5
6
7
8
from spellchecker import SpellChecker

spell = SpellChecker() # loads default word frequency list
spell.word_frequency.load_text_file('./word_frequency.txt')

# if you just want to make sure some words are not flagged as misspelled
spell.word_frequency.load_words(['microsoft', 'apple', 'google'])
spell.known(['microsoft', 'google']) # will return both now!
  • Set the distance parameter

If the words that you wish to check are long, it is recommended to reduce the distance to 1. This can be accomplished either when initializing the spell check class or after the fact.

1
2
3
4
5
6
7
from spellchecker import SpellChecker

spell = SpellChecker(distance=1) # set at initialization

# do some work on longer words

spell.distance = 2 # set the distance parameter back to the default

Additional Methods

candidates(word): Returns a set of possible candidates for the misspelled word

word_probability(word): The frequency of the given word out of all words in the frequency list

correction(word): Returns the most probable result for the misspelled word

known([words]): Returns those words that are in the word frequency list

unknown([words]): Returns those words that are not in the frequency list

, — Aug 16, 2020

Search

    Made with ❤️ and ☀️ on Earth.