🌑

Stephen Cheng

Spelling Correction with Soft-Masked BERT

 

Stephen Cheng

Intro

Sotf-Masked BERT is a novel neural architecture to address the aforementioned issue, which consists of a network for error detection and a network for error correction based on BERT, with the former being connected to the latter with what we call soft-masking technique. The method uses ‘Soft-Masked BERT’ is general, and it may be employed in other language detection-correction problems not just focusing on CSC (Chinese Spelling error Correction) domain as it’s proposed in the original paper.

The Architecture of Soft-Masked BERT

Soft-Masked BERT is composed of a detection network based on Bi-GRU and a correction network based on BERT. The detection network predicts the probabilities of errors and the correction network predicts the probabilities of error corrections, while the former passes its prediction results to the latter using soft masking.

The Model first creates an embedding for each character in the input sentence, referred to as input embedding. Next, it takes the sequence of embeddings as input and outputs the probabilities of errors for the sequence of characters (embeddings) using the detection network. After that it calculates the weighted sum of the input embeddings and [MASK] embeddings weighted by the error probabilities. The calculated embeddings mask the likely errors in the sequence in a soft way. Then it takes the sequence of soft-masked embeddings as input and outputs the probabilities of error corrections using the correction network, which is a BERT model whose final layer consists of a softmax function for all characters. There is also a residual connection between the input embeddings and the embeddings at the final layer.

Demo

Different with the original Sort-Masked BERT paper running models on Chinese dataset, here we modify a bit of code and use it in the English dataset.

  • Dataset

The data that we will use for this project will be 20 popular books from Project Gutenberg.

  • Prerequired packages
1
pip install -r requirements.txt
  • Parameters

The length of each sentence is between 4 and 200. So,

1
2
max_len = 32
min_len = 2
  • code

You can find the code on Github

How to run?

  • Prepare Data:
1
python data_prepare.py
  • Process Data:
1
python data_process.py
  • Train Models:
1
python train.py
  • Test Models:
1
python test.py

, — Sep 6, 2020

Search

    Made with ❤️ and ☀️ on Earth.