BERT (Bidirectional Encoder Representations from Transformers) is published by researchers at Google AI Language. It has caused a stir in the Machine Learning community by presenting state-of-the-art results in a wide variety of NLP tasks, including Question Answering, Natural Language Inference, and others. BERT’s key technical innovation is applying the bidirectional training of Transformer, a popular attention model, to language modelling. This is in contrast to previous efforts which looked at a text sequence either from left to right or combined left-to-right and right-to-left training. The results show that a language model which is bidirectionally trained can have a deeper sense of language context and flow than single-direction language models.
BERT makes use of Transformer, an attention mechanism that learns contextual relations between words (or sub-words) in a text. Transformer includes two separate mechanisms — an encoder that reads the text input and a decoder that produces a prediction for the task. As opposed to directional models, which read the text input sequentially (left-to-right or right-to-left), the Transformer encoder reads the entire sequence of words at once. Therefore it is considered bidirectional, though it would be more accurate to say that it’s non-directional. This characteristic allows the model to learn the context of a word based on all of its surroundings (left and right of the word).
1 | import re |
1 | imagename = '1.png' |
Output:
1 | national economy gained momentum in recent weeks as con@gmer spending |
1 | # text cleanup |
Output:
1 | national economy gained momentum in recent weeks as [MASK] spending Strengthened , manufacturing activity [MASK] to rise , and producers scheduled more investment in plant and equipment . |
1 | # Tokenize text |
1 | def predict_word(text_original, predictions, MASKIDS): |
1 | text_refined = predict_word(text, predictions, MASKIDS) |
Output:
1 | national economy gained momentum in recent weeks as consumer spending Strengthened , manufacturing activity continued to rise , and producers scheduled more investment in plant and equipment . |
BERT, Spelling-Correction — Jul 18, 2020
Made with ❤️ and ☀️ on Earth.