🌑

Stephen Cheng

Levenshtein Distance

 

Stephen Cheng

Intro

Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965.

Levenshtein distance may also be referred to as edit distance, although that term may also denote a larger family of distance metrics.

Definition

Mathematically, the Levenshtein distance between two strings a,b (of length |a| and |b| respectively) is given by lev_a,b(|a|,|b|)

where

Example

The Levenshtein distance between “kitten” and “sitting” is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:

1) kitten → sitten (substitution of “s” for “k”)
2) sitten → sittin (substitution of “i” for “e”)
3) sittin → sitting (insertion of “g” at the end)

Code

Python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#-*- coding: utf-8 -*-
"""
Levenshtein distance for measuring string difference
Created on Apr. 6th, 2017
@author: Stephen
"""
import numpy as np
class levenshtein_distance:
def le_dis(self, input_x, input_y):
xlen = len(input_x) + 1
ylen = len(input_y) + 1
dp = np.zeros(shape=(xlen, ylen), dtype=int)
for i in range(0, xlen):
dp[i][0] = i
for j in range(0, ylen):
dp[0][j] = j
for i in range(1, xlen):
for j in range(1, ylen):
if input_x[i - 1] == input_y[j - 1]:
dp[i][j] = dp[i - 1][j - 1]
else:
dp[i][j] = 1 + min(dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1])
return dp[xlen - 1][ylen - 1]
if __name__ == '__main__':
ld = levenshtein_distance()
print(ld.le_dis('abcd', 'abd')) # print out 1
print(ld.le_dis('ace', 'abcd')) # print out 2
print(ld.le_dis('hello world', 'hey word')) # print out 4

, , — Apr 18, 2018

Search

    Made with ❤️ and ☀️ on Earth.