Algorithm

Dynamic Programming

Levenshtein Distance / Edit Distance


[Up] Japanese English

Sometimes you want to determine how similar a string s is to another string t. This is represented by the concept of Edit Distance, also called Levenshtein Distance. The smaller the Edit Distance, the more "closer" or "similar".

The Edit Distance is the cost to change the string s to the string t by adding the following 3 operations. Costs are ususally set for each of the 3 types of operations.

To calculate the Edit Distance, we use Dynamic Programming. In order to find the Edit Distance between strings of length $n$ and length $m$, each element of the 2-dimensional matrix $(n+1) \times (m+1)$ is calculated, so the computational complexty is $O(mn)$.


[Points to Understand]

Carefully check the state of the string in each cell in "Table 1: State of the string being edited". If you understand that the state of the string consists of three elements:

you can easily understand the algorithm.


Example of Edit Distance Calculation

Consider an edit operation to minimize the cost of generating the string "ara" from the string "saka". We want to reuse the reusable characters in the string "saka" while consuming other characters to generate the string "ara" as efficiently as possible.

To solve with Dynamic Programming, we use a table with the original (to be consumed) string in the vertical direction and the target (to be generated) string in the horizontal direction. Leave a blank space at the beginning of each string.


The state of the string being edited corresponding to each cell in this table is as follows.

Table 1: State of the string being edited.

ara
|sakaa|sakaar|sakaara|saka
s|akaa|akaar|akaara|aka
a|kaa|kaar|kaara|ka
k|aa|aar|aara|a
a|a|ar|ara|

If you just want to find the minimum cost, one table is fine. If you need to record the steps of your editing operations, prepare another table.

In this example, we assume the cost of each operation as follows. Of course you can set these values however you like.

Table 2: Costs of each Edit Operation

OperationSimbolConstExplanation
INSERT$I$$1$Insert a character to the left of the cursor.
DELETE$D$$1$Delete a character to the right of the cursor.
SUBSTITUTE$S$$1$Substitute a character to the right of the cursor, and move the cursor position one character to the right.
MATCH$M$$0$Move the cursor position one character to the right.

In the following, the $L$ row and $R$ column elements of the table are expressed as $(L,R)$.

Of the two tables, the left table records the minimum costs and the right table records the last operation performed.
Enter cost $0$ at ($0$, $0$) in the left table, and enter '$-$' character to the ($0$, $0$) cell in the right table, because no operation has been performed yet.
celloperationString being edited
(0,0)the initial state|saka
Fill in the leftmost column ($L$, $0$) of the left table while adding the DELETE cost. Since the operation is "DELETE", enter '$D$' in ($L$, $0$) cells of the right table.
celloperationString being edited
(0,0)initial state|saka
(1,0)DELETE 's'|aka
(2,0)DELETE 'a'|ka
(3,0)DELETE 'k'|a
(4,0)DELETE 'a'|
Enter the cost of INSERT in the column ($0$, $R$) of the top row of the left table while adding the INSERT cost. Since the operation is INSERT, enter '$I$' in ($0$, $R$) in the right table.
celloperationstring being edited
(0,0)initial state|saka
(0,1)INSERT 'a'a|saka
(0,2)INSERT 'r'ar|saka
(0,3)INSERT 'a'ara|saka
$L=1$, $R=1$.
The cost to be enterd in ($L$, $R$) in the left table is one of the following.
  • SUBSTITUTE from the upper left cell ($L-1$, $R-1$)
  • MATCH (= Cursor Move) from the upper left cell ($L-1$,$R-1$) $\quad$ Only when the characters are equal.
  • DELETE from the upper cell ($L-1$,$R$)
  • INSERT from the left cell ($L$, $R-1$)
Calculate the minimum cost and enter it in the ($L$, $R$) cell of left table, and enter the selected operation in the ($L$, $R$) cell of the right table.
Let's consider the minimum cost to reach the state ($1$, $1$) in the left table. Since the character 's' at the cursor position is not equal to the character 'a' to be generated, "MATCH (= Cursor Move)" can not be used. Therefore, we consider 3 kinds of operations to arrive at ($1$, $1$):
  • SUBSTITUTE 's' with 'a' from state ($0$, $0$), the cost is $0+1=1$.
  • DELETE 's' from state ($0$, $1$) where 'a' was previously INSERTed, the cost is $1+1=2$.
  • INSERT 'a' from state ($1$, $0$) where 's' was previously DELETEd, the cost is $1+1=2$.
Among them, "REPLACE 's' with 'a'" has the lowest cost of 1, so 1 in the left table ($1$, $1$) and '$S$' in the right table ($1$, $1$).
celloperationString being edited
(0,0)initial state|saka
(1,1)SUBSTITUTE 's' with 'a'a|aka

Regardless of the operation order, the state of ($1$, $1$) is the state where "the leading 's' is disappeared from the string to be consumed" and "the leading 'a' has been generated from the string to be generated". The minimum cost to reach that state is $1$.

$L=2$, $R=1$.
The cost to be enterd in ($L$, $R$) in the left table is one of the following.
  • SUBSTITUTE from the upper left cell ($L-1$, $R-1$)
  • MATCH (= Cursor Move) from the upper left cell ($L-1$,$R-1$) $\quad$ Only when the characters are equal.
  • DELETE from the upper cell ($L-1$,$R$)
  • INSERT from the left cell ($L$, $R-1$)
Calculate the minimum cost and enter it in the ($L$, $R$) cell of left table, and enter the selected operation in the ($L$, $R$) cell of the right table.
In ($2$, $1$) of the left table, the character to "be consumed" and the character to "be generated" are both 'a', so MATCH (= Cursor Move) can be used. There are other ways to "DELETE 'a'" from ($1$, $1$) and "INSERT 'a'" from ($2$, $0$), but "MATCH (= Cursor Move)" from ($1$, $0$) is the minimum cost. Enter $1+0=1$ in the left table ($2$, $1$) and '$M$' in the right table ($2$, $1$).
celloperationString being edited
(1,0)DELETE 's'|aka
(2,1)MATCH (= Cursor Move) 'a'a|ka

The state of ($2$, $1$) is "First, DELETE the leading 's' from the original string, then MOVE CURSOR through 'a' for the character of the orignal and the target is the same. The minimum cost of the operations is $1 \mbox{(DELETE cost)} + 0 \mbox{(MATCH cost)} = 1$.

In ($3$, $1$) of the left table, the character 'k' to "be consumed" is different from the character 'a' to "be generated", so MATCH (= Cursor Move) cannot be used. The cost of "SUBSTITUTE from ($2$, $0$)" is 3, the cost of "INSERT from ($3$, $0$)" is 4, and the cost of "DELETE from ($2$, $1$)" is 2.
celloperationString being edited
(1,0)DELETE 's'|aka
(2,1)MATCH (= Cursor Move) 'a'a|ka
(3,1)DELETE 'k'a|a

All the cell is filled.

If multiple symbols are written together in the right table, it means that either operation has the lowest cost. For example, the symbol "SI" means "either SUBSTITUTE or INSERT is OK".

You can see that the minimum cost is $2$ by looking at the bottom right cell of the left table ($4$, $3$).

You can see how to get to the minimum cost by going backwards starting from the bottom right cell of the right table.
  • In case of '$S$' or '$M$', go to the upper left cell.
  • In case of '$I$', go to the left cell.
  • In case of '$D$', goto the upper cell.

The operation sequence with the lowest cost is as follows.

cell/th>operationString being edited
(0,0)initial state|saka
(1,0)DELETE 's'|aka
(2,1)MATCH 'a'a|ka
(3,2)SUBSTITUTE 'k' with'r'ar|a
(4,3)'MATCH a'ara|

Program of Edit Distance (python, Cost only)

EditDistance.py
import sys

COST_DEL = 1
COST_INS = 1
COST_SUBST = 1
COST_MATCH = 0

COST = []

def solve(s,t,flag = False):
  global COST
  sl = len(s)
  tl = len(t)
  COST = [[0 for i in range(tl+1)] for j in range(sl+1)]
  COST[0][0] = 0
  for i in range(1,sl+1):
    COST[i][0] = COST[i-1][0] + COST_DEL
  for i in range(1,tl+1):
    COST[0][i] = COST[0][i-1] + COST_INS
  for i in range(1,sl+1):
    for j in range(1,tl+1):
      c_d = COST[i][j-1] + COST_DEL
      c_i = COST[i-1][j] + COST_INS
      c_s = COST[i-1][j-1] + COST_SUBST
      if s[i-1]==t[j-1]: c_m = COST[i-1][j-1] + COST_MATCH
      else: c_m = sys.maxsize
      COST[i][j] = min(c_d, c_i, c_s, c_m)
  return COST[sl][tl]
Log of EditDistance.py
$ python  
Python 3.9.13 (main, Aug 25 2022, 23:26:10) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import EditDistance  
>>> EditDistance.solve('saka','ara')  
2
>>> print(EditDistance.COST)  
[[0, 1, 2, 3], [1, 1, 2, 3], [2, 1, 2, 2], [3, 2, 2, 3], [4, 3, 3, 2]]
>>> exit()  

Program of Edit Distance (python, Cost and Operations)

EditDistancePath.py
import sys

COST_DEL = 1
COST_INS = 1
COST_SUBST = 1
COST_MATCH = 0

COST = []
OP = []

def solve(s,t):
  global COST, OP
  sl = len(s)
  tl = len(t)
  COST = [[0 for i in range(tl+1)] for j in range(sl+1)]
  OP = [['' for i in range(tl+1)] for j in range(sl+1)]
  COST[0][0] = 0
  for i in range(1,sl+1):
    COST[i][0] = COST[i-1][0] + COST_DEL
    OP[i][0] += 'D'
  for i in range(1,tl+1):
    COST[0][i] = COST[0][i-1] + COST_INS
    OP[0][i] += 'I'
  for i in range(1,sl+1):
    for j in range(1,tl+1):
      c_d = COST[i][j-1] + COST_DEL
      c_i = COST[i-1][j] + COST_INS
      c_s = COST[i-1][j-1] + COST_SUBST
      if s[i-1]==t[j-1]: c_m = COST[i-1][j-1] + COST_MATCH
      else: c_m = sys.maxsize
      COST[i][j] = min(c_d, c_i, c_s, c_m)
      if (c_d == COST[i][j]): OP[i][j] += 'D'
      if (c_i == COST[i][j]): OP[i][j] += 'I'
      if (c_s == COST[i][j]): OP[i][j] += 'S'
      if (c_m == COST[i][j]): OP[i][j] += 'M'
  return COST[sl][tl], getPath(s,t)

def getPath(s,t):
  global OP
  r = []
  row = len(s)
  col = len(t)
  while row > 0 or col > 0:
    s_ch = s[row-1]
    t_ch = t[col-1]
    if  'M' in OP[row][col]:
      r.append('MATCH '+s_ch)
      row -= 1
      col -= 1
    elif 'S' in OP[row][col]:
      r.append('SUBSTITUTE '+s_ch+' with '+t_ch)
      row -= 1
      col -= 1
    elif 'I' in OP[row][col]:
      r.append('INSERT '+t_ch)
      col -= 1
    elif 'D' in OP[row][col]:
      r.append('DELETE '+s_ch)
      row -= 1
  r.reverse()
  return r
Log of EditDistancePath.py
$ python  
Python 3.9.13 (main, Aug 25 2022, 23:26:10) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import EditDistancePath  
>>> EditDistancePath.solve('saka','ara')  
(2, ['DELETE s', 'MATCH a', 'SUBSTITUTE k with r', 'MATCH a'])
>>> print(EditDistancePath.COST)  
[[0, 1, 2, 3], [1, 1, 2, 3], [2, 1, 2, 2], [3, 2, 2, 3], [4, 3, 3, 2]]
>>> print(EditDistancePath.OP)  
[['', 'I', 'I', 'I'], ['D', 'S', 'DS', 'DS'], ['D', 'M', 'DS', 'M'], ['D', 'I', 'S', 'DIS'], ['D', 'IM', 'IS', 'M']]
>>> exit()

Program of Edit Distance (java)

Consider a program that calculates the edit distance between two strings.

In the following program, the array cost is "the minimum cost to reach each state" and the array parent records "the most recent operation to reach that state with the minimum cost". Since each type of operation is represented by a different bit, even multiple operations can be recorded at the same time.

For example, if the operation is "$SI$" (= SUBSTITUTE or INSERT), the operations are recorded as $2+4=6$.

For simplicity, only one operation sequence of the minimal cost is shown, even if there are multiple.

EditDistance.java
import java.util.*;
public class EditDistance {
    int COST_M = 0, COST_S = 1, COST_D = 1, COST_I = 1;
    public static final int OP_M = 1, OP_S = 2, OP_I = 4, OP_D = 8;
    String s,t;
    int[][] cost, op;
    public EditDistance(String s,String t) { this.s = s; this.t = t; }
    public void setSubstituteCost(int COST_S) { this.COST_S = COST_S; }
    public void setInsertCost(int COST_I) { this.COST_I = COST_I; }
    public void setDeleteCost(int COST_D) { this.COST_D = COST_D; }
    public int solve() {
	cost = new int[s.length()+1][t.length()+1];
	cost[0][0] = 0;
	op = new int[s.length()+1][t.length()+1];
	op[0][0] = 0;
	for (int i=1; i<cost.length; i++) {
	    cost[i][0] = cost[i-1][0] + COST_D;
	    op[i][0] = OP_D;
	}
	for (int i=1; i<cost[0].length; i++) {
	    cost[0][i] = cost[0][i-1] + COST_I;
	    op[0][i] = OP_I;
	}
	for (int si=1; si<cost.length; si++) {
	    for (int ti=1; ti<cost[0].length; ti++) {
		int c_m = (s.charAt(si-1) == t.charAt(ti-1)) ? cost[si-1][ti-1] + COST_M : Integer.MAX_VALUE;
		int c_s = cost[si-1][ti-1] + COST_S;
		int c_i = cost[si][ti-1] + COST_I;
		int c_d = cost[si-1][ti] + COST_D;
		cost[si][ti] = Math.min(c_m,Math.min(c_s,Math.min(c_i,c_d)));
		op[si][ti] = 0;
		if (cost[si][ti] == c_m) op[si][ti] |= OP_M;
		if (cost[si][ti] == c_s) op[si][ti] |= OP_S;
		if (cost[si][ti] == c_i) op[si][ti] |= OP_I;
		if (cost[si][ti] == c_d) op[si][ti] |= OP_D;
	    }
	}
	return cost[s.length()][t.length()];
    }
    public String[] getPath() {
	ArrayList<String> v = new ArrayList<String>();
	int si=cost.length-1;
	int ti=cost[0].length-1;
	while (si > 0 || ti > 0) {
	    if ((op[si][ti] & OP_M) != 0) {
		v.add("matches '"+s.charAt(si-1)+"' and '"+t.charAt(ti-1)+"'");
		si--; ti--;
	    } else if ((op[si][ti] & OP_S) != 0) {
		v.add("substitute '"+s.charAt(si-1)+"' by '"+t.charAt(ti-1)+"'");
		si--; ti--;
	    } else if ((op[si][ti] & OP_I) != 0) {
		v.add("insert '"+t.charAt(ti-1)+"'");
		ti--;
	    } else if ((op[si][ti] & OP_D) != 0) {
		v.add("delete '"+s.charAt(si-1)+"'");
		si--;
	    } else throw new RuntimeException("bad op");
	}
	Collections.reverse(v);
	return v.toArray(new String[0]);
    }
    public String showCostTable() { return toString(cost); }
    public String showOpTable() { return toString(op); }
    public String toString(int[][] x) {
	StringBuilder sb = new StringBuilder(" |   ");
	for (int i=0; i<t.length(); i++) sb.append("  "+t.charAt(i));
	int w = sb.toString().length();
	sb.append("\n");
	for (int i=0; i<w; i++) sb.append((i==1) ? "+" : "-");
	sb.append("\n");
	for (int line=0; line < x.length; line++) {
	    if (line == 0) sb.append(" | ");
	    else sb.append(s.charAt(line-1)+"| ");
	    for (int col=0; col < x[line].length; col++) {
		int v=x[line][col];
		if (v < 10 && v>=0) sb.append(" ");
		sb.append(v + " ");
	    }
	    sb.append("\n");
	}
	return sb.toString();
    }
}
RunEditDistance.java
/*
 * [Usage]
 * $ java RunEditDistance saka ara
 */
import java.util.*;
public class RunEditDistance {
    public static void main(String[] args) {
	Scanner sc = new Scanner(System.in);
	String s,t;
	if (args.length > 0) s=args[0];
	else { System.out.print("s="); s = sc.next();}
	if (args.length > 1) t=args[1];
	else { System.out.print("t="); t = sc.next();}
	
	EditDistance ed = new EditDistance(s,t);
	System.out.println(ed.solve());
	System.out.println(ed.showCostTable());
	System.out.println(ed.showOpTable());
	String[] a = ed.getPath();
	for (int i=0; i<a.length; i++)
	    System.out.println(a[i]);
    }
}
RunEditDistance.javaの実行例
$ javac RunEditDistance.java EditDistance.java
$ java RunEditDistance "saka" "ara"
2
 |     a  r  a
-+------------
 |  0  1  2  3 
s|  1  1  2  3 
a|  2  1  2  2 
k|  3  2  2  3 
a|  4  3  3  2 

 |     a  r  a
-+------------
 |  0  4  4  4 
s|  8  2  6  6 
a|  8  1  6  1 
k|  8  8  2 14 
a|  8  9 10  1 

delete 's'
matches 'a' and 'a'
substitute 'k' by 'r'
matches 'a' and 'a'

$ java RunEditDistance "apple is good" "applet is god"
2
 |     a  p  p  l  e  t     i  s     g  o  d
-+------------------------------------------
 |  0  1  2  3  4  5  6  7  8  9 10 11 12 13 
a|  1  0  1  2  3  4  5  6  7  8  9 10 11 12 
p|  2  1  0  1  2  3  4  5  6  7  8  9 10 11 
p|  3  2  1  0  1  2  3  4  5  6  7  8  9 10 
l|  4  3  2  1  0  1  2  3  4  5  6  7  8  9 
e|  5  4  3  2  1  0  1  2  3  4  5  6  7  8 
 |  6  5  4  3  2  1  1  1  2  3  4  5  6  7 
i|  7  6  5  4  3  2  2  2  1  2  3  4  5  6 
s|  8  7  6  5  4  3  3  3  2  1  2  3  4  5 
 |  9  8  7  6  5  4  4  3  3  2  1  2  3  4 
g| 10  9  8  7  6  5  5  4  4  3  2  1  2  3 
o| 11 10  9  8  7  6  6  5  5  4  3  2  1  2 
o| 12 11 10  9  8  7  7  6  6  5  4  3  2  2 
d| 13 12 11 10  9  8  8  7  7  6  5  4  3  2 

 |     a  p  p  l  e  t     i  s     g  o  d
-+------------------------------------------
 |  0  4  4  4  4  4  4  4  4  4  4  4  4  4 
a|  8  1  4  4  4  4  4  4  4  4  4  4  4  4 
p|  8  8  1  5  4  4  4  4  4  4  4  4  4  4 
p|  8  8  9  1  4  4  4  4  4  4  4  4  4  4 
l|  8  8  8  8  1  4  4  4  4  4  4  4  4  4 
e|  8  8  8  8  8  1  4  4  4  4  4  4  4  4 
 |  8  8  8  8  8  8  2  1  4  4  5  4  4  4 
i|  8  8  8  8  8  8 10 10  1  4  4  4  4  4 
s|  8  8  8  8  8  8 10 10  8  1  4  4  4  4 
 |  8  8  8  8  8  8 10  1  8  8  1  4  4  4 
g|  8  8  8  8  8  8 10  8 10  8  8  1  4  4 
o|  8  8  8  8  8  8 10  8 10  8  8  8  1  4 
o|  8  8  8  8  8  8 10  8 10  8  8  8  9  2 
d|  8  8  8  8  8  8 10  8 10  8  8  8  8  1 

matches 'a' and 'a'
matches 'p' and 'p'
matches 'p' and 'p'
matches 'l' and 'l'
matches 'e' and 'e'
insert 't'
matches ' ' and ' '
matches 'i' and 'i'
matches 's' and 's'
matches ' ' and ' '
matches 'g' and 'g'
delete 'o'
matches 'o' and 'o'
matches 'd' and 'd'

$ java RunEditDistance "thou shalt not" "you should not"
5
 |     y  o  u     s  h  o  u  l  d     n  o  t
-+---------------------------------------------
 |  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 
t|  1  1  2  3  4  5  6  7  8  9 10 11 12 13 13 
h|  2  2  2  3  4  5  5  6  7  8  9 10 11 12 13 
o|  3  3  2  3  4  5  6  5  6  7  8  9 10 11 12 
u|  4  4  3  2  3  4  5  6  5  6  7  8  9 10 11 
 |  5  5  4  3  2  3  4  5  6  6  7  7  8  9 10 
s|  6  6  5  4  3  2  3  4  5  6  7  8  8  9 10 
h|  7  7  6  5  4  3  2  3  4  5  6  7  8  9 10 
a|  8  8  7  6  5  4  3  3  4  5  6  7  8  9 10 
l|  9  9  8  7  6  5  4  4  4  4  5  6  7  8  9 
t| 10 10  9  8  7  6  5  5  5  5  5  6  7  8  8 
 | 11 11 10  9  8  7  6  6  6  6  6  5  6  7  8 
n| 12 12 11 10  9  8  7  7  7  7  7  6  5  6  7 
o| 13 13 12 11 10  9  8  7  8  8  8  7  6  5  6 
t| 14 14 13 12 11 10  9  8  8  9  9  8  7  6  5 

 |     y  o  u     s  h  o  u  l  d     n  o  t
-+---------------------------------------------
 |  0  4  4  4  4  4  4  4  4  4  4  4  4  4  4 
t|  8  2  6  6  6  6  6  6  6  6  6  6  6  6  1 
h|  8 10  2  6  6  6  1  4  4  4  4  4  4  4  4 
o|  8 10  1  6  6  6 14  1  4  4  4  4  4  5  4 
u|  8 10  8  1  4  4  4 12  1  4  4  4  4  4  4 
 |  8 10  8  8  1  4  4  4 12  2  6  1  4  4  4 
s|  8 10  8  8  8  1  4  4  4  4  6 14  2  6  6 
h|  8 10  8  8  8  8  1  4  4  4  4  4  4  6  6 
a|  8 10  8  8  8  8  8  2  6  6  6  6  6  6  6 
l|  8 10  8  8  8  8  8 10  2  1  4  4  4  4  4 
t|  8 10  8  8  8  8  8 10 10 10  2  6  6  6  1 
 |  8 10  8  8  9  8  8 10 10 10 10  1  4  4  4 
n|  8 10  8  8  8  8  8 10 10 10 10  8  1  4  4 
o|  8 10  9  8  8  8  8  1 14 10 10  8  8  1  4 
t|  8 10  8  8  8  8  8  8  2 14 10  8  8  8  1 

delete 't'
substitute 'h' by 'y'
matches 'o' and 'o'
matches 'u' and 'u'
matches ' ' and ' '
matches 's' and 's'
matches 'h' and 'h'
insert 'o'
substitute 'a' by 'u'
matches 'l' and 'l'
substitute 't' by 'd'
matches ' ' and ' '
matches 'n' and 'n'
matches 'o' and 'o'
matches 't' and 't'


Exercise


Exercise a

Submission FileEditDistance.java
Comment: Minimum cost of editing the original string "sushi and wine belong to food" to the target string "sun shines and window blows, good"
Submit to:

Assume that the cost of SUBSTITUTE, INSERT, DELETE and MATCH operations are $1$, $1$, $1$, $0$, respectively. Find the minimum cost to edit from the original string "sushi and wine belong to food" to the target string "sun shines and window blows, good".


Exercise b

NearWord.java
Comment: Note that the input dataset is different for each student. The input dataset is assigned by the result of modulo 3 of the last digit of the student number. Paste the output of the program for each input data. (Paste the 7 execution result).
Group 0Group 1Group 2
abcense
explasion
foorin
irerebant
nowlege
rigitemite
newclear
akcept
exagelat
forine
embaras
misaile
gaard
okacionally
colligion
forhedd
riblaly
haight
hipoksicy
laiing
brouchere
Submit to:

Write a program that points out typos in English and suggests correct words. However, it must meet the following specifications.

The contents of "mwords_74550common.txt" are take from the list of 74550 words including in several dictionaries which is available from Grady Ward's Moby Project http://icon.shef.ac.uk/Moby/mwords.html .

Part of "mwords_74550common.txt"
...(omitted)...
alow
alp
alpaca
alpenglow
alpenhorn
alpenstock
alpestrine
alpha
alpha and omega
alpha decay
...(omitted)...
NearWord.java
import java.util.*;
import java.io.*;

public class NearWord {
    private Vector<String> dic;
    private int minCost;
    public NearWord(String fname) {
	try {
	    Scanner sc = new Scanner(new File(fname));
	    dic = new Vector<String>();
	    while (sc.hasNext())
		dic.add(sc.nextLine());
	} catch (Exception e) {
	    e.printStackTrace();
	    System.exit(-1);
	}
    }
    public int cost() { return minCost; }
    public Vector<String> find(String s) {
	minCost = Integer.MAX_VALUE;

	// [Exercise] Write the appropriate code hear.

    }
    public static void main(String[] args) {
	NearWord nw = new NearWord("mwords_74550common.txt");
	Scanner sc = new Scanner(System.in);
	String s = sc.nextLine();
	Vector<String> words = nw.find(s);
	System.out.println(nw.cost());
	for (String w: words)
	    System.out.println(w);
    }
}
Execution log of NearWord.java
$ javac NearWord.java 
$ java NearWord 
Amanda's applet 
5
Adam's apple
$ java NearWord 
womin 
1
woman
women
$ java NearWord 
thiatar 
2
theater
$ java NearWord 
the Internet 
6
eterne
herb bennet
interne
internee
phenanthrene
tenter
theater
theatre

Yoshihisa Nitta

http://nw.tsuda.ac.jp/