This is my first post on **Levenshtein Algorithm**: a way to calculate the edit distance between two strings. A way to calculate the cost to transmute a *string1* to a *string2*.

The comments in the codebase are useful for comprehension on their own. I start with the recursing and building a *matrix* where the answer (the cost to transmute or convert *str1* into *str2*) lies at *index [m][n]* for the first implementation. For the second case, the cost would be at *index [0][0]*.

This is the gateway to elucidating DP through the implementation of this algorithm. We would implement in ways that classify as DP (both memoization and tabulation) as well as plain and simple recursion or iteration with no DP components. We start with vanilla recursion in the following codebase. Please review the comments in the codebase to understand why this is not DP. A future post would elucidate DP.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
public class PlainLevenshtein { private final char[] str1; private final char[] str2; public PlainLevenshtein(String s1, String s2) { str1 = s1.toCharArray(); str2 = s2.toCharArray(); } /** * my code 1: A top-down approach that can benefit from memoization. It * recurses and creates a matrix where the answer lies at [m][n] * <p> * The goal is to transmute "str1" into "str2". * <p> * The initial call is {@code cost1(m, n)} where m = size of "str1" <br> * and n = size of "str2". * * @param i * the index (the 0th based index from 0 to m - 1) * @param j * the index (the 0th based index from 0 n - 1) * @return: the final cost */ public int cost1(int i, int j) { if (i == 0) { return j; } if (j == 0) { return i; } assert i >= 0; assert j >= 0; int c = 1; if (str1[i - 1] == str2[j - 1]) { c = 0; } int ins = cost1(i, j - 1) + 1; int del = cost1(i - 1, j) + 1; int sub = cost1(i - 1, j - 1) + c; return Math.min(ins, Math.min(del, sub)); } /** * my code 2: The difference is that this recurses and builds a matrix and * the answer lies at [0][0]. This could also benefit from memoization as * well. * * <p> * However in order for it to qualify as DP, they key is to solve the * smaller sub-problems first and then move on to solver bigger sub-problems * with the results of the smaller sub-problems that are already computed * and available at that point. Therefore this is not DP either. * <p> * The initial call is cost2(0,0) * * @param i * the index (the 0th based index from 0 to m - 1) * @param j * the index (the 0th based index from 0 n - 1) * @return: the final cost */ public int cost2(int i, int j) { if (i == str1.length) { return str2.length - j; } if (j == str2.length) { return str1.length - i; } assert i >= 0; assert j >= 0; int c = 1; if (str1[i] == str2[j]) { c = 0; } int ins = cost2(i, j + 1) + 1; int del = cost2(i + 1, j) + 1; int sub = cost2(i + 1, j + 1) + c; return Math.min(ins, Math.min(del, sub)); } } |

A snippet of the test code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
@Test public void testCost1_A() { String str1 = "adcklllkl"; String str2 = "abcjfkerjfgggg"; int cost1 = new PlainLevenshtein(str1, str2).cost1(str1.length(), str2.length()); System.out.println("cost 1: " +cost1); assertEquals(cost1, 11); } @Test public void testCost1_B() { String str1 = "abc"; String str2 = "xya"; int cost1 = new PlainLevenshtein(str1, str2).cost1(str1.length(), str2.length()); System.out.println("cost 1: " +cost1); assertEquals(cost1, 3); } |