Problem Statement
Given two strings, calculate the minimum number of edits required to transform the first string into the second. The allowed operations are:
- Insertion of a character
- Deletion of a character
- Substitution of one character for another
Key Concepts
- Dynamic Programming: This method breaks down the problem into smaller subproblems and builds up solutions using previously computed results.
- State Representation: We can use a 2D array
dp
wheredp[i][j]
represents the minimum edit distance between the firsti
characters of stringX
and the firstj
characters of stringY
.
Dynamic Programming Approach
- Step 1: Initialize a 2D array
dp
of size(m+1) x (n+1)
, wherem
is the length of stringX
andn
is the length of stringY
. - Step 2: Fill the first row and the first column to represent the cost of converting between an empty string and a substring of the other string.
- Step 3: Iterate through each character of both strings to fill the
dp
table based on the minimum operations required.
C Program for Minimum Edit Distance
Here’s a complete C program to compute the Minimum Edit Distance:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Function to calculate the minimum edit distance
int minEditDistance(char* X, char* Y) {
int m = strlen(X);
int n = strlen(Y);
// Create a DP table
int** dp = (int**)malloc((m + 1) * sizeof(int*));
for (int i = 0; i <= m; i++) {
dp[i] = (int*)malloc((n + 1) * sizeof(int));
}
// Initialize the first row and column
for (int i = 0; i <= m; i++) {
dp[i][0] = i; // Deletion cost
}
for (int j = 0; j <= n; j++) {
dp[0][j] = j; // Insertion cost
}
// Fill the DP table
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (X[i - 1] == Y[j - 1]) {
dp[i][j] = dp[i - 1][j - 1]; // No operation needed
} else {
dp[i][j] = 1 + (dp[i - 1][j - 1] < dp[i][j - 1] ?
(dp[i - 1][j - 1] < dp[i][j - 1] ?
dp[i - 1][j - 1] : dp[i][j - 1]) :
(dp[i][j - 1] < dp[i - 1][j] ?
dp[i][j - 1] : dp[i - 1][j]));
}
}
}
// The minimum edit distance is in dp[m][n]
int distance = dp[m][n];
// Free the allocated memory
for (int i = 0; i <= m; i++) {
free(dp[i]);
}
free(dp);
return distance;
}
// Main function
int main() {
char X[100], Y[100];
// Input the two strings
printf("Enter first string: ");
fgets(X, sizeof(X), stdin);
X[strcspn(X, "\n")] = '\0'; // Remove newline character
printf("Enter second string: ");
fgets(Y, sizeof(Y), stdin);
Y[strcspn(Y, "\n")] = '\0'; // Remove newline character
// Calculate the minimum edit distance
int distance = minEditDistance(X, Y);
printf("Minimum Edit Distance (Levenshtein Distance): %d\n", distance);
return 0;
}
Explanation of the Code
- Dynamic Programming Table:
- A 2D array
dp
is created, wheredp[i][j]
holds the minimum edit distance between the firsti
characters of stringX
and the firstj
characters of stringY
.
- A 2D array
- Initialization:
- The first row represents the cost of converting from an empty string to substrings of
Y
, which is equal to the length of those substrings (insertion cost). - The first column represents the cost of converting from substrings of
X
to an empty string, which is equal to the length of those substrings (deletion cost).
- The first row represents the cost of converting from an empty string to substrings of
- Filling the DP Table:
- We iterate through both strings, checking if the characters match. If they do, no additional cost is incurred. If they don’t, we compute the minimum cost of the three possible operations (insertion, deletion, substitution).
- Memory Management:
- The program dynamically allocates memory for the
dp
array and frees it after use to prevent memory leaks.
- The program dynamically allocates memory for the
Input and Output Example
Input
cCopy codeEnter first string:kitten
Enter second string:sitting
Output
javaCopy codeMinimum Edit Distance (Levenshtein Distance): 3
Explanation of the Output
In this case, the minimum edit distance of 3
can be achieved by:
- Substituting ‘k’ with ‘s’
- Substituting ‘e’ with ‘i’
- Inserting ‘g’ at the end.
Conclusion
This C program efficiently computes the Minimum Edit Distance (Levenshtein Distance) between two strings using dynamic programming. It allows user input for flexibility and outputs the minimum number of edits required to convert one string into another. You can test various pairs of strings to see how the edit distance changes, making this approach useful in various applications, including spell checking, DNA sequence analysis, and more.