Key Concepts of Longest Common Subsequence
- Subsequence: A subsequence is a sequence derived from another sequence by deleting some elements without changing the order of the remaining elements.
- Dynamic Programming: The LCS problem can be efficiently solved using a dynamic programming approach. We create a 2D array where each cell represents the length of the LCS of substrings of the two given strings.
Dynamic Programming Approach
- Step 1: Create a 2D array
dp
wheredp[i][j]
will hold the length of LCS of the firsti
characters of stringX
and the firstj
characters of stringY
. - Step 2: Fill this array based on the following rules:
- If the characters match (
X[i-1] == Y[j-1]
), then: dp[i][j]=dp[i−1][j−1]+1dp[i][j] = dp[i-1][j-1] + 1dp[i][j]=dp[i−1][j−1]+1 - If they don’t match, then: dp[i][j]=max(dp[i−1][j],dp[i][j−1])dp[i][j] = \max(dp[i-1][j], dp[i][j-1])dp[i][j]=max(dp[i−1][j],dp[i][j−1])
- If the characters match (
- Step 3: The value in
dp[m][n]
(wherem
andn
are the lengths of the strings) will be the length of the LCS.
C Program to Find Longest Common Subsequence
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Function to find the length of Longest Common Subsequence
int longestCommonSubsequence(char* X, char* Y) {
int m = strlen(X);
int n = strlen(Y);
// Create a 2D array to store lengths of longest common subsequence
int** dp = (int**)malloc((m + 1) * sizeof(int*));
for (int i = 0; i <= m; i++) {
dp[i] = (int*)malloc((n + 1) * sizeof(int));
}
// Build the dp array
for (int i = 0; i <= m; i++) {
for (int j = 0; j <= n; j++) {
if (i == 0 || j == 0) {
dp[i][j] = 0; // Base case
} else if (X[i - 1] == Y[j - 1]) {
dp[i][j] = dp[i - 1][j - 1] + 1; // Characters match
} else {
dp[i][j] = (dp[i - 1][j] > dp[i][j - 1]) ? dp[i - 1][j] : dp[i][j - 1]; // Characters don't match
}
}
}
// Length of LCS is in dp[m][n]
int lcsLength = dp[m][n];
// Free the allocated memory for dp
for (int i = 0; i <= m; i++) {
free(dp[i]);
}
free(dp);
return lcsLength;
}
// Function to print the LCS itself
void printLCS(char* X, char* Y) {
int m = strlen(X);
int n = strlen(Y);
// Create a 2D array to store lengths of longest common subsequence
int** dp = (int**)malloc((m + 1) * sizeof(int*));
for (int i = 0; i <= m; i++) {
dp[i] = (int*)malloc((n + 1) * sizeof(int));
}
// Build the dp array
for (int i = 0; i <= m; i++) {
for (int j = 0; j <= n; j++) {
if (i == 0 || j == 0) {
dp[i][j] = 0; // Base case
} else if (X[i - 1] == Y[j - 1]) {
dp[i][j] = dp[i - 1][j - 1] + 1; // Characters match
} else {
dp[i][j] = (dp[i - 1][j] > dp[i][j - 1]) ? dp[i - 1][j] : dp[i][j - 1]; // Characters don't match
}
}
}
// Now, reconstruct the LCS from the dp array
int index = dp[m][n];
char* lcs = (char*)malloc((index + 1) * sizeof(char));
lcs[index] = '\0'; // Null-terminate the string
int i = m, j = n;
while (i > 0 && j > 0) {
if (X[i - 1] == Y[j - 1]) {
lcs[index - 1] = X[i - 1]; // Store this character in lcs
i--;
j--;
index--;
} else if (dp[i - 1][j] > dp[i][j - 1]) {
i--; // Move up in the dp array
} else {
j--; // Move left in the dp array
}
}
printf("Longest Common Subsequence: %s\n", lcs);
// Free the allocated memory
for (int i = 0; i <= m; i++) {
free(dp[i]);
}
free(dp);
free(lcs);
}
// Main function
int main() {
char X[100], Y[100];
// Input the two strings
printf("Enter first string: ");
fgets(X, sizeof(X), stdin);
X[strcspn(X, "\n")] = '\0'; // Remove newline character
printf("Enter second string: ");
fgets(Y, sizeof(Y), stdin);
Y[strcspn(Y, "\n")] = '\0'; // Remove newline character
// Find the length of LCS
int lcsLength = longestCommonSubsequence(X, Y);
printf("Length of Longest Common Subsequence: %d\n", lcsLength);
// Print the LCS itself
printLCS(X, Y);
return 0;
}
Explanation of the Code
- Dynamic Programming Table:
- The program uses a 2D array
dp
wheredp[i][j]
stores the length of the LCS of the firsti
characters of stringX
and the firstj
characters of stringY
.
- The program uses a 2D array
- Building the DP Array:
- The nested loops populate the
dp
array based on whether characters match or not. The base case is when one of the strings is empty.
- The nested loops populate the
- Reconstructing the LCS:
- After filling the
dp
table, the program reconstructs the LCS by tracing back through thedp
array.
- After filling the
- Memory Management:
- The program frees the allocated memory for the
dp
array and the LCS string to prevent memory leaks.
- The program frees the allocated memory for the
Input and Output Example
Input
cCopy codeEnter first string: AGGTAB
Enter second string: GXTXAYB
Output
mathematicaCopy codeLength of Longest Common Subsequence: 4
Longest Common Subsequence: GTAB
Conclusion
This C program efficiently computes the Longest Common Subsequence of two strings using dynamic programming. It allows for user input and displays both the length and the actual LCS. You can test different pairs of strings to see how the LCS changes. This approach is suitable for various applications requiring sequence analysis.