[Dynamic Programming] Longest Common Subsequence - Medium

The Longest Common Subsequence (LCS) is a well-known problem in computer science. It is part of dynamic programming. The goal is to find the longest subsequence that two sequences share. This subsequence does not have to be next to each other. We can solve the Longest Common Subsequence problem efficiently by using different dynamic programming methods. These methods include recursive ways, memoization, and tabulation.

In this article, we will explore the dynamic programming method to solve the Longest Common Subsequence problem. We will look at the recursive method, the memoization technique, the tabulation method, and a solution that uses less space. We will also show how to implement the LCS algorithm in Java, Python, and C++. We will check the time complexity of this algorithm and answer some common questions about it.

Dynamic Programming Longest Common Subsequence Problem Overview
Dynamic Programming Longest Common Subsequence Recursive Approach
Dynamic Programming Longest Common Subsequence Memoization Technique
Dynamic Programming Longest Common Subsequence Tabulation Method
Dynamic Programming Longest Common Subsequence Space Optimized Solution
Dynamic Programming Longest Common Subsequence Java Implementation
Dynamic Programming Longest Common Subsequence Python Implementation
Dynamic Programming Longest Common Subsequence C++ Implementation
Dynamic Programming Longest Common Subsequence Time Complexity Analysis
Frequently Asked Questions

If you want to learn more about dynamic programming, we can suggest some related articles. You might find Dynamic Programming Fibonacci Number and Dynamic Programming Unique Paths in a Grid helpful.

Dynamic Programming Longest Common Subsequence Recursive Approach

The Longest Common Subsequence (LCS) problem wants to find the longest subsequence in two sequences. A subsequence is a sequence that shows up in the same order but not always next to each other. We can solve the LCS problem with a recursive approach based on these ideas:

If the last characters of both sequences are the same, the LCS length is 1 plus the LCS length of the rest of the sequences.
If the last characters are different, the LCS length is the bigger value from ignoring the last character of one sequence or the other.

Recursive Function

Here is a simple way to write the recursive function for the LCS problem:

def lcs_recursive(X, Y, m, n):
    if m == 0 or n == 0:
        return 0
    if X[m - 1] == Y[n - 1]:
        return 1 + lcs_recursive(X, Y, m - 1, n - 1)
    else:
        return max(lcs_recursive(X, Y, m, n - 1), lcs_recursive(X, Y, m - 1, n))

Example Usage

X = "AGGTAB"
Y = "GXTXAYB"
m = len(X)
n = len(Y)
print("Length of LCS is:", lcs_recursive(X, Y, m, n))  # Output: Length of LCS is: 4

Properties

The recursive solution has a time complexity of (O(2^{(m, n)})). This makes it slow for larger sequences because the number of recursive calls grows a lot.
It is easy to implement but can repeat calculations for the same problems.

This recursive method helps us understand how to approach LCS. Later, we can make it better using memoization or dynamic programming tabulation. For more information on dynamic programming methods, we can read articles like Dynamic Programming Fibonacci Number and Dynamic Programming Coin Change.

Dynamic Programming Longest Common Subsequence Memoization Technique

We use the memoization technique to make the recursive solution for the Longest Common Subsequence (LCS) problem faster. This method stores the results of expensive function calls. Then we can use these results again when we see the same inputs.

Problem Definition

We have two sequences. Our job is to find the length of the longest subsequence that is in both sequences. A subsequence is made by deleting some elements from another sequence. We do not change the order of what is left.

Recursive Approach with Memoization

Recursive Function: We need to define a recursive function. This function takes the indices of both sequences and compares their characters.
Memoization Table: We use a 2D array to save results of LCS lengths for specific indices. This way, we avoid doing the same calculations again.

Implementation

Here is a Python code to solve the LCS problem using the memoization technique:

def lcs_memoization(X, Y):
    m = len(X)
    n = len(Y)
    memo = [[-1 for _ in range(n + 1)] for _ in range(m + 1)]

    def lcs_helper(i, j):
        if i == 0 or j == 0:
            return 0
        if memo[i][j] != -1:
            return memo[i][j]

        if X[i - 1] == Y[j - 1]:
            memo[i][j] = 1 + lcs_helper(i - 1, j - 1)
        else:
            memo[i][j] = max(lcs_helper(i - 1, j), lcs_helper(i, j - 1))

        return memo[i][j]

    return lcs_helper(m, n)

# Example usage
X = "AGGTAB"
Y = "GXTXAYB"
print("Length of LCS is", lcs_memoization(X, Y))

Explanation of the Code

The function lcs_memoization starts by making a memoization table called memo. This table has size (m+1) x (n+1) where m and n are the lengths of the input strings X and Y.
The helper function lcs_helper does the recursive work. It checks if we have already found the result for indices (i, j).
If the characters match, we add 1 to the result from the previous indices. If they do not match, we take the maximum from two choices: excluding the current character from either string.

Using memoization helps us lower the time needed to solve the problem. It goes from an exponential time to polynomial time, specifically O(m * n). This makes it easier to handle bigger inputs.

For more about dynamic programming, we can read the Dynamic Programming Longest Increasing Subsequence article.

Dynamic Programming Longest Common Subsequence Tabulation Method

We can solve the Longest Common Subsequence (LCS) problem using the tabulation method. This method is an iterative way. It uses a 2D array to keep the lengths of the longest common subsequences for parts of the two sequences. We build the solution from the bottom up. We fill the table based on values we calculated before.

Algorithm

First, we create a 2D array dp of size (m+1) x (n+1). Here, m and n are the lengths of the two strings.
We set the first row and column of dp to zero. The LCS of any string with an empty string is 0.
Next, we go through each character of both strings. If the characters are the same, we set dp[i][j] = dp[i-1][j-1] + 1. If not, we set dp[i][j] = max(dp[i-1][j], dp[i][j-1]).
Finally, the value in dp[m][n] gives us the length of the longest common subsequence.

Code Implementation

Here is how we can write the tabulation method for LCS in Python:

def longest_common_subsequence(str1, str2):
    m, n = len(str1), len(str2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if str1[i - 1] == str2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    
    return dp[m][n]

# Example usage
str1 = "AGGTAB"
str2 = "GXTXAYB"
print("Length of Longest Common Subsequence is", longest_common_subsequence(str1, str2))

Complexity Analysis

Time Complexity: O(m * n). Here, m and n are the lengths of the two strings.
Space Complexity: O(m * n) for the dp table.

The tabulation method works well to find the LCS. It uses results we got earlier. This makes it a good choice for dynamic programming problems. For more about dynamic programming, we can read about Dynamic Programming Longest Increasing Subsequence or Dynamic Programming Coin Change.

Dynamic Programming Longest Common Subsequence Space Optimized Solution

We can solve the Longest Common Subsequence (LCS) problem using a smart approach that saves space. Instead of using a full 2D array for dynamic programming, we can lower the space needed to O(min(m, n)). Here, m and n are the lengths of the two sequences.

In this method, we use two 1D arrays. These arrays will hold the current and previous rows of the DP table. We will go through the characters of both strings and update the arrays as needed.

Here is a simple code for the space-optimized solution in Python:

def longest_common_subsequence(X, Y):
    m = len(X)
    n = len(Y)
    
    # Create two arrays for storing results of current and previous row
    previous = [0] * (n + 1)
    current = [0] * (n + 1)

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                current[j] = previous[j - 1] + 1
            else:
                current[j] = max(previous[j], current[j - 1])
        
        # Update previous to current for the next iteration
        previous, current = current, previous

    return previous[n]

# Example usage
X = "AGGTAB"
Y = "GXTXAYB"
print("Length of LCS is", longest_common_subsequence(X, Y))

Key Points:

Space Complexity: We reduce it to O(min(m, n)).
Time Complexity: It stays O(m * n).
Array Usage: We use two 1D arrays to keep results of the last and current steps.

This space-optimized way works well to find the length of the longest common subsequence. It also uses less memory, so it is good for bigger inputs. If we want to learn more about dynamic programming, we can check out topics like the Longest Increasing Subsequence.

Dynamic Programming Longest Common Subsequence Java Implementation

The Longest Common Subsequence (LCS) problem is a well-known problem in dynamic programming. In this section, we will show how to implement the LCS algorithm using Java. We will look at the recursive approach and the memoization technique.

Recursive Approach

The recursive approach checks if the last characters of both sequences are the same. If they are, we add that character to the LCS and check the rest of the characters. If they are not the same, we find the LCS by ignoring one character from either sequence.

public class LCS {
    public static int lcsRecursive(String X, String Y, int m, int n) {
        if (m == 0 || n == 0) {
            return 0;
        }
        if (X.charAt(m - 1) == Y.charAt(n - 1)) {
            return 1 + lcsRecursive(X, Y, m - 1, n - 1);
        } else {
            return Math.max(lcsRecursive(X, Y, m, n - 1), lcsRecursive(X, Y, m - 1, n));
        }
    }
}

Memoization Technique

To make the recursive approach better, we can use memoization. This means we store the results of smaller problems. This helps us avoid doing the same calculations again and makes it much faster.

public class LCS {
    public static int lcsMemoization(String X, String Y, int m, int n, int[][] memo) {
        if (m == 0 || n == 0) {
            return 0;
        }
        if (memo[m][n] != -1) {
            return memo[m][n];
        }
        if (X.charAt(m - 1) == Y.charAt(n - 1)) {
            memo[m][n] = 1 + lcsMemoization(X, Y, m - 1, n - 1, memo);
        } else {
            memo[m][n] = Math.max(lcsMemoization(X, Y, m, n - 1, memo), lcsMemoization(X, Y, m - 1, n, memo));
        }
        return memo[m][n];
    }
}

Tabulation Method

The tabulation method creates a 2D array to store the lengths of LCS for different parts of the sequences. We do this step by step.

public class LCS {
    public static int lcsTabulation(String X, String Y) {
        int m = X.length();
        int n = Y.length();
        int[][] dp = new int[m + 1][n + 1];

        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (X.charAt(i - 1) == Y.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                } else {
                    dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
                }
            }
        }
        return dp[m][n];
    }
}

Space Optimized Solution

To save space, we can use just one array. This array will hold the current and previous row of the DP table.

public class LCS {
    public static int lcsSpaceOptimized(String X, String Y) {
        int m = X.length();
        int n = Y.length();
        int[] prev = new int[n + 1];
        int[] curr = new int[n + 1];

        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (X.charAt(i - 1) == Y.charAt(j - 1)) {
                    curr[j] = prev[j - 1] + 1;
                } else {
                    curr[j] = Math.max(prev[j], curr[j - 1]);
                }
            }
            int[] temp = prev;
            prev = curr;
            curr = temp;
        }
        return prev[n];
    }
}

The methods we showed give good ways to solve the Longest Common Subsequence problem in Java using dynamic programming. For more reading on similar topics, you can check the Dynamic Programming Fibonacci Number or the Dynamic Programming Longest Increasing Subsequence.

Dynamic Programming Longest Common Subsequence Python Implementation

We can solve the Longest Common Subsequence (LCS) problem using dynamic programming in Python. The goal is to find the longest sequence that shows up in the same order in both strings. They do not need to be next to each other.

Python Code Implementation

Here is a simple Python code that shows how to use dynamic programming to solve the LCS problem. It uses a 2D table to save results.

def longest_common_subsequence(s1: str, s2: str) -> int:
    m, n = len(s1), len(s2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if s1[i - 1] == s2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

    return dp[m][n]

# Example usage
s1 = "AGGTAB"
s2 = "GXTXAYB"
length = longest_common_subsequence(s1, s2)
print(f"The length of the Longest Common Subsequence is: {length}")

Explanation of the Code

Initialization: We create a 2D list called dp. It has size (m + 1) x (n + 1) and starts with zeros.
Filling the DP Table: We look at both strings and fill the table like this:
- If the characters match (s1[i-1] == s2[j-1]), we take the value from the diagonal cell and add one.
- If they do not match, we take the bigger value from the cell on the left or the cell above.
Final Result: The bottom-right cell of the table gives us the length of the LCS.

This code runs with a time cost of (O(m n)) and space cost of (O(m n)). If we want to make it better, we can keep only two rows.

For more details about dynamic programming methods, we can check Dynamic Programming: Longest Increasing Subsequence.

Dynamic Programming Longest Common Subsequence C++ Implementation

We can solve the Longest Common Subsequence (LCS) problem using dynamic programming. Here is a simple C++ code for the LCS algorithm with a tabulation method.

#include <iostream>
#include <vector>
#include <string>
using namespace std;

int longestCommonSubsequence(string text1, string text2) {
    int m = text1.size();
    int n = text2.size();
    vector<vector<int>> dp(m + 1, vector<int>(n + 1, 0));

    for (int i = 1; i <= m; i++) {
        for (int j = 1; j <= n; j++) {
            if (text1[i - 1] == text2[j - 1]) {
                dp[i][j] = dp[i - 1][j - 1] + 1;
            } else {
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1]);
            }
        }
    }
    return dp[m][n];
}

int main() {
    string text1 = "abcde";
    string text2 = "ace";
    int lcs_length = longestCommonSubsequence(text1, text2);
    cout << "Length of Longest Common Subsequence: " << lcs_length << endl;
    return 0;
}

Explanation of the Code

We have a function called longestCommonSubsequence that takes two strings. It makes a 2D vector called dp. This vector saves the lengths of LCS at different points.

We use a nested loop to go through each character of both strings. If the characters match, we add one to the length of LCS we found so far. If they do not match, we take the bigger value from the nearby cells.

At the end, we find the length of LCS in dp[m][n], and we return this value.

This code runs in O(m*n) time. Here m and n are the lengths of the two strings. It works well for moderate input sizes. To learn more about dynamic programming, you can check the Dynamic Programming Longest Increasing Subsequence article.

Dynamic Programming Longest Common Subsequence Time Complexity Analysis

The time complexity of the Longest Common Subsequence (LCS) problem changes based on the method we use. Here is a simple overview of the different methods and their complexities:

Recursive Approach

Time Complexity: (O(2^{(m, n)}))
The recursive method looks at all possible pairs of characters. This leads to a huge number of calls.

Memoization Technique

Time Complexity: (O(m n))
This method improves the recursive approach. We save results in a 2D array. Here, (m) is the length of the first sequence and (n) is the length of the second sequence. We calculate each state only once.

Tabulation Method

Time Complexity: (O(m n))
The tabulation method is like memoization. It builds a table step by step. The space complexity is also (O(m n)) because of the table storage.

Space Optimized Solution

Time Complexity: (O(m n))
In this method, we use only two arrays to keep current and previous results. This cuts down the space to (O((m, n))). But the time complexity stays the same. We still look at all pairs of subsequences.

Summary of Time Complexity

Recursive: (O(2^{(m, n)}))
Memoization: (O(m n))
Tabulation: (O(m n))
Space Optimized: (O(m n)) time, (O((m, n))) space

We can solve the Longest Common Subsequence problem well using dynamic programming methods. These methods greatly lower the time complexity compared to simple recursive methods. If we want to learn more about dynamic programming, we can check out related topics like Dynamic Programming - Fibonacci Number or Dynamic Programming - Longest Increasing Subsequence.

Frequently Asked Questions

1. What is the Longest Common Subsequence (LCS) problem in dynamic programming?

The Longest Common Subsequence (LCS) problem is a well-known problem in dynamic programming. It is about finding the longest subsequence that is in two sequences. The subsequence keeps the order of characters but does not have to be next to each other. We can use this problem in many ways like comparing files and analyzing DNA sequences. Knowing the LCS problem is important for learning dynamic programming methods.

2. How does the recursive approach work for solving the LCS problem?

The recursive approach for the Longest Common Subsequence (LCS) problem means we make a function that checks characters from both sequences. If the characters are the same, we add to the count and call the function again with the next characters from both sequences. If they are not the same, we call the function again but skip a character from one of the sequences. This way is easy to understand but can be slow because of repeating problems. This makes the time complexity grow fast.

3. What is memoization in the context of LCS?

Memoization is a way to make the recursive solution of the Longest Common Subsequence (LCS) problem faster. We save the results of costly function calls and give back the saved result when we see the same inputs again. This really cuts down the number of times we call the function again and changes the time complexity from exponential to polynomial. This makes the algorithm much quicker.

4. How does the tabulation method improve the LCS solution?

The tabulation method is a bottom-up way to use dynamic programming. It helps to solve the Longest Common Subsequence (LCS) problem by using a table with two dimensions. This table keeps the lengths of LCS for different subsequences. We fill the table step by step by using values we got before. This way, we solve each small problem only one time. Because of this, it gives us a clear and organized way to find the LCS with a time complexity of O(m * n), where m and n are the lengths of the input sequences.

5. What are some applications of the Longest Common Subsequence algorithm?

The Longest Common Subsequence (LCS) algorithm has many uses in computer science and bioinformatics. We often use it in tools that compare files to see what changed in text files. It is also used in version control systems and in analyzing DNA sequences to find similarities in genetic sequences. Also, LCS can help in machine learning for choosing features and in natural language processing tasks. This makes it a useful tool in many areas.

For more about dynamic programming ideas, we can look at other articles on problems like the Fibonacci number, minimum path sum, and longest increasing subsequence. Each of these topics helps us understand dynamic programming methods and how to use them.