Longest common sub­se­quences (LCS) are a typical problem in IT . Ap­proach­es to solve the LCS problem often appear in in­ter­views for pro­gram­mers and algorithm courses.

What is the longest common sub­se­quence?

The aim is to find the longest common sub­se­quence in two sequences. This is where a sub­se­quence is derived from a sequence. The sub­se­quence has the same elemental order, in some instances when elements are removed. Let’s take a look at some examples to get a better idea of the principle:

Sequence X Sequence Y LCS(X, Y)
FATHER MATTER ATER
MOTHER HERBAL HER
DAVID DANIEL DAI
Note

There is an important dif­fer­ence between the longest common sub­se­quence and the longest common substring being that a substring may not have any gaps. In the example of “DAVID” and “DANIEL” the longest common substring would be “DA” since “I” is separated by “V” and “N”.

Are there any practical examples of the LCS problem?

The longest common sub­se­quence problem is an important issue in all areas in which sequences which stem from each other are used. There are certain ways to find LCS to see whether there are sim­i­lar­i­ties or dif­fer­ences and, in turn, discover any pla­gia­rism. The well-known “diff” utility which checks for changes to source text files is based on the LCS problem.

In bioin­for­mat­ics the longest common sub­se­quence problem is often used when analyzing DNA sequences. DNA bases change in certain positions over time due to mutations. The presence of a long common sub­se­quence in two sequences would suggest a genetic re­la­tion­ship. This can allow us to retrace evolution between species thanks to genetic de­vel­op­ment.

Linguists can also use the longest common sub­se­quence problem to research lin­guis­tic de­vel­op­ment over centuries. If you have two words from different languages which have the same meaning and a long common sub­se­quence, it suggests that they have the same root and etymology. This would be con­sid­ered, then, a similar his­tor­i­cal de­vel­op­ment.

What are the solutions to the longest common sub­se­quence problem?

First of all, you can solve the LCS problem with a naïve approach. This is a simple approach without using any op­ti­miza­tions or special methods. To do so you simply compute all sub­se­quences from two sequences and find the longest common sub­se­quence. Un­for­tu­nate­ly, this approach isn’t very efficient and is, therefore, only suitable for short sequences.

Below you will find three efficient solutions to the LCS problem:

  1. Recursive approach
  2. Op­ti­miza­tion through mem­o­iza­tion
  3. Dynamic pro­gram­ming

What all ap­proach­es have in common is that in regard to the two sequences, they differ in three cases:

  • The last letter is the same.
  • The last letter is not the same.
  • One of the sequences’ length is zero.

The ap­proach­es each differ in their time com­plex­i­ty (as­ymp­tot­ic run time) and space com­plex­i­ty (memory use):

Approach Run time Memory
Naive approach O(n * n²) O(n)
Recursive approach O(n²) O(1)
Op­ti­miza­tion via mem­o­iza­tion O(n *m) O(n* m)
Dynamic pro­gram­ming O(n *m) O(n* m)
Note

The al­go­rithms set out below all compute the length of the longest common sub­se­quence. There are, if ap­plic­a­ble, multiple set sub­se­quences of this length which can be de­ter­mined using other steps.

Recursive de­ter­mi­na­tion of the longest common sub­se­quence

If you look at the LCS problem, you can see that it has an “optimal sub­struc­ture”. This means that the problem can be reduced to sub­prob­lems. As a solution you can use a recursive approach. Below you can find some examples of al­go­rithms to implement this in three of the most common pro­gram­ming languages.

Python

def lcs(X, Y, m, n):
    # Base case
    if m == 0 or n == 0:
        return 0
    # Last elements are equal
    elif X[m-1] == Y[n-1]:
        return 1 + lcs(X, Y, m-1, n-1)
    # Last elements differ
    else:
        return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n))
# Let's test
X, Y = "DANIEL", "DAVID"
lcs_len = lcs(X, Y, len(X), len(Y))
print(f"Length of LCS is: {lcs_len}")
python

Java

import java.io.*;
class LCS {
    public static int lcs(String A, String B, int m, int n)
    {
        // Base case
        if (m == 0 || n == 0)
            return 0;
        // Last elements are equal
        if (A.charAt(m - 1) == B.charAt(n - 1))
            return 1 + lcs(A, B, m - 1, n - 1);
        // Last elements differ
        else
            return Math.max(lcs(A, B, m, n - 1),
             lcs(A, B, m - 1, n));
    }
    
    // Let's test
    public static void main(String[] args)
        {
            String X = "DAVID";
            String Y = "DANIEL";
            int lcsLength = LCS.lcs(X, Y, X.length(), Y.length());
            System.out.println("Length of LCS is: " + lcsLength);
        }
}
java
$1 Domain Names – Register yours today!
  • Simple reg­is­tra­tion
  • Premium TLDs at great prices
  • 24/7 personal con­sul­tant included
  • Free privacy pro­tec­tion for eligible domains

C++

#include <iostream>
using namespace std;
int lcs(string X, string Y, int m, int n)
{
    // Base case
    if (m == 0 || n == 0) {
        return 0;
    }
    // Last elements are equal
    if (X[m - 1] == Y[n - 1]) {
        return 1 + lcs(X, Y, m - 1, n - 1);
    }
    // Last elements differ
    else {
        return max(lcs(X, Y, m, n - 1), lcs(X, Y, m - 1, n));
    }
}
// Let's test
int main()
{
    // Initialize variables
    string X = "DAVID";
    string Y = "DANIEL";
    // Compute and output length of LCS
    cout << "Length of LCS is " << lcs(X, Y, X.size(), Y.size());
    return 0;
}
c++

Op­ti­miz­ing the recursive approach using mem­o­iza­tion

When looking again at the recursive approach, you can see that the over­lap­ping parts are computed. These char­ac­ter­is­tics, called “over­lap­ping sub­prob­lems” are known from Fibonacci sequences. In this case as well recursive sequences are con­stant­ly computed for a solution. To make the process more efficient, it’s worth­while using mem­o­iza­tion. In other words, you can cache the sub­prob­lems that have already been computed in a two-di­men­sion­al matrix.

Python

def lcs(X, Y, m, n, table):
    
    # Base case
    if (m == 0 or n == 0):
        return 0
    # Already computed value at given position
    if (table[m][n] != -1):
        return table[m][n]
    # Last elements are equal
    if X[m - 1] == Y[n - 1]:
        table[m][n] = 1 + lcs(X, Y, m - 1, n - 1, table)
    # Last elements differ
    else:
        table[m][n] = max(lcs(X, Y, m, n - 1, table), lcs(X, Y, m - 1, n, table))
    return table[m][n]
# Let's test
X = "DAVID"
Y = "DANIEL"
m, n = len(X), len(Y)
# Initialize table fields to `-1`
table = [[-1 for i in range(n + 1)] for j in range(m + 1)]
// Compute and output length of LCS
print(f"Length of LCS is: {lcs(X, Y, m, n, table)}")
python

Java

import java.io.*;
class LCS {
    public static int lcs(String X, String Y, int m, int n, int[][] table) {
        // Base case
        if (m == 0 || n == 0) {
            return 0;
        }
        // Already computed value at given position
        if (table[m][n] != -1) {
            return table[m][n];
        }
        // Last elements are equal
        if(X.charAt(m - 1) == Y.charAt(n - 1)) {
            table[m][n] = 1 + lcs(X, Y, m - 1, n - 1, table);
            return table[m][n];
        }
        // Last elements differ
        else {
            table[m][n] = Math.max(lcs(X, Y, m, n - 1, table),lcs(X, Y, m - 1, n, table));
            return table[m][n];
        }
    }
    // Let's test
    public static void main(String args[]){
        // Initialize variables
        String X = "DAVID";
        String Y = "DANIEL";
        int m = X.length();
        int n = Y.length();
        int[][] table = new int[m + 1][n + 1];
        
        // Initialize table fields to `-1`
        for(int i=0; i < m + 1; i++) {
            for(int j=0; j < n + 1; j++) {
                table[i][j] = -1;
            }
        }
        // Compute and output length of LCS
        int lcsLength = lcs(X, Y, m, n, table);
        System.out.println("Length of LCS is: " + lcsLength);
    }
}
java

C++

#include <bits/stdc++.h>
using namespace std;
int lcs(char *X, char* Y, int m, int n, vector<vector<int>>& table)
{
    // Base case
    if (m == 0 || n == 0)
        return 0;
    // Already computed value at given position
    if (table[m][n] != -1) {
        return table[m][n];
    }
    // Last elements are equal
    if (X[m - 1] == Y[n - 1]) {
        table[m][n] = 1 + lcs(X, Y, m - 1, n - 1, table);
        return table[m][n];
    }
    // Last elements differ
    else { 
        table[m][n] = max(lcs(X, Y, m, n - 1, table), lcs(X, Y, m - 1, n, table));
        return table;
    }
}
// Let's test
int main()
{
    // Initialize variables
    char X[] = "DAVID";
    char Y[] = "DANIEL";
    int m = strlen(X);
    int n = strlen(Y);
    // Initialize table with `-1`
    vector<vector<int>> table(m + 1, vector<int>(n + 1, -1));
    
    // Compute and output length of LCS
    cout << "Length of LCS is: " << lcs(X, Y, m, n, table);
    return 0;
}
c++

Dynamic pro­gram­ming for the longest common sub­se­quence

Dynamic pro­gram­ming is a non-recursive way to solve op­ti­miza­tion problems by dividing them into smaller sub­prob­lems and then solving them “bottom up” Among other things, dynamic pro­gram­ming is applied as a solution for pathfind­ing al­go­rithms. The longest common sub­se­quence problem can also be solved using dynamic pro­gram­ming when using a two-di­men­sion­al matrix.

Python

def lcs(X , Y, m, n): 
    
    # Initialize dynamic programming table fields to `None`
    table = [[None] * (n + 1) for _ in range(m + 1)]
    
    # Compute table values
    for i in range(m + 1):
        for j in range(n + 1):
            # Base case
            if i == 0 or j == 0 :
                table[i][j] = 0
            # Last elements are equal
            elif X[i - 1] == Y[j - 1]:
                table[i][j] = table[i - 1][j - 1]+ 1
            # Last elements differ
            else:
                table[i][j] = max(table[i - 1][j] , table[i][j - 1])
    
    return table[m][n]
# Let's test
X = "DAVID"
Y = "DANIEL"
# Compute and output length of LCS
lcs_len = lcs(X, Y, len(X), len(Y))
print(f"Length of LCS is: {lcs_len}")
python

Java

import java.io.*;
class LCS {
    public static int lcs(String X, String Y, int m, int n)
    {
        // Initialize dynamic programming table fields
        int table[][] = new int[m + 1][n + 1];
        for (int i = 0; i <= m; i++) {
            for (int j = 0; j <= n; j++) {
                // Base case
                if (i == 0 || j == 0)
                    table[i][j] = 0;
                // Last elements are equal
                else if (X.charAt(i - 1) == Y.charAt(j - 1))
                    table[i][j] = table[i - 1][j - 1] + 1;
                // Last elements differ
                else
                    table[i][j] = Math.max(table[i - 1][j], table[i][j - 1]);
            }
        }
        return table[m][n];
    }
    // Let's test
    public static void main(String args[]){
        // Initialize variables
        String X = "DAVID";
        String Y = "DANIEL";
        int m = X.length();
        int n = Y.length();
        // Compute and output length of LCS
        int lcsLength = lcs(X, Y, m, n);
        System.out.println("Length of LCS is: " + lcsLength);
    }
}
java

C++

#include <bits/stdc++.h>
using namespace std;
int lcs(string X, string Y, int m, int n) {
	// Initialize dynamic programming table
	int table[m + 1][n + 1];
	// Compute table values
	for (int i = 0; i <= m; i++) {
		for (int j = 0; j <= n; j++) {
			// Base case
			if (i == 0 || j == 0)
				table[i][j] = 0;
			// Last elements are equal
			else if (X[i - 1] == Y[j - 1])
				table[i][j] = table[i - 1][j - 1] + 1;
			// Last elements differ
			else
				table[i][j] = max(table[i - 1][j], table[i][j - 1]);
		}
	}
	return table[m][n];
}
// Let's test
int main() {
  // Initialize variables
  string X = "DAVID";
  string Y = "DANIEL";
  int m = X.size();
  int n = Y.size();
  // Compute and output length of LCS
  cout << "Length of LCS is " << lcs(X, Y, m, n);
  return 0;
}
c++
Go to Main Menu