Longest Common Substring

Algorithm

Given 2 strings abcdxyz and xyzabcd lets say, we want to find the longest common substring between the 2. There are 2 here: abcd and xyz. So we're looking for the former.

The most efficient algorithm given 2 strings n and m has characteristics:

Space: O(n+m)O(n+m)

Complexity: O(nm)O(nm)

And proceeds like so: We keep an in memory matrix where rows, indexed by i2, correspond to letters in s2 :xyzabcd and columns, indexed by i1 correspond to letters in s1: abcdxyz. We begin with i1 = 0, i2 = 0and iterate through both strings in a double for loop style. If s2[i2] == s1[i1] we updated the matrix: m[i2][i1] = m[i2-1][i1-1] + 1 . The resulting m looks like so:

a

b

c

d

x

y

z

x

0

0

0

0

1

0

0

y

0

0

0

0

0

2

0

z

0

0

0

0

0

0

3

a

1

0

0

0

0

0

0

b

0

2

0

0

0

0

0

c

0

0

3

0

0

0

0

d

0

0

0

4

0

0

0

Implementation

def lcs(s1: String, s2: String): String = {
  val memo = Vector.fill(s2.length + 1)(Vector.fill(s1.length + 1)(0))
  val ((maxL, maxI1), memoRes) =
    1.to(s1.length).foldLeft(0 -> 0 -> memo){case (((maxL, maxI1), memo), i1) =>
      1.to(s2.length).foldLeft(maxL -> maxI1 -> memo){case (((maxL, maxI1), memo), i2) =>
        if (s1(i1 - 1) == s2(i2 - 1)) {
          val next = memo(i2 - 1)(i1 - 1) + 1
          val updated = memo.updated(i2, memo(i2).updated(i1, next))
          if (next > maxL)
            (next -> i1 -> updated)
          else
            (maxL -> maxI1 -> updated)
        }
        else
          (maxL -> maxI1 -> memo)
      }
    }
  s1.substring(maxI1 - maxL, maxI1)
}

println(lcs("abcdxyz", "xyzabcd"))

Last updated

Was this helpful?