Longest Common Substring
Algorithm
Given 2 strings abcdxyz
and xyzabcd
lets say, we want to find the longest common substring between the 2. There are 2 here: abcd
and xyz
. So we're looking for the former.
The most efficient algorithm given 2 strings n
and m
has characteristics:
Space:
Complexity:
And proceeds like so: We keep an in memory matrix where rows, indexed by i2,
correspond to letters in s2
:xyzabcd
and columns, indexed by i1
correspond to letters in s1
: abcdxyz
. We begin with i1 = 0
, i2 = 0
and iterate through both strings in a double for loop style. If s2[i2] == s1[i1]
we updated the matrix: m[i2][i1] = m[i2-1][i1-1] + 1
. The resulting m
looks like so:
a
b
c
d
x
y
z
x
0
0
0
0
1
0
0
y
0
0
0
0
0
2
0
z
0
0
0
0
0
0
3
a
1
0
0
0
0
0
0
b
0
2
0
0
0
0
0
c
0
0
3
0
0
0
0
d
0
0
0
4
0
0
0
Implementation
def lcs(s1: String, s2: String): String = {
val memo = Vector.fill(s2.length + 1)(Vector.fill(s1.length + 1)(0))
val ((maxL, maxI1), memoRes) =
1.to(s1.length).foldLeft(0 -> 0 -> memo){case (((maxL, maxI1), memo), i1) =>
1.to(s2.length).foldLeft(maxL -> maxI1 -> memo){case (((maxL, maxI1), memo), i2) =>
if (s1(i1 - 1) == s2(i2 - 1)) {
val next = memo(i2 - 1)(i1 - 1) + 1
val updated = memo.updated(i2, memo(i2).updated(i1, next))
if (next > maxL)
(next -> i1 -> updated)
else
(maxL -> maxI1 -> updated)
}
else
(maxL -> maxI1 -> memo)
}
}
s1.substring(maxI1 - maxL, maxI1)
}
println(lcs("abcdxyz", "xyzabcd"))