# Rash thoughts about .NET, C#, F# and Dynamics NAV.

## Saturday, 1. November 2008

### Damerau-Levenshtein-Distance in F# – part III – O(m+n) space and functional style

Filed under: BioInformatik,F#,Informatik,Mathematik — Steffen Forkmann at 16:31 Uhr

In the first part of this series I showed a naïve algorithm for the Damerau-Levenshtein distance which needs O(m*n) space. In the last post I improved the algorithm to use only O(m+n) space. This time I will show a more functional implementation which uses only immutable F#-Lists and works still in O(m+n) space. This version doesn’t need any mutable data.

```/// Calcs the damerau levenshtein distance.
let calcDL (a:'a array) (b: 'a array) =
let n = a.Length + 1
let m = b.Length + 1

let processCell i j act l1 l2 ll1 =
let cost =
if a.[i-1] = b.[j-1] then 0 else 1
let deletion = l2 + 1
let insertion = act + 1
let substitution = l1 + cost
let min1 =
deletion
|> min insertion
|> min substitution

if i > 1 && j > 1 &&
a.[i-1] = b.[j-2] && a.[i-2] = b.[j-1] then
min min1 <| ll1 + cost
else
min1

let processLine i lastL lastLastL =
let processNext (actL,lastL,lastLastL) j =
match actL with
| act::actRest ->
match lastL with
| l1::l2::lastRest ->
if i > 1 && j > 1 then
match lastLastL with
| ll1::lastLastRest ->
(processCell i j act l1 l2 ll1 :: actL,
l2::lastRest,
lastLastRest)
| _ -> failwith "can't be"
else
(processCell i j act l1 l2 0 :: actL,
l2::lastRest,
lastLastL)
| _ -> failwith "can't be"
| [] -> failwith "can't be"

let (act,last,lastLast) =
[1..b.Length]
|> List.fold_left processNext ([i],lastL,lastLastL)
act |> List.rev

let (lastLine,lastLastLine) =
[1..a.Length]
|> List.fold_left
(fun (lastL,lastLastL) i ->
(processLine i lastL lastLastL,lastL))
([0..m-1],[])

lastLine.[b.Length]

let damerauLevenshtein(a:'a array) (b:'a array) =
if a.Length > b.Length then
calcDL a b
else
calcDL b a```

I admit the code is still a little messy but it works fine. Maybe I will find the time to cleanup a bit and post a final version.

Tags: , , , , , , , ,

### Damerau-Levenshtein-Distance in F# – part II – O(m+n) space

Filed under: BioInformatik,F#,Informatik,Mathematik — Steffen Forkmann at 14:40 Uhr

Last time I showed a naïve implementation of the Damerau-Levenshtein-Distance in F# that needs O(m*n) space. This is really bad if we want to compute the edit distance of large sequences (e.g. DNA sequences). If we look at the algorithm we can easily see that only the last two lines of the (n*m)-matrix are used. This observation leads to a improvement where we compute the distance with only 3 additional arrays of size min(n,m).

```/// Calcs the damerau levenshtein distance.
let calcDL (a:'a array) (b: 'a array) =
let n = a.Length + 1
let m = b.Length + 1
let lastLine = ref (Array.init m (fun i -> i))
let lastLastLine = ref (Array.create m 0)
let actLine = ref (Array.create m 0)

for i in [1..a.Length] do
(!actLine).[0] <- i
for j in [1..b.Length] do
let cost =
if a.[i-1] = b.[j-1] then 0 else 1
let deletion = (!lastLine).[j] + 1
let insertion = (!actLine).[j-1] + 1
let substitution = (!lastLine).[j-1] + cost
(!actLine).[j] <-
deletion
|> min insertion
|> min substitution

if i > 1 && j > 1 then
if a.[i-1] = b.[j-2] && a.[i-2] = b.[j-1] then
let transposition = (!lastLastLine).[j-2] + cost
(!actLine).[j] <- min (!actLine).[j] transposition

// swap lines
let temp = !lastLastLine
lastLastLine := !lastLine
lastLine := !actLine
actLine := temp

(!lastLine).[b.Length]

let damerauLevenshtein(a:'a array) (b:'a array) =
if a.Length > b.Length then
calcDL a b
else
calcDL b a```

This version of the algorithm needs only O(n+m) space but is not really "functional" style. I will show a more "F#-stylish" version in part III.

Tags: , , , , , , , ,