Rash thoughts about .NET, C#, F# and Dynamics NAV.


"Every solution will only lead to new problems."

Friday, 31. October 2008


Damerau-Levenshtein-Distance in F# – part I

Filed under: BioInformatik,F#,Informatik — Steffen Forkmann at 17:12 Uhr

Today I am publishing an algorithm for calculating the Damerau-Levenshtein distance in F#. The Levenshtein distance is a metric that allows to measure the amount of difference between two sequences and shows how many edit operations (insert, delete, substitution) are needed to transform one sequence into the other. The Damerau-Levenshtein distance allows the transposition of two characters as an operation. It is often used for spelling corrections or to measure the variation (“edit distance”) between DNA sequences.

let damerauLevenshtein(a:'a array) (b:'a array) =       
  let init i j =
    if j = 0 then i
    elif i = 0 then j else 0
  let n = a.Length + 1
  let m = b.Length + 1
 
  let d = Array2.init n m init
 
  for i in [1..a.Length] do
    for j in [1..b.Length] do          
      let cost = 
        if a.[i-1] = b.[j-1] then 0 else 1
      let deletion = d.[i-1, j] + 1
      let insertion = d.[i,j-1] + 1
      let substitution = d.[i-1,j-1] + cost
      d.[i, j] <- 
        deletion 
        |> min insertion 
        |> min substitution
 
      if i > 1 && j > 1 && a.[i-1] = b.[j-2] && 
           a.[i-2] = b.[j-1] then
        let transposition = d.[i-2,j-2] + cost  
        d.[i, j] <- min d.[i,j] transposition  
 
  d.[a.Length, b.Length]  

This naïve implementation needs quadratic space (O(m*n)). Since the algorithm is used to calculate the edit distance of large DNA sequences this is extremly bad. Next time I will show how we can get linear space (O(m+n)) for the algorithm.

Tags: , , , , , , ,

No Comments »

No comments yet.

RSS feed for comments on this post. | TrackBack URI

Leave a comment

XHTML ( You can use these tags): <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> .