Rash thoughts about .NET, C#, F# and Dynamics NAV.

"Every solution will only lead to new problems."

Saturday, 1. November 2008

Damerau-Levenshtein-Distance in F# – part II – O(m+n) space

Filed under: BioInformatik,F#,Informatik,Mathematik — Steffen Forkmann at 14:40 Uhr

Last time I showed a naïve implementation of the Damerau-Levenshtein-Distance in F# that needs O(m*n) space. This is really bad if we want to compute the edit distance of large sequences (e.g. DNA sequences). If we look at the algorithm we can easily see that only the last two lines of the (n*m)-matrix are used. This observation leads to a improvement where we compute the distance with only 3 additional arrays of size min(n,m).

/// Calcs the damerau levenshtein distance.    
let calcDL (a:'a array) (b: 'a array) =       
  let n = a.Length + 1
  let m = b.Length + 1
  let lastLine = ref (Array.init m (fun i -> i))
  let lastLastLine = ref (Array.create m 0)
  let actLine = ref (Array.create m 0)
  for i in [1..a.Length] do
    (!actLine).[0] <- i      
    for j in [1..b.Length] do          
      let cost = 
        if a.[i-1] = b.[j-1] then 0 else 1
      let deletion = (!lastLine).[j] + 1
      let insertion = (!actLine).[j-1] + 1
      let substitution = (!lastLine).[j-1] + cost
      (!actLine).[j] <- 
        |> min insertion 
        |> min substitution

      if i > 1 && j > 1 then
        if a.[i-1] = b.[j-2] && a.[i-2] = b.[j-1] then
          let transposition = (!lastLastLine).[j-2] + cost  
          (!actLine).[j] <- min (!actLine).[j] transposition
    // swap lines
    let temp = !lastLastLine
    lastLastLine := !lastLine
    lastLine := !actLine
    actLine := temp

let damerauLevenshtein(a:'a array) (b:'a array) =
  if a.Length > b.Length then
    calcDL a b
    calcDL b a

This version of the algorithm needs only O(n+m) space but is not really "functional" style. I will show a more "F#-stylish" version in part III.

Tags: , , , , , , , ,

Friday, 31. October 2008

Damerau-Levenshtein-Distance in F# – part I

Filed under: BioInformatik,F#,Informatik — Steffen Forkmann at 17:12 Uhr

Today I am publishing an algorithm for calculating the Damerau-Levenshtein distance in F#. The Levenshtein distance is a metric that allows to measure the amount of difference between two sequences and shows how many edit operations (insert, delete, substitution) are needed to transform one sequence into the other. The Damerau-Levenshtein distance allows the transposition of two characters as an operation. It is often used for spelling corrections or to measure the variation (“edit distance”) between DNA sequences.

let damerauLevenshtein(a:'a array) (b:'a array) =       
  let init i j =
    if j = 0 then i
    elif i = 0 then j else 0
  let n = a.Length + 1
  let m = b.Length + 1
  let d = Array2.init n m init
  for i in [1..a.Length] do
    for j in [1..b.Length] do          
      let cost = 
        if a.[i-1] = b.[j-1] then 0 else 1
      let deletion = d.[i-1, j] + 1
      let insertion = d.[i,j-1] + 1
      let substitution = d.[i-1,j-1] + cost
      d.[i, j] <- 
        |> min insertion 
        |> min substitution
      if i > 1 && j > 1 && a.[i-1] = b.[j-2] && 
           a.[i-2] = b.[j-1] then
        let transposition = d.[i-2,j-2] + cost  
        d.[i, j] <- min d.[i,j] transposition  
  d.[a.Length, b.Length]  

This naïve implementation needs quadratic space (O(m*n)). Since the algorithm is used to calculate the edit distance of large DNA sequences this is extremly bad. Next time I will show how we can get linear space (O(m+n)) for the algorithm.

Tags: , , , , , , ,

Friday, 24. October 2008

Using PLINQ in F# – Parallel Map and Reduce (Fold) functions – part 2

Filed under: .NET 3.0,English posts,F#,Informatik,PLINQ — Steffen Forkmann at 18:00 Uhr

Last time I showed how it is possible to use parallel map and fold functions to compute the sum of all factorials between 1 and 3000. The result was a nearly perfect load balancing for this task on a two processor machine. This time I will derive a generic function that computes partial results in parallel and folds them to a final result.

Let’s consider our F# example:

let add a b = a + b  
let fac (x:bigint) = 
  [1I..x] |> List.fold_left (*) 1I
let sequential() =
   |> List.map fac
   |> List.fold_left add 0I

This is the same as:

let calcFactorialSum min max =
   |> List.map fac
   |> List.fold_left add 0I  
let f1() = calcFactorialSum    1I 2000I
let f2() = calcFactorialSum 2001I 2200I
let f3() = calcFactorialSum 2201I 2400I
let f4() = calcFactorialSum 2401I 2600I
let f5() = calcFactorialSum 2601I 2800I
let f6() = calcFactorialSum 2801I 3000I
let sequential2() =
  f1() + f2() + f3() + f4() + f5() + f6()

We spitted the summation into 6 independent tasks and computed the sum of the partial results. This has nearly no bearing on the runtime.

But with the help of PLINQ we can compute each task in parallel:

let asParallel (list: 'a list) = 

let runParallel functions = 
      asParallel functions, (fun f ->  f() ) )
let pFold foldF seed (data:IParallelEnumerable<'a>)=
    data, seed, new Func<'b,'a,'b>(foldF))

let calcFactorialsParallel() =
  [f1; f2; f3; f4; f5; f6]
    |> runParallel
    |> pFold add 0I

This time we build a list of functions (f1, f2, f3, f4, f5, f6) and run them in parallel. "runParallel” gives us back a list of the partial results, which we can fold with the function “add” to get the final result.

On my Core 2 Duo E6550 with 2.33 GHz and 3.5 GB RAM I get the following results:

Time Normal: 26.576s

Time Sequential2: 26.205s (Ratio: 0.99)

Time “Parallel Functions”: 18.426s (Ratio: 0.69)

Time PLINQ: 14.990s (Ratio: 0.56) (Last post)

Same Results: true

We can see that the parallel computation of the functions f1 – f6 is much faster than the sequential.

But why is the PLINQ-version (see last post) still faster? We can easily see that each partial function needs a different runtime (e.g. it’s much harder to calculate the factorials between 2800 and 3000 than between 2000 and 2200). On my machine I get:

Time F1: 8.738s

Time F2: 2.663s

Time F3: 3.119s

Time F4: 3.492s

Time F5: 3.889s

Time F6: 4.442s

The problem is that the Parallel Framework can only guess each runtime amount in advance. So the load balancing for 2 processors will not be optimal in every case. In the original PLINQ-version there are only small tasks, and the difference between each runtime is smaller. So it is easier to compute the load balancing.

But of course we can do better if we split f1 into two functions f7 and f8:

let f7() = calcFactorialSum    1I 1500I
let f8() = calcFactorialSum 1501I 2000I

So we can get a better load balancing:

Time F1: 8.721s

Time F7: 4.753s

Time F8: 4.829s

Time Normal: 26.137s

Time “Parallel Functions”: 16.138s (Ratio: 0.62)

Same Results: true

Tags: , , , , , ,

Thursday, 23. October 2008

Using PLINQ in F# – Parallel Map and Reduce (Fold) functions – part 1

Filed under: .NET 3.0,C#,F# — Steffen Forkmann at 18:25 Uhr

If your wondering how Google computes query results in such a short time you have to read the famous “MapReduce”-Paper by Jeffrey Dean and Sanjay Ghemawat (2004). It shows how one can split large tasks into a mapping and a reduce step which could then be processed in parallel.

With PLINQ (part of the Parallel Extensions to the .NET Framework) you can easily use “MapReduce”-pattern in .NET and especially F#. PLINQ will take care of all the MultiThreading and load balancing stuff. You only have to give PLINQ a map and a reduce (or fold) function.

Lets consider a small example. Someone wants to compute the sum of the factorials of all integers from 1 to 3000. With List.map and List.fold_left this is a very easy task in F#:

open System

let add a b = a + b
let fac (x:bigint) = [1I..x] |> List.fold_left (*) 1I

let sum =
    |> List.map fac
    |> List.fold_left add 0I

printfn "Sum of Factorials: %A" sum

Of course you could do much much better if you don’t compute every factorial on its own (I will show this in one of the next parts) – but for this time I need an easy function that is time consuming.

This simple Task needs 27 sec. on my Core 2 Duo E6550 with 2.33 GHz and 3.5 GB RAM.

But we can do better if we use parallel map and fold functions with help of PLINQ:

let pMap (mapF:'a -> 'b) (data:IParallelEnumerable<'a>) =
  ParallelEnumerable.Select(data, mapF)

let pFold foldF seed (data:IParallelEnumerable<'a>)=
    data, seed, new Func<'b,'a,'b>(foldF))

Now we can easily transform our calculation to a parallel version:

let sum =
    |> pMap fac 
    |> pFold add 0I

Putting all together we can write a small test application:

open System
open System.Linq
open System.Diagnostics

let testRuntime f =
  let watch = new Stopwatch()

let add a b = a + b
let fac (x:bigint) = [1I..x] |> List.fold_left (*) 1I

let list = [1I..3000I]

let pMap (mapF:'a -> 'b) (data:IParallelEnumerable<'a>)=
  ParallelEnumerable.Select(data, mapF)

let pFold foldF seed (data:IParallelEnumerable<'a>)=
    data, seed, new Func<'b,'a,'b>(foldF))

let PLINQ() =
    |> pMap fac
    |> pFold add 0I

let sequential() =
   |> List.map fac
   |> List.fold_left add 0I

let (sumSequential,timeSequential) =
  testRuntime sequential
printfn "Time Normal: %.3fs" timeSequential.TotalSeconds

let (sumPLINQ,timePLINQ) =
  testRuntime PLINQ
printfn "Time PLINQ: %.3fs" timePLINQ.TotalSeconds

timePLINQ.TotalSeconds / timeSequential.TotalSeconds
  |> printfn "Ratio: %.2f"

sumSequential = sumPLINQ
  |> printfn "Same Results: %A"

On my machine I get the following results:

Time Normal: 27.955s

Time PLINQ: 15.505s

Ratio: 0.55

Same Results: true

This means I get nearly a perfect load balancing on my two processors for this task.

In part II I describe how one can compute a series of functions in parallel.

Tags: , , , , , , ,

Thursday, 16. October 2008

Debugging in Dynamics NAV 2009

Filed under: .NET 3.0,C#,Dynamics NAV 2009,msu solutions GmbH,Visual Studio — Steffen Forkmann at 13:41 Uhr

Claus Lundstrøm zeigt in einem schönen Blogpost wie man in NAV2009 den Code auf Seite der ServiceTier (also auch remote) debuggen kann – und zwar über Visual Studio 2008 direkt im generierten C#-Code. Mit dieser Variante ist man nicht mehr gezwungen das Debugging über den Classic-Client zu tun, sondern kann direkt aus dem Dynamics NAV RoleTailored-Client debuggen.

Dummerweise ist der generierte C#-Code, wie das bei generiertem Code eigentlich immer der Fall ist, nicht gerade “optisch schöner” C#-Style und hat auch nur noch wenig mit dem Original-C/AL-Code zu tun – ist aber immerhin lesbar.

Das ist ein wirklich interessanter Ansatz und erlaubt mit etwas Geschick auch UnitTesting für NAV 2009. Dafür werde ich demnächst mal versuchen ein kleines Beispiel zu bloggen.

Tags: , , , , ,

Tuesday, 14. October 2008

Technologie Highlights von Heute und Morgen! – Umfrage auf der BASTA! 2008

Filed under: C#,F#,Veranstaltungen — Steffen Forkmann at 9:02 Uhr

Florian Mätschke hat soeben seine auf der BASTA! 2008 in Mainz angekündigte Umfrage auf seinem Blog veröffentlicht. Dabei wurden die BASTA!-Speaker zu der Technologie befragt, die sie im Moment am meisten fasziniert. “Gewinner” ist übrigens Silverlight 2 geworden, dicht gefolgt von funktionaler Programmierung (in F# bzw. LINQ) – wofür ich mich übrigens auch entschieden habe.

Insgesamt ist das Umfrageergebnis, wie für die BASTA! zu erwarten war, sehr .NET-lastig. Obwohl auf der Abendveranstaltung noch Technologien wie Waschmaschine und Auto als faszinierend erachtet wurden, haben sich die meisten Speaker schlussendlich für ihr Vortragsthema im weitesten Sinne entschieden.

Ich muss sagen, dass ich das Konzept der Umfrage sehr interessant finde. Das Problem ist nur, dass man z.B. auf einer Java-Konferenz natürlich vollkommen konträre Ergebnisse erzielt. Um die wirklichen “Technologie Highlights“ zu ermitteln müsste man die Umfrage selbstverständlich viel größer und anonymisiert anlegen.

Tags: , , , , ,

Sunday, 12. October 2008

F# option types und generische Listen in C# verwenden

Filed under: C#,F# — Steffen Forkmann at 14:08 Uhr

Luis Fallas beschreibt in seinem Blog (“Exploring Beautiful Languages”) an einem sehr schönen Beispiel, wie man die F# option types mit Hilfe von Extension Methods in C# verwenden kann.

Hier ist eine generische Variante zu seiner Exists()-Methode:

open System.Runtime.CompilerServices

module Extensions =
  let Exists(opt : 'a option) =
    match opt with
      | Some _ -> true
      | None –> false

Auf ähnlichem Wege kann man übrigens auch die generischen F#-Listen in System.Collections.Generic.List<T> umwandeln:

let ToCSharpList(list : 'a list) =
  let csharpList = 
    new System.Collections.Generic.List<'a>()
  list |> List.iter (fun item -> csharpList.Add item)

Der umgekehrte Weg (von C# nach F#) ist fast analog, allerdings muss man die Liste drehen:

static class Extensions
  /// <summary>
  /// Converts a System.Collections.Generic.List<T> 
  /// in the corresponding F# list.
  /// </summary>
  public static Microsoft.FSharp.Collections.List<T> 

     ToFSharpList<T>(this List<T> list)
      var fSharpList = 
      for (int i = list.Count - 1; i >= 0; i--)
       fSharpList = 
      return fSharpList;
Tags: , , , , , ,

ParallelFX wird Kernkomponente vom .NET Framework 4.0

Filed under: .NET 3.0,F# — Steffen Forkmann at 11:32 Uhr

Wie das ParallelFX-Team bekannt gegeben hat, werden die “Parallel Extensions für das .NET Framework” nun zur Kernkomponente vom .NET-Framework 4.0 befördert.

“Parallel Extensions will indeed be a part of the .NET Framework 4.0.  Not only will it be a part of it, it will be a core part of it.”

Das bedeutet, dass dann vermutlich eine ganze Reihe an Basisfunktionen schon von Hause aus parallel verarbeitet werden kann. Anwenden kann man die Bibliotheken auch jetzt schon ganz einfach – allerdings immer mit zusätzlichem Installationsaufwand. Aber es lohnt sich wirklich.

Ein weiterer interessanter Aspekt ist übrigens, dass das F#-Team bereits angekündigt hat demnächst seine asynchronen Workflows auf ParallelFX umzustellen.

Weitere Informationen zu ParallelFX:

Tags: , , , ,

Wednesday, 8. October 2008

Folien vom ERP Launch und dem Technical Airlift verfügbar

Filed under: Dynamics NAV 2009,Veranstaltungen — Steffen Forkmann at 9:55 Uhr

Die Folien vieler Vorträge von der diesjährigen Microsoft-ERP-Launchveranstaltung sowie vom Technical Airlift 2008 sind nun zum Download verfügbar. In insgesamt 29 Präsentation und 2 Videos werden die Neuheiten von Dynamics AX 2009 und Dynamics NAV 2009 vorgestellt.

Tags: , , , ,

Sunday, 5. October 2008

Delphi Prism (Delphi.NET) Beta-Tests gestartet

Filed under: Tools — Steffen Forkmann at 10:02 Uhr

Chris Pattinson (CodeGear QA Manager for RAD Studio products) weist in seinem Blog darauf hin, dass eine geschlossene Beta-Phase für Delphi Prism gestartet wurde. Wer also mal die neue Delphi-Umgebung für .NET testen möchte, kann sich aber bei Embarcadero dafür bewerben.


Weitere Details gibt es z.B. bei Robert Wachtel.

Tags: , , ,