Rash thoughts about .NET, C#, F# and Dynamics NAV.


"Every solution will only lead to new problems."

Thursday, 23. October 2008


Using PLINQ in F# – Parallel Map and Reduce (Fold) functions – part 1

Filed under: .NET 3.0,C#,F# — Steffen Forkmann at 18:25 Uhr

If your wondering how Google computes query results in such a short time you have to read the famous “MapReduce”-Paper by Jeffrey Dean and Sanjay Ghemawat (2004). It shows how one can split large tasks into a mapping and a reduce step which could then be processed in parallel.

With PLINQ (part of the Parallel Extensions to the .NET Framework) you can easily use “MapReduce”-pattern in .NET and especially F#. PLINQ will take care of all the MultiThreading and load balancing stuff. You only have to give PLINQ a map and a reduce (or fold) function.

Lets consider a small example. Someone wants to compute the sum of the factorials of all integers from 1 to 3000. With List.map and List.fold_left this is a very easy task in F#:

#light
open System

let add a b = a + b
let fac (x:bigint) = [1I..x] |> List.fold_left (*) 1I

let sum =
  [1I..3000I]
    |> List.map fac
    |> List.fold_left add 0I

printfn "Sum of Factorials: %A" sum

Of course you could do much much better if you don’t compute every factorial on its own (I will show this in one of the next parts) – but for this time I need an easy function that is time consuming.

This simple Task needs 27 sec. on my Core 2 Duo E6550 with 2.33 GHz and 3.5 GB RAM.

But we can do better if we use parallel map and fold functions with help of PLINQ:

let pMap (mapF:'a -> 'b) (data:IParallelEnumerable<'a>) =
  ParallelEnumerable.Select(data, mapF)

let pFold foldF seed (data:IParallelEnumerable<'a>)=
  ParallelEnumerable.Aggregate<'a,'b>(
    data, seed, new Func<'b,'a,'b>(foldF))

Now we can easily transform our calculation to a parallel version:

let sum =
  [1I..3000I].AsParallel<bigint>()
    |> pMap fac 
    |> pFold add 0I

Putting all together we can write a small test application:

#light 
open System
open System.Linq
open System.Diagnostics

let testRuntime f =
  let watch = new Stopwatch()
  watch.Start()
  (f(),watch.Elapsed)

let add a b = a + b
let fac (x:bigint) = [1I..x] |> List.fold_left (*) 1I

let list = [1I..3000I]

let pMap (mapF:'a -> 'b) (data:IParallelEnumerable<'a>)=
  ParallelEnumerable.Select(data, mapF)

let pFold foldF seed (data:IParallelEnumerable<'a>)=
  ParallelEnumerable.Aggregate<'a,'b>(
    data, seed, new Func<'b,'a,'b>(foldF))

let PLINQ() =
  list.AsParallel<bigint>()
    |> pMap fac
    |> pFold add 0I

let sequential() =
  list
   |> List.map fac
   |> List.fold_left add 0I

let (sumSequential,timeSequential) =
  testRuntime sequential
printfn "Time Normal: %.3fs" timeSequential.TotalSeconds

let (sumPLINQ,timePLINQ) =
  testRuntime PLINQ
printfn "Time PLINQ: %.3fs" timePLINQ.TotalSeconds

timePLINQ.TotalSeconds / timeSequential.TotalSeconds
  |> printfn "Ratio: %.2f"

sumSequential = sumPLINQ
  |> printfn "Same Results: %A"

On my machine I get the following results:

Time Normal: 27.955s

Time PLINQ: 15.505s

Ratio: 0.55

Same Results: true

This means I get nearly a perfect load balancing on my two processors for this task.

In part II I describe how one can compute a series of functions in parallel.

Tags: , , , , , , ,

No Comments »

No comments yet.

RSS feed for comments on this post. | TrackBack URI

Leave a comment

XHTML ( You can use these tags): <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> .