If your wondering how Google computes query results in such a short time you have to read the famous “MapReduce”-Paper by Jeffrey Dean and Sanjay Ghemawat (2004). It shows how one can split large tasks into a mapping and a reduce step which could then be processed in parallel.
With PLINQ (part of the Parallel Extensions to the .NET Framework) you can easily use “MapReduce”-pattern in .NET and especially F#. PLINQ will take care of all the MultiThreading and load balancing stuff. You only have to give PLINQ a map and a reduce (or fold) function.
Lets consider a small example. Someone wants to compute the sum of the factorials of all integers from 1 to 3000. With List.map and List.fold_left this is a very easy task in F#:
#light
open System
let add a b = a + b
let fac (x:bigint) = [1I..x] |> List.fold_left (*) 1I
let sum =
[1I..3000I]
|> List.map fac
|> List.fold_left add 0I
printfn "Sum of Factorials: %A" sum
Of course you could do much much better if you don’t compute every factorial on its own (I will show this in one of the next parts) – but for this time I need an easy function that is time consuming.
This simple Task needs 27 sec. on my Core 2 Duo E6550 with 2.33 GHz and 3.5 GB RAM.
But we can do better if we use parallel map and fold functions with help of PLINQ:
let pMap (mapF:'a -> 'b) (data:IParallelEnumerable<'a>) =
ParallelEnumerable.Select(data, mapF)
let pFold foldF seed (data:IParallelEnumerable<'a>)=
ParallelEnumerable.Aggregate<'a,'b>(
data, seed, new Func<'b,'a,'b>(foldF))
Now we can easily transform our calculation to a parallel version:
let sum =
[1I..3000I].AsParallel<bigint>()
|> pMap fac
|> pFold add 0I
Putting all together we can write a small test application:
#light
open System
open System.Linq
open System.Diagnostics
let testRuntime f =
let watch = new Stopwatch()
watch.Start()
(f(),watch.Elapsed)
let add a b = a + b
let fac (x:bigint) = [1I..x] |> List.fold_left (*) 1I
let list = [1I..3000I]
let pMap (mapF:'a -> 'b) (data:IParallelEnumerable<'a>)=
ParallelEnumerable.Select(data, mapF)
let pFold foldF seed (data:IParallelEnumerable<'a>)=
ParallelEnumerable.Aggregate<'a,'b>(
data, seed, new Func<'b,'a,'b>(foldF))
let PLINQ() =
list.AsParallel<bigint>()
|> pMap fac
|> pFold add 0I
let sequential() =
list
|> List.map fac
|> List.fold_left add 0I
let (sumSequential,timeSequential) =
testRuntime sequential
printfn "Time Normal: %.3fs" timeSequential.TotalSeconds
let (sumPLINQ,timePLINQ) =
testRuntime PLINQ
printfn "Time PLINQ: %.3fs" timePLINQ.TotalSeconds
timePLINQ.TotalSeconds / timeSequential.TotalSeconds
|> printfn "Ratio: %.2f"
sumSequential = sumPLINQ
|> printfn "Same Results: %A"
On my machine I get the following results:
Time Normal: 27.955s
Time PLINQ: 15.505s
Ratio: 0.55
Same Results: true
This means I get nearly a perfect load balancing on my two processors for this task.
In part II I describe how one can compute a series of functions in parallel.
Tags: F#, F#, Google, Map and fold, MapReduce, Parallel Computing, Parallel Extensions, PLINQ