Last time I showed how it is possible to use parallel map and fold functions to compute the sum of all factorials between 1 and 3000. The result was a nearly perfect load balancing for this task on a two processor machine. This time I will derive a generic function that computes partial results in parallel and folds them to a final result.
Let’s consider our F# example:
let add a b = a + b
let fac (x:bigint) =
[1I..x] |> List.fold_left (*) 1I
let sequential() =
[1I..3000I]
|> List.map fac
|> List.fold_left add 0I
This is the same as:
let calcFactorialSum min max =
[min..max]
|> List.map fac
|> List.fold_left add 0I
let f1() = calcFactorialSum 1I 2000I
let f2() = calcFactorialSum 2001I 2200I
let f3() = calcFactorialSum 2201I 2400I
let f4() = calcFactorialSum 2401I 2600I
let f5() = calcFactorialSum 2601I 2800I
let f6() = calcFactorialSum 2801I 3000I
let sequential2() =
f1() + f2() + f3() + f4() + f5() + f6()
We spitted the summation into 6 independent tasks and computed the sum of the partial results. This has nearly no bearing on the runtime.
But with the help of PLINQ we can compute each task in parallel:
let asParallel (list: 'a list) =
list.AsParallel<'a>()
let runParallel functions =
ParallelEnumerable.Select(
asParallel functions, (fun f -> f() ) )
let pFold foldF seed (data:IParallelEnumerable<'a>)=
ParallelEnumerable.Aggregate<'a,'b>(
data, seed, new Func<'b,'a,'b>(foldF))
let calcFactorialsParallel() =
[f1; f2; f3; f4; f5; f6]
|> runParallel
|> pFold add 0I
This time we build a list of functions (f1, f2, f3, f4, f5, f6) and run them in parallel. "runParallel” gives us back a list of the partial results, which we can fold with the function “add” to get the final result.
On my Core 2 Duo E6550 with 2.33 GHz and 3.5 GB RAM I get the following results:
Time Normal: 26.576s
Time Sequential2: 26.205s (Ratio: 0.99)
Time “Parallel Functions”: 18.426s (Ratio: 0.69)
Time PLINQ: 14.990s (Ratio: 0.56) (Last post)
Same Results: true
We can see that the parallel computation of the functions f1 – f6 is much faster than the sequential.
But why is the PLINQ-version (see last post) still faster? We can easily see that each partial function needs a different runtime (e.g. it’s much harder to calculate the factorials between 2800 and 3000 than between 2000 and 2200). On my machine I get:
Time F1: 8.738s
Time F2: 2.663s
Time F3: 3.119s
Time F4: 3.492s
Time F5: 3.889s
Time F6: 4.442s
The problem is that the Parallel Framework can only guess each runtime amount in advance. So the load balancing for 2 processors will not be optimal in every case. In the original PLINQ-version there are only small tasks, and the difference between each runtime is smaller. So it is easier to compute the load balancing.
But of course we can do better if we split f1 into two functions f7 and f8:
let f7() = calcFactorialSum 1I 1500I
let f8() = calcFactorialSum 1501I 2000I
So we can get a better load balancing:
Tags: F#, Google, Map and fold, MapReduce, Parallel Computing, Parallel Extensions, PLINQTime F1: 8.721s
Time F7: 4.753s
Time F8: 4.829s
Time Normal: 26.137s
Time “Parallel Functions”: 16.138s (Ratio: 0.62)
Same Results: true