Rash thoughts about .NET, C#, F# and Dynamics NAV.

"Every solution will only lead to new problems."

Category PLINQ

Monday, 30. November 2015

F# advent calendar: Using Async.Choice in Paket

Filed under: .NET,C#,Diverses,F#,PLINQ — Steffen Forkmann at 0:50 Uhr

[This post is part of the F# advent calendar 2015 series.]

Prologue: What is Paket?

Paket is a dependency manager for .NET with support for NuGet packages and GitHub repositories. It enables precise and predictable control over what packages the projects within your application reference. If you want to learn how to use Paket then read the “Getting started” tutorial and take a look at the FAQs.

Paket file structure

Async computations in F#

Asynchronous Workflows are a very old F# feature (they already shipped in 2007) and they are used in many places in Paket. In this article I want to highlight one of the nice applications that F#’s async model allows you.

Many readers are probably familiar with C#’s async and await keywords (IIRC released with C# in 2012), but F#’s async feature works a little different. You can read more about some of “Asynchronous gotchas in C#” in Tomas Petricek’s excellent blog post. Most of these gotchas come from the fact that C# is starting all async tasks automatically while F# wraps the logic in data and allows you to run it explicitly. For a really good introduction to asynchronous programming with F# I can recommend Scott Wlaschin’s blog post.

The issue: retrieving version numbers from NuGet feeds

Paket’s package resolution algorithm (if you are interested in algorithms then read more) needs to know which versions are avalaible for a given package. Usually users specify more than one NuGet feed and different NuGet feeds support different protocols. The following code shows how older Paket versions retrieved all version numbers for a given package across all configured NuGet feeds:

As you can see getAllVersions tries 4 different NuGet protocols per source and returns the first result that is not None. Every getVersionsViaProtocol call performs async web request. In GetAllVersions we run this function for every NuGet source in parallel and combine the results. This code is very similar to the “async web downloader” sample in Scott Wlaschin’s async blog post and basically the “hello world” of asynchronous programming. 

Code cleanup

The getAllVersions is deeply nested and it’s hard to understand what’s going on. With the help of List.tryPick we can rewrite the code as:

List.tryPick returns the first result that is not None. So instead of nesting multiple match and let! expressions we use a higher-order-function to encapsulate the same pattern.

Introducing Async.Choice

After using this code for a while in Paket we noticed that different NuGet server implementations each implement a different subset of the 4 protocols and differ very much in the response times. So there was no order of the protocols that would work good for all server implementations. But what if we could query all 4 protocols in parallel and just take the fastest response?

The change is relatively easy and we can rewrite the GetAllVersions as:Instead of running the web requests synchronously and in order we run the four web request per NuGet feed in parallel and take the first response that is not None.

Implementation of Async.Choice

Unfortunately Async.Choice is not part of the standard FSharp.Core library and it turns out it is not easy to implement. There are many different implementations floating around on the internet. The one we use in Paket is by Eirik Tsarpalis and taken from fssnip. One of the advantages of this version is that the automatic cancellation of tasks works nicely. Since we are running lots and lots of web requests in parallel and always take the first response we want to cancel all the other pending web requests. Otherwise we would basically DDOS the feeds with useless requests (and yes that happened to us ;-)). Since Async.Choice is very useful we sent a pull request to the Visual F# project and hope that one day it will be in the box. 


Treating computations as data has even more advantages. The m-brace project created a new version of computation workflows called “cloud”:

cloud is a computation workflow builder and allows you to run your computations in the cloud. In contrast to async we don’t only control when something gets executed but also where. They even have a Cloud.Choice, which allows you to run tasks asynchronous in the cloud and take the first succeeding result.

Btw: async and cloud aren’t language keywords like C#’s async and await, but implemented as normal F# code in libraries. If you want to define your own computation expression builders, then the MSDN docs are a good starting point.

Tags: , , ,

Friday, 24. October 2008

Using PLINQ in F# – Parallel Map and Reduce (Fold) functions – part 2

Filed under: .NET 3.0,English posts,F#,Informatik,PLINQ — Steffen Forkmann at 18:00 Uhr

Last time I showed how it is possible to use parallel map and fold functions to compute the sum of all factorials between 1 and 3000. The result was a nearly perfect load balancing for this task on a two processor machine. This time I will derive a generic function that computes partial results in parallel and folds them to a final result.

Let’s consider our F# example:

let add a b = a + b  
let fac (x:bigint) = 
  [1I..x] |> List.fold_left (*) 1I
let sequential() =
   |> List.map fac
   |> List.fold_left add 0I

This is the same as:

let calcFactorialSum min max =
   |> List.map fac
   |> List.fold_left add 0I  
let f1() = calcFactorialSum    1I 2000I
let f2() = calcFactorialSum 2001I 2200I
let f3() = calcFactorialSum 2201I 2400I
let f4() = calcFactorialSum 2401I 2600I
let f5() = calcFactorialSum 2601I 2800I
let f6() = calcFactorialSum 2801I 3000I
let sequential2() =
  f1() + f2() + f3() + f4() + f5() + f6()

We spitted the summation into 6 independent tasks and computed the sum of the partial results. This has nearly no bearing on the runtime.

But with the help of PLINQ we can compute each task in parallel:

let asParallel (list: 'a list) = 

let runParallel functions = 
      asParallel functions, (fun f ->  f() ) )
let pFold foldF seed (data:IParallelEnumerable<'a>)=
    data, seed, new Func<'b,'a,'b>(foldF))

let calcFactorialsParallel() =
  [f1; f2; f3; f4; f5; f6]
    |> runParallel
    |> pFold add 0I

This time we build a list of functions (f1, f2, f3, f4, f5, f6) and run them in parallel. "runParallel” gives us back a list of the partial results, which we can fold with the function “add” to get the final result.

On my Core 2 Duo E6550 with 2.33 GHz and 3.5 GB RAM I get the following results:

Time Normal: 26.576s

Time Sequential2: 26.205s (Ratio: 0.99)

Time “Parallel Functions”: 18.426s (Ratio: 0.69)

Time PLINQ: 14.990s (Ratio: 0.56) (Last post)

Same Results: true

We can see that the parallel computation of the functions f1 – f6 is much faster than the sequential.

But why is the PLINQ-version (see last post) still faster? We can easily see that each partial function needs a different runtime (e.g. it’s much harder to calculate the factorials between 2800 and 3000 than between 2000 and 2200). On my machine I get:

Time F1: 8.738s

Time F2: 2.663s

Time F3: 3.119s

Time F4: 3.492s

Time F5: 3.889s

Time F6: 4.442s

The problem is that the Parallel Framework can only guess each runtime amount in advance. So the load balancing for 2 processors will not be optimal in every case. In the original PLINQ-version there are only small tasks, and the difference between each runtime is smaller. So it is easier to compute the load balancing.

But of course we can do better if we split f1 into two functions f7 and f8:

let f7() = calcFactorialSum    1I 1500I
let f8() = calcFactorialSum 1501I 2000I

So we can get a better load balancing:

Time F1: 8.721s

Time F7: 4.753s

Time F8: 4.829s

Time Normal: 26.137s

Time “Parallel Functions”: 16.138s (Ratio: 0.62)

Same Results: true

Tags: , , , , , ,