Rash thoughts about .NET, C#, F# and Dynamics NAV.


"Every solution will only lead to new problems."

Monday, 21. March 2016


Property based testing in the real world – or how I made my package manager suck less

Filed under: C#,F#,Informatik — Steffen Forkmann at 16:57 Uhr

A couple of days ago I tried to fix a bug in the .NET/mono dependency manager “Paket“. The bug was a really strange edge case in Paket’s resolver algorithm that resulted in a false positive conflict (an overview about the algorithm can be found here).

In other words: Paket reported a version conflict where it should have reported a valid package resolution.

After some investigation I found that the bug was in some optimization code that tries to cut parts of the search tree. When I disabled that part, the resolver found a valid resolution, but was significantly slower. So after some further testing I came up with a fix that was fast and solved the issue.

But why didn’t we find that edge case before? And most importantly: do we have other bugs in the resolver?

Isaac Abraham suggested to investigate the resolver optimizations further and proposed to use property based testing for this (see Scott Wlaschin’s blog for amazing introduction material).

Problem formulation

In this post I want to evaluate if Paket’s package resolver works correct. Let’s start by clarifying some terms.

  • A “package” consists of a name, a version and a (possibly empty) list of dependencies. In this post we are only talking about meta data. The package contents are irrelevant.
  • A “dependency” is always referencing another package name and defines a version requirement (e.g. >= 2.0 or < 4.1).
  • The “package graph” is a list of packages and acts as a stub for a possible configuration of a package feed like nuget.org. It’s basically a collection of the meta data of all published packages.
  • A “resolver puzzle” consists of a package graph and a list of dependencies. In other words it defines a possible configuration of the package feed and a specific list of package requirements (basically what you would define in paket.dependencies).
  • A “resolution” is a list of packages that “solves” the resolver puzzle. This means all package requirements are satisfied and we don’t have more than one package for a every package name.

Our goal is to show that Paket’s resolver is returning a version conflict if and only if we can not find a package resolution via a brute-force algorithm. We do not compare resolutions, we just want to know if the optimization is missing some valid resolutions.

Generating test data

Property based testing is a stochastical test approach. The test framework (in this case we use FsCheck) generates lots of test cases with random data and tries to automatically falsify a given assumption about the algorithm. 

Usually FsCheck will just generate random, uniformly distributed data for whatever data type it sees. This is a very good default, but in our case this would result in 99% package graphs that have no resolution.

In order to create more sensible data and increasing the likelihood of finding real bugs we need to write our own generator functions.

Let’s consider a very small example and think about package names. If we would use the standard FsCheck data generator then it would generate completely random names including empty string, null and names that contain very special characters. While this is great for testing the validation part of Paket, we don’t really need this here. We just generate some valid package names:

As you can see we just generate non-negative integers and create the package name by prefix a “P”. Similarly we can create random but valid version numbers like that:

Here we just map a triplet of non-negative integers to a version of form major.minor.patch.

Now it’s easy to generate a list of packages with corresponding versions. Actually FsCheck can automatically generate a list of PackageName and Version pairs, but since it’s completely random it would generate a list where most packages have only one or two versions. The following custom generator creates a list of different versions for every package and flattens the result. This has a much bigger chance of creating interesting cases:

With a small helper function that creates random version requirements we can create a full package graph:

The last step is to create some package requirements to that graph that we want to resolve. This is basically what you would write into your paket.dependencies file.

Testing against brute-force algorithm

The following defines a FsCheck property that calls Paket’s resolver with a random puzzle. If the resolver finds a resolution, then we verify that this resolution is correct. If it can’t find a resolution we use a brute-force algorithm to verify that the puzzle indeed has no valid resolution:

This is testing the real resolver against a very naïve brute-force implementation. The brute-force version was relatively easy to write and has no smart optimizations:

And this point I reverted the fix in the optimization code and started the test to see what happens. FsCheck needed about 3s to come up with a random example where a valid resolution exists but Paket’s resolver didn’t find it. That was an amazing moment. Unfortunately the example was HUGE and very complicated to understand.

Shrinking

Most property based testing frameworks have a second important feature called “shrinkers”. While generators create random test data, shrinkers are used to simplify counter-examples and have type ‘a -> seq<‘a>. Given a value (our counter-example), it produces a sequence of new examples that are “smaller” than the given value. These new values are tested if they still falsify the given property. If they do a new shrinking process will be started that tries to reduce the new example even further.

There are two easy ways to create a new “smaller” graph. The first optios is to select a package from the graph and remove one of it’s dependencies and second option is to remove a package completely. The following code shows this:

In a similar way we can shrink a puzzle:

After running the test again FsCheck reported:

The ouput shows that FsCheck generated a random puzzle with 104 packages and reduced it to a sample with only 3 packages and 2 package requirements. This made it pretty easy to analyze what was going on in the optimization part of the resolver. The sample was even smaller than the one that was reported from the wild.

With the fix reapplied I rerun the tests and FsCheck reported another error:

So now FsCheck discovered a new, previously unknown bug in Paket’s resolver. The sample shows 2 packages and one of the packages is requiring a non-existing package.
This is trivial case and the brute-force algorihm is instantly finding a solution. So what is wrong in Paket’s resolver?
Again the answer is hidden in some performance optimization and the fix can be found here.

But this was not the end – the property based test spotted yet another counter-example so I fixed that as well.

Conclusion

It took me a while to build up property based testing for this complex scenario. Most of the time was needed to build custom generators and shrinkers, but I was able to reproduce a bug from the wild and also found two new bugs. I think this is a big success.

No Comments »

No comments yet.

RSS feed for comments on this post. | TrackBack URI

Leave a comment

XHTML ( You can use these tags): <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> .