F# microbenchmark study

Recently, there was some discussion about a set of microbenchmarks  reported in a study called Clash of the Lambdas which compared a simple stream/sequence benchmark using Java 8 Streams, Scala, C# LINQ and F#. I am learning F# and as a learning exercise I decided to re-implement one of the benchmarks (Sum of Squares Even) myself in F# without referring to the code provided by the authors.

The source of my implementation can be found on Bitbucket and binaries are also provided.  My interest was to test/compare various F# implementations and not cross-language comparison.  I implemented it in four different ways:

  • Imperative sequential for-loop
  • Imperative parallel version using Parallel.For from Task Parallel Library
  • Functional sequential version using F# sequences
  • Functional parallel version using F# PSeq from FSharp.ParallelSeq
  • UPDATE: I added a functional version using the Nessos Streams package as suggested by Nick Palladinos on twitter

I compiled using VS 2013 Express and F#3.1 with “Release” settings, Any CPU (32-bit not preferred) and ran it on my machine on 3 different CLR implementations:  MS CLR from .net SDK 4.5.2 running on Windows 8.1, MS CLR RyuJIT CTP4 and finally on OpenSUSE 13.1 using Mono 3.4 (sgen GC, no LLVM).

The results are as follows:

Imperative Functional
Sequential Parallel Sequential Parallel Streams
MS CLR 17 8 172 81 45
MS RyuJIT CTP4 18 7 168 76 44
Mono 88 23 240 797 97

Some observations for this microbenchmark:

  • Imperative version is far faster than the functional version, but the functional version was shorter and clearer to me. I wonder if there is some opportunity for compiler optimizations in the F# compiler for the functional version, such as inlining sequence operations or fusing a pipeline of operations where possible.
  • MS RyuJIT CTP4, which is the beta version of the next-gen MS CLR JIT, is performing similar to the current MS CLR. This is good to see
  • Mono is much slower than the MS CLR. Also, it absolutely hates F# parallel sequences for some reason.  I guess I will have to try and install Mono with LLVM enabled and then check the performance again.
  • Streams package from Nessos looks to be faster than F# sequences in this microbenchmark. It is currently sequential only but performs much faster than even PSeq.

These observations only apply to this microbenchmark, and probably should not be considered as general results.  Overall, it was a fun learning experience, especially as a newcomer to both F# and the .net ecosystem. F# looks like a really elegant and powerful language and is a joy to write.  There is still a LOT more to learn about both. For example, I am not quite clear what the best way to distribute .net projects as open-source. Should I distribute VS solution files? I am more used to distributing build files for CMake, Make, scons, ant etc. and looking more into FAKE. NuGet is also nice-ish though appears to be useful but not very powerful (eg: can’t remove packages) and merits further investigation.

Facebooktwittergoogle_plusredditlinkedinmail

3 thoughts on “F# microbenchmark study”

  1. I ran VS 2013’s profiler on your code, and it was thrown off my the genMyArr function. I would suggest memoizing the result to speed up benchmarks.

    Also, I put together a LINQ equivalent:

    myseq.Where(fun x -> x%2L = 0L).Sum(fun x -> x*x)

    It was about 2x speed of seq. My best guess is LINQ iterator magic. F# needs some of the LINQ implementation secret sauce, I think.

  2. Your functional benchmark functions look more like imperative to me, you might be writing out some things like filer / sum, but you are not writing it well and for performance. Also you are comparing two completly different data structures – you realize that There is also Array.Filter and such? Why are you counting using array for imperative, yet using sequence for functional? There are filter and sum and (shhh!) Array.Fold functions for arrays. I’m getting completely different results with realistic and functional approach, using Fold and ‘match’, with functional being about 5%(!!) slower not 1000% ! So practically there is no real difference… This test good

Leave a Reply

Your email address will not be published. Required fields are marked *