Paradigmatic's Random ramblings

27Feb/129

Benchmarking scala list traversal idioms

Disclaimer: This article is about micro-benchmarking. I know it's not an exact science, but I tried to keep the comparisons as far as possible. Code and results are available on github. If you disagree about the methodology or if you want to test alternative implementations, Scala or JVM versions, you will find instruction for running the benchmark at the end of this post.

The purpose

I've recently read a nice post from javacodegeeks.com about "Variations for computing results from sequences in Scala". The author showed several Scala idioms for computing the given task:

" Given a List of words, compute two lists: one has the lengths of each word and the second indicates whether a word starts with a capital letter or not "

After reading it, I've done a series of micro-benchmark to evaluate the cost of using a  given style. Most code presented here comes from that article, but was slightly tuned to provide the best performances. Again, if you don't agree with my implementation, feel free to try alternatives...

The methodology

Micro-benchmarking code running on JVM can be very hard. I decided to cicumvent the difficulty by using Google Caliper framework. It was accessed from a nice  Scala wrapper available on github.

The problem input (a list of strings) was created from a list of random words of 8 character (alpha chars, lower and upper case). Inputs of several size were tried for each implementation: 10, 100, 1000, 10000. For more information, see the code.

The benchmark were run with the following setup:

  • CPU: Intel i5-2500 @ 3.3 GHz
  • OS: Ubuntu 11.10 (Linux 3.0.0-16)
  • Java: OpenJDK 1.6.0_23
  • Scala:  2.9.1

The code

Here are the different programming styles that I want to compare:

Old School style

First, I propose an old-school procedural implementation. It's very close to the code I would produce if I were programming in Java. Since it's the fastest implementation (sorry for the spoiler), I will normalize the duration of every benchmark by the result of this implementation for each size.

val n = words.length
val wLength = Array.ofDim[Int](n)
val wCaps = Array.ofDim[Boolean](n)
var i = 0
val it = words.iterator
while( it.hasNext ) {
  val w = it.next
  wLength(i) = w.length
  wCaps(i) = isCapitalized(w)
  i += 1
}
( wLength, wCaps )

Since I know the size of the output, I can use fixed size arrays. I use an iterator and a while loop to traverse the input as fast as possible. Finally I must update a counter manually  to insert the results at the correct position. The resulting code is fast, but lengthy and full of possible bugs (such as forgetting to increment the index or modifying the array by mistake).

Old-School safe style

The example above returned  a pair of mutable arrays. In Scala, we tend to favor immutable data structures. Here is a version which converts the resulting arrays to Lists:

val n = words.length
val wLength = Array.ofDim[Int](n)
val wCaps = Array.ofDim[Boolean](n)
var i = 0
val it = words.iterator
while( it.hasNext ) {
  val w = it.next
  wLength(i) = w.length
  wCaps(i) = isCapitalized(w)
  i += 1
}
( wLength.toList, wCaps.toList )

Of course the resulting code will be slower as two arrays must be traversed to build two lists.

Functional style

By symmetry, here is the code that most Scala programmers will consider "idiomatic":

val wLength = words.map( _.length )
val wCaps = words.map( isCapitalized )
(wLength, wCaps)

The code here is very short and readable. In contrast with the original post, I used two map operation rather than map/unzip combination, because it's faster.

Recursive style

Tail-recursive functions are supposed to be converted to fast while loops by the Scala compiler:

@tailrec
def lengthAndCaps( ws: List[String], ls: List[Int], cs: List[Boolean] ): (List[Int],List[Boolean]) =
  if( ws.isEmpty )
    (ls.reverse, cs.reverse)
  else {
    val w = ws.head
    lengthAndCaps( ws.tail, w.length::ls, isCapitalized(w)::cs )
  }
val (wLength,wCaps) = lengthAndCaps( words, Nil, Nil )
( wLength, wCaps )

The solution looks like an exercise in a Functional Programming 101 tutorial. Everything is immutable and all looping is replaced by recursive calls. I've avoided pattern matching to reduce the duration.

Reassignment style

Programmers with procedural background often find recursive code less readable than explicit loops. Here is a version using mutable variables of immutable lists:

var wLength = List[Int]()
var wCaps = List[Boolean]()
for( word <- words ) {
  wLength ::= word.length
  wCaps ::= isCapitalized(word)
}
(wLength, wCaps)

This code will certainly make the "functional purists" cringe. But this usage is compatible with a string functional discipline, if you encapsulate it inside a function, it will be referentially transparent.

Buffer style

Some programmers have the feeling that mutable collections are always faster. To test this hypothesis, we can use a mutable ArrayBuffer to accumulate the result:

import collection.mutable._
val wLength = new ArrayBuffer[Int]()
val wCaps = new ArrayBuffer[Boolean]()
for( word <- words ) {
  wLength.append(word.length)
  wCaps.append(isCapitalized(word))
}
( wLength, wCaps )

In contrast with the original post, I used an ArrayBuffer rather than a ListBuffer, because it's faster.

Buffer safe style

Finally, we can convert the mutable buffers to immutable lists as we did for the old-school implementation:

import collection.mutable._
val wLength = new ListBuffer[Int]()
val wCaps = new ListBuffer[Boolean]()
for( word <- words ) {
  wLength.append(word.length)
  wCaps.append(isCapitalized(word))
}
( wLength.toList, wCaps.toList )

I use here a ListBuffer, because it can be converted to a List faster.

The results

Here are the benchmark results. As I said above, every column is normalized to the old-school implementation. Thus the number here are the number of folds a given implementation is slower that the old-school implementation. For instance, with an input of size 1000, the functional style is 3.5x slower.

10100100010000100000
OldSchool1.01.01.01.01.0
OldSchoolSafe2.82.73.63.52.8
Functional3.02.13.53.12.9
Recurs1.61.82.12.52.3
Reassign1.61.92.22.11.8
Buffer4.35.15.35.03.5
BufferSafe5.45.87.76.94.7

Here is a chart illustrating the results:

Barplot of benchmark results

The conclusions

From the results, we can highlight some points:

  • Using procedural style (old-school) is faster, but it will force you to use that style all along you code, as soon as you convert big arrays to immutable collections, you will lose the performance gains.
  • The functional style is short, readable and safe. It is only 3x slower than the procedural style. For most codes, specially when the operations are not in a tight loop, this is definitely the best option.
  • Both recursive and reassignment style are rougly 2x slower than the procedural style, while maintaining immutability. It's a very good choice for performance sensitive areas of you code.
  • The poor results of both buffer implementations show that well designed immutable collections can be faster than mutable ones. I believe it should be avoided. It will be interesting to try with Java ArrayLists to check is Scala implementation of ArrayBuffer is the one to blame.

Running the benchmarks

To run the benchmarks, you need git and sbt 0.11 installed. Then you just need to follow a few steps:

  1. git clone git://github.com/paradigmatic/seqBench.git
  2. Modify the file: src/main/scala/TraversalVariations.scala
  3. Run the benchmark with: sbt run
  4. Write about your results ;-)

Print This Post Print This Post
Comments (9) Trackbacks (1)
  1. Fold! I found:

    def timeFoldLeft(reps: Int) = repeat(reps) {
    words.foldLeft(List.empty[Int],List.empty[Boolean])((p,q)=>(q.length::p._1,isCapitalized(q)::p._2))
    }

    to be idiomatic Scala, purely functional, concise and as performant as all but the OldSchool implementation

  2. The recursive style should be as fast as (or minimally fasterthan) the reassignment style. The calls to “.reverse” might be the cause. Note that the two functions produce different output:

    val words = List(“some”, “WORDS”, “in”, “a”, “List”)

    recursive produces:
    (List(4, 5, 2, 1, 4),List(false, true, false, false, true))

    reassignment produces:
    (List(4, 1, 2, 5, 4),List(true, false, false, true, false))

    To be comparable with the other solutions, reassignment should probably reverse, too. Reversing the input might be faster than reversing the 2 output lists, too.

  3. @Jim, @Andreas: Thanks for your remarks, I will update the post soon.

  4. Please update Buffer style to set an initial size. E.g.:

    import collection.mutable._
    val wLength = new ArrayBuffer[Int](words.size)
    val wCaps = new ArrayBuffer[Boolean](words.size)
    for( word <- words ) {
    wLength.append(word.length)
    wCaps.append(isCapitalized(word))
    }
    ( wLength, wCaps )

  5. @Erik: I tried, and setting the initial size makes the computation 2% slower than the posted version ! I must confess it’s weird and unexpected. I will try to investigate it.

  6. 2% slower, for a micro benchmark that is virtually the same value. Interesting nonetheless!

    I think I have some rewriting to do….

  7. @Eric: I agree, but the trend is repeatable (at least on my laptop) so even if it doesn’t make any difference in real life, it tells something about the array-buffer implementation. I will check with java.util.ArrayList when I have more time.

  8. Thanks for the post and the code!!

    I added tests for Vector and List.newBuilder as well as ensuring that the Server VM is used and seeded Random so runs are always the same. I sent you a pull request and posted my results here: http://codedependents.com/2012/04/30/benchmarking-more-seq-traversal-idioms-in-scala/

  9. Nice benchmark. I am a bit surprised you did not consider this version:

    words.view.map(w => (w.length,isCapitalized(w))).unzip

    At a size of 100000 it is only slower by factor 2 compared to the old school implementation. It doesn’t fair that well at smaller sizes, but it almost always beats your functional implementation.

    [info] length benchmark ns linear runtime
    [info] 10 OldSchool 130 =
    [info] 10 Functional 411 =
    [info] 10 FunctionalZipped 279 =
    [info] 100 OldSchool 1046 =
    [info] 100 Functional 2180 =
    [info] 100 FunctionalZipped 2575 =
    [info] 1000 OldSchool 10524 =
    [info] 1000 Functional 35255 =
    [info] 1000 FunctionalZipped 29772 =
    [info] 10000 OldSchool 109789 =
    [info] 10000 Functional 383540 ==
    [info] 10000 FunctionalZipped 300771 =
    [info] 100000 OldSchool 1684707 ==========
    [info] 100000 Functional 4662245 ==============================
    [info] 100000 FunctionalZipped 3352162 =====================
    [info]


Leave a comment

(required)