Paradigmatic's Random ramblings

27Feb/129

Benchmarking scala list traversal idioms

Disclaimer: This article is about micro-benchmarking. I know it's not an exact science, but I tried to keep the comparisons as far as possible. Code and results are available on github. If you disagree about the methodology or if you want to test alternative implementations, Scala or JVM versions, you will find instruction for running the benchmark at the end of this post.

The purpose

I've recently read a nice post from javacodegeeks.com about "Variations for computing results from sequences in Scala". The author showed several Scala idioms for computing the given task:

" Given a List of words, compute two lists: one has the lengths of each word and the second indicates whether a word starts with a capital letter or not "

After reading it, I've done a series of micro-benchmark to evaluate the cost of using a  given style. Most code presented here comes from that article, but was slightly tuned to provide the best performances. Again, if you don't agree with my implementation, feel free to try alternatives...

The methodology

Micro-benchmarking code running on JVM can be very hard. I decided to cicumvent the difficulty by using Google Caliper framework. It was accessed from a nice  Scala wrapper available on github.

The problem input (a list of strings) was created from a list of random words of 8 character (alpha chars, lower and upper case). Inputs of several size were tried for each implementation: 10, 100, 1000, 10000. For more information, see the code.

The benchmark were run with the following setup:

  • CPU: Intel i5-2500 @ 3.3 GHz
  • OS: Ubuntu 11.10 (Linux 3.0.0-16)
  • Java: OpenJDK 1.6.0_23
  • Scala:  2.9.1

The code

Here are the different programming styles that I want to compare:

Old School style

First, I propose an old-school procedural implementation. It's very close to the code I would produce if I were programming in Java. Since it's the fastest implementation (sorry for the spoiler), I will normalize the duration of every benchmark by the result of this implementation for each size.

val n = words.length
val wLength = Array.ofDim[Int](n)
val wCaps = Array.ofDim[Boolean](n)
var i = 0
val it = words.iterator
while( it.hasNext ) {
  val w = it.next
  wLength(i) = w.length
  wCaps(i) = isCapitalized(w)
  i += 1
}
( wLength, wCaps )

Since I know the size of the output, I can use fixed size arrays. I use an iterator and a while loop to traverse the input as fast as possible. Finally I must update a counter manually  to insert the results at the correct position. The resulting code is fast, but lengthy and full of possible bugs (such as forgetting to increment the index or modifying the array by mistake).

Old-School safe style

The example above returned  a pair of mutable arrays. In Scala, we tend to favor immutable data structures. Here is a version which converts the resulting arrays to Lists:

val n = words.length
val wLength = Array.ofDim[Int](n)
val wCaps = Array.ofDim[Boolean](n)
var i = 0
val it = words.iterator
while( it.hasNext ) {
  val w = it.next
  wLength(i) = w.length
  wCaps(i) = isCapitalized(w)
  i += 1
}
( wLength.toList, wCaps.toList )

Of course the resulting code will be slower as two arrays must be traversed to build two lists.

Functional style

By symmetry, here is the code that most Scala programmers will consider "idiomatic":

val wLength = words.map( _.length )
val wCaps = words.map( isCapitalized )
(wLength, wCaps)

The code here is very short and readable. In contrast with the original post, I used two map operation rather than map/unzip combination, because it's faster.

Recursive style

Tail-recursive functions are supposed to be converted to fast while loops by the Scala compiler:

@tailrec
def lengthAndCaps( ws: List[String], ls: List[Int], cs: List[Boolean] ): (List[Int],List[Boolean]) =
  if( ws.isEmpty )
    (ls.reverse, cs.reverse)
  else {
    val w = ws.head
    lengthAndCaps( ws.tail, w.length::ls, isCapitalized(w)::cs )
  }
val (wLength,wCaps) = lengthAndCaps( words, Nil, Nil )
( wLength, wCaps )

The solution looks like an exercise in a Functional Programming 101 tutorial. Everything is immutable and all looping is replaced by recursive calls. I've avoided pattern matching to reduce the duration.

Reassignment style

Programmers with procedural background often find recursive code less readable than explicit loops. Here is a version using mutable variables of immutable lists:

var wLength = List[Int]()
var wCaps = List[Boolean]()
for( word <- words ) {
  wLength ::= word.length
  wCaps ::= isCapitalized(word)
}
(wLength, wCaps)

This code will certainly make the "functional purists" cringe. But this usage is compatible with a string functional discipline, if you encapsulate it inside a function, it will be referentially transparent.

Buffer style

Some programmers have the feeling that mutable collections are always faster. To test this hypothesis, we can use a mutable ArrayBuffer to accumulate the result:

import collection.mutable._
val wLength = new ArrayBuffer[Int]()
val wCaps = new ArrayBuffer[Boolean]()
for( word <- words ) {
  wLength.append(word.length)
  wCaps.append(isCapitalized(word))
}
( wLength, wCaps )

In contrast with the original post, I used an ArrayBuffer rather than a ListBuffer, because it's faster.

Buffer safe style

Finally, we can convert the mutable buffers to immutable lists as we did for the old-school implementation:

import collection.mutable._
val wLength = new ListBuffer[Int]()
val wCaps = new ListBuffer[Boolean]()
for( word <- words ) {
  wLength.append(word.length)
  wCaps.append(isCapitalized(word))
}
( wLength.toList, wCaps.toList )

I use here a ListBuffer, because it can be converted to a List faster.

The results

Here are the benchmark results. As I said above, every column is normalized to the old-school implementation. Thus the number here are the number of folds a given implementation is slower that the old-school implementation. For instance, with an input of size 1000, the functional style is 3.5x slower.

10100100010000100000
OldSchool1.01.01.01.01.0
OldSchoolSafe2.82.73.63.52.8
Functional3.02.13.53.12.9
Recurs1.61.82.12.52.3
Reassign1.61.92.22.11.8
Buffer4.35.15.35.03.5
BufferSafe5.45.87.76.94.7

Here is a chart illustrating the results:

Barplot of benchmark results

The conclusions

From the results, we can highlight some points:

  • Using procedural style (old-school) is faster, but it will force you to use that style all along you code, as soon as you convert big arrays to immutable collections, you will lose the performance gains.
  • The functional style is short, readable and safe. It is only 3x slower than the procedural style. For most codes, specially when the operations are not in a tight loop, this is definitely the best option.
  • Both recursive and reassignment style are rougly 2x slower than the procedural style, while maintaining immutability. It's a very good choice for performance sensitive areas of you code.
  • The poor results of both buffer implementations show that well designed immutable collections can be faster than mutable ones. I believe it should be avoided. It will be interesting to try with Java ArrayLists to check is Scala implementation of ArrayBuffer is the one to blame.

Running the benchmarks

To run the benchmarks, you need git and sbt 0.11 installed. Then you just need to follow a few steps:

  1. git clone git://github.com/paradigmatic/seqBench.git
  2. Modify the file: src/main/scala/TraversalVariations.scala
  3. Run the benchmark with: sbt run
  4. Write about your results ;-)

Print This Post Print This Post
16Feb/120

Sonatype/Implicitly Release Checklist

Configrity 0.10.0 is finally out and brings YAML format import/export. The full release process on Sonatype was full of small boring steps that must be taken in a given order.

If I were less lazy, I would have written a nice script to automatize everything. In the mean time, I have a small checklist (org-mode format), which helps me to mind every step in correct order. Perhaps you will find it useful:

* Release check-list

** Before publication [0%]

  - [ ] Clean >> All tests green.
  - [ ] Check version number in =project/Build.scala=
  - [ ] Update changelog
  - [ ] Update README file
  - [ ] Write the new note for implicitly
  - [ ] Commit changes
  - [ ] Tag release

** Publication [0%]

  - [ ] SBT publish
  - [ ] Close the stagging repository on http://oss.sonatype.org
  - [ ] Release the repository 

** After publication [0%]

  - [ ] Wait at least two hours
  - [ ] Check it appears in MVN central
  - [ ] Update README maven section
  - [ ] Push and push --tags
  - [ ] Generate the scaladoc
  - [ ] Push the scaladoc
  - [ ] Update the wiki
  - [ ] submit the release to implicitly
  - [ ] close remaining issues

** When everything is OK

  - [ ] Reset this checklist

 

If you have a full automated solution let me know ;-)

Print This Post Print This Post
6Mar/119

Lazy parallel evaluation

In this post, I present a small experiment aimed to provide a toy DSL for demand-driven parallel computations in scala. The current solution is a functional approach (lazy monads), which works as expected. However, I do not claim to have found the perfect solution. In fact, I wrote this post because I was not able to achieve my initial syntactic goals and I hope to receive some suggestions.

Print This Post Print This Post
23Feb/110

Anonymous adapter static factory

The adapter design pattern allows to 'translate' an interface into one other. Usually it is done by creating an wrapper class. In this short post, I show how to use static factory method instead, and I analyse some of the potential benefits of this approach. This post is mostly about java, but the ideas here can be exported to any object-oriented language with anonymous class support. I don't claim that the ideas here are new, but I think they deserve some illustration.

Print This Post Print This Post
20Feb/118

Does it scale ?

In this short tutorial we explain how to measure properly the parallel scalability of a piece of software. This writing is not for experts  but for beginners and code dabblers who are looking for a better analysis of their creation. In what follows the terms CPUs, processors or cores are used without distinction.

Print This Post Print This Post
20Feb/110

Best datacenter feature ever

Last week I visited the Leibniz Supercomputing center. It's a really nice site hosting not only IT services for bavarian universities (more than 100,000 customers) and über-awesome computing clusters and supercomputers (they claim to have the NUMA machine with the biggest RAM in the world). But the most impressive feature can be found in a nearby building:

Print This Post Print This Post
Tagged as: , Continue reading
16Feb/110

Parallel collections coming to scala

Parallel collections will be part of scala 2.9. Here are some primers:

Other expected changes: Changes between Scala 2.8 and Scala 2.9

Print This Post Print This Post
16Feb/110

Powerful free software for scientists

As a researcher in computational science, I use currently some nice free and open source software, which make my life a lot easier. All the following tools work on Linux, most of them on MacOSX and some on Windows (check the links for more info).

Print This Post Print This Post
15Feb/112

Generating html meta data with Jekyll

This post was part of an old Jekyll blog. Since it receives lots of visits, I decided to repost here (even if I quit using Jekyll).

Purpose

(X)html allow web authors to supply meta-data about pages in the form of meta elements inside the <head> section. Arbitrary content can be specified in this fashion which is generally ignored by browsers. However, most major search engines (like Google, Yahoo, Bing, etc.) make use of this data to index the site and present search results. In this short article, we will see how useful data can be generated by Jekyll.

The method presented here works with all standard versions of Jekyll since 0.5.2.

Print This Post Print This Post