Wednesday, April 6, 2016

Feeling lazy -- memoization

Scala has a neat feature in the lazy val, which is somewhere between a val and a def. A val is evaluated exactly once, while a def is evaluated any number of times (0..n). A lazy val is evaluated at most once. In that sense, it's like a single-valued cache: no evaluation takes place if it's never needed but it will be evaluated only once if it's needed several times. One particularly useful use-case is when repeatedly referencing the value of a call-by-name method parameter.

But there's no multi-valued (collection) equivalent. You can create a "view" of a collection which is implemented as an IterableView. This is particularly useful when you want your collection to undergo several transformations (for example, by invoking map). Applying map to such a view essentially composes the existing function-to-be-applied with the new function. When the view is materialized (by calling force or apply) the (composed) function is applied to the member(s) that you need. However, if you apply force (or apply) a second or third time, the composed function will be invoked again. You can, if you like, "memoize" the result of the force or apply method, but you can't memoize them one element at a time.

So I recently asked a question about it on StackOverflow just in case there was something I was missing.

Let's have a brief interlude here on the meaning of memoization. A frequent issue when optimizing a program for performance is when a function is called with the same parameter (or parameters) as before. The opportunity to optimize this only occurs, obviously, if the function is referentially transparent (i.e. its context is immutable). Otherwise, we have no choice but to simply go through the evaluation again. But in functional programming in particular we will often have the situation where we can consider remembering the parameters and their result in a lookup table and, if those parameters ever arise again, we pass back the result instead of reevaluating the function. This is called "memoization", an ugly word and one which my spell-checker absolutely refuses to allow me to type without putting up a big fight. Why not just call it a cache? Well, according to Wikipedia, the reason is that it's a special case of caching (although I'm not quite buying the argument). For this reason (and because of spell-check) I'm going to use the word cache going forward.

You'd think that something that obvious would be part of the basic Scala language. But it isn't. What we want really is just a sub-class of Function1 where we provide a means of fulfilling (evaluating) the value.

Here's the simplest way to define such a cache:

class Cache[K,V](evaluate: K=>V) extends mutable.HashMap[K,V] {
  override def apply(k: K): V = getOrElseUpdate(k,evaluate(k))
}


This works exactly as you would expect. But the class (or its equivalent) doesn't appear in the Scala collections library. And all of the other methods of HashMap are exposed which doesn't really conform to the contract of a simple memoization class.

The other drawback to this class is that it doesn't deal well with exceptions that might arise in the evaluate function. Basically, they simply bubble up to the top. And, even if they are caught somewhere, they will be thrown again the next time the wrong key is requested.

A somewhat more robust version of Cache, however, can easily be built. This is what I'm using in an (Akka) actor-based project that I'm working on:

class Cache[K,V](evaluate: K=>Try[V])(implicit carper: String=>Unit) extends (K=>Option[V]) {
  def apply(k: K): Option[V] = m.getOrElseUpdate(k,fulfill(k))
  private val m = mutable.HashMap[K,Option[V]]()
  private def fulfill(k: K): Option[V] = evaluate(k).recoverWith{case x => carper(s"Cache: evaluation exception for key $k: $x"); Failure(x)}.toOption
}


Note that the return type of evaluate has changed: it's now a Try[V] instead of just V. And the apply method now returns an Option[V] instead of simply V. This makes it much more like invoking get on a collection. As a side-effect, when evaluate yields a failure, the cause of the failure is passed to carper, along with an explanatory comment. The failure then results in None being memoized which means that future calls to apply with the same parameter will return a None without causing the exception to recur.