Monday, December 28, 2015

MapReduce with Akka actors

I was creating a lecture on actors, in particular Akka, for the Scala class that I teach. I wanted to do something simple and general. I showed the students an application I had previously written which used actors to manage a hedge fund. But I wanted something even more generic so I thought about implementing map/reduce. After all, actors (especially remote actors) are the preferred means of implementing concurrent processing in Scala, especially where the code to be run in parallel involves representing mutable state.

So, I did the obvious -- I looked on the internet for some code that someone had already written. I didn't want to make a big deal out of it, or spend a lot of time on it. Unfortunately, I couldn't find anything. Well, I couldn't find anything good. There were several half-hearted attempts at doing it but nothing that I would want to use. So, I set out to do it myself. The good news was that I got something (a word count application) working in about an hour. But after sharing it with the students, I realized that it, too, wasn't nearly as good as it could be.

So, I have refined it to a point (after many hours of work) that I am now proud to share it. What, to me, is particularly elegant is the functional API that I created to set up map-reduce stages that can be chained (or piped, rather) together to form a (potentially) multi-stage parallel process. Using this interface actually hides all details of actors (except for creating the actor system and a couple of implicit values) from the application programmer.

If you're interested in taking a look, contributing, or whatever, please refer to Majabigwaduce on Github. The README will explain, hopefully, everything you need to know about using the framework. And, of course, the source code is all there too. As I searched for an appropriate name for something that uses Akka to implement Map/Reduce, I couldn't help the name Majabigwaduce from  popping into my mind. See the README for an explanation.

I have used my own naming convention everywhere in the code. If you're wondering about it, please see my previous blog.

Tuesday, December 8, 2015

Naming of identifiers

One of the conventions recommended by the designers of Scala is to use short identifier names. It's fine to make method names long (especially when they are private methods whose meaning may not be entirely clear otherwise) but context usually gives a pretty good clue to identifiers.

You've probably noticed code like this (taken from the Artima web site: Working with Lists):

def isort(xs: List[Int]): List[Int] =
    if (xs.isEmpty) Nil
    else insert(xs.head, isort(xs.tail))

where xs is used for a sequence of things and pronounced x's (the plural of x).

Here, I want to present a recommendation for generalizing this notion. In the same way that a sequence (Seq) of x is written xs, then a Future of x should be written xf. A Try of x might be written xt but I'm going to recommend something slightly different (xy) because t is useful for Tuple. So, the general scheme is:

val ac = C[A]

where C is some container type and A is some element type. Obviously, these can be composed such that if A happens to be a sequence of X (for example) then we end up with:

val xsc = C[Seq[X]]
Note that the order of the container/type definitions is mirrored for the name of the identifier. The advantage of all this is that when a reader (or even yourself) comes to read that code later, it is clear what the type of the identifier is, even if it isn't shown explicitly. If you're not reading the code through an IDE, then having an indication of type will be quite helpful. It also makes debugging a lot easier.

In a for-comprehension, patterns/generators look very simple: for (x <- xs) yield x, for example.

Here are my recommendations for the abbreviations (but of course you should choose your own):


ContainerAbbreviation
Seq/List, etc.s
Arraya
Mapm
Futuref
Tryy
Promisep
Tuplet
Optiono
Iterator/iterablei
Eitherh
RDDr

TypeAbbreviation
My own typex
Generic type A, B, T, etc.a, b, t, etc.
String (line/document)g
String (word)s or w
URLu
Doubled
Intn
Functionf
ActorRefa
Exceptione

Sometimes you might have more than one identifier of the same type in which case you can follow the single letter with 1, 2, etc. Or, alternatively, you might use a single uppercase character to add context, for example gD for a document string and gL for a line string. You might also decide to use t2 for a Tuple2, etc. In the situation where we have an Either of form Either[String,X] or Either[Throwable,X] then we simply use the identifier xe, ignoring the left-side types.

Some confusion might arise between a as an Array (i.e. a container) and a as a generic type A (or ActorRef). This should be clear in that types are found at the start of the identifier string. Similarly, f as a Future and f as a Function. And s if you use it both for a String and a Seq.

Here are some examples of code which uses this idea:

  def flatten[X](xsfs: Seq[Future[Seq[X]]])(implicit c: ExecutionContext): Future[Seq[X]] = Future.sequence(xsfs) map {_ flatten}

  def sequence[X](xy: Try[X]): Either[Throwable,X] =
    xy match {
    case Success(s) => Right(s)
    case Failure(e) => Left(e)
  }

  def sequence[X](xos : Seq[Option[X]]): Option[Seq[X]] = (Option(Seq[X]()) /: xos) {
    (xso, xo) => for (xs <- xso; x <- xo ) yield xs :+ x
  }

  val ws = List("http://www.htmldog.com/examples/","http://www.google.com")

  val uys = for ( w <- ws ) yield Try(new URL(w))