Monday, December 28, 2015

MapReduce with Akka actors

I was creating a lecture on actors, in particular Akka, for the Scala class that I teach. I wanted to do something simple and general. I showed the students an application I had previously written which used actors to manage a hedge fund. But I wanted something even more generic so I thought about implementing map/reduce. After all, actors (especially remote actors) are the preferred means of implementing concurrent processing in Scala, especially where the code to be run in parallel involves representing mutable state.

So, I did the obvious -- I looked on the internet for some code that someone had already written. I didn't want to make a big deal out of it, or spend a lot of time on it. Unfortunately, I couldn't find anything. Well, I couldn't find anything good. There were several half-hearted attempts at doing it but nothing that I would want to use. So, I set out to do it myself. The good news was that I got something (a word count application) working in about an hour. But after sharing it with the students, I realized that it, too, wasn't nearly as good as it could be.

So, I have refined it to a point (after many hours of work) that I am now proud to share it. What, to me, is particularly elegant is the functional API that I created to set up map-reduce stages that can be chained (or piped, rather) together to form a (potentially) multi-stage parallel process. Using this interface actually hides all details of actors (except for creating the actor system and a couple of implicit values) from the application programmer.

If you're interested in taking a look, contributing, or whatever, please refer to Majabigwaduce on Github. The README will explain, hopefully, everything you need to know about using the framework. And, of course, the source code is all there too. As I searched for an appropriate name for something that uses Akka to implement Map/Reduce, I couldn't help the name Majabigwaduce from  popping into my mind. See the README for an explanation.

I have used my own naming convention everywhere in the code. If you're wondering about it, please see my previous blog.

No comments:

Post a Comment