Thursday, December 29, 2016

Get considered harmful

In Scala, there are several monadic containers for which the get method can throw an exception. The two most obvious such containers are Option and Try. If you have an instance of Option[T] and you invoke get, you will get either a T or throw a java.util.NoSuchElementException; if you have an instance of Try[T] and you invoke get, you will get either a T or throw the exception. Future[T] doesn't have a get method but it has value, which yields Option[Try[T]] which has two potential exception-throwing get methods.

So, we learn that using get is bad: a code smell. Idiomatic Scala recommends using pattern matching or monadic operations like map, flatMap, etc. for working with values from such containers.

So, if get is so bad, then why is Map[K,V] designed the way it is? Map[K,V] extends GenMapLike[K,V], which actually defines the following two methods:

  • abstract def apply(key: K): V
  • abstract def get(key: K): Option[V]
Thus, effectively, GenMapLike[K,V] extends Function1[K,V]. This class was originally written by Martin Odersky, although not until 2.9. I'm assuming that it was written this way to be compatible with earlier versions. In Scala 2.8, MapLike[K,V] extends Function1[K,V] via PartialFunction[K,V]. Again, Odersky was the author.

If the value of key is unknown in the map, the so-called default method is invoked. By default, the default method simply throws an exception (as you'd expect). In other words, apply is the dangerous method (default notwithstanding) and get is the safe method.

So, is it just me, or are these definitions backwards? This is how I feel that the methods in GenMapLike[K,V] should be defined:

  • abstract def get(key: K): V
  • abstract def apply(key: K): Option[V]

Thus, GenMapLike[K,V] would effectively extend Function1[K,Option[V]]. What would be wrong with that? It would be so much more consistent. By all means have this version of get invoke the default mechanism. But it would still essentially correspond to the dangerous method.

Obviously, nobody is going to change it now. And, they're just names, right? But it does seem a shame that it got this way in the first place.

Tuesday, December 13, 2016

Scala's pesky "contravariant position" problem

I'm sure this has happened to you. You design a data structure, for example a tree and, because you are thinking essentially of using this tree to look things up (perhaps it's an AST for example) you decide to make the tree covariant in its underlying type:

trait Tree[+A] extends Node[A] {
   def children: Seq[Node[A]]
}

Now, it's time to start building this tree up from nodes and branches so you add a method like the following:

  def +:(a: A): Tree[A] = :+(a)

Aargh! it's the dreaded "Covariant type A appears in contravariant position in type A of value a." Does this mean that we have to re-write our Tree trait so that it is invariant (as we would have to do if tree was a mutable data structure like Array)? Not at all.

There is a simple and pretty much automatic way of refactoring the +: method. Use the fact that you can define a parametric type at the method level and constrain it to be a super-type of A. Thus:

def +:[B >: A](a: B): Tree[B] = :+(a)

Since B is now a super-type of A (or, of course, A itself), then we declare a (we could rename it b now of course) as a B and we return a Tree[B]. Problem solved.

Note also that if the type of a needs to have one or more context bounds you can declare these too, just as you would expect:


  def +:[B >: A : TreeBuilder](b: B): Tree[B] = :+(b)

I should perhaps point out that this problem is not unique to Scala. It will occur (or should occur) in any object-oriented language that allows generic typing (or anywhere the Liskov substitution principle applies). It's only because Java, for example, glosses over the whole concept of variance, that we don't find the problem there too.

Next time I will talk about having more than one context bound for a type.

Tuesday, December 6, 2016

Spying -- or a functional way to log in Scala

All of the logging utilities I've found for Scala suffer from the same problem: they are not written in a functional-programming style. Here's the problem: you want to log the result of a function which, to choose an example more or less at random, looks like this:

  def map2[T, U](to1: Option[T], to2: => Option[T])(f: (T, T) => U): Option[U] = for {t1 <- to1; t2 <- to2} yield f(t1, t2)

Now, we decide that we'd like to log the result of this method (or maybe we just want to print it to the console). To do this the "traditional" way, we would need to write something like this:

  def map2[T, U](to1: Option[T], to2: => Option[T])(f: (T, T) => U): Option[U] = {
    val logger = LoggerFactory.getLogger(getClass)
    val r = for {t1 <- to1; t2 <- to2} yield f(t1, t2)
    logger.debug(s"map2: ${r.toString}")
    r
  }

We have interrupted the flow of the expression, we've had to create a new variable (r), we've had to wrap it all in braces. Not cool!

So, I looked around for something that could do this the functional way. Unfortunately, I couldn't find anything. There is something in Scalaz but it looked complicated and maybe a little too general. So I decided (as I often do) to write my own and call it Spy.

One of the design decisions that I had to make was this: I don't want the Spy to have to be instantiated for every user invocation, or even every class that an invocation appears in. Yet I want each invocation/class to be able to customize the spying behavior somewhat. That's where implicits come (again) to the rescue. But the spy-function (the one that actually does something with a String formed from the expression's value) needs to be found. The natural type of the spy-function is String=>Unit but it turns out that the implicits mechanism couldn't deal with something so ordinary so I changed it to String=>Spy where the Spy class is essentially just a wrapper that has no real significance. Then, the implicit value for the spy-function (if any) could be found and used.

One example of a need to customize the implicit value is to forget about logging and simply write to the console. I tried to make this as easy as possible. See the specification in the repo (linked at the bottom) for an example of this.

Here, using the default slf4j logging mechanism, is the map2 function with logging. Note that, when you use the default mechanism, you must provide an implicit value of a logger in scope. A convenience method has been provided for this as shown below.

  implicit val logger = Spy.getLogger(getClass)
  def map2[T, U](to1: Option[T], to2: => Option[T])(f: (T, T) => U): Option[U] = Spy.spy(s"map2($to1,$to2)",for {t1 <- to1; t2 <- to2} yield f(t1, t2))

We test it using the following specification for map2 (unchanged):

  "map2(Option)" should "succeed" in {
    val one = Some(1)
    val two = Some(2)
    def sum(x: Int, y: Int) = x + y
    map2(one, two)(sum) should matchPattern { case Some(3) => }
    map2(one, None)(sum) should matchPattern { case None => }
  }

And the resulting entries in the log file are:

2016-12-06 22:35:04,123 DEBUG com.phasmid.laScala.fp.FP$  - spy: map2(Some(1),Some(2)): Some(3)
2016-12-06 22:35:04,128 DEBUG com.phasmid.laScala.fp.FP$  - spy: map2(Some(1),None): None

All we had to do, for the default logging behavior, was to ensure there was an implicit value of logger in scope and wrap the expression inside an invocation of Spy.spy. Everything is still purely functional. You can even leave the spy invocation in place if you really want to and either explicitly switch it off by adding false as a third parameter or by turning off debugging in the logger.

If you're interested in using this you can find the Spy class, together with its SpySpec, in my LaScala project.

Friday, November 4, 2016

A generic tail-recursive method for tree traversal

In an earlier article, Transforming iteration into tail recursion, I talked about the general functional-programming technique of replacing iteration by recursion. That method is quite simple and is fairly obvious for linear collections.

But this week, I was working on the general problem of tree traversal. In a binary tree, for instance, as you recurse through the tree, you need to keep track of the results of two recursive tree traversals at each node. They can't both be tail-recursive! But, fortunately, there is a well-known, more general technique available. The method basically consists, as usual, of creating an "inner" method with an accumulator (r for result) and a list of unprocessed elements (ts). The only extra complication is that when we visit a tree node, we need to add all the children (two in the case of a binary tree) to the unprocessed list at the same time as we process the node itself (if there is indeed anything at the node).

Because trees can take so many forms, I created a very general mechanism to implement this "inner" method, which I call traverse in the companion object of a type-class called Parent.

import scala.annotation.tailrec

/**
  * A trait which expresses parenthood.
  * This defines a type class.
  *
  * @tparam T the underlying type of the Parent
  */
trait Parent[T] {
  /**
    * Get the children of a T.
    *
    * @param t the parent whose children we want
    * @return the children as a Seq of T
    */
  def children(t: T): Seq[T]
}

object Parent {
  /**
    * Tail-recursive method to traverse a tree-like object (defined as something that implements Parent).
    *
    * @param f  the map function which takes a T and produces an R for just that parent object (not its children).
    * @param g  the reduce function which accumulates the result (the accumulator, i.e. r, is passed as its first parameter).
    * @param q  the quick return function which, if q(r) yields true, an immediate return of r is made.
    * @param z  the function which builds the T list (ts) from the existing T list and the given T (parent node).
    * @param ts a list of Ts to be worked on.
    * @param r  the current value of the result, i.e. the "accumulator".
    * @tparam T a type which extends Parent, and thus has children of type T -- this "context bound" is implemented via a compiler-generated implicit parameter of type Parent[T].
    * @tparam R the result type.
    * @return a value of R.
    */
  @tailrec
  final def traverse[T: Parent, R](f: T => R, g: (R, R) => R, q: R => Boolean, z: (List[T], T) => List[T])(ts: List[T], r: R): R =
    if (q(r))
      r
    else
      ts match {
        case Nil => r
        case h :: t => traverse(f, g, q, z)(z(t, h), g(r, f(h)))
      }
}

In order to use this for something like size, we need a tree structure made up of nodes, of type T, which implement Parent[T]. In the following code, Node[A] is a node in a tree.

/**
  * This trait expresses the notion of a Node from a tree.
  *
  */
sealed trait Node[+A] {
  /**
    * @return If this node contains an actual value x, return Some(x) else None
    */
  def get: Option[A]

  /**
    * Get the children of this Node, if any.
    * @return the children as a Seq of Nodes
    */
  def children: Seq[Node[A]]

  /**
    * @return the size of the subtree
    */
  def size: Int

  /**
    * @return true if node contains value or children; otherwise false
    */
  def isEmpty: Boolean

  /**
    * @param b the value to be searched
    * @tparam B the type of b
    * @return true if value b is found in the sub-tree defined by this
    */
  def includes[B >: A](b: B): Boolean
}

But, how can we make Node[A] extend Parent[Node[A]]? How about using the magic of type classes (see my earlier blog, Type classes, on the subject)? But a simple type class uses an implicit object which obviously can't itself have a parametric type.

The solution lies in an implicit conversion method. Normally, such methods convert from one type (T1) to another (T2) when the context calls for a T2 and only a T1 is available. However, in this case, T2 is Node[A] but there is no T1! But, the implicit mechanism works also for Unit=>T2. That's really cool!

So, here's the code that takes care of the implicit "evidence" for a node to be a parent:

object Node {
  /**
    * Implicit definition of a Parent for the (type-parameterized, i.e. generic) type Node[A]
    *
    * @tparam A the underlying node type
    * @return a Parent of type Node[A]
    */
  implicit def nodeParent[A]: Parent[Node[A]] = new Parent[Node[A]] {
    /**
      * @return the children as a Seq of Nodes
      */
    def children(t: Node[A]): Seq[Node[A]] = t.children
  }
}

And, finally, here is the code for the implementation of size and includes in a trait TreeLike[+A] which extends Node[A]:

  def size: Int = Parent.traverse[Node[A], Int]({
    case Empty => 0
    case _ => 1
  }, _+_, { x => false}, _ ++ _.children)(List(this), 0)

  def includes[B >: A](b: B): Boolean = Parent.traverse[Node[B], Boolean]({ n => n.get match {
    case Some(`b`) => true
    case _ => false
  }}, _|_, {x => x}, _ ++ _.children)(List(this), false)

Type classes come to the rescue again and we are able to define a wholly generic tail-recursive method for tree traversal.


Thursday, September 29, 2016

Installation of PredictionIO on MacOS

I'm kicking the tires of PredictionIO, an open-source machine learning server. I'm hoping that this might be a great platform for my BigData/Scala class. It looks really promising.

However, kicking the tires is a lot easier at the car dealership. In order to get something like this running, there are many hurdles to jump. It would no doubt be easier if I was running on Windows, or if I wanted to install to virtual box or via Docker or something like that. The manual installation process for Mac took me several hours to crack as the error messages do not make it clear what the problem is. It's not unusual for an early-stage (incubating) project to have documentation/installation issues. On the whole, they've done an outstanding job. But, in case anyone else out there is having similar problems, here are my fixes to the installation docs. For the Mac (or linux), the installation procedure I followed is at Installing Apache PredictionIO (incubating) on Linux / Mac OS X.

PredictionIO is based on Spark, as well as some combination of storage solutions. You can use Postgres, MySQL or a combination of HBase and ElasticSearch. I chose the latter as it appears to be the recommended solution.

After following the instructions, including editing the conf/pio-env.sh
file to specify my own existing Spark installation, by editing SPARK_HOME as follows:

  SPARK_HOME=Applications/spark-1.6.1

I started the PIO suite using $PIO_HOME/bin/pio-start-all, which apparently went OK.

However, the first problem I ran into resulted in the following cryptic error message (when checking the status via pio status):

/Applications/spark-1.6.1 is probably an Apache Spark development tree. Please make sure you are using at least 1.3.0.

It turns out that you need a version of Spark pre-built with Hadoop (I had built this one from source). Before realizing that was the problem, I simply downloaded their recommended version into the vendors directory (rather than trying to put it in my own area) and edited the pio-env.sh file:

  SPARK_HOME=$PIO_HOME/vendors/spark-1.5.1-bin-hadoop2.6

Now, I tried it again. Again, getting the status showed a problem:

[ERROR] [Console$] Unable to connect to all storage backends successfully. The following shows the error message from the storage backend.
[ERROR] [Console$] Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections. (org.postgresql.util.PSQLException)

It isn't enough to set up the HBase and ElasticSearch configuration as instructed by the installation guide. You also have comment out the Postgres and MySQL variables and set up the PIO_STORAGE_REPOSITORIES_xxx variables as follows:

PIO_STORAGE_REPOSITORIES_METADATA_NAME=predictionio_metadata
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH

PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=predictionio_eventdata
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE

PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_

PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS

Then don't forget to configure local storage...

PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models

You also need to configure the PIO_STORAGE_SOURCES_xxx variables but the installation instructions already tell you to do that (I think).

Restarting and checking the status generated this error:

[ERROR] [RecoverableZooKeeper] ZooKeeper exists failed after 1 attempts

This took me a while to figure out. I actually installed ZooKeeper, although it turns out that I didn't need to -- and in fact it's incorrect to have ZooKeeper running if you're simply running HBase locally. The problem turned out to be that I hadn't properly configured HBase in the vendors/hbase-1.0.0/conf/hbase-site.xml configuration file. Look carefully at the values that you will find here. They do not match your setup!

Once I had fixed that problem, and rechecked the status, I saw the following wonderful message:

[INFO] [Console$] Your system is all ready to go.

You can also check that all is well by running jps. You should see something like the following:

41109 Console
41014 Elasticsearch
41079 HMaster
41127 Jps
29769 

If you don't see HMaster included, then HBase is not running correctly.

Now, it's time to clone a template. I'm interested in recommender systems so I navigated to the following set of instructions: which state in section 2. Create a new Engine from an Engine Template that you should run the following command:

pio template get PredictionIO/template-scala-parallel-recommendation MyRecommendation

However, that is (currently at least) incorrect. Instead, use the following:

pio template get PredictionIO/template-scala-parallel-universal-recommendation MyUniversalRecommendation

Once I figured out the correct mumbo-jumbo to clone the template, I was able to create my own recommendation engine. Now, I shall (hopefully) enjoy actually doing some implementation!

Stay tuned in case I have more tips.

Update 9/30/2016: there's one more thing you have to do, at least for the universal recommendation engine, and which is not spelled out in the installation instructions. Before you can import the sample data, you must install the Python predictionio package. Do this by entering the following:

pip install prediction

If that doesn't work for you, then consult this github project: https://github.com/apache/incubator-predictionio-sdk-python

One more problem which arose during the training phase:

[INFO] [Engine$] Data sanity check is on.
Exception in thread "main" java.lang.NumberFormatException: For input string: "558:feed::1"

at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

The solution this time is something that I never would have imagined in a million years, let alone solved! Here's the link to the issue. But the bottom line is, if you're running on a Mac, change the TCP/IP wifi network settings so that Configure IPv6 is manual, not automatic.

And another problem after deploying and trying to make a query:

[ERROR] [OneForOneStrategy] Cannot support TLS_RSA_WITH_AES_256_CBC_SHA with currently installed providers

Here's a discussion of the issue.The solution is to upgrade your Java encryption policy to unlimited. Follow the instructions in the discussion. Ensure that you put the jars into the directory lib/security which lives under the JRE home, typically that would be at $JAVA_HOME/jre.

It turned out, however, that this particular solution was necessary but not sufficient. In the same discussion there's a mention of the SSL problem that I ran into next:

[ERROR] [HttpServerConnection] Aborting encrypted connection to /0:0:0:0:0:0:0:1:52194 due to [SSLException:Unrecognized SSL message, plaintext connection?]

You need to edit the shell script  examples/single-query-handmade.sh which sends queries to the PredictionIO server as follows (add "k" to the curl options and "s" to the http protocol value):

curl -kH "Content-Type: application/json" -d '
{
    "user": "u1"

}' https://localhost:8000/queries.json

Finally, after all this work, I was able to see the blessedly wonderful response from my recommendation engine:

Recommendations for user: u1


{"itemScores":[{"item":"Galaxy","score":0.8880454897880554},{"item":"Nexus","score":0.24007925391197205},{"item":"Surface","score":0.043848853558301926}]}

Monday, September 19, 2016

Functions vs. methods in Scala

Here's a question that often comes up while teaching Scala: what's the difference between functions and methods. If you look this up on the web, you'll find many different answers, some of which are mathematically grounded but hardly helpful, others which are just plain confusing.

Here's how I like to explain it: a function is an object, just like any other expression: it has a type (which will be a subclass of FunctionN[T] where T is the type of the result and N is 0... according to the number of parameters the function requires). But note that the type is not a T but a function which must be invoked to yield a value of T. In Scala, this means that to get an actual T, you need to follow the function with a "(...)" where the number of parameters between the parentheses is N.

In Scala (or any other functional language for that matter), we can label expressions and functions by writing a statement like the following:

val a = 42
val f = { x: Int => x*x }

In the first case, our expression is quite simple: the constant 42 ("The answer to life, the universe and everything"). In the second case, it's a bit more complex (it's a function literal or "lamda"). But there's essentially no difference between the intent of the two statements. In each case, we have asserted that, from here on, whenever we write the variable "a", we mean 42 and wherever we write "f", we mean { x: Int => x*x }.

So much for functions. What about methods?

Methods aren't objects like functions. They are properties of a class or object. In an object-oriented language, if they are properties of a class, that implies that they are "instance methods", if of an object then they are just plain "methods".  A method is really just like a variable that is a property of a class/object. But it has a special syntax for its declaration which is more human-friendly than the syntax used above for the variable "f".

If we wanted to use the same function as a method in an object, then we could write it like this:

def m(x: Int) = x*x

This has exactly the same effect as with "f" above, except this time its label is "m" and it belongs to an object--it doesn't exist beyond the scope of that object. You can think of this form as syntactic sugar for "val m = { x: Int => x*x }".

So, what about instance methods? Instance methods are special to object-oriented programming because there is an implicit parameter which is generally written as this. It refers to the instance that is the "receiver" (class, owner, etc..) of the method.

Therefore we might define the following class:

class MyClass(x: Int) {
  def sqr = x*x
}

This time, we don't need to provide our own parameter to satisfy the need for "x" in the function (remember that this is essentially syntactic sugar for something that really looks like the definition of f above). The compiler supplies the parameter "this" and so "x" is now syntactic sugar for "this.x".

I do hope that I haven't made things even more confusing! And, I further hope that I haven't said anything that is clearly incorrect. But I don't believe so. However, if you consider yourself a Scala expert or are closely related to Martin Odersky, then feel free to add a clarifying or disputational comment.

Wednesday, September 7, 2016

Working with mutable state

The second of the questions I asked my students (see my earlier article) is essentially this:

In functional programming, how do we work with mutable state? This is a complex question in general as we may want to deal with I/O (i.e. external mutable state) or we may want to wrap the whole problem inside an actor and make it "go away." So, let's confine ourselves to dealing with internal mutable state in a referentially transparent way. We have an object whose internal state is changing in an unpredictable manner and we have another object which operates on that other mutable object. The first question that arises is how to test our object?
Let me propose a concrete example. An evolutionary computation framework uses a random number generator (RNG) to provide the entropy needed to get coverage of a solution space. If you simply create a scala.util.Random and take numbers from it, you will, sooner or later, run into the problem that your unit tests will fail. Not because the logic is incorrect but because, even if the order of running test cases is fixed (it usually is not so guaranteed), you will introduce new logic that pulls one more, or one fewer, numbers from the RNG and your expected results will be incorrect. Here's the kind of thing I have in mind (greatly simplified):

val random = new scala.util.Random(0L)
val someInt: Int = ???
val x1 = random.nextInt(100)
val x2 = random.nextInt(100)
// ...
val y = random.nextInt(100)
assert(y==someInt)
There's a simple solution for this. Just like a snail, that constructs and carries its home around with it, we can transform the code as follows:
class RNG[+A](f: Long=>A)(seed: Long) {
  private val random = new scala.util.Random(seed)
  private lazy val state = random.nextLong
  def next = new RNG(f)(state)
  def value = f(state)
}
val r = new RNG[Int](x => (x.toInt + 100) % 100)(0L)
val someInt: Int = ???
val r1 = r.next
val r2 = r.next
// ...
val rN = r2.next
val y = rN.value
assert(y==someInt)

There are many variations on this general theme, of course. But the important point is that we carry around our current RNG from one step to the next, rather than simply taking its value.

Here's another example which helps us strip arguments (from a variable number of args) with the appropriate types. The get method returns a Try of a tuple of a T and the remaining Args:

  case class Args(args: List[Any]) extends (() => Try[(Any,Args)]) {
    def apply(): Try[(Any, Args)] = args match {
      case Nil => Failure(new Exception("Args is empty: this typically means you didn't provide sufficient arguments"))
      case h :: t => Success(h, Args(t))
    }
    def get[T](clazz: Class[T]): Try[(T, Args)] = {
      apply match {
        case Success((r,a)) =>
          if (clazz.isInstance(r) || clazz.isAssignableFrom(r.getClass))
            Success(r.asInstanceOf[T], a)
          else throw new Exception(s"args head is not of type: $clazz but is of type ${r.getClass}")
        case f @ Failure(t) => f.asInstanceOf[Try[(T, Args)]]
      }
    }
    def isEmpty: Boolean = args.isEmpty
}

And here is how you would extract the arguments (in a FlatSpec with Matchers):

  def mainMethod(args: List[Any]): Unit = {
    val a1 = Args(args)
    val (x: String, a2) = a1.get(classOf[String]).get
    val (y, a3) = a2.get(classOf[java.lang.Boolean]).get
    a3.isEmpty shouldBe true    x shouldBe "hello"    y shouldBe true  }
  mainMethod(List("hello", true))

Note how we should end up with an empty list after stripping off all the arguments. And note also that if we want to add a type annotation (for example for x, as shown above) then we can do that and be quite sure that the get method is working properly. In this example, we didn't try to avoid the exceptions that may be thrown by the get methods on Try objects. We could of course do a nicer job with nested match statements.

On a completely different subject, I asked and answered my own question on StackOverflow today, regarding an issue with serializing/deserializing case class instances via Json.

Sunday, July 17, 2016

Darwin for Scala

Many years ago, I began work on an evolutionary computation (genetic algorithms) framework for Java. It's called Darwin and can be found on SourceForge here. Over the next few years, I completed it, at least as far as basic functionality is concerned. There is a genetics package, an evolution package, and a user-interface, written in Swing.

Yet, there were aspects of it that were not ideal. In particular, I tried to make the types as generic as possible, but Java simply isn't able to cope with that. You end up basically wild-carding everything which means that you're no longer getting any benefit from type-checking.

Now, that I've been writing in Scala for a few years, I've been thinking about resuscitating Darwin. I began work on the new version of Darwin a couple of months ago. It is somewhat loosely based on the original, but with a better overall architecture.

I need help so am asking for contributors. If you think you might be interested in helping, let me know (through Github would be the best way). I especially need help on the github end of things: project pages, gh pages, branching, all that sort of thing. I also need people to write unit tests (i.e. specifications) for me. Or, if you're really brave, you could help with the core library. Thanks.

Wednesday, June 22, 2016

Type classes in Scala

Type classes in Scala are one of those scary concepts to be used only by the cognoscenti, right? Well, no, actually. In reality, they are quite simple.

As usual, Daniel Westheide has some good things to say in his Neophyte's Guide to Scala. Because his explanation is so good, I won't go into as much detail here as I might otherwise. The somewhat confusingly-named Type Class is really just a different sort of trait. If we have a Type Class "C" and a type "T" which is a member of "C", then "T" is required to implement the operations that "C" requires. But unlike the relationship between a trait and its implementations which is based on the polymorphism inherent in object-oriented programming, the relationship between a type class and its members is more ad hoc and, in practice, relies on the "implicits" mechanism of Scala.

The two most commonly encountered type classes in Scala are Numeric and Ordering. Since Numeric extends Ordering, I'll just show a typical usage of Ordering. The SeqLike trait (extended by all the familiar sequence-type collections) defines a sort method which takes an implicit parameter which is an Ordering. If that parameter is available, it will be used to define the ordering for the sort. If you write the following Scala code, the result will be a properly sorted list:

val a = List(10, 5, 8, 1, 7).sorted

You don't have to provide the ordering parameter: it is defined for you, although you're free to pass an explicit orderer of course. This is all part of the magic of implicits, which I don't have space to go into here.

So, let's look at a practical example of a type class. I was working on a rule parser. This reads rules (as Strings) in the form of, for instance,

feels cold: t < 0 or t < 5 and w > 10

where, in this case, t is temperature (in Celsius) and w is wind speed in km/h. But you will notice that I haven't yet specified whether the calculations should be performed with Int or Double (or whatever). We might also have rules that are purely textual, perhaps based on dictionary order or something like that. So, it would be inappropriate to parse these rules as, for instance, being based on Double. I therefore set out to parse the rules in String form and then (and only then) convert them into numeric rules.

Then I ran into a snag. The Numeric type class, which extends Ordering, isn't quite powerful enough to do everything I needed. In particular, I needed to be able to take a String and convert it into my generic type. I tried everything I could think of including, obviously, extending Numeric. Finally, I turned to StackOverflowThe answer was so simple: instead of trying to extend the Numeric type class, I should simply create my own. Until that moment, I hadn't really grasped the amazing flexibility of type classes. Unlike in Java where the various numeric types all extend the abstract class Numeric (a totally inadequate way of doing things by the way), in Scala, the numeric types such as Int do not have to extend anything. Instead, a programmer can simply create a type class, adding members that implement the operations for any given type, for example, Int or Double. This is the origin of the name "type class".

So, to cut a long story short, I created my own type class Valuable, which you can see in the answer to the StackOverflow question. Actually, I ended up adding a few more operations (why not?) and was able to complete my rule parser.

The direct result of parsing a rule is a RuleLike object. RuleLike is a trait that extends AnyRef and has one method signature:

def asRule: Rule[String]

Rule[T] is trait which is defined thus:

sealed trait Rule[T] extends (() => Try[Boolean])

and has a transform method (like map but takes two functions) such that we can transform a Rule[String] into, say, a Rule[Int] (with the help of a lookup function to evaluate variable). If you're interested in seeing the whole code base for the rule parser, you can find it here.

Monday, May 16, 2016

Learn Java or Scala first?

There is a common question type on Quora that asks whether you need to know Java in order to learn Scala. My experience tells me that it is not necessary. Indeed, I think it may actually be harmful to learn an imperative language (such as Java) first. Such languages are based on the Von Neumann model of computing (which was fine in its day): load-modify-store. But compute power today is so much greater than anything imaginable in the 1940s that a compiler doesn't need us to tell it how to solve a problem. We really only need to be able to describe our problem.

Rather than repeat my entire argument here, I will simply link to an answer I wrote for Quora on the question Should I need Java...?

Here also are some other Quora discussions relevant to this subject:
The consensus appears to be that it would perfectly appropriate, and generally beneficial, to learn Scala first, without having to know Java or, indeed, any other language. Learning Scala first will teach you how to think and describe your problem to the compiler. You shouldn't have to be thinking about the way the machine will ultimately implement your application.

Wednesday, May 4, 2016

Transforming iteration into tail recursion

My attention was caught by a question on Quora: What specific Scala coding skills do I need to master in order to get a job as a Scala dev?
 
Since I had given my students at Northeastern University two questions in the final exam that related to this, I thought I would answer the question and write a series of blogs here to give a little more detail. The two questions that I gave them specifically were the first two bullet points in my answer. So let's start the ball rolling by seeing if there is indeed a general way to transform an iteration into a (tail) recursion.

Here's a typical, if trivial, example of the kind of thing that new Scala programmers want to do:

  val data = List(0.1, 0.4, 0.2, 0.7, -0.1, 1.1, 0.5)

  {
    var sum = 0.0
    for (i <- data.indices; if i % 2 == 0) sum = sum + data(i)
    println(s"sum of even positions: $sum")
  }

This works just as you'd expect. The even numbered elements of data are summed. The only fly in the ointment is that darned var sum. Obviously, we do want something to start out as zero and get successively augmented so that it results in the sum of the data values we're interested in. But how can we do that without iterating and updating a variable?

This is where recursion comes to our rescue. Recursion is really the functional programming counterpart of iteration for imperative programming. But haven't we always been taught to avoid recursion in our programming? Well, yes. But because recursion is so important for functional programming, languages like Scala (and Haskell, etc.) allow a compiler optimization when our process is tail-recursive. What does that mean? Well, if the recursion happens as the very last thing that the recursive method performs (i.e. at the tail end), there's actually nothing of interest that we need to place on the stack. We would essentially just be putting boiler-plate code on the stack--nothing that we really need.  So, the compiler implements tail-recursion every bit as efficiently as iteration.

Since the compiler is better than we are at checking if a method is tail-recursive (well, that's certainly true for beginners), we can assert that our code is tail-recursive using the @tailrec annotation, and the compiler will warn us if we are wrong. Then we can fix it, rather than run the risk of it chewing up the entire stack in a production run.

So, here's our functional-programming style version of the previous code:

  val data = List(0.1, 0.4, 0.2, 0.7, -0.1, 1.1, 0.5)

  {
    import scala.annotation.tailrec
    @tailrec def loop(accumulator: Double, indices: List[Int]): Double = indices match {
      case Nil => accumulator
      case h :: t => loop(accumulator + data(h), t)
    }

    val sum = loop(0.0, (for (i <- data.indices; if i % 2 == 0) yield i).toList)
    println(s"sum of even positions: $sum")
  }

This particular example was so simple that it was easy to transform (refactor) the code. [And, yes, I'm aware that there are even simpler ways to sum these numbers using Scala's collections API.] In particular, we had a generator of indices (the guts of the for-comprehension) that could be invoked ahead of doing the recursion. This won't always be true of course, and in such cases we'll need a third parameter in our loop method. But here loop only needs two things: the list of indices we haven't looked at yet and the accumulated value (accumulator).

It's true that the code is a bit longer than it was. But the big advantage is that our new code is referentially transparent. That's to say we can replace any identifier with its definition and our program will function the same. That's not possible when there are mutable variables about (i.e. vars) because the behavior of the program will depend on exactly when we make the substitution. There are much better discussions of referential transparency out there. My purpose here was to suggest a simple refactoring of a simple var-based iteration.

In the next blog entry, I will talk about dealing with mutable state.