Scala Prof

Wednesday, June 22, 2016

Type classes in Scala

Type classes in Scala are one of those scary concepts to be used only by the cognoscenti, right? Well, no, actually. In reality, they are quite simple.

As usual, Daniel Westheide has some good things to say in his Neophyte's Guide to Scala. Because his explanation is so good, I won't go into as much detail here as I might otherwise. The somewhat confusingly-named Type Class is really just a different sort of trait. If we have a Type Class "C" and a type "T" which is a member of "C", then "T" is required to implement the operations that "C" requires. But unlike the relationship between a trait and its implementations which is based on the polymorphism inherent in object-oriented programming, the relationship between a type class and its members is more ad hoc and, in practice, relies on the "implicits" mechanism of Scala.

The two most commonly encountered type classes in Scala are Numeric and Ordering. Since Numeric extends Ordering, I'll just show a typical usage of Ordering. The SeqLike trait (extended by all the familiar sequence-type collections) defines a sort method which takes an implicit parameter which is an Ordering. If that parameter is available, it will be used to define the ordering for the sort. If you write the following Scala code, the result will be a properly sorted list:

val a = List(10, 5, 8, 1, 7).sorted

You don't have to provide the ordering parameter: it is defined for you, although you're free to pass an explicit orderer of course. This is all part of the magic of implicits, which I don't have space to go into here.

So, let's look at a practical example of a type class. I was working on a rule parser. This reads rules (as Strings) in the form of, for instance,

feels cold: t < 0 or t < 5 and w > 10

where, in this case, t is temperature (in Celsius) and w is wind speed in km/h. But you will notice that I haven't yet specified whether the calculations should be performed with Int or Double (or whatever). We might also have rules that are purely textual, perhaps based on dictionary order or something like that. So, it would be inappropriate to parse these rules as, for instance, being based on Double. I therefore set out to parse the rules in String form and then (and only then) convert them into numeric rules.

Then I ran into a snag. The Numeric type class, which extends Ordering, isn't quite powerful enough to do everything I needed. In particular, I needed to be able to take a String and convert it into my generic type. I tried everything I could think of including, obviously, extending Numeric. Finally, I turned to StackOverflow. The answer was so simple: instead of trying to extend the Numeric type class, I should simply create my own. Until that moment, I hadn't really grasped the amazing flexibility of type classes. Unlike in Java where the various numeric types all extend the abstract class Numeric (a totally inadequate way of doing things by the way), in Scala, the numeric types such as Int do not have to extend anything. Instead, a programmer can simply create a type class, adding members that implement the operations for any given type, for example, Int or Double. This is the origin of the name "type class".

So, to cut a long story short, I created my own type class Valuable, which you can see in the answer to the StackOverflow question. Actually, I ended up adding a few more operations (why not?) and was able to complete my rule parser.

The direct result of parsing a rule is a RuleLike object. RuleLike is a trait that extends AnyRef and has one method signature:

def asRule: Rule[String]

Rule[T] is trait which is defined thus:

sealed trait Rule[T] extends (() => Try[Boolean])

and has a transform method (like map but takes two functions) such that we can transform a Rule[String] into, say, a Rule[Int] (with the help of a lookup function to evaluate variable). If you're interested in seeing the whole code base for the rule parser, you can find it here.

Monday, May 16, 2016

Learn Java or Scala first?

There is a common question type on Quora that asks whether you need to know Java in order to learn Scala. My experience tells me that it is not necessary. Indeed, I think it may actually be harmful to learn an imperative language (such as Java) first. Such languages are based on the Von Neumann model of computing (which was fine in its day): load-modify-store. But compute power today is so much greater than anything imaginable in the 1940s that a compiler doesn't need us to tell it how to solve a problem. We really only need to be able to describe our problem.

Rather than repeat my entire argument here, I will simply link to an answer I wrote for Quora on the question Should I need Java...?

Here also are some other Quora discussions relevant to this subject:

The consensus appears to be that it would perfectly appropriate, and generally beneficial, to learn Scala first, without having to know Java or, indeed, any other language. Learning Scala first will teach you how to think and describe your problem to the compiler. You shouldn't have to be thinking about the way the machine will ultimately implement your application.

Wednesday, May 4, 2016

Transforming iteration into tail recursion

My attention was caught by a question on Quora: What specific Scala coding skills do I need to master in order to get a job as a Scala dev?

Since I had given my students at Northeastern University two questions in the final exam that related to this, I thought I would answer the question and write a series of blogs here to give a little more detail. The two questions that I gave them specifically were the first two bullet points in my answer. So let's start the ball rolling by seeing if there is indeed a general way to transform an iteration into a (tail) recursion.

Here's a typical, if trivial, example of the kind of thing that new Scala programmers want to do:

  val data = List(0.1, 0.4, 0.2, 0.7, -0.1, 1.1, 0.5)

{

    var sum = 0.0

    for (i <- data.indices; if i % 2 == 0) sum = sum + data(i)

    println(s"sum of even positions: $sum")

}

This works just as you'd expect. The even numbered elements of data are summed. The only fly in the ointment is that darned var sum. Obviously, we do want something to start out as zero and get successively augmented so that it results in the sum of the data values we're interested in. But how can we do that without iterating and updating a variable?

This is where recursion comes to our rescue. Recursion is really the functional programming counterpart of iteration for imperative programming. But haven't we always been taught to avoid recursion in our programming? Well, yes. But because recursion is so important for functional programming, languages like Scala (and Haskell, etc.) allow a compiler optimization when our process is tail-recursive. What does that mean? Well, if the recursion happens as the very last thing that the recursive method performs (i.e. at the tail end), there's actually nothing of interest that we need to place on the stack. We would essentially just be putting boiler-plate code on the stack--nothing that we really need. So, the compiler implements tail-recursion every bit as efficiently as iteration.

Since the compiler is better than we are at checking if a method is tail-recursive (well, that's certainly true for beginners), we can assert that our code is tail-recursive using the @tailrec annotation, and the compiler will warn us if we are wrong. Then we can fix it, rather than run the risk of it chewing up the entire stack in a production run.

So, here's our functional-programming style version of the previous code:

  val data = List(0.1, 0.4, 0.2, 0.7, -0.1, 1.1, 0.5)

{

    import scala.annotation.tailrec

    @tailrec def loop(accumulator: Double, indices: List[Int]): Double = indices match {

      case Nil => accumulator

      case h :: t => loop(accumulator + data(h), t)

}

    val sum = loop(0.0, (for (i <- data.indices; if i % 2 == 0) yield i).toList)

    println(s"sum of even positions: $sum")

}

This particular example was so simple that it was easy to transform (refactor) the code. [And, yes, I'm aware that there are even simpler ways to sum these numbers using Scala's collections API.] In particular, we had a generator of indices (the guts of the for-comprehension) that could be invoked ahead of doing the recursion. This won't always be true of course, and in such cases we'll need a third parameter in our loop method. But here loop only needs two things: the list of indices we haven't looked at yet and the accumulated value (accumulator).

It's true that the code is a bit longer than it was. But the big advantage is that our new code is referentially transparent. That's to say we can replace any identifier with its definition and our program will function the same. That's not possible when there are mutable variables about (i.e. vars) because the behavior of the program will depend on exactly when we make the substitution. There are much better discussions of referential transparency out there. My purpose here was to suggest a simple refactoring of a simple var-based iteration.

In the next blog entry, I will talk about dealing with mutable state.

Wednesday, April 6, 2016

Feeling lazy -- memoization

Scala has a neat feature in the lazy val, which is somewhere between a val and a def. A val is evaluated exactly once, while a def is evaluated any number of times (0..n). A lazy val is evaluated at most once. In that sense, it's like a single-valued cache: no evaluation takes place if it's never needed but it will be evaluated only once if it's needed several times. One particularly useful use-case is when repeatedly referencing the value of a call-by-name method parameter.

But there's no multi-valued (collection) equivalent. You can create a "view" of a collection which is implemented as an IterableView. This is particularly useful when you want your collection to undergo several transformations (for example, by invoking map). Applying map to such a view essentially composes the existing function-to-be-applied with the new function. When the view is materialized (by calling force or apply) the (composed) function is applied to the member(s) that you need. However, if you apply force (or apply) a second or third time, the composed function will be invoked again. You can, if you like, "memoize" the result of the force or apply method, but you can't memoize them one element at a time.

So I recently asked a question about it on StackOverflow just in case there was something I was missing.

Let's have a brief interlude here on the meaning of memoization. A frequent issue when optimizing a program for performance is when a function is called with the same parameter (or parameters) as before. The opportunity to optimize this only occurs, obviously, if the function is referentially transparent (i.e. its context is immutable). Otherwise, we have no choice but to simply go through the evaluation again. But in functional programming in particular we will often have the situation where we can consider remembering the parameters and their result in a lookup table and, if those parameters ever arise again, we pass back the result instead of reevaluating the function. This is called "memoization", an ugly word and one which my spell-checker absolutely refuses to allow me to type without putting up a big fight. Why not just call it a cache? Well, according to Wikipedia, the reason is that it's a special case of caching (although I'm not quite buying the argument). For this reason (and because of spell-check) I'm going to use the word cache going forward.

You'd think that something that obvious would be part of the basic Scala language. But it isn't. What we want really is just a sub-class of Function1 where we provide a means of fulfilling (evaluating) the value.

Here's the simplest way to define such a cache:

class Cache[K,V](evaluate: K=>V) extends mutable.HashMap[K,V] {
  override def apply(k: K): V = getOrElseUpdate(k,evaluate(k))
}

This works exactly as you would expect. But the class (or its equivalent) doesn't appear in the Scala collections library. And all of the other methods of HashMap are exposed which doesn't really conform to the contract of a simple memoization class.

The other drawback to this class is that it doesn't deal well with exceptions that might arise in the evaluate function. Basically, they simply bubble up to the top. And, even if they are caught somewhere, they will be thrown again the next time the wrong key is requested.

A somewhat more robust version of Cache, however, can easily be built. This is what I'm using in an (Akka) actor-based project that I'm working on:

class Cache[K,V](evaluate: K=>Try[V])(implicit carper: String=>Unit) extends (K=>Option[V]) {
  def apply(k: K): Option[V] = m.getOrElseUpdate(k,fulfill(k))
  private val m = mutable.HashMap[K,Option[V]]()
  private def fulfill(k: K): Option[V] = evaluate(k).recoverWith{case x => carper(s"Cache: evaluation exception for key $k: $x"); Failure(x)}.toOption
}

Note that the return type of evaluate has changed: it's now a Try[V] instead of just V. And the apply method now returns an Option[V] instead of simply V. This makes it much more like invoking get on a collection. As a side-effect, when evaluate yields a failure, the cause of the failure is passed to carper, along with an explanatory comment. The failure then results in None being memoized which means that future calls to apply with the same parameter will return a None without causing the exception to recur.

Sunday, March 13, 2016

Composing with Try

Because scala.util.Try is a monad, we can compose tries together conjunctively. In the excellent Neophyte's Guide to Scala, Daniel Westheide shows some examples of this. Paraphrasing his for-comprehension exemplar slightly we have:

  for {

      url <- Try(new URL(s))

      connection <- Try(url.openConnection())

      is <- Try(connection.getInputStream)

    } yield Source.fromInputStream(is).getLines()

Each try properly depends on the success of its predecessor. That's to say, this is a sequential conjunction.

But suppose we want to compose tries disjunctively? In other words, success is not defined (as above) when all components succeed but when one succeeds. And again, we want our composition to be sequential (the first try is tried first). How should we go about that?

The obvious solution is to use orElse:

  val x1 = "10.0" 

  val x: Try[Any] = Try(x1.toInt) orElse Try(x1.toDouble)

This works just as expected: the value of x is Success(10.0).

Indeed, orElse has the signature:

deforElse[U >: T](default: ⇒ Try[U]): Try[U]

Ideally, we'd like another signature something like this:

  def orElseTry[U >: T](alt: => U): Try[U] = orElse(Try(f))

This compares with orElse as does map to flatMap or recover to recoverWith. And we might invoke it thus:

def toNumber(x: String) = Try(x.toInt) orElseTry(x.toDouble)

This looks pleasantly elegant and simple--and it differs only from the normal way of doing things by one space. Indeed, I was able to implement it quite easily in a copy of scala.util.Try.scala. However, in their infinite wisdom, the fathers of Scala did not provide such a method (actually, Try derives from the folks at Twitter rather than EPFL).

Unlike with the conjunctive set of Try operations that we started with, the disjunctive set is much more likely to be based on a single input, as in the case of the toNumber example above. I therefore set about creating a class to support multiple successive Try operations:

  case class Trial[V, T](f: V=>Try[T]) extends TrialBase[V,T](f)

  abstract class TrialBase[V, T](f: V=>Try[T]) extends (V=>Try[T]) {

    def apply(v: V): Try[T] = f(v)

    private def orElse(f: V=>Try[T], g: V=>Try[T]): V=>Try[T] = {v => f(v) orElse g(v) }

    def |: (g: V => Try[T]): Trial[V,T] = Trial(orElse(g,f))

    def :| (g: V => Try[T]): Trial[V,T] = Trial(orElse(f,g))

    def ^: (g: V => T): Trial[V,T] = Trial(orElse(Lift(g),f))

    def :^ (g: V => T): Trial[V,T] = Trial(orElse(f,Lift(g)))

  }

This case class called Trial takes a function f of type V=>Try[T] such that its instances are composable with other functions of type V=>Try[T] (using |: or :|) or even V=>T functions (using ^: or :^). These latter functions are "lifted" to V=>Try[T]. Such Trial objects can be composed with other functions both leftwards and rightwards using the |: and ^: operators with the colon on the proper side. And, of course we can even use partial functions to allow us to fork the path of control. Let me provide an example taken from a generic CSV parser that I created:

  val defaultParser = Trial.none[String,Any] :|

    {case s @ (date0(_) | date4(_) | date1(_)) => dateParser(s)} :^

    {case quoted(w) => w} :^

    {case whole(s) => s.toInt} :^

    {case truth(w) => true} :^

    {case untruth(w) => false} :^

    {case s: String => s match { case floating(_) | floating(_,_) | floating(_,_,_) => s.toDouble}} :^

    {case s: String => s}

  val dateFormatStrings = Seq("yyyy-MM-dd","yyyy-MM-dd-hh:mm:ss.s") // etc.

  val dateParser = Trial[String,Any]((parseDate _)(dateFormatStrings))

  val date0 = """(\d\d\d\d-\d\d-\d\d)""".r

  val date1 = ??? // omitting for brevity

  def parseDate(dfs: Seq[String])(s: String): Try[DateTime] = ??? // omitting for brevity

  val quoted = """"([^"]*)"""".r

  val whole = """(\d+)""".r

  val floating = """-?(\d+(\.\d*)?|\d*\.\d+)([eE][+-]?\d+)?[fFdD]?""".r

  val truth = """(?i)^([ty]|true|yes)$""".r

  val untruth = """(?i)^([fn]|false|no)$""".r

In this fragment, we define a default element parser for our CsvParser object. Trial.none simply starts the chain of functions (we could also put the date parsing function as the argument to Trial). If the input String is recognized as a valid date format (in reality there are many more possible date formats), then we invoke dateParser with the String as its parameter. Otherwise, we try to interpret the String as a quoted string, an integer, a boolean, etc.

If you're interested in looking at the code you can see it here: Trial.scala, while the specification is here: TrialSpec.scala.

Thursday, March 3, 2016

Scala job interview questions

One of my students from last semester asked me what questions about Scala might arise in an upcoming job interview. Of course, there are such lists available already for example the following:

An excellent (IMO) list of questions without answers (which is good): Scala Job Interview Questions
A brief (but good) list courtesy of Typesafe... er Lightbend (including very thorough answers): 4 Interview Questions for Scala Developers
A list of questions with answers (but I'm not sure I like the answers): Top 25 interview questions on Scala
etc.

Naturally, I couldn't resist the idea of writing up some of my own questions/answers:

Question: Name two significant differences between a trait and an abstract class.

Answer: Abstract classes have constructors with zero or more parameters while traits do not; a class can extend any number of traits but only one abstract class; see also To trait or not to trait? for more differences.

Question: What type of construct follows the match keyword?

Answer: A partially-defined function.

Question: What exactly is wrong with a recursive function that is not tail-recursive?
Answer: You run the risk of running out of stack space and thus throwing an exception.

Question: Explain the concept and usage of an extractor.
Answer: An extractor is essentially the opposite of a constructor and is defined by the unapply(t: T) method defined in the companion object of a type T; for a case class, which provides a built-in unapply method, it returns a Option[TupleN] based on the number N of parameters in the constructor (exception: if N=0, then the returned type is Boolean; if N=1, the returned type is Option[T]); extractors are typically used following the case keyword in match expressions. For more information, see The Neophyte's Guide to Scala Part 1: Extractors.

Question: What relatively minor difference in the definitions of their respective non-empty-constructors results in the major distinction between a List and a Stream?
Answer: The tail parameter is defined as call-by-name.

Wednesday, January 6, 2016

The curious incident of the Scala enum

One of the strangest things about Scala is its concept of enumerations, typically implemented by extending Enumeration, for example:

  object Suit extends Enumeration {

     type Suit = Value

     val Spades, Hearts, Diamonds, Clubs = Value

}

I won't go into the details of the syntax of this (I don't understand it myself!) but I will observe that this mechanism works reasonably well. The only odd thing here is that if you actually want to refer to Spades, for instance, in your code, you will need to write something like this:

  import Suit._

But you will find that the suit values have an ordering that you would expect without you having to do anything about it, as in this bit of Scalatest code for instance:

   "suits" should "be ordered properly" in {

     val suitList = List(Clubs,Hearts,Spades,Diamonds)

     suitList.sorted shouldBe List(Spades,Hearts,Diamonds,Clubs)

}

But what if we want to know the color of a suit? How can we add a method to this enumeration (in a manner similar to the way it is done in Java?). It's tricky! -- my first (obvious) attempt did not succeed. I am indebted to Aaron Novstrup and StackOverflow for this somewhat elegant solution:

  object Suit extends Enumeration {

    type Suit = Value

    val Spades, Hearts, Diamonds, Clubs = Value

    class SuitValue(suit: Value) {

      def isRed = !isBlack

      def isBlack = suit match {

         case Clubs | Spades => true

         case _              => false

}

}

    implicit def value2SuitValue(suit: Value) = new SuitValue(suit)

}

There is a competing style of enum in Scala: based on a case object. If you're already familiar with case classes (you should be if you're reading and understanding this), then you will probably know that a case object is just like a case class but has no parameters. One big advantage of using case objects is that matching will result in warnings if we miss a case. It's also very obvious if we want to add further methods such as our color methods. Note the trait Concept. In reality, I also have defined Rank in addition to Suit, and since they have very similar implementations, they both extend Concept.

So, let's implement our suits again:

  trait Concept extends Ordered[Concept] {

    val name: String

    val priority: Int

    override def toString = name

    def compare(that: Concept) = priority-that.priority

}

  object Concept {

    implicit def ordering[A <: Concept]: Ordering[A] = Ordering.by(_.priority)

  }

  sealed trait Suit extends Concept {

    def isRed = !isBlack

    def isBlack = this match {

      case Spades | Clubs => true

      case _ => false

}

}

  case object Spades extends Suit { val name = "Spades"; val priority = 3 }

  case object Hearts extends Suit { val name = "Hearts"; val priority = 2 }

  case object Diamonds extends Suit { val name = "Diamonds"; val priority = 1 }

  case object Clubs extends Suit { val name = "Clubs"; val priority = 0 }

The key to this working correctly (and allowing, for example, List(Spades,Clubs) to be sorted without any fuss) is the ordering definition in Concept. It would be possible for Concept not to extend Ordered but then we would not be able to compare two suits directly using a "<" or similar operator.
If you decide to implement this without trait Concept and simply defining its methods directly in Suit, you would find that you do not need to define any implicit ordering. However, in order to sort a list of suits, you would have to explicitly define the type of your list as for example:

  List[Suit](Clubs,Spades)

This is because Suit appears as Suit with Product etc. This is the kind of thing that makes working with implicit values so tricky.

If you'd like to see the code for all this, including unit tests, then go to my Scalaprof site on github. and drill down to edu.neu.coe.scala.enums.