Friday, November 4, 2016

A generic tail-recursive method for tree traversal

In an earlier article, Transforming iteration into tail recursion, I talked about the general functional-programming technique of replacing iteration by recursion. That method is quite simple and is fairly obvious for linear collections.

But this week, I was working on the general problem of tree traversal. In a binary tree, for instance, as you recurse through the tree, you need to keep track of the results of two recursive tree traversals at each node. They can't both be tail-recursive! But, fortunately, there is a well-known, more general technique available. The method basically consists, as usual, of creating an "inner" method with an accumulator (r for result) and a list of unprocessed elements (ts). The only extra complication is that when we visit a tree node, we need to add all the children (two in the case of a binary tree) to the unprocessed list at the same time as we process the node itself (if there is indeed anything at the node).

Because trees can take so many forms, I created a very general mechanism to implement this "inner" method, which I call traverse in the companion object of a type-class called Parent.

import scala.annotation.tailrec

/**
  * A trait which expresses parenthood.
  * This defines a type class.
  *
  * @tparam T the underlying type of the Parent
  */
trait Parent[T] {
  /**
    * Get the children of a T.
    *
    * @param t the parent whose children we want
    * @return the children as a Seq of T
    */
  def children(t: T): Seq[T]
}

object Parent {
  /**
    * Tail-recursive method to traverse a tree-like object (defined as something that implements Parent).
    *
    * @param f  the map function which takes a T and produces an R for just that parent object (not its children).
    * @param g  the reduce function which accumulates the result (the accumulator, i.e. r, is passed as its first parameter).
    * @param q  the quick return function which, if q(r) yields true, an immediate return of r is made.
    * @param z  the function which builds the T list (ts) from the existing T list and the given T (parent node).
    * @param ts a list of Ts to be worked on.
    * @param r  the current value of the result, i.e. the "accumulator".
    * @tparam T a type which extends Parent, and thus has children of type T -- this "context bound" is implemented via a compiler-generated implicit parameter of type Parent[T].
    * @tparam R the result type.
    * @return a value of R.
    */
  @tailrec
  final def traverse[T: Parent, R](f: T => R, g: (R, R) => R, q: R => Boolean, z: (List[T], T) => List[T])(ts: List[T], r: R): R =
    if (q(r))
      r
    else
      ts match {
        case Nil => r
        case h :: t => traverse(f, g, q, z)(z(t, h), g(r, f(h)))
      }
}

In order to use this for something like size, we need a tree structure made up of nodes, of type T, which implement Parent[T]. In the following code, Node[A] is a node in a tree.

/**
  * This trait expresses the notion of a Node from a tree.
  *
  */
sealed trait Node[+A] {
  /**
    * @return If this node contains an actual value x, return Some(x) else None
    */
  def get: Option[A]

  /**
    * Get the children of this Node, if any.
    * @return the children as a Seq of Nodes
    */
  def children: Seq[Node[A]]

  /**
    * @return the size of the subtree
    */
  def size: Int

  /**
    * @return true if node contains value or children; otherwise false
    */
  def isEmpty: Boolean

  /**
    * @param b the value to be searched
    * @tparam B the type of b
    * @return true if value b is found in the sub-tree defined by this
    */
  def includes[B >: A](b: B): Boolean
}

But, how can we make Node[A] extend Parent[Node[A]]? How about using the magic of type classes (see my earlier blog, Type classes, on the subject)? But a simple type class uses an implicit object which obviously can't itself have a parametric type.

The solution lies in an implicit conversion method. Normally, such methods convert from one type (T1) to another (T2) when the context calls for a T2 and only a T1 is available. However, in this case, T2 is Node[A] but there is no T1! But, the implicit mechanism works also for Unit=>T2. That's really cool!

So, here's the code that takes care of the implicit "evidence" for a node to be a parent:

object Node {
  /**
    * Implicit definition of a Parent for the (type-parameterized, i.e. generic) type Node[A]
    *
    * @tparam A the underlying node type
    * @return a Parent of type Node[A]
    */
  implicit def nodeParent[A]: Parent[Node[A]] = new Parent[Node[A]] {
    /**
      * @return the children as a Seq of Nodes
      */
    def children(t: Node[A]): Seq[Node[A]] = t.children
  }
}

And, finally, here is the code for the implementation of size and includes in a trait TreeLike[+A] which extends Node[A]:

  def size: Int = Parent.traverse[Node[A], Int]({
    case Empty => 0
    case _ => 1
  }, _+_, { x => false}, _ ++ _.children)(List(this), 0)

  def includes[B >: A](b: B): Boolean = Parent.traverse[Node[B], Boolean]({ n => n.get match {
    case Some(`b`) => true
    case _ => false
  }}, _|_, {x => x}, _ ++ _.children)(List(this), false)

Type classes come to the rescue again and we are able to define a wholly generic tail-recursive method for tree traversal.