The problem is with Stream and other non-strict collections such as Iterator or SeqView. These are inherently unserializable (except for empty collections). I'm sure there are several posts on this in StackOverflow, although I couldn't actually find one when I searched just now. But, I hear you say, these are never the kind of thing you would want to store as a (non-transient) field in a class.
That's true of course, but you might inadvertently include one, as I did just recently. Let's say you have a case class like this:
case class Person(name: String, friends: Seq[String])
You create instances of this class by running a database query on the person and getting his/her friends as a collection of strings. Something like this:
val x = sqlContext.sql("select ...").collect val p = Person("Ronaldo", x.toSeq)
The problem here, is that, if the type of x is Stream[String], then x.toSeq will not change its type since Stream extends Seq. You need to write x.toList instead (or x.toVector).
Actually, it's much better practice never to use Seq for persistable/serializable objects. Far better to force an explicit collection type in the definition of the case class. In particular, you should use either List or Vector, unless you have some specific reason to use a different sequence. In this case, that would be:
case class Person(name: String, friends: List[String])
Very Good Post! Thank you so much for sharing this nice post, it was so good to read and useful to improve my Technical knowledge as updated one, keep blogging.
ReplyDeleteScala and Spark Training in Electronic city