Migrating from Spark 1.6 to Spark 2.0
Introduction#
Spark 2.0 has been released and contains many enhancements and new features. If you are using Spark 1.6 and now you want to upgrade your application to use Spark 2.0, you have to take into account some changes in the API. Below are some of the changes to the code that need to be made.
Update build.sbt file
Update build.sbt with :
scalaVersion := "2.11.8" // Make sure to have installed Scala 11
sparkVersion := "2.0.0" // Make sure to have installed Spark 2.0
Note that when compiling with sbt package
, the .jar
will now be created in target/scala-2.11/
, and the .jar
name will also be changed, so the spark-submit
command need to be updated as well.
Update ML Vector libraries
ML Transformers
now generates org.apache.spark.ml.linalg.VectorUDT
instead of org.apache.spark.mllib.linalg.VectorUDT
.
They are also mapped locally to subclasses of org.apache.spark.ml.linalg.Vector
. These are not compatible with old MLLib API which is moving towards deprecation in Spark 2.0.0.
//import org.apache.spark.mllib.linalg.{Vector, Vectors} // Depreciated in Spark 2.0
import org.apache.spark.ml.linalg.Vector // Use instead