Spark Machine Learning pipeline works fine in Spark 1 6 but it gives error when executed on Spark 2 x

Question

I have written a code in Spark1.6 which was working fine. However when I converted it to Saprk 2.0. I am getting an error as following:

 <console>:56: error: type mismatch;
found   : Array[org.apache.spark.ml.feature.QuantileDiscretizer]
required: Array[org.apache.spark.ml.PipelineStage with org.apache.spark.ml.para                                                                                                                         m.shared.HasOutputCol with org.apache.spark.ml.util.DefaultParamsWritable{def co                                                                                                                       py(extra: org.apache.spark.ml.param.ParamMap): org.apache.spark.ml.PipelineStage                                                                                                                      with org.apache.spark.ml.param.shared.HasOutputCol with org.apache.spark.ml.uti                                                                                                                     l.DefaultParamsWritable{def copy(extra: org.apache.spark.ml.param.ParamMap): org                                                                                                                     .apache.spark.ml.PipelineStage with org.apache.spark.ml.param.shared.HasOutputCo                                                                                                                     l with org.apache.spark.ml.util.DefaultParamsWritable}}]
Note: org.apache.spark.ml.feature.QuantileDiscretizer <: org.apache.spark.ml.Pip                                                                                                                     elineStage with org.apache.spark.ml.param.shared.HasOutputCol with org.apache.sp                                                                                                                     ark.ml.util.DefaultParamsWritable{def copy(extra: org.apache.spark.ml.param.Para                                                                                                                     mMap): org.apache.spark.ml.PipelineStage with org.apache.spark.ml.param.shared.H                                                                                                                     asOutputCol with org.apache.spark.ml.util.DefaultParamsWritable{def copy(extra:                                                                                                                      org.apache.spark.ml.param.ParamMap): org.apache.spark.ml.PipelineStage with org.                                                                                                                     apache.spark.ml.param.shared.HasOutputCol with org.apache.spark.ml.util.DefaultP                                                                                                                     aramsWritable}}, but class Array is invariant in type T.   
 You may wish to investigate a wildcard type such as `_ <: org.apache.spark.ml.Pi                                                                                                                     pelineStage with org.apache.spark.ml.param.shared.HasOutputCol with org.apache.s                                                                                                                     park.ml.util.DefaultParamsWritable{def copy(extra: org.apache.spark.ml.param.Par                                                                                                                     amMap): org.apache.spark.ml.PipelineStage with org.apache.spark.ml.param.shared.                                                                                                                     HasOutputCol with org.apache.spark.ml.util.DefaultParamsWritable{def copy(extra:                                                                                                                      org.apache.spark.ml.param.ParamMap): org.apache.spark.ml.PipelineStage with org                                                                                                                     .apache.spark.ml.param.shared.HasOutputCol with org.apache.spark.ml.util.Default                                                                                                                     ParamsWritable}}`. (SLS 3.2.10)

Shubham · Answer 1 · May 31, 2018

You need to change the following:

val pipeline = new Pipeline().setStages(discretizers ++ Array(assembler, selector))

If you want to Master Machine Learning concepts. Enroll in Machine Learning Course now!

answered May 31, 2018 by Shubham
• 13,490 points

Spark Machine Learning pipeline works fine in Spark 1 6 but it gives error when executed on Spark 2 x

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Apache Spark

Spark 2.3? What is new in it?

Cannot resolve Error In Spark when filter records with two where condition

3)You have a dataset of in-game purcahses from mobile game users and you want to group these users for upsell. which one of the spark machine learning algorithms could you use ?

In AWS, if user wants to run spark, then on top of which one of the following can the user do it?

How to import the dependencies of Spark MLlib into eclipse project?

Difference between Spark ML & Spark MLlib package

How do I get number of columns in each line from a delimited file??

Is it possible to run Apache Spark without Hadoop?

Getting error while connecting zookeeper in Kafka - Spark Streaming integration

Filtering a row in Spark DataFrame based on matching values from a list

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES