Spark Machine Learning pipeline works fine in Spark 1 6 but it gives error when executed on Spark 2 x

0 votes

I have written a code in Spark1.6 which was working fine. However when I converted it to Saprk 2.0. I am getting an error as following:

 <console>:56: error: type mismatch;
found   : Array[org.apache.spark.ml.feature.QuantileDiscretizer]
required: Array[org.apache.spark.ml.PipelineStage with org.apache.spark.ml.para                                                                                                                         m.shared.HasOutputCol with org.apache.spark.ml.util.DefaultParamsWritable{def co                                                                                                                       py(extra: org.apache.spark.ml.param.ParamMap): org.apache.spark.ml.PipelineStage                                                                                                                      with org.apache.spark.ml.param.shared.HasOutputCol with org.apache.spark.ml.uti                                                                                                                     l.DefaultParamsWritable{def copy(extra: org.apache.spark.ml.param.ParamMap): org                                                                                                                     .apache.spark.ml.PipelineStage with org.apache.spark.ml.param.shared.HasOutputCo                                                                                                                     l with org.apache.spark.ml.util.DefaultParamsWritable}}]
Note: org.apache.spark.ml.feature.QuantileDiscretizer <: org.apache.spark.ml.Pip                                                                                                                     elineStage with org.apache.spark.ml.param.shared.HasOutputCol with org.apache.sp                                                                                                                     ark.ml.util.DefaultParamsWritable{def copy(extra: org.apache.spark.ml.param.Para                                                                                                                     mMap): org.apache.spark.ml.PipelineStage with org.apache.spark.ml.param.shared.H                                                                                                                     asOutputCol with org.apache.spark.ml.util.DefaultParamsWritable{def copy(extra:                                                                                                                      org.apache.spark.ml.param.ParamMap): org.apache.spark.ml.PipelineStage with org.                                                                                                                     apache.spark.ml.param.shared.HasOutputCol with org.apache.spark.ml.util.DefaultP                                                                                                                     aramsWritable}}, but class Array is invariant in type T.   
 You may wish to investigate a wildcard type such as `_ <: org.apache.spark.ml.Pi                                                                                                                     pelineStage with org.apache.spark.ml.param.shared.HasOutputCol with org.apache.s                                                                                                                     park.ml.util.DefaultParamsWritable{def copy(extra: org.apache.spark.ml.param.Par                                                                                                                     amMap): org.apache.spark.ml.PipelineStage with org.apache.spark.ml.param.shared.                                                                                                                     HasOutputCol with org.apache.spark.ml.util.DefaultParamsWritable{def copy(extra:                                                                                                                      org.apache.spark.ml.param.ParamMap): org.apache.spark.ml.PipelineStage with org                                                                                                                     .apache.spark.ml.param.shared.HasOutputCol with org.apache.spark.ml.util.Default                                                                                                                     ParamsWritable}}`. (SLS 3.2.10)
May 31, 2018 in Apache Spark by hack236
968 views

1 answer to this question.

0 votes

You need to change the following:

val pipeline = new Pipeline().setStages(discretizers ++ Array(assembler, selector))

If you want to Master Machine Learning concepts. Enroll in Machine Learning Course now!

answered May 31, 2018 by Shubham
• 13,490 points

Related Questions In Apache Spark

0 votes
1 answer

Spark 2.3? What is new in it?

Here are the changes in new version ...READ MORE

answered May 28, 2018 in Apache Spark by kurt_cobain
• 9,390 points
781 views
+1 vote
1 answer

Cannot resolve Error In Spark when filter records with two where condition

Try df.where($"cola".isNotNull && $"cola" =!= "" && !$"colb".isin(2,3)) your ...READ MORE

answered Dec 13, 2019 in Apache Spark by Alexandru
• 510 points

edited Dec 13, 2019 by Alexandru 2,646 views
0 votes
1 answer
0 votes
1 answer

Difference between Spark ML & Spark MLlib package

org.apache.spark.mllib is the old Spark API while ...READ MORE

answered Jul 5, 2018 in Apache Spark by Shubham
• 13,490 points
2,086 views
+1 vote
2 answers
0 votes
1 answer

Is it possible to run Apache Spark without Hadoop?

Though Spark and Hadoop were the frameworks designed ...READ MORE

answered May 3, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
1,210 views
0 votes
1 answer

Getting error while connecting zookeeper in Kafka - Spark Streaming integration

I guess you need provide this kafka.bootstrap.servers ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,490 points
2,785 views
0 votes
3 answers

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3,1) df.filter(col("uid").isin(notFollowingList:_*)) You can ...READ MORE

answered Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points
92,667 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP