How to import the dependencies of Spark MLlib into eclipse project?

0 votes

I am new to Apache Spark. Currently, I am learning machine learning algorithms and I want to apply those algorithms using Spark MLlib. I am using eclipse and I am finding it difficult to execute my program in eclipse. I also tried downloading the jars and adding it to the build path, but still, it looks difficult to me.

May 31, 2018 in Apache Spark by hack236
252 views

1 answer to this question.

0 votes

I would recommend you create & build a maven project. Where you can specify the dependencies.

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.0.0</version>
</dependency>

These are the dependencies, where first is your spark core, which provides the core spark dependencies. Then, second is your machine learning dependencies & third is your spark sql dependencies.

You can go ahead and add more dependencies according to your requirement.

You can also choose jars from the lib directory present in the Spark root directory.

answered May 31, 2018 by Shubham
• 13,290 points

Related Questions In Apache Spark

0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

answered Dec 10, 2018 in Apache Spark by Kuber
9,456 views
0 votes
1 answer

How to change the location of Spark event logs?

You can change the location where you ...READ MORE

answered Mar 6 in Apache Spark by Rohit
130 views
0 votes
1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

answered May 8, 2018 in Apache Spark by kurt_cobain
• 9,240 points
167 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,290 points
1,963 views
0 votes
1 answer
0 votes
1 answer

Difference between Spark ML & Spark MLlib package

org.apache.spark.mllib is the old Spark API while ...READ MORE

answered Jul 5, 2018 in Apache Spark by Shubham
• 13,290 points
326 views
0 votes
1 answer
0 votes
1 answer

Is it possible to run Apache Spark without Hadoop?

Though Spark and Hadoop were the frameworks designed ...READ MORE

answered May 2 in Big Data Hadoop by ravikiran
• 4,560 points
62 views
0 votes
1 answer
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
13,271 views