How to import the dependencies of Spark MLlib into eclipse project?

0 votes

I am new to Apache Spark. Currently, I am learning machine learning algorithms and I want to apply those algorithms using Spark MLlib. I am using eclipse and I am finding it difficult to execute my program in eclipse. I also tried downloading the jars and adding it to the build path, but still, it looks difficult to me.

May 31, 2018 in Apache Spark by hack236
199 views

1 answer to this question.

0 votes

I would recommend you create & build a maven project. Where you can specify the dependencies.

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.0.0</version>
</dependency>

These are the dependencies, where first is your spark core, which provides the core spark dependencies. Then, second is your machine learning dependencies & third is your spark sql dependencies.

You can go ahead and add more dependencies according to your requirement.

You can also choose jars from the lib directory present in the Spark root directory.

answered May 31, 2018 by Shubham
• 13,190 points

Related Questions In Apache Spark

0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

answered Dec 10, 2018 in Apache Spark by Kuber
7,072 views
0 votes
1 answer

How to change the location of Spark event logs?

You can change the location where you ...READ MORE

answered Mar 6 in Apache Spark by Rohit
61 views
0 votes
1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

answered May 8, 2018 in Apache Spark by kurt_cobain
• 9,240 points
110 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,190 points
1,459 views
0 votes
1 answer
0 votes
1 answer

Difference between Spark ML & Spark MLlib package

org.apache.spark.mllib is the old Spark API while ...READ MORE

answered Jul 5, 2018 in Apache Spark by Shubham
• 13,190 points
294 views
0 votes
0 answers
0 votes
1 answer

Is it possible to run Apache Spark without Hadoop?

Though Spark and Hadoop were the frameworks designed ...READ MORE

answered May 2 in Big Data Hadoop by ravikiran
• 3,560 points
44 views
0 votes
1 answer
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
10,507 views