How to import the dependencies of Spark MLlib into eclipse project?

0 votes

I am new to Apache Spark. Currently, I am learning machine learning algorithms and I want to apply those algorithms using Spark MLlib. I am using eclipse and I am finding it difficult to execute my program in eclipse. I also tried downloading the jars and adding it to the build path, but still, it looks difficult to me.

May 31, 2018 in Apache Spark by hack236
402 views

1 answer to this question.

0 votes

I would recommend you create & build a maven project. Where you can specify the dependencies.

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.0.0</version>
</dependency>

These are the dependencies, where first is your spark core, which provides the core spark dependencies. Then, second is your machine learning dependencies & third is your spark sql dependencies.

You can go ahead and add more dependencies according to your requirement.

You can also choose jars from the lib directory present in the Spark root directory.

answered May 31, 2018 by Shubham
• 13,370 points

Related Questions In Apache Spark

0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

answered Dec 10, 2018 in Apache Spark by Akshay
16,164 views
0 votes
1 answer

How to change the location of Spark event logs?

You can change the location where you ...READ MORE

answered Mar 6, 2019 in Apache Spark by Rohit
406 views
0 votes
1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

answered May 8, 2018 in Apache Spark by kurt_cobain
• 9,290 points
308 views
0 votes
1 answer
0 votes
1 answer

Difference between Spark ML & Spark MLlib package

org.apache.spark.mllib is the old Spark API while ...READ MORE

answered Jul 5, 2018 in Apache Spark by Shubham
• 13,370 points
485 views
+1 vote
1 answer
0 votes
1 answer

Is it possible to run Apache Spark without Hadoop?

Though Spark and Hadoop were the frameworks designed ...READ MORE

answered May 2, 2019 in Big Data Hadoop by ravikiran
• 4,600 points
126 views
0 votes
1 answer
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
19,809 views