How to import the dependencies of Spark MLlib into eclipse project

0 votes

I am new to Apache Spark. Currently, I am learning machine learning algorithms and I want to apply those algorithms using Spark MLlib. I am using eclipse and I am finding it difficult to execute my program in eclipse. I also tried downloading the jars and adding it to the build path, but still, it looks difficult to me.

May 31, 2018 in Apache Spark by hack236
980 views

1 answer to this question.

0 votes

I would recommend you create & build a maven project. Where you can specify the dependencies.

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.0.0</version>
</dependency>

These are the dependencies, where first is your spark core, which provides the core spark dependencies. Then, second is your machine learning dependencies & third is your spark sql dependencies.

You can go ahead and add more dependencies according to your requirement.

You can also choose jars from the lib directory present in the Spark root directory.

answered May 31, 2018 by Shubham
• 13,480 points

Related Questions In Apache Spark

+1 vote
8 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

answered Dec 10, 2018 in Apache Spark by Akshay
43,384 views
0 votes
1 answer

How to change the location of Spark event logs?

You can change the location where you ...READ MORE

answered Mar 6, 2019 in Apache Spark by Rohit
1,833 views
0 votes
1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

answered May 8, 2018 in Apache Spark by kurt_cobain
• 9,390 points
764 views
0 votes
1 answer
0 votes
1 answer

Difference between Spark ML & Spark MLlib package

org.apache.spark.mllib is the old Spark API while ...READ MORE

answered Jul 5, 2018 in Apache Spark by Shubham
• 13,480 points
937 views
+1 vote
2 answers
0 votes
1 answer

Is it possible to run Apache Spark without Hadoop?

Though Spark and Hadoop were the frameworks designed ...READ MORE

answered May 2, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
360 views
0 votes
1 answer
0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

answered Dec 13, 2020 in Apache Spark by Gitika
• 65,870 points
55,268 views