Building Yarn and Hive on Spark - Edureka Blog

Comprehensive HIVE (4 Blogs) Become a Certified Professional

Become a Certified Professional

In this blog, let us see how to build Spark for a specific Hadoop version.

We will also learn how to build Spark with HIVE and YARN.

Considering that you have Hadoop, jdk, mvn and git pre-installed and pre-configured on your system.

Open Mozilla browser and Download Spark using below link.

https://edureka.wistia.com/medias/k14eamzaza/

Open terminal.

Command: tar -xvf Downloads/spark-1.1.1.tgz

Command: ls

Open spark-1.1.1 directory.

You can open pom.xml file. This file gives you the information about all the dependencies you need.

Do not edit it to stay out of trouble.

Command: cd spark-1.1.1/

Command: sudo gedit sbt/sbt-launch-lib.bash

Edit the file as below snapshot, save it and close it.

We are reducing the memory to avoid object heap space issue as mentioned in below snapshot.

Now, run the below command in the terminal to build spark for Hadoop 2.2.0 with HIVE and YARN.

Command: ./sbt/sbt -Pyarn -Phive -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests assembly

Note: My Hadoop version is 2.2.0, you can change it according to your Hadoop version.

For other Hadoop versions

# Apache Hadoop 2.0.5-alpha

-Dhadoop.version=2.0.5-alpha

# Cloudera CDH 4.2.0

-Dhadoop.version=2.0.0-cdh4.2.0

# Apache Hadoop 0.23.x

-Phadoop-0.23 -Dhadoop.version=0.23.7

# Apache Hadoop 2.3.X

-Phadoop-2.3 -Dhadoop.version=2.3.0

# Apache Hadoop 2.4.X

-Phadoop-2.4 -Dhadoop.version=2.4.0

It will take some time for compiling and packaging, please wait till it completes.

Two jars spark-assembly-1.1.1-hadoop2.2.0.jar and spark-examples-1.1.1-hadoop2.2.0.jar gets created.

Path of spark-assembly-1.1.1-hadoop2.2.0.jar : /home/edureka/spark-1.1.1/assembly/target/scala-2.10/spark-assembly-1.1.1-hadoop2.2.0.jar

Path of spark-examples-1.1.1-hadoop2.2.0.jar : /home/edureka/spark-1.1.1/examples/target/scala-2.10/spark-examples-1.1.1-hadoop2.2.0.jar

Congratulations, you have successfully built Spark for Hive & Yarn.

Got a question for us? Please mention them in the comments section and we will get back to you.

Related Posts:

Get Started with Apache Spark

Apache Spark Lighting up the Big Data World

Apache Spark Ecosystem

Apache Spark with Hadoop-Why it matters?

Start your Training in Apache Spark & Scala Today.

Recommended videos for you

Big Data Processing With Apache Spark

Ways to Succeed with Hadoop in 2015

Logistic Regression In Data Science

Real-Time Analytics with Apache Storm

Power of Python With BigData

Hadoop-Interview-Questions-and-Answers-Big-Data-Interview-Questions-Hadoop-Tutorial-Edureka.jpeg

Top Hadoop Interview Questions and Answers – Ace Your Interview

Apache Spark For Faster Batch Processing

mapreduce-design-patterns-application-of-join-pattern.jpg

MapReduce Design Patterns – Application of Join Pattern

Big-Data-Tutorial-For-Beginners-What-Is-Big-Data-Big-Data-Tutorial-Hadoop-Training-Edureka.jpeg

Big Data Tutorial – Get Started With Big Data And Hadoop

Python-for-Big-Data-Analytics-1-Python-Hadoop-Tutorial-for-Beginners-Python-Tutorial-Edureka.jpeg

Python for Big Data Analytics

When not to use Hadoop

Big Data Processing with Spark and Scala

Administer Hadoop Cluster

Hadoop Cluster With High Availability

MapReduce-Tutorial-What-is-MapReduce-Hadoop-MapReduce-Tutorial-Edureka.jpeg

MapReduce Tutorial – All You Need To Know About MapReduce

Apache Spark Redefining Big Data Processing

Spark SQL | Apache Spark

HBase-Tutorial-Apache-HBase-Tutorial-for-Beginners-NoSQL-Databases-Hadoop-Tutorial-Edureka.jpeg

HBase Tutorial – A Complete Guide On Apache HBase

Advanced Security In Hadoop Cluster

5 Scenarios: When To Use & When Not to Use Hadoop

Recommended blogs for you

Introduction of Hadoop Architecture

Cloudera Hadoop: Getting started with CDH Distribution

What is SAP HANA?

How to Plan the Capacity of a Hadoop Cluster?

Increasing Demand for ‘ Hadoop and NoSQL Skills ’

Azure Data Engineer Roadmap in 2025

Pig Programming: Create Your First Apache Pig Script

How To Install MongoDB On Ubuntu Operating System?

Top 3 Big Data Certifications : Become a Big Data Hadoop Professional

HDFS-Tutorial-Introduction-to-HDFS-its-Features-300x175.png

HDFS Tutorial: Introduction to HDFS & its Features

Copy Activity in Azure Data Factory and Azure Synapse Analytics

How Predictive Analysis can Help you Combat Employee Attrition

Oozie Tutorial: Learn How to Schedule your Hadoop Jobs

Steps to Create UDF in Apache Pig

PySpark-Dataframes-Tutorial-Introduction-to-PySpark-Dataframes-API-PySpark-Training-Edureka.jpeg

PySpark Dataframe Tutorial – PySpark Programming with Dataframes

Stateful Transformations in Apache Spark Streaming

Apache Flume Tutorial : Twitter Data Streaming

Hive and Yarn Examples on Spark

Apache Kafka: What You Need For A Career In Real-Time Analytics

Real Time Storm Project

Comments

1 Comment

Join the discussionCancel reply

REGISTER FOR FREE WEBINAR

webinar_success

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP

Hive & Yarn Get Electrified By Spark

edureka.co