Copy file from local to hdfs from the spark job in yarn mode

0 votes

How can I copy the file from local to hdfs from the spark job in yarn mode? Means, hdfs dfs -put command equivalent the the spark. Because I have a file in local i need to preprocess it the need to put the file in hdfs and then apply the transformation logic. Everything i need to do in one job.

Jul 24 in Apache Spark by Rishi
59 views

1 answer to this question.

0 votes

Refer to the below code:

import org.apache.hadoop.conf.Configuration

import org.apache.hadoop.fs.FileSystem

import org.apache.hadoop.fs.Path


val hadoopConf = new Configuration()

val hdfs = FileSystem.get(hadoopConf)


val srcPath = new Path("/home/edureka/Documents/data")

val destPath = new Path("hdfs:///tranferrred_data")


hdfs.copyFromLocalFile(srcPath, destPath)

Any Spark Job that you are executing, you might want to include the above code snippet according to your requirement use spark-submit to deploy your code in the cluster. Every time you deploy your spark application, the data in your local gets transferred to the hdfs and then you can perform your transformations accordingly.

You might use the below dependencies:

libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0"

libraryDependencies += "org.apache.commons" % "commons-io" % "1.3.2"

libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.6.0"
answered Jul 24 by Yogi

Related Questions In Apache Spark

0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,240 points
1,222 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,290 points
1,993 views
0 votes
1 answer

Copy all files from local (Windows) to HDFS with Scala code

Please try the following Scala code: import org.apache.hadoop.conf.Configuration import ...READ MORE

answered May 22 in Apache Spark by Karan
168 views
0 votes
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6 in Apache Spark by Gitika
• 25,340 points
89 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,698 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
284 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
13,417 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
983 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,290 points
1,185 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
13,402 views