Copy all files from local (Windows) to HDFS with Scala code

0 votes
I have requirement copying the files from local machine to Hadoop environment with Scala programming.

1. We have to copy all the files from the share point folder to the local machine. (This one I am able to copy from share folder to location machine)

2. Once files copied to a local machine then I have to move all files from local machine (windows) to HDFS location with scala code.

Can you please help me with code how to copy/move all the files from the local system to HDFS path (2nd point) with Scala programming.
May 22 in Apache Spark by Rishab
24 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Please try the following Scala code:

import org.apache.hadoop.conf.Configuration

import org.apache.hadoop.fs.FileSystem

import org.apache.hadoop.fs.Path


val hadoopConf = new Configuration()

val hdfs = FileSystem.get(hadoopConf)


val srcPath = new Path(srcFilePath)

val destPath = new Path(destFilePath)


hdfs.copyFromLocalFile(srcPath, destPath)

You should also check if Spark has the HADOOP_CONF_DIR variable set in the conf/spark-env.sh file. This will make sure that Spark is going to find the Hadoop configuration settings.

The dependencies for the build.sbt file:

libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0"

libraryDependencies += "org.apache.commons" % "commons-io" % "1.3.2"

libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.6.0"
answered May 22 by Karan

Related Questions In Apache Spark

0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 12,890 points
1,274 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,240 points
894 views
0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

answered Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,240 points
759 views
0 votes
1 answer

Is it better to have one large parquet file or lots of smaller parquet files?

Ideally, you would use snappy compression (default) ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 9,670 points
1,153 views
0 votes
1 answer

Is it possible to run Spark and Mesos along with Hadoop?

Yes, it is possible to run Spark ...READ MORE

answered May 29, 2018 in Apache Spark by Data_Nerd
• 2,350 points
30 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 12,890 points
762 views
0 votes
1 answer

Scala pass input data as arguments

Please refer to the below code as ...READ MORE

answered Jun 19 in Apache Spark by Lisa
12 views
0 votes
1 answer

Doubt in display(id, name, salary) before display function

The statement display(id, name, salary) is written before the display function ...READ MORE

answered Jun 19 in Apache Spark by Ritu
11 views
0 votes
1 answer
0 votes
1 answer

How to select all columns with group by?

You can use the following to print ...READ MORE

answered Feb 18 in Apache Spark by Omkar
• 67,000 points
49 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.