Copy files to all Hadoop DFS directories

0 votes

I am trying to copy files to hdfs. I need to copy all files from a specified local folder(myData) to all directories available in my hdfs. I am using wildcards for this.

Here is my code:

bin/hdfs dfs -copyFromLocal myData/* /*/

This is my error:

copyFromLocal: `/var/': No such file or directory 

Normally when i specify a single directory in hdfs, all my files are being copied successfully to that directory.
But when i use wildcards(*) to copy to all directories in my hdfs, i get the above error.

Is there any way to get around this?
Thank you. 

Feb 23 in Big Data Hadoop by Bhavish
• 370 points
303 views

1 answer to this question.

+1 vote
Best answer

Hi @Bhavish. There is no Hadoop command to copy a local file to all/multiple directories in hdfs. But you can do it using a loop in the bash script. 

Create a new bash script, I named my file copytoallhdfs.sh:

$ nano copytoallhdfs.sh

I came up with the following script for your problem statement:

#!/bin/bash
myarray=`hdfs dfs -ls -C /path/to/directory`
for name in $myarray
do hdfs dfs -copyFromLocal /path/to/source /$name; done

Save(Ctrl+o) and close(Ctrl+x) this file

Now make this script executable:

$ sudo chmod 777 copytoallhdfs.sh

and run the script:

$./copytoallhdfs.sh

This worked for me. 

answered Feb 23 by Omkar
• 67,460 points

selected Feb 23 by Bhavish

Hello @Omkar,

Your response is working perfectly for me. I just made some changes to the script.

Here is my script:

#!/bin/bash

myarray=`bin/hdfs dfs -ls -C /`

echo $myarray;
for name in $myarray
do bin/hdfs dfs -copyFromLocal myData/* $name;
done

The line  echo $myarray; gives me /folder1 /folder2 /folder3.

Since the slash notation '/' is already in the output, i have removed the slash before the variable $name. Also i removed the slash before pathToSource since it was not working.

Many thanks.

Happy to be of help @Bhavish!

Related Questions In Big Data Hadoop

0 votes
1 answer

Hadoop: ERROR datanode.DataNode: All directories in dfs.data.dir are invalid.

Hi, Try this, first delete all contents from ...READ MORE

answered Aug 5 in Big Data Hadoop by Gitika
• 25,340 points
20 views
0 votes
1 answer

How to sync Hadoop configuration files to multiple nodes?

For syncing Hadoop configuration files, you have ...READ MORE

answered Jun 21, 2018 in Big Data Hadoop by HackTheCode
162 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,700 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
284 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
13,437 views
+2 votes
5 answers

Not able to start hadoop dfs

You can re-install openssh-client and openssh-server: $ sudo ...READ MORE

answered Oct 25, 2018 in Big Data Hadoop by Jino
226 views
0 votes
5 answers

Hadoop hdfs: list all files in a directory and its subdirectories

Hi, You can try this command: hadoop fs -ls ...READ MORE

answered Aug 1 in Big Data Hadoop by Dinish
1,644 views