So i found a workaround for the above problem, it's basically another scenario.
What i did is, instead of uploading files to hadoop using copyFromLocal, i used PHP cURL. I will try to explain this step by step.
So you create a php script and insert the below codes:
function call_curl($headers, $method, $url, $data,$file,$size) {
$handle = curl_init();
curl_setopt($handle, CURLOPT_URL, $url);
curl_setopt($handle, CURLOPT_HTTPHEADER, $headers);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($handle, CURLOPT_SAFE_UPLOAD, true);
switch($method) {
case 'GET':
break;
case 'POST':
curl_setopt($handle, CURLOPT_POST, true);
curl_setopt($handle, CURLOPT_POSTFIELDS, $data);
break;
case 'PUT':
curl_setopt($handle, CURLOPT_CUSTOMREQUEST, 'PUT');
curl_setopt($handle, CURLOPT_POSTFIELDS, $data);
curl_setopt($handle, CURLOPT_INFILE, $file);
curl_setopt($handle, CURLOPT_INFILESIZE, $size);
break;
case 'DELETE':
curl_setopt($handle, CURLOPT_CUSTOMREQUEST, 'DELETE');
break;
}
$response = curl_exec($handle);
$code = curl_getinfo($handle, CURLINFO_HTTP_CODE);
curl_close($handle);
return $response;
}
The above function will make a request to hadoop and perform operation that we will define below. In our case that will be PUT since we are uploading to hadoop.
Now i will define a path to the folder containing the files that i wish to upload.
$dir = '/var/www/html/myData';
Now i will create a for loop that will loop through all the files to get all the filenames and afterwards get other details such as file size, file path etc (depends on what you need, basically in my case, i am storing each file details in my database).
foreach (new DirectoryIterator($dir) as $fileInfo) {
if($fileInfo->isDot()) continue;
$filename = $fileInfo->getFilename();
//start curl php session to connect and upload file on hdfs
$header = array('Content-Type: application/octet-stream');
$method = "PUT";
//Path of zip folder
$filepath=$dir."/".$filename;
//echo $filepath;
//size of zip folder
$size = filesize($filepath);
//echo $size;
//verify method of storing the zip folder
if ($size<=(1024*1024*1024)){
$rep = "3WayReplication";
}else{
$rep="erasure";
}
$url="http://chbpc-VirtualBox:9864/webhdfs/v1/".$username."/".$rep."/".$filename."?op=CREATE&namenoderpcaddress=localhost:9000&createflag=&createparent=true&overwrite=false";
$file=fopen($filepath, 'r');
//echo $file;
$filedata = fread($file,$size);
//echo $filedata;
$data = array($filedata, $target_file);
//echo "<br/>data: ".$data;
//echo "<br>";
//echo "url: " . $url;
call_curl($header, $method, $url, $data,$file,$size);
//Store file details into database
$m = new MongoClient();
$collection = $m->ecoss->fileInfo;
$document = array(
"rootFolder" => $username,
"fileName" => $filename,
"filePath" => $filepath,
"fileSize" => ($size/1024)."kb",
"replicationType" => $rep,
"uploadDate" => $date,
"uploadTime" => $time
);
$collection->insert($document);
//echo "Document inserted successfully";
//Allow permission chmod 777 in root folder for deletion of files
if (!unlink($filepath)) {
//echo ("Error deleting ".$filename);
}
else{
//echo ("Deleted ".$filename);
}
}
As you can see above, i am using mongoDB to store my file details. This is why i posted the above question since i was using copyFromLocal , there is no way i would get the file information that was being uploaded to my hdfs. Using php cURL, i have my file names stored in the variable $filename and i am able to store in my database. You can just skip the mongoDB codes if you don't need it.
Now there is something very important, that is the $url. You can just copy paste the above codes, that is absolutely fine, but the url in your case would be different. To get your own $url, please make reference to this: https://hadoop.apache.org/docs/r1.0.4/webhdfs.html#MKDIRS (Create and Write to a File).
Basically you need to have curl installed on your machine. If not, pls follow this link: https://stackoverflow.com/questions/38800606/how-to-install-php-curl-in-ubuntu-16-04
Now, open your terminal (as per the previous hadoop apache link), type your curl command:
curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE
where <HOST> is your hostname that you can find by typing on the terminal:
hostname
<PORT> is the port you use to connect to hadoop and <PATH> is the path to the folder where you wish to upload your files to. After issuing this first command, you will get a response like:
After this, issue another curl command:
curl -i -X PUT -T <LOCAL_FILE>
Here <LOCAL_FILE> is the path to the file that you wish to upload to hdfs. After that, copy the location that you received after having issued the first command. Im my case, the location is:
http://chbpc-VirtualBox:9864/webhdfs/v1/test4/datafile?op=CREATE&namenoderpcaddress=localhost:9000&createflag=&createparent=true&overwrite=false
So basically, the above location, is what you need to put in your $url variable.
$url = "http://chbpc-VirtualBox:9864/webhdfs/v1/test4/datafile?op=CREATE&namenoderpcaddress=localhost:9000&createflag=&createparent=true&overwrite=false";
Naturally, before testing your php script, you can try to upload a file to hdfs through the terminal to test if everything is fine. You just run the second curl command which should be something like this after adding the location:
curl -i -X PUT -T /var/www/html/myData/21\ February\ 2019\ 11_31_55\ PM "http://chbpc-VirtualBox:9864/webhdfs/v1/test4/datafile?op=CREATE&namenoderpcaddress=localhost:9000&createflag=&createparent=true&overwrite=false"
Check in your hdfs if the file has been uploaded. If the file has been uploaded, then your $url should be good.
The code below will just delete all the files in my folder that has already been uploaded to my hdfs.
//Allow permission chmod 777 in root folder for deletion of files
if (!unlink($filepath)) {
//echo ("Error deleting ".$filename);
}
else{
//echo ("Deleted ".$filename);
}
Before you run the php script, delete the files that has been uploaded using the command line curl (just in case you are uploading the same file with php curl). Now run your file and check if it has been uploaded to your hdfs.
Please excuse me if this is a long reply, i have tried to give max details for this solution since it was quite some struggle for me to make this work.
I hope you can use this as a good reference.