Pyspark is taking default path

0 votes
from pyspark import SparkFiles
rdd=sc.textFile("emp/employees/part-m-00000")
rdd.map(lambda line: line.upper()).collect()

This code is executing with no issues . But my file is present in 
/user/edureka_536711/emp/employees/ part-m-00000

I am not sure how the path /user/edureka_536711/ is passing by default and below code is failing :

def get_hdfspath(filename):
my_hdfs="user/{0}".format(user_id.lower())
return os.path.join(my_hdfs,filename)
rdd=sc.textFile(sample)
rdd.map(lambda line: line.upper()).collect()

Can you help here?

Jul 16 in Apache Spark by Will
22 views

1 answer to this question.

0 votes

The HDFS path for MyLab is /user/edureka_id. So, by default, it will take that path even if you do not mention it. As for example if a textfile abc.txt is present the Hadoop path. Then if you mention /user/edureka_id/abc.txt or only abc.txt, both will be the same.

Regarding the code

def get_hdfspath(filename):
my_hdfs="user/{0}".format(user_id.lower())
return os.path.join(my_hdfs,filename)
rdd=sc.textFile(sample)
rdd.map(lambda line: line.upper()).collect()
answered Jul 16 by Khushi

Related Questions In Apache Spark

0 votes
1 answer
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 10,490 points
994 views
0 votes
1 answer

Why is Spark faster than Hadoop Map Reduce

Firstly, it's the In-memory computation, if the file ...READ MORE

answered Apr 30, 2018 in Apache Spark by shams
• 3,580 points
100 views
0 votes
1 answer

Why is collect in SparkR slow?

It's not the collect() that is slow. ...READ MORE

answered May 3, 2018 in Apache Spark by Data_Nerd
• 2,360 points
78 views
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,490 points
2,305 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,490 points
236 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
11,894 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

answered Dec 10, 2018 in Apache Spark by Vini
11,922 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 27, 2018 in Apache Spark by shams
• 3,580 points
11,678 views