Pyspark is taking default path

0 votes
from pyspark import SparkFiles
rdd=sc.textFile("emp/employees/part-m-00000")
rdd.map(lambda line: line.upper()).collect()

This code is executing with no issues . But my file is present in 
/user/edureka_536711/emp/employees/ part-m-00000

I am not sure how the path /user/edureka_536711/ is passing by default and below code is failing :

def get_hdfspath(filename):
my_hdfs="user/{0}".format(user_id.lower())
return os.path.join(my_hdfs,filename)
rdd=sc.textFile(sample)
rdd.map(lambda line: line.upper()).collect()

Can you help here?

Jul 16, 2019 in Apache Spark by Will
1,214 views

1 answer to this question.

0 votes

The HDFS path for MyLab is /user/edureka_id. So, by default, it will take that path even if you do not mention it. As for example if a textfile abc.txt is present the Hadoop path. Then if you mention /user/edureka_id/abc.txt or only abc.txt, both will be the same.

Regarding the code

def get_hdfspath(filename):
my_hdfs="user/{0}".format(user_id.lower())
return os.path.join(my_hdfs,filename)
rdd=sc.textFile(sample)
rdd.map(lambda line: line.upper()).collect()


Hope this helps!

Join PySpark Training online today to know more about Pyspark.

Thanks.

answered Jul 16, 2019 by Khushi

Related Questions In Apache Spark

0 votes
1 answer
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
8,030 views
0 votes
1 answer

Why is Spark faster than Hadoop Map Reduce

Firstly, it's the In-memory computation, if the file ...READ MORE

answered Apr 30, 2018 in Apache Spark by shams
• 3,670 points
1,118 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,599 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,206 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,739 views
0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

answered Dec 14, 2020 in Apache Spark by Gitika
• 65,910 points
122,141 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 28, 2018 in Apache Spark by shams
• 3,670 points
42,375 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP