Working of map function on data

0 votes

Suppose we have a "customer" file with the data - 
1 vishal
2 vijay
3 vinay

if I create an RDD

val cust = sc.textfile("home\customer.txt").map(_.split(" "))

What operation are map and split going to perform? Can you please explain this to me?

Jul 11 in Apache Spark by Medha
24 views

1 answer to this question.

0 votes

The map function creates an array of arrays and the split function defines the delimiter in the dataset. Refer to the below screenshotimage

Since the dataset was delimited by space, we wrote the split function as - split(" ").  If our dataset was delimited by tab, then we would have to specify "\t" in the split function.

answered Jul 11 by Krish

Related Questions In Apache Spark

0 votes
1 answer

When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?

No, it is not necessary to install ...READ MORE

answered Jun 14, 2018 in Apache Spark by nitinrawat895
• 10,670 points
1,104 views
0 votes
1 answer
0 votes
1 answer

How to load data of .csv file in MySQL Database Table?

You can do it using a code ...READ MORE

answered Jul 22 in Apache Spark by Vishwa
34 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,736 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
289 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
13,553 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
992 views
0 votes
1 answer

How can I minimize data transfers when working with Spark?

Minimizing data transfers and avoiding shuffling helps ...READ MORE

answered Sep 19, 2018 in Apache Spark by zombie
• 3,690 points
168 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,810 points
340 views