Working of map function on data

0 votes

Suppose we have a "customer" file with the data - 
1 vishal
2 vijay
3 vinay

if I create an RDD

val cust = sc.textfile("home\customer.txt").map(_.split(" "))

What operation are map and split going to perform? Can you please explain this to me?

Jul 11 in Apache Spark by Medha
29 views

1 answer to this question.

0 votes

The map function creates an array of arrays and the split function defines the delimiter in the dataset. Refer to the below screenshotimage

Since the dataset was delimited by space, we wrote the split function as - split(" ").  If our dataset was delimited by tab, then we would have to specify "\t" in the split function.

answered Jul 11 by Krish

Related Questions In Apache Spark

0 votes
1 answer

When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?

No, it is not necessary to install ...READ MORE

answered Jun 14, 2018 in Apache Spark by nitinrawat895
• 10,730 points
1,246 views
0 votes
1 answer
0 votes
1 answer

How to load data of .csv file in MySQL Database Table?

You can do it using a code ...READ MORE

answered Jul 22 in Apache Spark by Vishwa
41 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
3,378 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
407 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
16,839 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
1,230 views
0 votes
1 answer

How can I minimize data transfers when working with Spark?

Minimizing data transfers and avoiding shuffling helps ...READ MORE

answered Sep 19, 2018 in Apache Spark by zombie
• 3,690 points
221 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

answered Nov 20, 2018 in Apache Spark by Frankie
• 9,810 points
487 views