what is Paired RDD and how to create paired RDD in Spark?

0 votes
Can anyone explain how to create paired RDD in Spark?
Aug 2 in Apache Spark by disha
60 views

1 answer to this question.

0 votes

Hi,

Paired RDD is a distributed collection of data with the key-value pair. It is a subset of Resilient Distributed Dataset So it has all the features of RDD and some new feature for the key-value pair. There are many transformation operations available for Paired RDD. These operations on Paired RDD are very useful to solve many use cases that require sorting, grouping, reducing some value/function.
Commonly used operations on paired RDD are: groupByKey() reduceByKey() countByKey() join() etc.

Here is an example regarding creating of paired RDD:

val pRDD:[(String),(Int)]=sc.textFile(“path_of_your_file”)
.flatMap(line => line.split(” “))
.map{word=>(word,word.length)}
answered Aug 2 by Gitika
• 25,340 points

Related Questions In Apache Spark

+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 27, 2018 in Apache Spark by shams
• 3,580 points
13,608 views
0 votes
1 answer

Can anyone explain what is RDD in Spark?

RDD is a fundamental data structure of ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,290 points
581 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,290 points
1,989 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,670 points
1,306 views
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,690 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
283 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
13,390 views
0 votes
1 answer

How to create paired RDD using subString method in Spark?

Hi, If you have a file with id ...READ MORE

answered Aug 2 in Apache Spark by Gitika
• 25,340 points
27 views
0 votes
1 answer

What is Spark UI and how to monitor a spark job?

Hey, Jobs- to view all the spark jobs Stages- ...READ MORE

answered Aug 6 in Apache Spark by Gitika
• 25,340 points
39 views