Join in RDD using keys

0 votes
Hi Team,

How can I join two rdd without converting into dataframe?

rdd_x=(k1, V_x)
rdd_y=(k1, V_y)

Result should be like this: (k1(V_x, V_y)
Aug 2 in Apache Spark by Jishan
42 views

1 answer to this question.

0 votes

Suppose you have two dataset results( id, result) and student(name, id). Now, you can join the RDD by using the below commands in Spark on the basis of the common key id.​

case class results (roll_id: Int, result: String)

case class students (name: String, roll_id: Int)

val a = sc.textFile("file:///home/edureka/Desktop/all-files/datsets/f1").map(_.split("\t" ))  // mention complete path for input dataset

val b = sc.textFile("file:///home/edureka/Desktop/all-files/datsets/f2").map(_.split("\t"))


val class_a = a.map( z => (z(0).toInt , results(z(0).toInt , z(1))))

val class_b = b.map( z => (z(1).toInt , students (z(0), z(1).toInt)))


val v_join = class_a.join(class_b) 

v_join.foreach(println)

​
answered Aug 2 by Trisha

Related Questions In Apache Spark

0 votes
1 answer

Which query to use for better performance, join in SQL or using Dataset API?

DataFrames and SparkSQL performed almost about the ...READ MORE

answered Apr 19, 2018 in Apache Spark by kurt_cobain
• 9,280 points
138 views
0 votes
1 answer

How to create paired RDD using subString method in Spark?

Hi, If you have a file with id ...READ MORE

answered Aug 2 in Apache Spark by Gitika
• 25,420 points
111 views
0 votes
1 answer

Can anyone explain what is RDD in Spark?

RDD is a fundamental data structure of ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,350 points
654 views
0 votes
1 answer

How can I compare the elements of the RDD using MapReduce?

You have to use the comparison operator ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,350 points
494 views
+1 vote
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,760 points
3,529 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,760 points
433 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
17,940 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

answered Dec 10, 2018 in Apache Spark by Kuber
13,837 views
0 votes
1 answer

How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

Some of the key differences between an RDD and ...READ MORE

answered Jul 26, 2018 in Apache Spark by zombie
• 3,710 points
202 views