Difference between cogroup and full outer join in spark

0 votes
Please explain the difference between cogroup and full outer join in spark.
Jul 14, 2019 in Apache Spark by Hiran
9,406 views

1 answer to this question.

0 votes

Please go through the below explanation :

Full Outer Join

Full outer joins in RDD is same as full outer join in SQL.

  • FULL JOIN returns all matching records from both tables whether the other table matches or not.
  • FULL JOIN can potentially return very large datasets.
  • FULL JOIN and FULL OUTER JOIN are the same.

Also Please go through the below link it had detailed explanation for the full joins.

Group and Co-group

The GROUP and COGROUP operators are identical but GROUP is used in statements involving one relation and COGROUP is used in statements involving two or more relations.

Suppose we have one relation A like below

A = load 'student' AS (name:chararray,age:int,gpa:float);

DUMP A;

(John,18,4.0F)

(Mary,19,3.8F)

(Bill,20,3.9F)

(Joe,18,3.8F)


B = GROUP A BY age;

DUMP B;


(18,{(John,18,4.0F),(Joe,18,3.8F)})

(19,{(Mary,19,3.8F)})

(20,{(Bill,20,3.9F)})

Now we are using Cogroup

Suppose we have two relations, A and B like below

A = LOAD 'data1' AS (owner:chararray,pet:chararray);

DUMP A;


(Alice,turtle)

(Alice,goldfish)

(Alice,cat)

(Bob,dog)

(Bob,cat)


B = LOAD 'data2' AS (friend1:chararray,friend2:chararray);

DUMP B;


(Cindy,Alice)

(Mark,Alice)

(Paul,Bob)

(Paul,Jane)


X = COGROUP A BY owner, B BY friend2;

dump X;


(Alice,{(Alice,turtle),(Alice,goldfish),(Alice,cat)},{(Cindy,Alice),(Mark,Alice)})

(Bob,{(Bob,dog),(Bob,cat)},{(Paul,Bob)})

(Jane,{},{(Paul,Jane)})

In the above example, the first bag is the tuples from the first relation with the matching key field. The second bag is the tuples from the second relation with the matching key field. If no tuples match the key field, the bag is empty.

answered Jul 14, 2019 by Kiran

Related Questions In Apache Spark

0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

answered May 23, 2018 in Apache Spark by nitinrawat895
• 11,380 points
33,812 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 28, 2018 in Apache Spark by shams
• 3,670 points
42,376 views
0 votes
1 answer

What is the difference between persist() and cache() in apache spark?

Hi, persist () allows the user to specify ...READ MORE

answered Jul 3, 2019 in Apache Spark by Gitika
• 65,910 points
3,326 views
0 votes
1 answer

Difference between map() and mapPartitions() function in Spark.

Hi@ akhtar, Both map() and mapPartitions() are the ...READ MORE

answered Jan 29, 2020 in Apache Spark by MD
• 95,440 points
6,094 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,600 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,207 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,748 views
0 votes
1 answer

Spark: How can i create temp views in user defined database instead of default database?

You can try the below code: df.registerTempTable(“airports”) sqlContext.sql(" create ...READ MORE

answered Jul 14, 2019 in Apache Spark by Ishan
4,140 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve "`id`" given input columns

I have used a header-less csv file ...READ MORE

answered Jul 14, 2019 in Apache Spark by Puneet
17,430 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP