How to merge data frames using joins

0 votes

Below are two data frames:

data1 = data.frame(Cust_Id = c(1:6), Product = c(rep("Headphones", 3), rep("Toaster", 3)))
data2 = data.frame(Cust_Id = c(2, 4, 6), State = c(rep("New York", 2), rep("California", 1)))

data1
#  Cust_Id   Product
#           1 Headphones
#           2 Headphones
#           3 Headphones
#           4   Toaster
#           5   Toaster
#           6   Toaster

data2
#  Cust_Id   State
#           2 New York
#           4 New York
#           6 California

How can I apply joins on these data frames?

  • An inner join of data1 and data2:
    This returns only those rows in which the left table has matching keys in the right table.
  • An outer join of data1 and data2:
    This should return all rows from both tables, basically joins records from the left which have matching keys in the right table.
  • A left outer join or a simple left join of data1 and data2:
    This should return all rows from the left table, and any rows with matching keys from the right table.
  • A right outer join of data1 and data2:
    This should return all rows from the right table, and any rows with matching keys from the left table.
Apr 12, 2018 in Data Analytics by CodingByHeart77
• 3,750 points
862 views

1 answer to this question.

0 votes

You can use the merge function with it's optional parameters

Inner join: merge(data1, data2) will work .

This is because R  joins the frames by common variable names.

To make sure that you are matching by only those fields that you desire, you can specify the 'by' parameter like this:

 merge(df1, df2, by = "Cust_Id")

You can also use the by.x and by.y parameters if the matching variables have different names and are present in different data frames.

For all other joins you can use the commands mentioned below:

Outer join: merge(x = data1, y = data2, by = "Cust_Id", all = TRUE)

Left outer: merge(x = data1, y = data2, by = "Cust_Id", all.x = TRUE)

Right outer: merge(x = data1, y = data2, by = "Cust_Id", all.y = TRUE)

Cross join: merge(x = data1, y = data2, by = NULL)

answered Apr 12, 2018 by kappa3010
• 2,090 points

Related Questions In Data Analytics

0 votes
1 answer

How to use dplyr functions such as filter() inside nested data frames with map()

You can use map() call as follows:  map(full, ...READ MORE

answered Apr 6, 2018 in Data Analytics by Sahiti
• 6,370 points
4,598 views
0 votes
1 answer

How to create a list of Data frames?

Basically all we have to do is ...READ MORE

answered Apr 9, 2018 in Data Analytics by DeepCoder786
• 1,720 points
1,245 views
0 votes
1 answer

How to order data frame rows according to vector with specific order using R?

You can try using match: data <- data.frame(alphabets=letters[1:4], ...READ MORE

answered Apr 30, 2018 in Data Analytics by Sahiti
• 6,370 points
7,333 views
0 votes
1 answer

How to forecast season and trend of data using STL and ARIMA in R?

You can use the forecast.stl function for the ...READ MORE

answered May 19, 2018 in Data Analytics by DataKing99
• 8,250 points
2,131 views
+15 votes
2 answers

Git management technique when there are multiple customers and need multiple customization?

Consider this - In 'extended' Git-Flow, (Git-Multi-Flow, ...READ MORE

answered Mar 27, 2018 in DevOps & Agile by DragonLord999
• 8,450 points
3,954 views
+2 votes
1 answer
0 votes
1 answer

How to join a list of data frames using map()

You can use reduce set.seed(24) r1 <- map(c(5, 10, ...READ MORE

answered Apr 11, 2018 in Data Analytics by kappa3010
• 2,090 points
1,500 views
0 votes
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP