How to achieve pivot like data using tidyverse library in R

0 votes

Below is the sample data set:

ss <- "   Date          ColAB ColCD
1         2018-01-17    A     C
2         2018-01-17    B     D
3         2018-01-17    A     D     
4         2018-01-17    B     C     
5         2018-01-17    B     C     
6         2018-01-18    A     D     
7         2018-01-18    A     C     
8         2018-01-18    A     C
9         2018-01-18    A     D"

data <- read.table(text=ss, header = TRUE)}

I would like transform the data in such as way that I can see the number of occurrences per day

Date           A        B        C       D
2018-01-17     2        3        3       2
2018-01-18     2        0        1       1

I tired spread function that works great on ColAB column.

data %>%
  group_by(Date, ColAB, ColCD) %>%
  summarise(occr = n()) %>%
  spread(ColAB, occr, fill = 0) %>%
  mutate(occrCD = A+B)

 The result is as follows:

# A tibble: 4 x 5
# Groups:   Date [2]
    Date        ColCD     A     B   occrCD
  <fctr>        <fctr> <dbl> <dbl>  <dbl>
1 2018-01-17      C     1     2      3
2 2018-01-17      D     1     1      2
3 2018-01-18      C     2     0      2
4 2018-01-18      D     2     0      2

But when I tried to spread second time it does not work as intended.

The data for column A (and) B is not added for a particular day for C and D rows. As a result wrong data was given as output

Code for both the steps:

data %>%
  group_by(Date, ColAB, ColCD) %>%
  summarise(occr = n()) %>%
  spread(ColAB, occr, fill = 0) %>% # first spread - result as expected
  mutate(occrCD = A+B) %>%
  spread(ColCD, occrCD, fill = 0) %>% 
# second spread, lost sum for A and B
  group_by(Date) %>%
  summarise_all(sum)

The result that is not what I wanted. The error is visible as A+B should be equal for C +D but that's not the case for 2018-01-18.

# A tibble: 2 x 5
        Date     A     B     C     D
      <fctr> <dbl> <dbl> <dbl> <dbl>
1 2018-01-17     2     3     3     2
2 2018-01-18     2     0     2     2

Any help is highly appreciated!

Apr 4, 2018 in Data Analytics by CodingByHeart77
• 3,740 points

edited Apr 4, 2018 by CodingByHeart77 841 views

1 answer to this question.

0 votes

You need not spread twice, if you want to put all the parameters in a single column.

You can try the following code:

library(dplyr)
library(tidyr)

ss <- "   Date          ColAB ColCD
1         2018-01-17    A     C
2         2018-01-17    B     D
3         2018-01-17    A     D     
4         2018-01-17    B     C     
5         2018-01-17    B     C     
6         2018-01-18    A     D     
7         2018-01-18    A     C     
8         2018-01-18    A     C
9         2018-01-18    A     D"

data <- read.table(text=ss, header = TRUE, stringsAsFactors = F)


data %>%
  gather(v1,value,-Date) %>%
  count(Date, value) %>%
  spread(value, n, fill = 0)

# # A tibble: 2 x 5
#         Date     A     B     C     D
# *      <chr> <dbl> <dbl> <dbl> <dbl>
# 1 2017-05-27     2     3     3     2
# 2 2017-05-28     4     0     2     2
answered Apr 4, 2018 by kappa3010
• 2,090 points

Related Questions In Data Analytics

0 votes
1 answer

How to forecast season and trend of data using STL and ARIMA in R?

You can use the forecast.stl function for the ...READ MORE

answered May 19, 2018 in Data Analytics by DataKing99
• 8,240 points
1,919 views
+1 vote
3 answers

How to change the value of a variable using R programming in a data frame?

Try this: df$symbol <- as.character(df$symbol) df$symbol[df$sym ...READ MORE

answered Jan 11, 2019 in Data Analytics by Tyrion anex
• 8,700 points
35,140 views
+1 vote
0 answers

How to show CLUSTERS AFTER USING Kmeans(data,n) in R?

Hi, I'm new to R. I used ...READ MORE

Jul 12, 2019 in Data Analytics by prakash
332 views
0 votes
1 answer

How to change y axis max in time series using R?

The axis limits are being set using ...READ MORE

answered Apr 3, 2018 in Data Analytics by Sahiti
• 6,370 points
3,496 views
0 votes
1 answer

How to apply list to a function which give data frame as output

If you use  tidyverse, you can use ...READ MORE

answered Apr 11, 2018 in Data Analytics by Sahiti
• 6,370 points
443 views
0 votes
1 answer

How to use dplyr functions such as filter() inside nested data frames with map()

You can use map() call as follows:  map(full, ...READ MORE

answered Apr 6, 2018 in Data Analytics by Sahiti
• 6,370 points
4,259 views
0 votes
1 answer
+1 vote
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered Aug 21, 2019 in Data Analytics by anonymous
• 33,030 points
1,401 views
0 votes
1 answer
0 votes
1 answer

How to join two tables (tibbles) by *list* columns in R

You can use the hash from digest ...READ MORE

answered Apr 6, 2018 in Data Analytics by kappa3010
• 2,090 points
1,387 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP