How to achieve pivot like data using tidyverse library in R?

0 votes

Below is the sample data set:

ss <- "   Date          ColAB ColCD
1         2018-01-17    A     C
2         2018-01-17    B     D
3         2018-01-17    A     D     
4         2018-01-17    B     C     
5         2018-01-17    B     C     
6         2018-01-18    A     D     
7         2018-01-18    A     C     
8         2018-01-18    A     C
9         2018-01-18    A     D"

data <- read.table(text=ss, header = TRUE)}

I would like transform the data in such as way that I can see the number of occurrences per day

Date           A        B        C       D
2018-01-17     2        3        3       2
2018-01-18     2        0        1       1

I tired spread function that works great on ColAB column.

data %>%
  group_by(Date, ColAB, ColCD) %>%
  summarise(occr = n()) %>%
  spread(ColAB, occr, fill = 0) %>%
  mutate(occrCD = A+B)

 The result is as follows:

# A tibble: 4 x 5
# Groups:   Date [2]
    Date        ColCD     A     B   occrCD
  <fctr>        <fctr> <dbl> <dbl>  <dbl>
1 2018-01-17      C     1     2      3
2 2018-01-17      D     1     1      2
3 2018-01-18      C     2     0      2
4 2018-01-18      D     2     0      2

But when I tried to spread second time it does not work as intended.

The data for column A (and) B is not added for a particular day for C and D rows. As a result wrong data was given as output

Code for both the steps:

data %>%
  group_by(Date, ColAB, ColCD) %>%
  summarise(occr = n()) %>%
  spread(ColAB, occr, fill = 0) %>% # first spread - result as expected
  mutate(occrCD = A+B) %>%
  spread(ColCD, occrCD, fill = 0) %>% 
# second spread, lost sum for A and B
  group_by(Date) %>%
  summarise_all(sum)

The result that is not what I wanted. The error is visible as A+B should be equal for C +D but that's not the case for 2018-01-18.

# A tibble: 2 x 5
        Date     A     B     C     D
      <fctr> <dbl> <dbl> <dbl> <dbl>
1 2018-01-17     2     3     3     2
2 2018-01-18     2     0     2     2

Any help is highly appreciated!

Apr 4, 2018 in Data Analytics by CodingByHeart77
• 3,680 points

edited Apr 4, 2018 by CodingByHeart77 62 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

You need not spread twice, if you want to put all the parameters in a single column.

You can try the following code:

library(dplyr)
library(tidyr)

ss <- "   Date          ColAB ColCD
1         2018-01-17    A     C
2         2018-01-17    B     D
3         2018-01-17    A     D     
4         2018-01-17    B     C     
5         2018-01-17    B     C     
6         2018-01-18    A     D     
7         2018-01-18    A     C     
8         2018-01-18    A     C
9         2018-01-18    A     D"

data <- read.table(text=ss, header = TRUE, stringsAsFactors = F)


data %>%
  gather(v1,value,-Date) %>%
  count(Date, value) %>%
  spread(value, n, fill = 0)

# # A tibble: 2 x 5
#         Date     A     B     C     D
# *      <chr> <dbl> <dbl> <dbl> <dbl>
# 1 2017-05-27     2     3     3     2
# 2 2017-05-28     4     0     2     2
answered Apr 4, 2018 by kappa3010
• 2,010 points

Related Questions In Data Analytics

0 votes
1 answer

How to forecast season and trend of data using STL and ARIMA in R?

You can use the forecast.stl function for the ...READ MORE

answered May 18, 2018 in Data Analytics by DataKing99
• 8,100 points
364 views
0 votes
1 answer

How to change the value of a variable using R programming in a data frame?

Try this: df$symbol <- as.character(df$symbol) df$symbol[df$symb ...READ MORE

answered Jan 11 in Data Analytics by Tyrion anex
• 8,280 points
46 views
0 votes
1 answer

How to change y axis max in time series using R?

The axis limits are being set using ...READ MORE

answered Apr 3, 2018 in Data Analytics by darklord
• 6,140 points
41 views
0 votes
1 answer

How to use group by for multiple columns in dplyr, using string vector input in R?

dplyr added versions for group_by. This allows you ...READ MORE

answered Apr 12, 2018 in Data Analytics by CodingByHeart77
• 3,680 points

edited Apr 12, 2018 by CodingByHeart77 1,621 views
0 votes
1 answer
0 votes
1 answer

How to use dplyr functions such as filter() inside nested data frames with map()

You can use map() call as follows:  map(full, ...READ MORE

answered Apr 6, 2018 in Data Analytics by darklord
• 6,140 points
85 views
0 votes
1 answer
0 votes
1 answer

How to sort a data frame by columns in R?

You can just use the order function ...READ MORE

answered Apr 10, 2018 in Data Analytics by darklord
• 6,140 points
49 views
0 votes
1 answer
0 votes
1 answer

How to join two tables (tibbles) by *list* columns in R

You can use the hash from digest ...READ MORE

answered Apr 5, 2018 in Data Analytics by kappa3010
• 2,010 points
36 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.