How to achieve pivot like data using tidyverse library in R?

0 votes

Below is the sample data set:

ss <- "   Date          ColAB ColCD
1         2018-01-17    A     C
2         2018-01-17    B     D
3         2018-01-17    A     D     
4         2018-01-17    B     C     
5         2018-01-17    B     C     
6         2018-01-18    A     D     
7         2018-01-18    A     C     
8         2018-01-18    A     C
9         2018-01-18    A     D"

data <- read.table(text=ss, header = TRUE)}

I would like transform the data in such as way that I can see the number of occurrences per day

Date           A        B        C       D
2018-01-17     2        3        3       2
2018-01-18     2        0        1       1

I tired spread function that works great on ColAB column.

data %>%
  group_by(Date, ColAB, ColCD) %>%
  summarise(occr = n()) %>%
  spread(ColAB, occr, fill = 0) %>%
  mutate(occrCD = A+B)

 The result is as follows:

# A tibble: 4 x 5
# Groups:   Date [2]
    Date        ColCD     A     B   occrCD
  <fctr>        <fctr> <dbl> <dbl>  <dbl>
1 2018-01-17      C     1     2      3
2 2018-01-17      D     1     1      2
3 2018-01-18      C     2     0      2
4 2018-01-18      D     2     0      2

But when I tried to spread second time it does not work as intended.

The data for column A (and) B is not added for a particular day for C and D rows. As a result wrong data was given as output

Code for both the steps:

data %>%
  group_by(Date, ColAB, ColCD) %>%
  summarise(occr = n()) %>%
  spread(ColAB, occr, fill = 0) %>% # first spread - result as expected
  mutate(occrCD = A+B) %>%
  spread(ColCD, occrCD, fill = 0) %>% 
# second spread, lost sum for A and B
  group_by(Date) %>%
  summarise_all(sum)

The result that is not what I wanted. The error is visible as A+B should be equal for C +D but that's not the case for 2018-01-18.

# A tibble: 2 x 5
        Date     A     B     C     D
      <fctr> <dbl> <dbl> <dbl> <dbl>
1 2018-01-17     2     3     3     2
2 2018-01-18     2     0     2     2

Any help is highly appreciated!

Apr 4, 2018 in Data Analytics by CodingByHeart77
• 3,680 points

edited Apr 4, 2018 by CodingByHeart77 94 views

1 answer to this question.

0 votes

You need not spread twice, if you want to put all the parameters in a single column.

You can try the following code:

library(dplyr)
library(tidyr)

ss <- "   Date          ColAB ColCD
1         2018-01-17    A     C
2         2018-01-17    B     D
3         2018-01-17    A     D     
4         2018-01-17    B     C     
5         2018-01-17    B     C     
6         2018-01-18    A     D     
7         2018-01-18    A     C     
8         2018-01-18    A     C
9         2018-01-18    A     D"

data <- read.table(text=ss, header = TRUE, stringsAsFactors = F)


data %>%
  gather(v1,value,-Date) %>%
  count(Date, value) %>%
  spread(value, n, fill = 0)

# # A tibble: 2 x 5
#         Date     A     B     C     D
# *      <chr> <dbl> <dbl> <dbl> <dbl>
# 1 2017-05-27     2     3     3     2
# 2 2017-05-28     4     0     2     2
answered Apr 4, 2018 by kappa3010
• 2,020 points

Related Questions In Data Analytics

0 votes
1 answer

How to forecast season and trend of data using STL and ARIMA in R?

You can use the forecast.stl function for the ...READ MORE

answered May 18, 2018 in Data Analytics by DataKing99
• 8,130 points
510 views
0 votes
2 answers
0 votes
0 answers

How to show CLUSTERS AFTER USING Kmeans(data,n) in R?

Hi, I'm new to R. I used ...READ MORE

Jul 12 in Data Analytics by prakash
19 views
0 votes
1 answer

How to change y axis max in time series using R?

The axis limits are being set using ...READ MORE

answered Apr 3, 2018 in Data Analytics by darklord
• 6,140 points
104 views
0 votes
1 answer
0 votes
1 answer

How to use dplyr functions such as filter() inside nested data frames with map()

You can use map() call as follows:  map(full, ...READ MORE

answered Apr 6, 2018 in Data Analytics by darklord
• 6,140 points
235 views
0 votes
1 answer
0 votes
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered Aug 21 in Data Analytics by anonymous
• 25,900 points
121 views
0 votes
1 answer
0 votes
1 answer

How to join two tables (tibbles) by *list* columns in R

You can use the hash from digest ...READ MORE

answered Apr 5, 2018 in Data Analytics by kappa3010
• 2,020 points
54 views