Below is the sample data set:
ss <- " Date ColAB ColCD
1 2018-01-17 A C
2 2018-01-17 B D
3 2018-01-17 A D
4 2018-01-17 B C
5 2018-01-17 B C
6 2018-01-18 A D
7 2018-01-18 A C
8 2018-01-18 A C
9 2018-01-18 A D"
data <- read.table(text=ss, header = TRUE)}
I would like transform the data in such as way that I can see the number of occurrences per day
Date A B C D
2018-01-17 2 3 3 2
2018-01-18 2 0 1 1
I tired spread function that works great on ColAB column.
data %>%
group_by(Date, ColAB, ColCD) %>%
summarise(occr = n()) %>%
spread(ColAB, occr, fill = 0) %>%
mutate(occrCD = A+B)
The result is as follows:
# A tibble: 4 x 5
# Groups: Date [2]
Date ColCD A B occrCD
<fctr> <fctr> <dbl> <dbl> <dbl>
1 2018-01-17 C 1 2 3
2 2018-01-17 D 1 1 2
3 2018-01-18 C 2 0 2
4 2018-01-18 D 2 0 2
But when I tried to spread second time it does not work as intended.
The data for column A (and) B is not added for a particular day for C and D rows. As a result wrong data was given as output
Code for both the steps:
data %>%
group_by(Date, ColAB, ColCD) %>%
summarise(occr = n()) %>%
spread(ColAB, occr, fill = 0) %>% # first spread - result as expected
mutate(occrCD = A+B) %>%
spread(ColCD, occrCD, fill = 0) %>%
# second spread, lost sum for A and B
group_by(Date) %>%
summarise_all(sum)
The result that is not what I wanted. The error is visible as A+B should be equal for C +D but that's not the case for 2018-01-18.
# A tibble: 2 x 5
Date A B C D
<fctr> <dbl> <dbl> <dbl> <dbl>
1 2018-01-17 2 3 3 2
2 2018-01-18 2 0 2 2
Any help is highly appreciated!