How to use group by for multiple columns in dplyr using string vector input in R

Question

I'm trying to implement the dplyr and understand the difference between ply and dplyr. But there is one major problem, I'm not able to use the group_by function for multiple columns

Below is my code:

# Make columns with weird names
data = data.frame(
  zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),
  zbc123qws1 = sample(LETTERS[1:3], 100, replace=TRUE),
  value = rnorm(100)
)
# To get the columns I want to average within
columns = names(data)[-3]
# plyr - works
ddply(data, columns, summarize, value=mean(value))
# dplyr -  Does not work! 
data %.%
  group_by(columns) %.%
  summarise(Value = mean(value))

#The error is as follows:
#Error in eval(expr, envir, enclos) : index out of bounds

Can anyone please help me out!

CodingByHeart77 · Answer 1 · Apr 12, 2018

dplyr added versions for group_by.

This allows you to use the same functions as you would use with select().

For example:

data = data.frame(
zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),
zbc123qws1 = sample(LETTERS[1:3], 100, replace=TRUE),
    value = rnorm(100)
)
# To get the columns I want to average within
columns = names(data)[-3]
library(dplyr)
data1 <- data %>%
  group_by_at(vars(one_of(columns))) %>%
  summarize(Value = mean(value))
#Now compare with plyr for better understanding
data2 <- plyr::ddply(data, columns, plyr::summarize, value=mean(value))
table(data1 == data2, useNA = 'ifany')
## TRUE 
##  27

The output is as expected

# A tibble: 9 x 3

                       zzz11def               zbc123qws1   Value
                     <fctr>                    <fctr>       <dbl>
1                         A                         A  0.04095002
2                         A                         B  0.24943935
3                         A                         C -0.25783892
4                         B                         A  0.15161805
5                         B                         B  0.27189974
6                         B                         C  0.20858897
7                         C                         A  0.19502221
8                         C                         B  0.56837548
9                         C                         C -0.22682998

dplyr::summarize only strips of one layer of grouping at a time. But, we also have some grouping going on in the resultant tibble

If you want to avoid this unexpected behavior, you can add %>% ungroup to your pipeline after you summarize.

score 0 · Answer 2 · Aug 6, 2019

data = data.frame(
  zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),
  zbc123qws1 = sample(LETTERS[1:3], 100, replace=TRUE),
  value = rnorm(100)
)
# To get the columns I want to average within
col = names(data)[-3]
library(dplyr)
# dplyr - Does not work!
data %>% group_by_at(vars(col)) %>% summarise(Value = mean(value))
detach("package:dplyr", unload = TRUE)
library(plyr)
# plyr - works
ddply(data, columns, summarize, value=mean(value))
detach("package:plyr", unload = TRUE)