How to use group by for multiple columns in dplyr, using string vector input in R?

0 votes

I'm trying to implement the dplyr and understand the difference between ply and dplyr. But there is one major problem, I'm not able to use the group_by function for multiple columns

Below is my code:

# Make columns with weird names
data = data.frame(
  zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),
  zbc123qws1 = sample(LETTERS[1:3], 100, replace=TRUE),
  value = rnorm(100)
)
# To get the columns I want to average within
columns = names(data)[-3]
# plyr - works
ddply(data, columns, summarize, value=mean(value))
# dplyr -  Does not work! 
data %.%
  group_by(columns) %.%
  summarise(Value = mean(value))
#The error is as follows:
#Error in eval(expr, envir, enclos) : index out of bounds

Can anyone please help me out!

Apr 12, 2018 in Data Analytics by nirvana
• 3,060 points
3,585 views

2 answers to this question.

0 votes

dplyr added versions for group_by.

This allows you to use the same functions as you would use with select().

For example:

data = data.frame(
zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),
zbc123qws1 = sample(LETTERS[1:3], 100, replace=TRUE),
    value = rnorm(100)
)
# To get the columns I want to average within
columns = names(data)[-3]
library(dplyr)
data1 <- data %>%
  group_by_at(vars(one_of(columns))) %>%
  summarize(Value = mean(value))
#Now compare with plyr for better understanding
data2 <- plyr::ddply(data, columns, plyr::summarize, value=mean(value))
table(data1 == data2, useNA = 'ifany')
## TRUE 
##  27 

The output is as expected

# A tibble: 9 x 3

                       zzz11def               zbc123qws1   Value
                     <fctr>                    <fctr>       <dbl>
1                         A                         A  0.04095002
2                         A                         B  0.24943935
3                         A                         C -0.25783892
4                         B                         A  0.15161805
5                         B                         B  0.27189974
6                         B                         C  0.20858897
7                         C                         A  0.19502221
8                         C                         B  0.56837548
9                         C                         C -0.22682998

dplyr::summarize only strips of one layer of grouping at a time. But, we also have some grouping going on in the resultant tibble

 If you want to avoid this unexpected behavior, you can add %>% ungroup to your pipeline after you summarize.

answered Apr 12, 2018 by CodingByHeart77
• 3,690 points

edited Apr 12, 2018 by CodingByHeart77
0 votes
data = data.frame(
  zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),
  zbc123qws1 = sample(LETTERS[1:3], 100, replace=TRUE),
  value = rnorm(100)
)
# To get the columns I want to average within
col = names(data)[-3]
library(dplyr)
# dplyr -  Does not work!
data %>% group_by_at(vars(col)) %>% summarise(Value = mean(value))
detach("package:dplyr", unload = TRUE)
library(plyr)
# plyr - works
ddply(data, columns, summarize, value=mean(value))
detach("package:plyr", unload = TRUE)
answered Aug 5 by anonymous

Related Questions In Data Analytics

0 votes
1 answer

How to find out the sum/mean for multiple variables per group in R?

You can use the reshape2 package for ...READ MORE

answered Apr 12, 2018 in Data Analytics by DataKing99
• 8,130 points
431 views
0 votes
1 answer

How to join two tables (tibbles) by *list* columns in R

You can use the hash from digest ...READ MORE

answered Apr 5, 2018 in Data Analytics by kappa3010
• 2,020 points
72 views
0 votes
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered Aug 21 in Data Analytics by anonymous
• 31,840 points
148 views
0 votes
1 answer

Using dplyr package to summarise multiple columns - R

'dplyr' package provides 'summarise_all()' function to apply ...READ MORE

answered Jun 6, 2018 in Data Analytics by Bharani
• 4,550 points
443 views
0 votes
1 answer

How to group all columns exculding a single column?

You can either use group_by or group_by_at Using ...READ MORE

answered Apr 12, 2018 in Data Analytics by darklord
• 6,170 points
90 views
0 votes
0 answers

dplyr R - Pipelining

Hi, can you tell how pipelining works ...READ MORE

Jul 12 in Data Analytics by riya
30 views
0 votes
1 answer

Sort a field based on another field in R

Hi.. Just for sorting, you can use arrange ...READ MORE

answered Aug 29 in Data Analytics by Gups_1985
28 views
0 votes
1 answer

How to use add_tally() and add_count()

Hey, add_tally and add_count returns the count of ...READ MORE

answered Sep 10 in Data Analytics by Cherukuri
• 31,840 points
39 views
0 votes
2 answers

How to sum a variable by group in R?

You can also try this way, x_new = ...READ MORE

answered Jul 31 in Data Analytics by Cherukuri
• 31,840 points
10,445 views
0 votes
1 answer

How to write a custom function which will replace all the missing values in a vector with the mean of values in R?

Consider this vector: a<-c(1,2,3,NA,4,5,NA,NA) Write the function to impute ...READ MORE

answered Jul 4, 2018 in Data Analytics by CodingByHeart77
• 3,690 points
194 views