How to use group by for multiple columns in dplyr using string vector input in R

0 votes

I'm trying to implement the dplyr and understand the difference between ply and dplyr. But there is one major problem, I'm not able to use the group_by function for multiple columns

Below is my code:

# Make columns with weird names
data = data.frame(
  zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),
  zbc123qws1 = sample(LETTERS[1:3], 100, replace=TRUE),
  value = rnorm(100)
)
# To get the columns I want to average within
columns = names(data)[-3]
# plyr - works
ddply(data, columns, summarize, value=mean(value))
# dplyr -  Does not work! 
data %.%
  group_by(columns) %.%
  summarise(Value = mean(value))
#The error is as follows:
#Error in eval(expr, envir, enclos) : index out of bounds

Can anyone please help me out!

Apr 12, 2018 in Data Analytics by nirvana
• 3,130 points
13,638 views

2 answers to this question.

0 votes

dplyr added versions for group_by.

This allows you to use the same functions as you would use with select().

For example:

data = data.frame(
zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),
zbc123qws1 = sample(LETTERS[1:3], 100, replace=TRUE),
    value = rnorm(100)
)
# To get the columns I want to average within
columns = names(data)[-3]
library(dplyr)
data1 <- data %>%
  group_by_at(vars(one_of(columns))) %>%
  summarize(Value = mean(value))
#Now compare with plyr for better understanding
data2 <- plyr::ddply(data, columns, plyr::summarize, value=mean(value))
table(data1 == data2, useNA = 'ifany')
## TRUE 
##  27 

The output is as expected

# A tibble: 9 x 3

                       zzz11def               zbc123qws1   Value
                     <fctr>                    <fctr>       <dbl>
1                         A                         A  0.04095002
2                         A                         B  0.24943935
3                         A                         C -0.25783892
4                         B                         A  0.15161805
5                         B                         B  0.27189974
6                         B                         C  0.20858897
7                         C                         A  0.19502221
8                         C                         B  0.56837548
9                         C                         C -0.22682998

dplyr::summarize only strips of one layer of grouping at a time. But, we also have some grouping going on in the resultant tibble

 If you want to avoid this unexpected behavior, you can add %>% ungroup to your pipeline after you summarize.

answered Apr 12, 2018 by CodingByHeart77
• 3,740 points

edited Apr 12, 2018 by CodingByHeart77
0 votes
data = data.frame(
  zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),
  zbc123qws1 = sample(LETTERS[1:3], 100, replace=TRUE),
  value = rnorm(100)
)
# To get the columns I want to average within
col = names(data)[-3]
library(dplyr)
# dplyr -  Does not work!
data %>% group_by_at(vars(col)) %>% summarise(Value = mean(value))
detach("package:dplyr", unload = TRUE)
library(plyr)
# plyr - works
ddply(data, columns, summarize, value=mean(value))
detach("package:plyr", unload = TRUE)
answered Aug 6, 2019 by anonymous

Related Questions In Data Analytics

0 votes
1 answer

How to find out the sum/mean for multiple variables per group in R?

You can use the reshape2 package for ...READ MORE

answered Apr 12, 2018 in Data Analytics by DataKing99
• 8,240 points
3,344 views
0 votes
1 answer

How to join two tables (tibbles) by *list* columns in R

You can use the hash from digest ...READ MORE

answered Apr 6, 2018 in Data Analytics by kappa3010
• 2,090 points
1,394 views
+1 vote
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered Aug 21, 2019 in Data Analytics by anonymous
• 33,030 points
1,416 views
0 votes
1 answer

Using dplyr package to summarise multiple columns - R

'dplyr' package provides 'summarise_all()' function to apply ...READ MORE

answered Jun 6, 2018 in Data Analytics by Bharani
• 4,660 points
1,869 views
0 votes
1 answer

How to group all columns exculding a single column?

You can either use group_by or group_by_at Using ...READ MORE

answered Apr 12, 2018 in Data Analytics by Sahiti
• 6,370 points
3,173 views
+1 vote
0 answers

dplyr R - Pipelining

Hi, can you tell how pipelining works ...READ MORE

Jul 12, 2019 in Data Analytics by riya
403 views
0 votes
1 answer

Sort a field based on another field in R

Hi.. Just for sorting, you can use arrange ...READ MORE

answered Aug 29, 2019 in Data Analytics by Gups_1985
583 views
0 votes
1 answer

How to use add_tally() and add_count()

Hey, add_tally and add_count returns the count of ...READ MORE

answered Sep 10, 2019 in Data Analytics by Cherukuri
• 33,030 points
4,272 views
+4 votes
3 answers

How to sum a variable by group in R?

You can also try this way, x_new = ...READ MORE

answered Aug 1, 2019 in Data Analytics by Cherukuri
• 33,030 points
77,231 views
0 votes
1 answer

How to write a custom function which will replace all the missing values in a vector with the mean of values in R?

Consider this vector: a<-c(1,2,3,NA,4,5,NA,NA) Write the function to impute ...READ MORE

answered Jul 4, 2018 in Data Analytics by CodingByHeart77
• 3,740 points
4,208 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP