How to use group by for multiple columns in dplyr, using string vector input in R?

0 votes

I'm trying to implement the dplyr and understand the difference between ply and dplyr. But there is one major problem, I'm not able to use the group_by function for multiple columns

Below is my code:

# Make columns with weird names
data = data.frame(
  zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),
  zbc123qws1 = sample(LETTERS[1:3], 100, replace=TRUE),
  value = rnorm(100)
)
# To get the columns I want to average within
columns = names(data)[-3]
# plyr - works
ddply(data, columns, summarize, value=mean(value))
# dplyr -  Does not work! 
data %.%
  group_by(columns) %.%
  summarise(Value = mean(value))
#The error is as follows:
#Error in eval(expr, envir, enclos) : index out of bounds

Can anyone please help me out!

Apr 12, 2018 in Data Analytics by nirvana
• 3,060 points
2,247 views

1 answer to this question.

0 votes

dplyr added versions for group_by.

This allows you to use the same functions as you would use with select().

For example:

data = data.frame(
zzz11def = sample(LETTERS[1:3], 100, replace=TRUE),
zbc123qws1 = sample(LETTERS[1:3], 100, replace=TRUE),
    value = rnorm(100)
)
# To get the columns I want to average within
columns = names(data)[-3]
library(dplyr)
data1 <- data %>%
  group_by_at(vars(one_of(columns))) %>%
  summarize(Value = mean(value))
#Now compare with plyr for better understanding
data2 <- plyr::ddply(data, columns, plyr::summarize, value=mean(value))
table(data1 == data2, useNA = 'ifany')
## TRUE 
##  27 

The output is as expected

# A tibble: 9 x 3

                       zzz11def               zbc123qws1   Value
                     <fctr>                    <fctr>       <dbl>
1                         A                         A  0.04095002
2                         A                         B  0.24943935
3                         A                         C -0.25783892
4                         B                         A  0.15161805
5                         B                         B  0.27189974
6                         B                         C  0.20858897
7                         C                         A  0.19502221
8                         C                         B  0.56837548
9                         C                         C -0.22682998

dplyr::summarize only strips of one layer of grouping at a time. But, we also have some grouping going on in the resultant tibble

 If you want to avoid this unexpected behavior, you can add %>% ungroup to your pipeline after you summarize.

answered Apr 12, 2018 by CodingByHeart77
• 3,680 points

edited Apr 12, 2018 by CodingByHeart77

Related Questions In Data Analytics

0 votes
1 answer

How to find out the sum/mean for multiple variables per group in R?

You can use the reshape2 package for ...READ MORE

answered Apr 12, 2018 in Data Analytics by DataKing99
• 8,100 points
258 views
0 votes
1 answer

How to join two tables (tibbles) by *list* columns in R

You can use the hash from digest ...READ MORE

answered Apr 5, 2018 in Data Analytics by kappa3010
• 2,010 points
42 views
0 votes
1 answer

How to sort a data frame by columns in R?

You can just use the order function ...READ MORE

answered Apr 10, 2018 in Data Analytics by darklord
• 6,140 points
74 views
0 votes
1 answer

Using dplyr package to summarise multiple columns - R

'dplyr' package provides 'summarise_all()' function to apply ...READ MORE

answered Jun 6, 2018 in Data Analytics by Bharani
• 4,550 points
203 views
0 votes
1 answer

How to group all columns exculding a single column?

You can either use group_by or group_by_at Using ...READ MORE

answered Apr 12, 2018 in Data Analytics by darklord
• 6,140 points
37 views
0 votes
0 answers

dplyr R - Pipelining

Hi, can you tell how pipelining works ...READ MORE

Jul 12 in Data Analytics by riya
10 views
0 votes
0 answers

Sort a field based on another field in R

Hi, I want to sort the students ...READ MORE

6 days ago in Data Analytics by priya
6 views
0 votes
1 answer

By using dpylr package sum of multiple columns

Basically here we are making an equation ...READ MORE

answered Apr 5, 2018 in Data Analytics by DeepCoder786
• 1,700 points
65 views
0 votes
1 answer

How to sum a variable by group in R?

Easily by using Aggregate Func(): aggregate(x$points, by=list(Players=x$Players), FUN=sum) or ...READ MORE

answered Apr 13, 2018 in Data Analytics by CodingByHeart77
• 3,680 points

edited Apr 13, 2018 by CodingByHeart77 5,171 views
0 votes
1 answer

How to write a custom function which will replace all the missing values in a vector with the mean of values in R?

Consider this vector: a<-c(1,2,3,NA,4,5,NA,NA) Write the function to impute ...READ MORE

answered Jul 4, 2018 in Data Analytics by CodingByHeart77
• 3,680 points
94 views