How to use a function to repeat a set of procedures on specific set of columns in a data frame

0 votes

Below is my data frame:

sample_data
# A tibble: 10 x 7
      REVENUEID AMOUNT  YEAR REPORT_CODE PAYMENT_METHOD INBOUND_CHANNEL  AMOUNT_CAT
          <chr>  <dbl> <chr>       <chr>          <chr>           <chr>      <fctr>
1 rev-24985629     30  FY18           S          Check            Mail     [25,50)
2 rev-22812413      1  FY16           Q          Other      Canvassing   [0.01,10)
3 rev-23508794    100  FY17           Q    Credit card             Web   [100,250)
4 rev-23506121    300  FY17           S    Credit card            Mail   [250,500)
5 rev-23550444    100  FY17           S    Credit card             Web   [100,250)
6 rev-21508672     25  FY14           J          Check            Mail     [25,50)
7 rev-24981769    500  FY18           S    Credit card             Web [500,1e+03)
8 rev-23503684     50  FY17           R          Check            Mail     [50,75)
9 rev-24982087     25  FY18           R          Check            Mail     [25,50)
10 rev-24979834     50  FY18           R    Credit card             Web     [50,75)

Here is my code:

MOUNT_CAT<- sample_data %>% group_by(AMOUNT_CAT,YEAR) %>% summarize(num=n(),total=sum(AMOUNT)) %>% rename(REPORT_VALUE=AMOUNT_CAT) %>% mutate(REPORT_CATEGORY="AMOUNT_CAT")
INBOUND_CHANNEL<- sample_data %>% group_by(INBOUND_CHANNEL,YEAR) %>% summarize(num=n(),total=sum(AMOUNT)) %>% rename(REPORT_VALUE=INBOUND_CHANNEL) %>% mutate(REPORT_CATEGORY="INBOUND_CHANNEL")
PAYMENT_METHOD<- sample_data %>% group_by(PAYMENT_METHOD,YEAR) %>% summarize(num=n(),total=sum(AMOUNT)) %>% rename(REPORT_VALUE=PAYMENT_METHOD) %>% mutate(REPORT_CATEGORY="PAYMENT_METHOD")
REPORT_CODE<- sample_data %>% group_by(REPORT_CODE,YEAR) %>% summarize(num=n(),total=sum(AMOUNT)) %>% rename(REPORT_VALUE=REPORT_CODE) %>% mutate(REPORT_CATEGORY="REPORT_CODE")
final_product<-bind_rows(REPORT_CODE,PAYMENT_METHOD,INBOUND_CHANNEL,AMOUNT_CAT)

Here is the final product of that code:

final_product
        # A tibble: 27 x 5
        # Groups:   REPORT_VALUE [16]
           REPORT_CATEGORY REPORT_VALUE  YEAR   num total

                 <chr>        <chr> <chr> <int> <dbl>
     1     REPORT_CODE            J  FY14     1    25
     2     REPORT_CODE            Q  FY16     1     1
     3     REPORT_CODE            Q  FY17     1   100
     4     REPORT_CODE            R  FY17     1    50
     5     REPORT_CODE            R  FY18     2    75
     6     REPORT_CODE            S  FY17     2   400
     7     REPORT_CODE            S  FY18     2   530
     8  PAYMENT_METHOD        Check  FY14     1    25
     9  PAYMENT_METHOD        Check  FY17     1    50
    10  PAYMENT_METHOD        Check  FY18     2    55
    # ... with 17 more rows

This is my attempt to condense the code to make it more efficient. (This code doesn't work):

cat.list <- c("REPORT_CODE","PAYMENT_METHOD","INBOUND_CHANNEL","AMOUNT_CAT")
repeat_procs <- lapply(cat.list, function(x) x <- sample_data %>% group_by(x,YEAR) %>% summarize(num=n(),total=sum(AMOUNT)) %>% rename(REPORT_VALUE=x) %>% mutate(REPORT_CATEGORY="x")

Can someone please tell me how to write "more efficient" code that doesn't repeat that often?

Apr 3, 2018 in Data Analytics by DataKing99
• 8,240 points

edited Apr 3, 2018 by DataKing99 1,243 views

1 answer to this question.

0 votes

You can parse the strings to symbols. Also, your cat.list is already a string vector, so there is no need to add double quotes around x in mutate.

Below is the code you can refer to:

library(dplyr)
library(rlang)
cat.list <- c("REPORT_CODE","PAYMENT_METHOD","INBOUND_CHANNEL","AMOUNT_CAT")
repeat_procs <- lapply(cat.list, function(x){
  final_data <- sample_data %>% 
    group_by(!!sym(x), YEAR) %>% 
    summarize(num=n(),total=sum(AMOUNT)) %>% 
    rename(REPORT_VALUE=!!sym(x)) %>% 
    mutate(REPORT_CATEGORY=x)
}) %>%
  bind_rows()

Result would be as follows:

> repeat_procs
# A tibble: 27 x 5
# Groups:   REPORT_VALUE [16]
   REPORT_VALUE   YEAR   num total REPORT_CATEGORY
          <chr> <fctr> <int> <int>           <chr>
 1            J   FY14     1    25     REPORT_CODE
 2            Q   FY16     1     1     REPORT_CODE
 3            Q   FY17     1   100     REPORT_CODE
 4            R   FY17     1    50     REPORT_CODE
 5            R   FY18     2    75     REPORT_CODE
 6            S   FY17     2   400     REPORT_CODE
 7            S   FY18     2   530     REPORT_CODE
 8        Check   FY14     1    25  PAYMENT_METHOD
 9        Check   FY17     1    50  PAYMENT_METHOD
10        Check   FY18     2    55  PAYMENT_METHOD
# ... with 17 more rows
answered Apr 3, 2018 by kappa3010
• 2,090 points

Related Questions In Data Analytics

0 votes
1 answer

How to spilt a column of a data frame into multiple columns

it is easily achievable by using "stringr" ...READ MORE

answered Apr 9, 2018 in Data Analytics by DeepCoder786
• 1,720 points
1,432 views
+1 vote
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered Aug 21, 2019 in Data Analytics by anonymous
• 33,030 points
1,401 views
+1 vote
2 answers

How to replace a value in a data frame based on a conditional 'If' statement?

It's easier to convert alpha to characters ...READ MORE

answered Jun 6, 2018 in Data Analytics by Sahiti
• 6,370 points
36,254 views
0 votes
2 answers

How to arrange a data set in ascending order based on a variable?

In your case it'll be, orderedviews = arrange(movie_views, ...READ MORE

answered Nov 27, 2018 in Data Analytics by Kalgi
• 52,360 points
825 views
0 votes
5 answers

How to remove NA values with dplyr::filter()

Try this: df %>% filter(!is.na(col1)) READ MORE

answered Mar 26, 2019 in Data Analytics by anonymous
318,765 views
0 votes
1 answer

How can I use parallel so that it preserves the list of data frames

You can use pmap as follows: nc <- ...READ MORE

answered Apr 4, 2018 in Data Analytics by kappa3010
• 2,090 points
758 views
0 votes
1 answer
0 votes
1 answer

How to join two tables (tibbles) by *list* columns in R

You can use the hash from digest ...READ MORE

answered Apr 6, 2018 in Data Analytics by kappa3010
• 2,090 points
1,387 views
0 votes
2 answers

How to subset rows containing NA in a chosen column of a data frame?

You can give this a try. subset(dataframe, is.na(dataframe$col2)) ...READ MORE

answered Aug 21, 2019 in Data Analytics by anonymous
• 33,030 points
9,750 views
0 votes
1 answer

How to drop factor levels in a subsetted data frame?

You can use factor(ff) to drop levels ...READ MORE

answered Apr 17, 2018 in Data Analytics by kappa3010
• 2,090 points

edited Apr 17, 2018 by kappa3010 5,023 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP