How to subset data so that it contains only columns whose names match a condition

0 votes
Is there any way to subset data based on column names starting with any particular string?

Suppose I have columns like ABC_1 ABC_2 ABC_3 and others like XYZ_1, XYZ_2, XYZ_3

How to subset the data frame based on columns containing either ABC OR XYZ?

I don't want to use indices since the columns are too scattered in data.

Also, how do I include only rows from each of these columns where any of their value will be >0
Apr 26, 2018 in Data Analytics by CodingByHeart77
• 3,680 points
22 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

You can use grepl on the names of data frame.

grepl matches a regular expression to a target and returns TRUE if a match is found and FALSE otherwise.

#  Data
data <- data.frame( ABC_1 = runif(3),
            ABC_2 = runif(3),
            XYZ_1 = runif(3),
            XYZ_2 = runif(3) )

#      ABC_1     ABC_2     XYZ_1     XYZ_2
#1 0.3792645 0.3614199 0.9793573 0.7139381
#2 0.1313246 0.9746691 0.7276705 0.0126057
#3 0.7282680 0.6518444 0.9531389 0.9673290

#  Use grepl
data[ , grepl( "ABC" , names( data ) ) ]
#      ABC_1     ABC_2
#1 0.3792645 0.3614199
#2 0.1313246 0.9746691
#3 0.7282680 0.6518444

#  grepl returns logical vector 
grepl( "ABC" , names( data ) )
#[1]  TRUE  TRUE FALSE FALSE

To answer the second part of the question, make the subset data.frame and then make a vector that indexes the rows to keep (a logical vector)

set.seed(1)
data <- data.frame( ABC_1 = sample(0:1,3,repl = TRUE),
            ABC_2 = sample(0:1,3,repl = TRUE),
            XYZ_1 = sample(0:1,3,repl = TRUE),
            XYZ_2 = sample(0:1,3,repl = TRUE) )

# We want to discard the second row because 'all' ABC values are 0:
#  ABC_1 ABC_2 XYZ_1 XYZ_2
#1     0     1     1     0
#2     0     0     1     0
#3     1     1     1     0


data1 <- data[ , grepl( "ABC" , names( data ) ) ]

ind <- apply( data1 , 1 , function(x) any( x > 0 ) )

data1[ ind , ]
#  ABC_1 ABC_2
#1     0     1
#3     1     1
answered Apr 26, 2018 by darklord
• 6,140 points

Related Questions In Data Analytics

0 votes
1 answer
0 votes
1 answer

How can I use parallel so that it preserves the list of data frames

You can use pmap as follows: nc <- ...READ MORE

answered Apr 4, 2018 in Data Analytics by kappa3010
• 2,010 points
19 views
0 votes
1 answer

How to spilt a column of a data frame into multiple columns

it is easily achievable by using "stringr" ...READ MORE

answered Apr 9, 2018 in Data Analytics by DeepCoder786
• 1,700 points
23 views
0 votes
1 answer

How to change column names of a Data frame?

Easiest way: names(prices)[1]<-paste0("old") names(prices)[2]<-paste0("new") names(prices)[3]<-paste0("best") #or by using colnames colnames(prices)<-c("old","new","best") prices old ...READ MORE

answered Apr 11, 2018 in Data Analytics by DeepCoder786
• 1,700 points
24 views
0 votes
1 answer

Any filter based on conditional criteria in r?

Consider a data frame like this: #Create DF ...READ MORE

answered May 11, 2018 in Data Analytics by darklord
• 6,140 points
25 views
0 votes
1 answer

Big Data transformations with R

Dear Koushik, Hope you are doing great. You can ...READ MORE

answered Dec 17, 2017 in Data Analytics by Sudhir
• 1,610 points
27 views
0 votes
2 answers

Transforming a key/value string into distinct rows in R

We would start off by loading the ...READ MORE

answered Mar 26, 2018 in Data Analytics by Bharani
• 4,550 points
29 views
0 votes
1 answer

Finding frequency of observations in R

You can use the "dplyr" package to ...READ MORE

answered Mar 26, 2018 in Data Analytics by Bharani
• 4,550 points
39 views
0 votes
1 answer

How to sort a data frame by columns in R?

You can just use the order function ...READ MORE

answered Apr 10, 2018 in Data Analytics by darklord
• 6,140 points
54 views
0 votes
1 answer

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.