Cleaning a Data Frame Using Regexp in R

0 votes
 NUMERO_APPEL
1          NNA
2     VQ-40989
3        41993
4        41993
5        42597
6     VQ-42597
7         DER8
8   40001-2010

I would like to extract the 5 consecutive digits of the strings that have the following format and only the following format, all other strings will be replaced by NAs.

AO-11111
VQ-11111
11111
Nov 13, 2018 in Data Analytics by Ali
• 10,290 points
13 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

The simplest way:

library(dplyr)
library(stringi)

df %>%
  mutate(NUMERO_APPEL.fix = 
           NUMERO_APPEL %>% 
             stri_extract_first_regex("[0-9]{5}") %>%
             as.numeric)
answered Nov 13, 2018 by Maverick
• 10,000 points

Related Questions In Data Analytics

0 votes
1 answer

How to change the value of a variable using R programming in a data frame?

Try this: df$symbol <- as.character(df$symbol) df$symbol[df$symb ...READ MORE

answered Jan 11 in Data Analytics by Tyrion anex
• 8,280 points
46 views
0 votes
1 answer

How to sort a data frame by columns in R?

You can just use the order function ...READ MORE

answered Apr 10, 2018 in Data Analytics by darklord
• 6,140 points
49 views
0 votes
1 answer

How to convert a list to data frame in R?

Let's assume your list of lists is ...READ MORE

answered Apr 12, 2018 in Data Analytics by nirvana
• 3,040 points

edited Apr 12, 2018 by nirvana 1,272 views
0 votes
1 answer

How to convert tables to a data frame in R ?

> trial.table.df <- as.data.frame(trial.table) //assuming that trial.table ...READ MORE

answered Apr 20, 2018 in Data Analytics by zombie
• 3,690 points
17 views
0 votes
1 answer

How do I remove unnecessary redundant data from a dataset?

You can use dimensionality reduction methods such as ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
25 views
0 votes
1 answer

Clean and standardize words using R

You might want to checkout the stringdist package, e.g.: library(stringdist) toMatch ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
12 views
0 votes
1 answer

Cleaning raw data

Try this using read.fwf d <- read.fwf(textConnection( " ...READ MORE

answered Nov 13, 2018 in Data Analytics by Ali
• 10,290 points
14 views
0 votes
1 answer

Getting rid of extra periods - cleaning data using R

Just try removing the periods using sub ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
11 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
15 views
0 votes
1 answer

Manipulate character string using gsub() and perform multivariate data cleaning efficiently in R

gsubfn is perfect for this task: library(gsubfn) as.vector(sapply(gsubfn("[A-Z]", list(B="* 1", ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
11 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.