Cleaning a Data Frame Using Regexp in R

0 votes
 NUMERO_APPEL
1          NNA
2     VQ-40989
3        41993
4        41993
5        42597
6     VQ-42597
7         DER8
8   40001-2010

I would like to extract the 5 consecutive digits of the strings that have the following format and only the following format, all other strings will be replaced by NAs.

AO-11111
VQ-11111
11111
Nov 13, 2018 in Data Analytics by Ali
• 10,430 points
27 views

1 answer to this question.

0 votes

The simplest way:

library(dplyr)
library(stringi)

df %>%
  mutate(NUMERO_APPEL.fix = 
           NUMERO_APPEL %>% 
             stri_extract_first_regex("[0-9]{5}") %>%
             as.numeric)
answered Nov 13, 2018 by Maverick
• 10,040 points

Related Questions In Data Analytics

0 votes
2 answers
0 votes
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered Aug 21 in Data Analytics by anonymous
• 31,840 points
146 views
0 votes
1 answer

How to convert a list to data frame in R?

Let's assume your list of lists is ...READ MORE

answered Apr 12, 2018 in Data Analytics by nirvana
• 3,060 points

edited Apr 12, 2018 by nirvana 3,493 views
0 votes
1 answer

How to convert tables to a data frame in R ?

> trial.table.df <- as.data.frame(trial.table) //assuming that trial.table ...READ MORE

answered Apr 20, 2018 in Data Analytics by zombie
• 3,690 points
101 views
0 votes
1 answer

How do I remove unnecessary redundant data from a dataset?

You can use dimensionality reduction methods such as ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
52 views
0 votes
1 answer

Clean and standardize words using R

You might want to checkout the stringdist package, e.g.: library(stringdist) toMatch ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
31 views
0 votes
1 answer

Cleaning raw data

Try this using read.fwf d <- read.fwf(textConnection( " ...READ MORE

answered Nov 13, 2018 in Data Analytics by Ali
• 10,430 points
29 views
0 votes
1 answer

Getting rid of extra periods - cleaning data using R

Just try removing the periods using sub ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
26 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
59 views
0 votes
1 answer

Manipulate character string using gsub() and perform multivariate data cleaning efficiently in R

gsubfn is perfect for this task: library(gsubfn) as.vector(sapply(gsubfn("[A-Z]", list(B="* 1", ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
31 views