Cleaning a Data Frame Using Regexp in R

0 votes
 NUMERO_APPEL
1          NNA
2     VQ-40989
3        41993
4        41993
5        42597
6     VQ-42597
7         DER8
8   40001-2010

I would like to extract the 5 consecutive digits of the strings that have the following format and only the following format, all other strings will be replaced by NAs.

AO-11111
VQ-11111
11111
Nov 13, 2018 in Data Analytics by Ali
• 10,450 points
19 views

1 answer to this question.

0 votes

The simplest way:

library(dplyr)
library(stringi)

df %>%
  mutate(NUMERO_APPEL.fix = 
           NUMERO_APPEL %>% 
             stri_extract_first_regex("[0-9]{5}") %>%
             as.numeric)
answered Nov 13, 2018 by Maverick
• 10,040 points

Related Questions In Data Analytics

0 votes
1 answer

How to change the value of a variable using R programming in a data frame?

Try this: df$symbol <- as.character(df$symbol) df$symbol[df$symb ...READ MORE

answered Jan 11 in Data Analytics by Tyrion anex
• 8,280 points
83 views
0 votes
1 answer

How to sort a data frame by columns in R?

You can just use the order function ...READ MORE

answered Apr 10, 2018 in Data Analytics by darklord
• 6,140 points
74 views
0 votes
1 answer

How to convert a list to data frame in R?

Let's assume your list of lists is ...READ MORE

answered Apr 12, 2018 in Data Analytics by nirvana
• 3,060 points

edited Apr 12, 2018 by nirvana 1,886 views
0 votes
1 answer

How to convert tables to a data frame in R ?

> trial.table.df <- as.data.frame(trial.table) //assuming that trial.table ...READ MORE

answered Apr 20, 2018 in Data Analytics by zombie
• 3,690 points
31 views
0 votes
1 answer

How do I remove unnecessary redundant data from a dataset?

You can use dimensionality reduction methods such as ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
31 views
0 votes
1 answer

Clean and standardize words using R

You might want to checkout the stringdist package, e.g.: library(stringdist) toMatch ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
17 views
0 votes
1 answer

Cleaning raw data

Try this using read.fwf d <- read.fwf(textConnection( " ...READ MORE

answered Nov 13, 2018 in Data Analytics by Ali
• 10,450 points
20 views
0 votes
1 answer

Getting rid of extra periods - cleaning data using R

Just try removing the periods using sub ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
16 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
30 views
0 votes
1 answer

Manipulate character string using gsub() and perform multivariate data cleaning efficiently in R

gsubfn is perfect for this task: library(gsubfn) as.vector(sapply(gsubfn("[A-Z]", list(B="* 1", ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
21 views