Cleaning a Data Frame Using Regexp in R

0 votes
 NUMERO_APPEL
1          NNA
2     VQ-40989
3        41993
4        41993
5        42597
6     VQ-42597
7         DER8
8   40001-2010

I would like to extract the 5 consecutive digits of the strings that have the following format and only the following format, all other strings will be replaced by NAs.

AO-11111
VQ-11111
11111
Nov 13, 2018 in Data Analytics by Ali
• 11,360 points
433 views

1 answer to this question.

0 votes

The simplest way:

library(dplyr)
library(stringi)

df %>%
  mutate(NUMERO_APPEL.fix = 
           NUMERO_APPEL %>% 
             stri_extract_first_regex("[0-9]{5}") %>%
             as.numeric)
answered Nov 13, 2018 by Maverick
• 10,840 points

Related Questions In Data Analytics

+1 vote
3 answers

How to change the value of a variable using R programming in a data frame?

Try this: df$symbol <- as.character(df$symbol) df$symbol[df$sym ...READ MORE

answered Jan 11, 2019 in Data Analytics by Tyrion anex
• 8,700 points
35,142 views
+1 vote
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered Aug 21, 2019 in Data Analytics by anonymous
• 33,030 points
1,401 views
0 votes
1 answer

How to convert a list to data frame in R?

Let's assume your list of lists is ...READ MORE

answered Apr 12, 2018 in Data Analytics by nirvana
• 3,130 points

edited Apr 12, 2018 by nirvana 21,818 views
0 votes
1 answer

How to convert tables to a data frame in R ?

> trial.table.df <- as.data.frame(trial.table) //assuming that trial.table ...READ MORE

answered Apr 20, 2018 in Data Analytics by zombie
• 3,790 points
7,152 views
0 votes
1 answer

How do I remove unnecessary redundant data from a dataset?

You can use dimensionality reduction methods such as ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,840 points
1,231 views
0 votes
1 answer

Clean and standardize words using R

You might want to checkout the stringdist package, e.g.: library(stringdist) toMatch ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,840 points
574 views
0 votes
1 answer

Cleaning raw data

Try this using read.fwf d <- read.fwf(textConnection( " ...READ MORE

answered Nov 13, 2018 in Data Analytics by Ali
• 11,360 points
682 views
0 votes
1 answer

Getting rid of extra periods - cleaning data using R

Just try removing the periods using sub ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,840 points
474 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,840 points
3,269 views
0 votes
1 answer

Manipulate character string using gsub() and perform multivariate data cleaning efficiently in R

gsubfn is perfect for this task: library(gsubfn) as.vector(sapply(gsubfn("[A-Z]", list(B="* 1", ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,840 points
585 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP