Clean and standardize words using R

0 votes
data<-data.frame(comment=c('scan','scanned','SCANNED','scan and sent','FAXED','faxed to','faxed- pt'))


1          scan
2       scanned
3       SCANNED
4 scan and sent
5         FAXED
6      faxed to
7     faxed- pt

I'm wondering how to use R to clean the data into:

1  scanned
2  scanned
3  scanned
4  scanned
5    faxed
6    faxed
7    faxed
Nov 13, 2018 in Data Analytics by Ali
• 10,450 points
17 views

1 answer to this question.

0 votes

You might want to checkout the stringdist package, e.g.:

library(stringdist)

toMatch <- c('scan', 'scanned', 'SCANNED', 'scan and sent', 'FAXED', 'faxed to', 'faxed- pt')
possibleValues <- c("scanned", "faxed")

possibleValues[amatch(x = toMatch, table = possibleValues, maxDist = Inf)]

Returns:

[1] "scanned" "scanned" "scanned" "scanned" "faxed"   "faxed"   "faxed"
answered Nov 13, 2018 by Maverick
• 10,040 points

Related Questions In Data Analytics

0 votes
1 answer

How can I print string and variable contents on the same line using R?

There are two options for doing so.  You ...READ MORE

answered May 9, 2018 in Data Analytics by zombie
• 3,690 points
22 views
0 votes
1 answer

How to forecast season and trend of data using STL and ARIMA in R?

You can use the forecast.stl function for the ...READ MORE

answered May 18, 2018 in Data Analytics by DataKing99
• 8,100 points
449 views
0 votes
1 answer

What is a Random Walk model and how can you simulate it using R?

A random walk is a simple example ...READ MORE

answered Jul 3, 2018 in Data Analytics by DataKing99
• 8,100 points
397 views
+1 vote
1 answer

R programming: Drawing an xbar and R chart using qcc package

Try this: #x-Bar library(qcc) x=c(1080͵ 1390͵ 1460͵ ...READ MORE

answered Jan 25 in Data Analytics by Tyrion anex
• 8,280 points
103 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
28 views
0 votes
1 answer

Cleaning a Data Frame Using Regexp in R

The simplest way: library(dplyr) library(stringi) df %>% mutate(NUMERO_APPEL.fix = ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
19 views
0 votes
1 answer

How do I remove unnecessary redundant data from a dataset?

You can use dimensionality reduction methods such as ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
31 views
0 votes
1 answer

Cleaning raw data

Try this using read.fwf d <- read.fwf(textConnection( " ...READ MORE

answered Nov 13, 2018 in Data Analytics by Ali
• 10,450 points
20 views
0 votes
1 answer

Clean a set of data using R

Try this: NCM <- c(5,1,3,2,4) Mbrand <- c(1,5,3,4,2) fac<-factor(Mbrand, levels ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
23 views
0 votes
1 answer

Manipulate character string using gsub() and perform multivariate data cleaning efficiently in R

gsubfn is perfect for this task: library(gsubfn) as.vector(sapply(gsubfn("[A-Z]", list(B="* 1", ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
20 views