Clean and standardize words using R

0 votes
data<-data.frame(comment=c('scan','scanned','SCANNED','scan and sent','FAXED','faxed to','faxed- pt'))


1          scan
2       scanned
3       SCANNED
4 scan and sent
5         FAXED
6      faxed to
7     faxed- pt

I'm wondering how to use R to clean the data into:

1  scanned
2  scanned
3  scanned
4  scanned
5    faxed
6    faxed
7    faxed
Nov 13, 2018 in Data Analytics by Ali
• 10,670 points
73 views

1 answer to this question.

0 votes

You might want to checkout the stringdist package, e.g.:

library(stringdist)

toMatch <- c('scan', 'scanned', 'SCANNED', 'scan and sent', 'FAXED', 'faxed to', 'faxed- pt')
possibleValues <- c("scanned", "faxed")

possibleValues[amatch(x = toMatch, table = possibleValues, maxDist = Inf)]

Returns:

[1] "scanned" "scanned" "scanned" "scanned" "faxed"   "faxed"   "faxed"
answered Nov 13, 2018 by Maverick
• 10,820 points

Related Questions In Data Analytics

0 votes
1 answer

How can I print string and variable contents on the same line using R?

There are two options for doing so.  You ...READ MORE

answered May 9, 2018 in Data Analytics by zombie
• 3,750 points
70 views
0 votes
1 answer

How to forecast season and trend of data using STL and ARIMA in R?

You can use the forecast.stl function for the ...READ MORE

answered May 18, 2018 in Data Analytics by DataKing99
• 8,150 points
883 views
0 votes
1 answer

What is a Random Walk model and how can you simulate it using R?

A random walk is a simple example ...READ MORE

answered Jul 3, 2018 in Data Analytics by DataKing99
• 8,150 points
1,343 views
+1 vote
1 answer

R programming: Drawing an xbar and R chart using qcc package

Try this: #x-Bar library(qcc) x=c(1080͵ 1390͵ 1460͵ ...READ MORE

answered Jan 25, 2019 in Data Analytics by Tyrion anex
• 8,380 points
269 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,820 points
579 views
0 votes
1 answer

Cleaning a Data Frame Using Regexp in R

The simplest way: library(dplyr) library(stringi) df %>% mutate(NUMERO_APPEL.fix = ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,820 points
55 views
0 votes
1 answer

How do I remove unnecessary redundant data from a dataset?

You can use dimensionality reduction methods such as ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,820 points
159 views
0 votes
1 answer

Cleaning raw data

Try this using read.fwf d <- read.fwf(textConnection( " ...READ MORE

answered Nov 13, 2018 in Data Analytics by Ali
• 10,670 points
111 views
0 votes
1 answer

Clean a set of data using R

Try this: NCM <- c(5,1,3,2,4) Mbrand <- c(1,5,3,4,2) fac<-factor(Mbrand, levels ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,820 points
87 views
0 votes
1 answer

Manipulate character string using gsub() and perform multivariate data cleaning efficiently in R

gsubfn is perfect for this task: library(gsubfn) as.vector(sapply(gsubfn("[A-Z]", list(B="* 1", ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,820 points
62 views