Clean and standardize words using R

0 votes
data<-data.frame(comment=c('scan','scanned','SCANNED','scan and sent','FAXED','faxed to','faxed- pt'))


1          scan
2       scanned
3       SCANNED
4 scan and sent
5         FAXED
6      faxed to
7     faxed- pt

I'm wondering how to use R to clean the data into:

1  scanned
2  scanned
3  scanned
4  scanned
5    faxed
6    faxed
7    faxed
Nov 13, 2018 in Data Analytics by Ali
• 10,290 points
12 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

You might want to checkout the stringdist package, e.g.:

library(stringdist)

toMatch <- c('scan', 'scanned', 'SCANNED', 'scan and sent', 'FAXED', 'faxed to', 'faxed- pt')
possibleValues <- c("scanned", "faxed")

possibleValues[amatch(x = toMatch, table = possibleValues, maxDist = Inf)]

Returns:

[1] "scanned" "scanned" "scanned" "scanned" "faxed"   "faxed"   "faxed"
answered Nov 13, 2018 by Maverick
• 10,000 points

Related Questions In Data Analytics

0 votes
1 answer

How can I print string and variable contents on the same line using R?

There are two options for doing so.  You ...READ MORE

answered May 9, 2018 in Data Analytics by zombie
• 3,690 points
13 views
0 votes
1 answer

How to forecast season and trend of data using STL and ARIMA in R?

You can use the forecast.stl function for the ...READ MORE

answered May 18, 2018 in Data Analytics by DataKing99
• 8,100 points
370 views
0 votes
1 answer

What is a Random Walk model and how can you simulate it using R?

A random walk is a simple example ...READ MORE

answered Jul 3, 2018 in Data Analytics by DataKing99
• 8,100 points
293 views
+1 vote
1 answer

R programming: Drawing an xbar and R chart using qcc package

Try this: #x-Bar library(qcc) x=c(1080͵ 1390͵ 1460͵ ...READ MORE

answered Jan 25 in Data Analytics by Tyrion anex
• 8,280 points
70 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
15 views
0 votes
1 answer

Cleaning a Data Frame Using Regexp in R

The simplest way: library(dplyr) library(stringi) df %>% mutate(NUMERO_APPEL.fix = ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
13 views
0 votes
1 answer

How do I remove unnecessary redundant data from a dataset?

You can use dimensionality reduction methods such as ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
25 views
0 votes
1 answer

Cleaning raw data

Try this using read.fwf d <- read.fwf(textConnection( " ...READ MORE

answered Nov 13, 2018 in Data Analytics by Ali
• 10,290 points
15 views
0 votes
1 answer

Clean a set of data using R

Try this: NCM <- c(5,1,3,2,4) Mbrand <- c(1,5,3,4,2) fac<-factor(Mbrand, levels ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
15 views
0 votes
1 answer

Manipulate character string using gsub() and perform multivariate data cleaning efficiently in R

gsubfn is perfect for this task: library(gsubfn) as.vector(sapply(gsubfn("[A-Z]", list(B="* 1", ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
11 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.