Cleaning data using R

0 votes

I'm trying to clean a dataset. This is my dataset:

"id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10"
1,"Male",22,"movies","music","travel","cloths","grocery",,,,,
2,"Male",28,"travel","books","movies",,,,,,,
3,"Female",27,"rent","fuel","grocery","cloths",,,,,,
4,"Female",22,"rent","grocery","travel","movies","cloths",,,,,
5,"Female",22,"rent","online-shopping","utiliy",,,,,,,

I want it to be in a tabular form something like this:

id gender age category            rank
1 Male    22  movies               1
1 Male    22  music                2
1 Male    22  travel               3
1 Male    22  cloths               4
1 Male    22  grocery              5
1 Male    22  books                NA
1 Male    22  rent                 NA
1 Male    22  fuel                 NA
1 Male    22  utility              NA
1 Male    22  online-shopping      NA
...................................
5 Female    22  movies             NA
5 Female    22  music              NA
5 Female    22  travel             NA
5 Female    22  cloths             NA
5 Female    22  grocery            NA
5 Female    22  books              NA
5 Female    22  rent               1
5 Female    22  fuel               NA
5 Female    22  utility            NA
5 Female    22  online-shopping    2

How do I achieve it?

Nov 13, 2018 in Data Analytics by Ali
• 10,290 points
18 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Try something like this:

text1='"id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10"
1,"Male",22,"movies","music","travel","cloths","grocery",,,,,
2,"Male",28,"travel","books","movies",,,,,,,
3,"Female",27,"rent","fuel","grocery","cloths",,,,,,
4,"Female",22,"rent","grocery","travel","movies","cloths",,,,,
5,"Female",22,"rent","online-shopping","utiliy",,,,,,,'
d1 <- read.table(text=text1, sep=",", head=T, as.is=T)

library(reshape2)
d2 <- melt(d1, id.vars=c("id","gender","age"))
names(d2)[5] <- "category"
names(d2)[4] <- "rank"
d2$rank <- gsub("category", "", d2$rank)
head(d2)
#   id gender age rank category
# 1  1   Male  22    1   movies
# 2  2   Male  28    1   travel
# 3  3 Female  27    1     rent
# 4  4 Female  22    1     rent
# 5  5 Female  22    1     rent
# 6  1   Male  22    2    music
answered Nov 13, 2018 by Maverick
• 10,000 points

Related Questions In Data Analytics

0 votes
1 answer

Getting rid of extra periods - cleaning data using R

Just try removing the periods using sub ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
11 views
0 votes
1 answer
0 votes
1 answer

How to order data frame rows according to vector with specific order using R?

You can try using match: data <- data.frame(alphabets=letters[1:4], ...READ MORE

answered Apr 30, 2018 in Data Analytics by darklord
• 6,140 points
36 views
0 votes
1 answer

How to forecast season and trend of data using STL and ARIMA in R?

You can use the forecast.stl function for the ...READ MORE

answered May 18, 2018 in Data Analytics by DataKing99
• 8,100 points
364 views
0 votes
1 answer

How do I remove unnecessary redundant data from a dataset?

You can use dimensionality reduction methods such as ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
25 views
0 votes
1 answer

Manipulate character string using gsub() and perform multivariate data cleaning efficiently in R

gsubfn is perfect for this task: library(gsubfn) as.vector(sapply(gsubfn("[A-Z]", list(B="* 1", ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
11 views
0 votes
1 answer

Clean and standardize words using R

You might want to checkout the stringdist package, e.g.: library(stringdist) toMatch ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
12 views
0 votes
1 answer

Cleaning raw data

Try this using read.fwf d <- read.fwf(textConnection( " ...READ MORE

answered Nov 13, 2018 in Data Analytics by Ali
• 10,290 points
14 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
15 views
0 votes
1 answer

Cleaning a Data Frame Using Regexp in R

The simplest way: library(dplyr) library(stringi) df %>% mutate(NUMERO_APPEL.fix = ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,000 points
13 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.