Cleaning raw data

0 votes

I have a dataset which needs to be cleaned

  1
    *******
    *******
    *******
    *******
      S  H
     HHHHH
        2
    *******
    JSH   K
    *******
    *******
    *******
    *******

This is how it's supposed to look

 ID   a1   a2 a3   a4   a5   a6   a7
1   1    *    *  *    *    *    *    *
2   1    *    *  *    *    *    *    *
3   1    *    *  *    *    *    *    *
4   1    *    *  *    *    *    *    *
5   1 <NA> <NA>  S <NA> <NA>    H <NA>
6   1 <NA>    H  H    H    H    H <NA>
7   2    *    *  *    *    *    *    *
8   2    J    S  H <NA> <NA> <NA>    K
9   2    *    *  *    *    *    *    *
10  2    *    *  *    *    *    *    *
11  2    *    *  *    *    *    *    *
12  2    *    *  *    *    *    *    *
Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
26 views

1 answer to this question.

0 votes

Try this using read.fwf

d <- read.fwf(textConnection(
"    1  
*******
*******
*******
*******
  S  H 
 HHHHH 
    2  
*******
JSH   K
*******
*******
*******
*******"), 
    widths = rep(1, 7),
    na = c(" "),
    stringsAsFactors = FALSE)

id <- as.numeric(d[seq(1, nrow(d), 7), 5])
id <- rep(id, each = 6)

d <- d[seq(1, nrow(d), 7), ]
d <- cbind(id, d)
names(d)[-1] <- paste0("a", 1:7)
d

   id   a1   a2   a3   a4   a5   a6   a7
3   1    *    *    *    *    *    *    *
4   1    *    *    *    *    *    *    *
5   1    *    *    *    *    *    *    *
6   1 <NA> <NA>    S <NA> <NA>    H <NA>
7   1 <NA>    H    H    H    H    H <NA>
8   1 <NA> <NA> <NA> <NA>    2 <NA> <NA>
9   2    *    *    *    *    *    *    *
10  2    J    S    H <NA> <NA> <NA>    K
11  2    *    *    *    *    *    *    *
12  2    *    *    *    *    *    *    *
13  2    *    *    *    *    *    *    *
14  2    *    *    *    *    *    *    *
answered Nov 13, 2018 by Ali
• 10,430 points

Related Questions In Data Analytics

0 votes
2 answers

How does data cleaning play a vital role in data analysis

Data is the core you do your ...READ MORE

answered Jul 23, 2018 in Data Analytics by Anmol
• 3,620 points
142 views
0 votes
1 answer

Cleaning data using R

Try something like this: text1='"id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10" 1,"Male",22,"movies","music","travel","cloths","grocery",,,,, 2,"Male",28,"travel","books","movies",,,,,,, 3,"Female",27,"rent","fuel","grocery","cloths",,,,,, 4,"Female",22,"rent","grocery","travel","movies","cloths",,,,, 5,"Female",22,"rent","online-shopping","utiliy",,,,,,,' d1 <- read.table(text=text1, sep=",", ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
30 views
0 votes
1 answer

Getting rid of extra periods - cleaning data using R

Just try removing the periods using sub ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
22 views
0 votes
1 answer

Why is data cleaning needed?

Data cleaning is the fourth step in ...READ MORE

answered Nov 14, 2018 in Data Analytics by Maverick
• 10,040 points
21 views
0 votes
1 answer

What is raw data?

Raw data is the data that hasn’t ...READ MORE

answered Nov 14, 2018 in Data Analytics by Maverick
• 10,040 points
13 views
0 votes
2 answers

what are the different ways of getting/reading data into for cleaning

Most used functions for reading or extracting ...READ MORE

answered Aug 22 in Data Analytics by anonymous
• 25,900 points
22 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
47 views
0 votes
1 answer

Cleaning a Data Frame Using Regexp in R

The simplest way: library(dplyr) library(stringi) df %>% mutate(NUMERO_APPEL.fix = ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
23 views
0 votes
1 answer

How do I remove unnecessary redundant data from a dataset?

You can use dimensionality reduction methods such as ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
40 views
0 votes
1 answer

Manipulate character string using gsub() and perform multivariate data cleaning efficiently in R

gsubfn is perfect for this task: library(gsubfn) as.vector(sapply(gsubfn("[A-Z]", list(B="* 1", ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
27 views