How to standardize rows when cleaning data?

0 votes
How to standardize rows when cleaning data?
Nov 14, 2018 in Data Analytics by Ali
• 10,450 points
15 views

1 answer to this question.

0 votes

The goal of this step is to make sure that 1) every row has the same number of fields and 2) the fields are in the right order. In read.table, lines that contain less fields than the maximum number of fields detected are appended with NA. One advantage of the do-it-yourself approach shown here is that we do not have to make this assumption. The easiest way to standardize rows is to write a function that takes a single character vector as input and assigns the values in the right order.

assignFields <- function(x){
out <- character(3) 
# get names 
i <- grepl("[[:alpha:]]",x)
out[1] <- x[i]
# get birth date (if any)
i <- which(as.numeric(x) < 1890)
out[2] <- ifelse(length(i)>0, x[i], NA)
# get death date (if any)
i <- which(as.numeric(x) > 1890) 
out[3] <- ifelse(length(i)>0, x[i], NA)
out
}
answered Nov 14, 2018 by Maverick
• 10,040 points

Related Questions In Data Analytics

0 votes
1 answer

How to remove rows with missing values (NAs) in a data frame?

You can use complete.cases in the following ...READ MORE

answered Apr 13, 2018 in Data Analytics by darklord
• 6,140 points
3,696 views
0 votes
1 answer

How to subset rows containing NA in a chosen column of a data frame?

I would suggest you, to never to ...READ MORE

answered Apr 26, 2018 in Data Analytics by kappa3010
• 2,010 points
72 views
0 votes
1 answer

How to order data frame rows according to vector with specific order using R?

You can try using match: data <- data.frame(alphabets=letters[1:4], ...READ MORE

answered Apr 30, 2018 in Data Analytics by darklord
• 6,140 points
48 views
0 votes
1 answer

How can I append rows to an R data frame?

Consider a dataSet i.e cicar(present under library ...READ MORE

answered May 8, 2018 in Data Analytics by zombie
• 3,690 points
30 views
0 votes
1 answer

How to remove certain character from a vector

We can use sub to remove the * by specifying fixed = ...READ MORE

answered Nov 14, 2018 in Data Analytics by Maverick
• 10,040 points
39 views
0 votes
1 answer

How to sort a data frame by columns in R?

You can just use the order function ...READ MORE

answered Apr 10, 2018 in Data Analytics by darklord
• 6,140 points
73 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
28 views
0 votes
1 answer

Look for certain values from not cleaned data

First see what rows meet t$ps04==1 & t$rectyp==1. ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
20 views
0 votes
1 answer

an error occurred" because rows do not match when trying to use lm to perform an ANOVA test

Maybe you could do something like this. ...READ MORE

answered Nov 2, 2018 in Data Analytics by Maverick
• 10,040 points
24 views
0 votes
1 answer

Error saying "duplicate 'row.names' are not allowed" when trying to setup my data for the mlogit-package

Take out the chid.var argument in your call to mlogit.data, ...READ MORE

answered Nov 12, 2018 in Data Analytics by Maverick
• 10,040 points
223 views