How to standardize rows when cleaning data

+1 vote
How to standardize rows when cleaning data?
Nov 14, 2018 in Data Analytics by Ali
• 11,330 points
145 views

1 answer to this question.

0 votes

The goal of this step is to make sure that 1) every row has the same number of fields and 2) the fields are in the right order. In read.table, lines that contain less fields than the maximum number of fields detected are appended with NA. One advantage of the do-it-yourself approach shown here is that we do not have to make this assumption. The easiest way to standardize rows is to write a function that takes a single character vector as input and assigns the values in the right order.

assignFields <- function(x){
out <- character(3) 
# get names 
i <- grepl("[[:alpha:]]",x)
out[1] <- x[i]
# get birth date (if any)
i <- which(as.numeric(x) < 1890)
out[2] <- ifelse(length(i)>0, x[i], NA)
# get death date (if any)
i <- which(as.numeric(x) > 1890) 
out[3] <- ifelse(length(i)>0, x[i], NA)
out
}
answered Nov 14, 2018 by Maverick
• 10,840 points

Related Questions In Data Analytics

0 votes
2 answers

How to remove rows with missing values (NAs) in a data frame?

Hi, The below code returns rows without ...READ MORE

answered Aug 20, 2019 in Data Analytics by anonymous
• 33,010 points
13,267 views
0 votes
2 answers

How to subset rows containing NA in a chosen column of a data frame?

You can give this a try. subset(dataframe, is.na(dataframe$col2)) ...READ MORE

answered Aug 21, 2019 in Data Analytics by anonymous
• 33,010 points
6,317 views
0 votes
1 answer

How to order data frame rows according to vector with specific order using R?

You can try using match: data <- data.frame(alphabets=letters[1:4], ...READ MORE

answered Apr 30, 2018 in Data Analytics by Sahiti
• 6,380 points
5,351 views
0 votes
1 answer

How can I append rows to an R data frame?

Consider a dataSet i.e cicar(present under library ...READ MORE

answered May 9, 2018 in Data Analytics by zombie
• 3,790 points
9,693 views
0 votes
1 answer

How to remove certain character from a vector

We can use sub to remove the * by specifying fixed = ...READ MORE

answered Nov 14, 2018 in Data Analytics by Maverick
• 10,840 points
158 views
+1 vote
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered Aug 21, 2019 in Data Analytics by anonymous
• 33,010 points
717 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,840 points
2,282 views
0 votes
1 answer

Look for certain values from not cleaned data

First see what rows meet t$ps04==1 & t$rectyp==1. ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,840 points
125 views
+1 vote
1 answer

an error occurred" because rows do not match when trying to use lm to perform an ANOVA test

Maybe you could do something like this. ...READ MORE

answered Nov 2, 2018 in Data Analytics by Maverick
• 10,840 points
161 views
0 votes
1 answer

Error saying "duplicate 'row.names' are not allowed" when trying to setup my data for the mlogit-package

Take out the chid.var argument in your call to mlogit.data, ...READ MORE

answered Nov 12, 2018 in Data Analytics by Maverick
• 10,840 points
1,210 views