How to standardize rows when cleaning data?

0 votes
How to standardize rows when cleaning data?
Nov 14, 2018 in Data Analytics by Ali
• 10,450 points
19 views

1 answer to this question.

0 votes

The goal of this step is to make sure that 1) every row has the same number of fields and 2) the fields are in the right order. In read.table, lines that contain less fields than the maximum number of fields detected are appended with NA. One advantage of the do-it-yourself approach shown here is that we do not have to make this assumption. The easiest way to standardize rows is to write a function that takes a single character vector as input and assigns the values in the right order.

assignFields <- function(x){
out <- character(3) 
# get names 
i <- grepl("[[:alpha:]]",x)
out[1] <- x[i]
# get birth date (if any)
i <- which(as.numeric(x) < 1890)
out[2] <- ifelse(length(i)>0, x[i], NA)
# get death date (if any)
i <- which(as.numeric(x) > 1890) 
out[3] <- ifelse(length(i)>0, x[i], NA)
out
}
answered Nov 14, 2018 by Maverick
• 10,040 points

Related Questions In Data Analytics

0 votes
2 answers

How to remove rows with missing values (NAs) in a data frame?

Hi, The below code returns rows without ...READ MORE

answered 2 days ago in Data Analytics by anonymous
• 21,110 points
4,119 views
0 votes
2 answers

How to subset rows containing NA in a chosen column of a data frame?

You can give this a try. subset(dataframe, is.na(dataframe$col2)) ...READ MORE

answered 23 hours ago in Data Analytics by anonymous
• 21,110 points
84 views
0 votes
1 answer

How to order data frame rows according to vector with specific order using R?

You can try using match: data <- data.frame(alphabets=letters[1:4], ...READ MORE

answered Apr 30, 2018 in Data Analytics by darklord
• 6,140 points
61 views
0 votes
1 answer

How can I append rows to an R data frame?

Consider a dataSet i.e cicar(present under library ...READ MORE

answered May 8, 2018 in Data Analytics by zombie
• 3,690 points
37 views
0 votes
1 answer

How to remove certain character from a vector

We can use sub to remove the * by specifying fixed = ...READ MORE

answered Nov 14, 2018 in Data Analytics by Maverick
• 10,040 points
41 views
0 votes
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered 23 hours ago in Data Analytics by anonymous
• 21,110 points
91 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
40 views
0 votes
1 answer

Look for certain values from not cleaned data

First see what rows meet t$ps04==1 & t$rectyp==1. ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
25 views
0 votes
1 answer

an error occurred" because rows do not match when trying to use lm to perform an ANOVA test

Maybe you could do something like this. ...READ MORE

answered Nov 2, 2018 in Data Analytics by Maverick
• 10,040 points
28 views
0 votes
1 answer

Error saying "duplicate 'row.names' are not allowed" when trying to setup my data for the mlogit-package

Take out the chid.var argument in your call to mlogit.data, ...READ MORE

answered Nov 12, 2018 in Data Analytics by Maverick
• 10,040 points
247 views