How to standardize rows when cleaning data?

0 votes
How to standardize rows when cleaning data?
Nov 14, 2018 in Data Analytics by Ali
• 10,430 points
23 views

1 answer to this question.

0 votes

The goal of this step is to make sure that 1) every row has the same number of fields and 2) the fields are in the right order. In read.table, lines that contain less fields than the maximum number of fields detected are appended with NA. One advantage of the do-it-yourself approach shown here is that we do not have to make this assumption. The easiest way to standardize rows is to write a function that takes a single character vector as input and assigns the values in the right order.

assignFields <- function(x){
out <- character(3) 
# get names 
i <- grepl("[[:alpha:]]",x)
out[1] <- x[i]
# get birth date (if any)
i <- which(as.numeric(x) < 1890)
out[2] <- ifelse(length(i)>0, x[i], NA)
# get death date (if any)
i <- which(as.numeric(x) > 1890) 
out[3] <- ifelse(length(i)>0, x[i], NA)
out
}
answered Nov 14, 2018 by Maverick
• 10,040 points

Related Questions In Data Analytics

0 votes
2 answers

How to remove rows with missing values (NAs) in a data frame?

Hi, The below code returns rows without ...READ MORE

answered Aug 20 in Data Analytics by anonymous
• 26,880 points
4,914 views
0 votes
2 answers

How to subset rows containing NA in a chosen column of a data frame?

You can give this a try. subset(dataframe, is.na(dataframe$col2)) ...READ MORE

answered Aug 21 in Data Analytics by anonymous
• 26,880 points
96 views
0 votes
1 answer

How to order data frame rows according to vector with specific order using R?

You can try using match: data <- data.frame(alphabets=letters[1:4], ...READ MORE

answered Apr 30, 2018 in Data Analytics by darklord
• 6,150 points
77 views
0 votes
1 answer

How can I append rows to an R data frame?

Consider a dataSet i.e cicar(present under library ...READ MORE

answered May 8, 2018 in Data Analytics by zombie
• 3,690 points
54 views
0 votes
1 answer

How to remove certain character from a vector

We can use sub to remove the * by specifying fixed = ...READ MORE

answered Nov 14, 2018 in Data Analytics by Maverick
• 10,040 points
45 views
0 votes
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered Aug 21 in Data Analytics by anonymous
• 26,880 points
129 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
52 views
0 votes
1 answer

Look for certain values from not cleaned data

First see what rows meet t$ps04==1 & t$rectyp==1. ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
28 views
0 votes
1 answer

an error occurred" because rows do not match when trying to use lm to perform an ANOVA test

Maybe you could do something like this. ...READ MORE

answered Nov 2, 2018 in Data Analytics by Maverick
• 10,040 points
32 views
0 votes
1 answer

Error saying "duplicate 'row.names' are not allowed" when trying to setup my data for the mlogit-package

Take out the chid.var argument in your call to mlogit.data, ...READ MORE

answered Nov 12, 2018 in Data Analytics by Maverick
• 10,040 points
272 views