Is there any easy way to fill in missing data

0 votes

There is a table with results from an optimization algorithm. I have 100 runs. 

X represents the time and is only stored when an improvement is stored. 

x1; y1  ; x2 ; y2
1 ; 100 ; 1  ; 150
4 ; 90  ; 2  ; 85
7 ; 85  ; 10 ; 60
10; 80  ;

 am looking for a method to easily process this. 

As I want to calculate averages at each x-value. So the average at x = 4, needs to take into account that for run 2, y at 4 is 85.

So the expected output would look like this:

x1; y1  ; x2 ; y2
1 ; 100 ; 1  ; 150
2 ; 100 ; 2  ; 85
4 ; 90  ; 4  ; 85
7 ; 85  ; 7  ; 85
10; 80  ;10 ; 60

I have tried out the below code:

library(ggplot2)
 library(zoo)

data1 = read.table("rundata1", sep= " ", col.names=c("tm1","score1","current1"))
data2 = read.table("rundata1", sep= " ", col.names=c("tm2","score2","current2"))

newdata<- merge(data1[,1:2],data2[,1:2],by=1,all=T)
newdata <- newdata[!is.na(newdata$tm1),]
newdata$score1 <- zoo::na.locf(newdata$score1)
newdata$score2 <- zoo::na.locf(newdata$score2)

Its almost working now, but it is showing the following error:

newdata$score2 <- zoo::na.locf(newdata$score2)
Error in `$<-.data.frame`(`*tmp*`, "score2", value = c(40152.6, 40152.6,  : 
  replacement has 11767 rows, data has 11768
Jun 20, 2018 in Data Analytics by CodingByHeart77
• 3,750 points
1,860 views

1 answer to this question.

0 votes

You can try the following code:

First, you merge your 2 runs, then you fill the missing values with the last no missing. 

I am using na.locf from the zoo package for this.

xx <- read.table(text='x1; y1  ; x2 ; y2
1 ; 100 ; 1  ; 150
4 ; 90  ; 2  ; 85
7 ; 85  ; 10 ; 60
10; 80  ;',sep=';',fill=TRUE,header=TRUE)

dm <- merge(xx[,1:2],xx[,3:4],by=1,all=T)
dm <- dm[!is.na(dm$x1),]
dm$y1 <- zoo::na.locf(dm$y1)
dm$y2 <- zoo::na.locf(dm$y2)
dm
  x1  y1  y2
1  1 100 150
2  2 100  85
3  4  90  85
4  7  85  85
5 10  80  60
answered Jun 20, 2018 by DataKing99
• 8,250 points

Related Questions In Data Analytics

0 votes
1 answer

Is there any way to check for missing packages and install them in R?

There are 2 options: Either you can use ...READ MORE

answered Apr 17, 2018 in Data Analytics by nirvana
• 3,090 points
1,744 views
0 votes
1 answer

Is there a way to display correlation in graphical manner in R?

Ans There are multiple ways of getting this. ...READ MORE

answered Nov 26, 2018 in Data Analytics by Maverick
• 10,840 points
1,208 views
0 votes
2 answers

How to remove rows with missing values (NAs) in a data frame?

Hi, The below code returns rows without ...READ MORE

answered Aug 20, 2019 in Data Analytics by anonymous
• 33,050 points
15,994 views
0 votes
1 answer

Is there a way to make R beep/play a sound?

Yes there are few ways to do ...READ MORE

answered May 25, 2018 in Data Analytics by zombie
• 3,790 points
1,450 views
0 votes
1 answer

SMOTE-function not working in R

If you convert 'y' to a factor, ...READ MORE

answered Jun 27, 2018 in Data Analytics by CodingByHeart77
• 3,750 points
4,080 views
0 votes
1 answer

How to find out cluster center mean of DBSCAN in R?

Just index back into the original data ...READ MORE

answered Jun 27, 2018 in Data Analytics by Sahiti
• 6,370 points
2,330 views
0 votes
1 answer

Create vector matrix of movie ratings using R project?

Why do'nt you try the dcast function, in the reshape2 package. d ...READ MORE

answered Jun 30, 2018 in Data Analytics by anonymous
1,922 views
0 votes
1 answer

List packages are used for data mining in R?

You can refer to the following packages ...READ MORE

answered Jul 3, 2018 in Data Analytics by DataKing99
• 8,250 points
2,430 views
0 votes
1 answer

How to filter a data frame with dplyr and tidy evaluation in R?

Requires the use of map_df to run each model, ...READ MORE

answered May 17, 2018 in Data Analytics by DataKing99
• 8,250 points
2,906 views