Is there any easy way to fill in missing data?

0 votes

There is a table with results from an optimization algorithm. I have 100 runs. 

X represents the time and is only stored when an improvement is stored. 

x1; y1  ; x2 ; y2
1 ; 100 ; 1  ; 150
4 ; 90  ; 2  ; 85
7 ; 85  ; 10 ; 60
10; 80  ;

 am looking for a method to easily process this. 

As I want to calculate averages at each x-value. So the average at x = 4, needs to take into account that for run 2, y at 4 is 85.

So the expected output would look like this:

x1; y1  ; x2 ; y2
1 ; 100 ; 1  ; 150
2 ; 100 ; 2  ; 85
4 ; 90  ; 4  ; 85
7 ; 85  ; 7  ; 85
10; 80  ;10 ; 60

I have tried out the below code:

library(ggplot2)
 library(zoo)

data1 = read.table("rundata1", sep= " ", col.names=c("tm1","score1","current1"))
data2 = read.table("rundata1", sep= " ", col.names=c("tm2","score2","current2"))

newdata<- merge(data1[,1:2],data2[,1:2],by=1,all=T)
newdata <- newdata[!is.na(newdata$tm1),]
newdata$score1 <- zoo::na.locf(newdata$score1)
newdata$score2 <- zoo::na.locf(newdata$score2)

Its almost working now, but it is showing the following error:

newdata$score2 <- zoo::na.locf(newdata$score2)
Error in `$<-.data.frame`(`*tmp*`, "score2", value = c(40152.6, 40152.6,  : 
  replacement has 11767 rows, data has 11768
Jun 20, 2018 in Data Analytics by CodingByHeart77
• 3,680 points
32 views

1 answer to this question.

0 votes

You can try the following code:

First, you merge your 2 runs, then you fill the missing values with the last no missing. 

I am using na.locf from the zoo package for this.

xx <- read.table(text='x1; y1  ; x2 ; y2
1 ; 100 ; 1  ; 150
4 ; 90  ; 2  ; 85
7 ; 85  ; 10 ; 60
10; 80  ;',sep=';',fill=TRUE,header=TRUE)

dm <- merge(xx[,1:2],xx[,3:4],by=1,all=T)
dm <- dm[!is.na(dm$x1),]
dm$y1 <- zoo::na.locf(dm$y1)
dm$y2 <- zoo::na.locf(dm$y2)
dm
  x1  y1  y2
1  1 100 150
2  2 100  85
3  4  90  85
4  7  85  85
5 10  80  60
answered Jun 20, 2018 by DataKing99
• 8,100 points

Related Questions In Data Analytics

0 votes
1 answer

Is there any way to check for missing packages and install them in R?

There are 2 options: Either you can use ...READ MORE

answered Apr 17, 2018 in Data Analytics by nirvana
• 3,060 points
36 views
0 votes
1 answer

Is there a way to display correlation in graphical manner in R?

Ans There are multiple ways of getting this. ...READ MORE

answered Nov 26, 2018 in Data Analytics by Maverick
• 10,040 points
25 views
0 votes
1 answer

How to remove rows with missing values (NAs) in a data frame?

You can use complete.cases in the following ...READ MORE

answered Apr 13, 2018 in Data Analytics by darklord
• 6,140 points
3,635 views
0 votes
1 answer

Is there a way to make R beep/play a sound?

Yes there are few ways to do ...READ MORE

answered May 25, 2018 in Data Analytics by zombie
• 3,690 points
22 views
0 votes
1 answer

SMOTE-function not working in R

If you convert 'y' to a factor, ...READ MORE

answered Jun 26, 2018 in Data Analytics by CodingByHeart77
• 3,680 points
320 views
0 votes
1 answer

How to find out cluster center mean of DBSCAN in R?

Just index back into the original data ...READ MORE

answered Jun 27, 2018 in Data Analytics by darklord
• 6,140 points
81 views
0 votes
1 answer

Create vector matrix of movie ratings using R project?

Why do'nt you try the dcast function, in the reshape2 package. d ...READ MORE

answered Jun 29, 2018 in Data Analytics by anonymous
53 views
0 votes
1 answer

List packages are used for data mining in R?

You can refer to the following packages ...READ MORE

answered Jul 3, 2018 in Data Analytics by DataKing99
• 8,100 points
25 views
0 votes
1 answer

How to filter a data frame with dplyr and tidy evaluation in R?

Requires the use of map_df to run each model, ...READ MORE

answered May 16, 2018 in Data Analytics by DataKing99
• 8,100 points
58 views
0 votes
1 answer

How to forecast season and trend of data using STL and ARIMA in R?

You can use the forecast.stl function for the ...READ MORE

answered May 18, 2018 in Data Analytics by DataKing99
• 8,100 points
444 views