Is there any way I can use imputeTS for time series prediction with multiple regression variables? I am having blanks in y, a minute level data with NAs, while all my X(x1,x2,.. xn) are continuous variable ae without NAs
DateTime Processed Avg 1_Q Median 3_Q
04/01/20 3:22 3 1.8 1 2 2.5
04/01/20 3:23 3 1.6 1 1 2
04/01/20 3:24 1 1.5 1 1 2
04/01/20 3:25 1 1.2 1 1 1
04/01/20 3:28 1 1.1 1 1 1
04/01/20 3:29 1 1.7 1 1.5 2.8
04/01/20 3:32 1 1.6 1 1 2
04/01/20 3:33 2 1.4 1 1 2
04/01/20 3:35 1 1.4 1 1 1.8
04/01/20 3:38 1.4 1 1 2
04/01/20 3:39 2 1.4 1 1 2
04/01/20 3:41 1.2 1 1 2
04/01/20 3:42 1.2 1 1 1.8
04/01/20 3:44 1 1.3 1 1 2
04/01/20 3:45 1 1.2 1 1 1
04/01/20 3:46 1 1.6 1 2 2
04/01/20 3:47 1 1.8 1 2 2
04/01/20 3:48 1.2 - 1 2
04/01/20 3:52 1.3 1 1 1.3
04/01/20 3:53 2 1.9 1 2 2
04/01/20 3:54 1 0.9 1 1 1
04/01/20 3:56 1 1.3 1 1 1
04/01/20 3:57 2 1.1 1 1 1
a complete data set can be find here
imputeTS is really good for time series imputation (where you employ correlations of one variable in time)
In your case there is a lot of useful information in the other variables (inter-variable correlations). imputeTS performs univariate time series imputation, thus it only looks at each variable and it's correlation in time separately.
Since your variables
Avg,1_Q,Median,3_Qseem to be highly correlated toProcessed(where your missing data are) probably another package is a better choice. missForest, imputeR and other packages that employ inter-variable correlations (but not inter-time correlations) would be a better choice.Might be, that you get even better results, if you come up with your own imputation routine for the missing data. The missing data always seems to be in
ProcessedandAvg,Median,3_Qseem to be statistics aboutProcessed. Maybe e.g. using always theAvgrounded to the nearest number as replacement forProcessedis already quite good.