I am using keras in R to fit neural networks to multivariate time series data, generate predictions on test data (taken as a subset of the original data), and estimate RMSE by comparing predictions to the real data. This works fine for DNNs, GRUs, and LSTMs. I am currently trying to create a CNN-GRU (or CNN-LSTM) with a 1d convolutional layer after reading some posts (e.g. https://www.kaggle.com/code/davidchilders/time-series-prediction-in-r-keras). After some tinkering I can get this to train just fine. However, when I make a prediction I get an unexpected result. The output vector of predictions is only a fraction of the length of what it should be. For example, if I withhold 1000 time steps for test predictions, the output of predict() on the CNN-GRU will have a length of around 300. This seems to happen in any case that I use layer*_*conv_1d(), so it's not about the combination of using it with other types of layers. What is going on?
Here is some example code which reproduces this:
library(tidyverse)
library(keras)
library(reticulate)
# Function to generate sample data
generate_data = function(n_samples) {
set.seed(123)
time = seq(1, n_samples)
covariate1 = rnorm(n_samples, mean = 0, sd = 1)
covariate2 = rnorm(n_samples, mean = 5, sd = 2)
covariate3 = rnorm(n_samples, mean = -3, sd = 3)
target = sin(seq(1, n_samples) * 0.1) + rnorm(n_samples, mean = 0, sd = 0.2)
data = tibble(Time = time, Covariate1 = covariate1, Covariate2 = covariate2, Covariate3 = covariate3, Target = target)
return(data)
}
# Generate sample data
nsamp = 5000
sample_data = as.matrix(generate_data(nsamp))
I'm using standard generator functions (for example from the earlier post on Kaggle that I reference). I included them here for completeness, sorry about the space. Skip down for the model definition, etc.
generator <- function(data, lookback, delay, min_index, max_index,
shuffle = FALSE, batch_size, step,
predseries) {
if (is.null(max_index)) max_index <- nrow(data) - delay - 1
i <- min_index + lookback
function() {
if (shuffle) {
rows <- sample(c((min_index+lookback):max_index), size = batch_size)
} else {
if (i + batch_size >= max_index)
i <<- min_index + lookback
rows <- c(i:min(i+batch_size, max_index))
i <<- i + length(rows)
}
samples <- array(0, dim = c(length(rows),
lookback / step,
dim(data)[[-1]]))
targets <- array(0, dim = c(length(rows)))
for (j in 1:length(rows)) {
indices <- seq(rows[[j]] - lookback, rows[[j]],
length.out = dim(samples)[[2]])
samples[j,,] <- data[indices,]
targets[[j]] <- data[rows[[j]] + delay,predseries]
}
list(samples, targets)
}
}
#Parameters for generator functions:
#How long of a series to use at a time
lookback = 10
#Use every time point
step = 1
#Number of time steps into the future to predict
delay = 1
#Samples
batch_size = 20
predser = 1 #Index of label
#Set variables for training, validation, and testing data sets
#Range of training, validation, and test sets:
min_train = 1
max_train = floor(nsamp*2/3)
min_val = max_train+1
max_val = min_val + floor(0.5*(nsamp-max_train))
min_test = max_val+1
max_test = NULL
#Validation and test steps
val_steps = floor( (max_val - min_val - lookback) / batch_size )
test_steps = floor( (nrow(sample_data) - max_val - lookback) / batch_size)
#Training set
train_gen = generator(
sample_data,
lookback = lookback,
delay = delay,
min_index = min_train,
max_index = max_train,
#shuffle = TRUE,
step = step,
batch_size = batch_size,
predseries = predser
)
#Validation set
val_gen = generator(
sample_data,
lookback = lookback,
delay = delay,
min_index = min_val,
max_index = max_val,
step = step,
batch_size = batch_size,
predseries = predser
)
#Test set looks at remaining
test_gen = generator(
sample_data,
lookback = lookback,
delay = delay,
min_index = min_test,
max_index = NULL,
step = step,
batch_size = batch_size,
predseries = predser
)
Here is a simple version of the model with a1D Convolutional layer and a Dense layer. I assume that this must be where I'm missing something, possibly another layer or transformation of some kind?
build_and_compile_model = function() {
model = keras_model_sequential() %>%
layer_conv_1d(
filters=64,
kernel_size=2,
activation="relu",
input_shape = list(NULL, dim(sample_data)[[-1]])
) %>%
layer_max_pooling_1d(pool_size=3) %>%
layer_dense(64, activation = 'relu') %>%
layer_dense(units = 1)
model %>% compile(
loss = 'mean_absolute_error',
optimizer = optimizer_adam()
)
model
}
#Build the model
model1 = build_and_compile_model()
#Fit the model to training data
model1 %>% fit(
train_gen,
steps_per_epoch = test_steps,
epochs = 20,
validation_data = val_gen,
validation_steps = val_steps
)
Final step for prediction. In this example, the length of the test data is 833 time points. The variable returned by predict(), "test_pred," has only 277 items.
#Generate predictions from test data
test_tmp = sample_data[min_test:nsamp, ]
test_data = array(test_tmp ,
dim = c(1, dim(test_tmp)[1],dim(test_tmp)[2] ) )
Thanks so much everyone!