I have a dataset where there are some missing response variables in the training data. How can I set up a GPFlow training regime to ignore these values?
Some initial starting point: (see https://gpflow.github.io/GPflow/2.8.0/notebooks/getting_started/large_data.html and https://gpflow.github.io/GPflow/2.8.0/notebooks/advanced/multioutput.html):
X = np.array(
[
[0.70, 0.70], [0.53, 0.81], [0.78, 0.36], [0.83, 0.09], [0.71, 0.55],
[0.66, 0.75], [0.87, 0.50], [0.63, 0.65], [0.37, 0.90], [0.82, 0.11],
[0.58, 0.61], [0.93, 0.21], [0.98, 0.18], [0.85, 0.27], [0.64, 0.77],
[0.49, 0.73], [0.13, 0.82], [0.93, 0.08], [0.65, 0.71], [0.54, 0.83],
[0.85, 0.20], [0.90, 0.07], [0.00, 0.84], [0.64, 0.81], [0.62, 0.70],
]
)
Y = np.array(
[
[0.83, 0.42], [0.82, 0.41], [0.60, np.nan], [0.31, np.nan], [0.73, 0.37],
[0.85, 0.42], [0.70, np.nan], [0.77, np.nan], [0.77, 0.38], [0.34, 0.17],
[0.73, np.nan], [0.45, 0.22], [0.39, 0.19], [0.51, np.nan], [0.85, 0.42],
[0.76, 0.38], [0.46, 0.23], [0.27, 0.13], [0.82, 0.41], [0.85, np.nan],
[0.44, 0.22], [0.26, 0.13], [0.28, 0.14], [0.87, 0.43], [0.81, np.nan],
]
)
rng = np.random.default_rng(1234)
n_inducing = 4
inducing_variable = rng.choice(X, size=n_inducing, replace=False)
kern_list = [gpf.kernels.SquaredExponential(), gpf.kernels.SquaredExponential()]
kernel = gpf.kernels.SeparateIndependent(kern_list)
iv = gpf.inducing_variables.SharedIndependentInducingVariables(
gpf.inducing_variables.InducingPoints(inducing_variable )
)
m = gpf.models.SVGP(
kernel, gpf.likelihoods.Gaussian(), inducing_variable=iv, num_latent_gps=2
)
optimizer = gpf.optimizers.Scipy()
optimizer.minimize(
model.training_loss_closure(X,Y),
variables=model.trainable_variables,
method="l-bfgs-b",
options={"disp": 50, "maxiter": 50000, "maxfun": 1000000},
)