The issue of insufficient memory when creating a Gaussian process model due to the excessively large size of the dataset

99 Views Asked by At

I was informed that there is not enough memory when creating the following Gaussian process model, and I would like to know if there is a feature in GPflow that allows loading data in batches instead of reading all the data at once.

Try this code

data = (X, Y) # size approximate to 1e6
gpflow.models.VGP(
    data,
    kernel=gpflow.kernels.SquaredExponential(),
    likelihood=gpflow.likelihoods.Bernoulli(),
),

encounter OOM

1

There are 1 best solutions below

0
STJ On

If you use the SVGP model instead of VGP, you can train the model on data loaded in batches ("mini-batch"). This is demonstrated in this notebook: https://gpflow.github.io/GPflow/2.7.1/notebooks/advanced/gps_for_big_data.html

If you're just past the edge of "not enough memory" there might be some other ways of computing part by part (though I can't give any advice on how you would do that), but with a VGP model for N data points in the end you will still need to allocate O(N^2) memory.