I was informed that there is not enough memory when creating the following Gaussian process model, and I would like to know if there is a feature in GPflow that allows loading data in batches instead of reading all the data at once.
Try this code
data = (X, Y) # size approximate to 1e6
gpflow.models.VGP(
data,
kernel=gpflow.kernels.SquaredExponential(),
likelihood=gpflow.likelihoods.Bernoulli(),
),
encounter OOM
If you use the SVGP model instead of VGP, you can train the model on data loaded in batches ("mini-batch"). This is demonstrated in this notebook: https://gpflow.github.io/GPflow/2.7.1/notebooks/advanced/gps_for_big_data.html
If you're just past the edge of "not enough memory" there might be some other ways of computing part by part (though I can't give any advice on how you would do that), but with a VGP model for N data points in the end you will still need to allocate O(N^2) memory.