I have the following error on BigQuery while trying to run a matrix factorization
Per-customer shuffle size limit exceeded. Please wait and retry, or reduce the size of your model, change training data query less shuffle dependent.
create or replace model `my_model`
options(
model_type="matrix_factorization",
feedback_type="implicit",
user_col="user_id",
item_col="professional_id",
rating_col="contact_force_scaled",
l2_reg=30,
num_factors=200,
max_iterations=5,
min_rel_progress=0.01,
data_split_method="no_split"
) as (
select * from `my_dataset_scaled`
);
I'm not sure I understand what it means and how I can fix it. Is my dataset is too large for matrix factorization ? My dataset has 45 412 383 rows and it is a simple user/item/rating matrix (with mostly empty values)
THe only known limitation I was able to see on BigQuery is 100 million for a single user ratings