Big Query shuffle size limit

161 Views Asked by At

I have the following error on BigQuery while trying to run a matrix factorization

Per-customer shuffle size limit exceeded. Please wait and retry, or reduce the size of your model, change training data query less shuffle dependent.

create or replace model `my_model`
    options(
        model_type="matrix_factorization",
        feedback_type="implicit",
        user_col="user_id",
        item_col="professional_id",
        rating_col="contact_force_scaled",
        l2_reg=30,
        num_factors=200,
        max_iterations=5,
        min_rel_progress=0.01,
        data_split_method="no_split"
    ) as (
        select * from `my_dataset_scaled`
    );

I'm not sure I understand what it means and how I can fix it. Is my dataset is too large for matrix factorization ? My dataset has 45 412 383 rows and it is a simple user/item/rating matrix (with mostly empty values)

THe only known limitation I was able to see on BigQuery is 100 million for a single user ratings

0

There are 0 best solutions below