How to run multiple custom jobs at the same time in Vertex AI?

175 Views Asked by At

We run custom training jobs in Vertex AI. They are scheduled to run once a week using Airflow. These jobs are provisioned at the same time to Vertex AI but are running sequentially (one at a time). Each job takes around 10 minutes to run while the other 20+ jobs are pending.

We provision the custom jobs at the same time, we were at least expecting them to run by batches (5 at a time for example). But they're getting started sequentially. This is the Vertex AI config that we are using:

{
                    "displayName": display_name,
                    "trainingTaskDefinition": PREDICTION_JOB_SCHEMA_URI,
                    "trainingTaskInputs": {
                        "serviceAccount": VERTEX_SERVICE_ACCOUNT,
                        "workerPoolSpecs": [
                            {
                                "machineSpec": {
                                    "machineType": "n2-standard-16",
                                },
                                "replicaCount": 1,
                                "pythonPackageSpec": {
                                    "executorImageUri": PREDICTION_EXECUTOR_IMAGE_URI,
                                    "packageUris": task_params["package_uris"],
                                    "pythonModule": task_params["python_module"],
                                    "args": task_params["args"],
                                    "env": task_params["envs"],
                                },
                            }
                        ],
                    },
                }
0

There are 0 best solutions below