GCP Dataflow Changing Date Format Causing Errors/Retries

66 Views Asked by At

Replicating a mysql db using cloud_datastream_to_bigquery template in dataflow. So mysql->datastream->gcs->dataflow->bigquery

I'm seeing this in my DLQ/retry "error_message":{"errors":[{"debugInfo":"","location":"estimated_delivery_date","message":"Invalid date: '2023-07-26Z'","reason":"invalid"}],"index":0}}

When I look at what datastream is putting into the GCS bucket, "ESTIMATED_DELIVERY_DATE":"2023-07-26T00:00:00.000Z". This stream has been running for months until today when a handful of different date fields starting causing a ton of retries and worker spikes. I would love any input, thanks!

I've been using a UDF to fix each date individually, but there has to be a root cause of this. Thanks again for looking. I expected the error to be on an individual field. I keep finding more. The owners of the source mysql db say that nothing has changed.

0

There are 0 best solutions below