I'm encountering a NUL (ASCII 0) data error while attempting to read data from a tab-separated (TSV) GZIP file stored in Google Cloud Storage (GCS) and load it into BigQuery using the GCSToBigQueryOperator in Apache Airflow. It seems that the presence of NUL characters in the file is causing issues during the load process. How can I address this error and successfully load the data into BigQuery?
code:
task= GCSToBigQueryOperator(
task_id='task',
bucket=bucket_name,
source_objects=[
'places/dt=2024-01-01/*'
],
destination_project_dataset_table=f'dataset.tablename',
source_format="csv",
write_disposition='WRITE_TRUNCATE',
autodetect=True,
quote_character="",
field_delimiter="\t",
encoding="UTF-8",
allow_jagged_rows=True,
ignore_unknown_values=True,
allow_quoted_newlines=True,
skip_leading_rows=1, # If your TSV has a header row
dag=dag
)
Error:
Error while reading data, error message: Bad character (ASCII 0) encountered.;
line_number: 611503 byte_offset_to_start_of_line: 181596443 column_index: 0
column_name: "fsq_id" column_type: STRING value: "Atakum\000" File:
gs://bucket_name/places/dt=2024-03-24/places_tr.tsv.gz
