How to Handle NUL (ASCII 0) Data Error When Loading TSV GZIP File from Google Cloud Storage into BigQuery

15 Views Asked by lohith devapatla At 26 March 2024 at 14:21

I'm encountering a NUL (ASCII 0) data error while attempting to read data from a tab-separated (TSV) GZIP file stored in Google Cloud Storage (GCS) and load it into BigQuery using the GCSToBigQueryOperator in Apache Airflow. It seems that the presence of NUL characters in the file is causing issues during the load process. How can I address this error and successfully load the data into BigQuery?

code:

        task= GCSToBigQueryOperator(
        task_id='task',
        bucket=bucket_name,
        source_objects=[
            'places/dt=2024-01-01/*'
        ],
        destination_project_dataset_table=f'dataset.tablename',
        source_format="csv",
        write_disposition='WRITE_TRUNCATE',
        autodetect=True,
        quote_character="",
        field_delimiter="\t",
        encoding="UTF-8",
        allow_jagged_rows=True,
        ignore_unknown_values=True,
        allow_quoted_newlines=True,
        skip_leading_rows=1,  # If your TSV has a header row
        dag=dag
    )

Error:

 Error while reading data, error message: Bad character (ASCII 0) encountered.; 
 line_number: 611503 byte_offset_to_start_of_line: 181596443 column_index: 0 
 column_name: "fsq_id" column_type: STRING value: "Atakum\000" File: 
 gs://bucket_name/places/dt=2024-03-24/places_tr.tsv.gz

Original Q&A

How to Handle NUL (ASCII 0) Data Error When Loading TSV GZIP File from Google Cloud Storage into BigQuery

There are 0 best solutions below

Related Questions in GOOGLE-BIGQUERY

Related Questions in GOOGLE-CLOUD-STORAGE

Related Questions in AIRFLOW

Related Questions in AIRFLOW-2.X

Trending Questions

Popular # Hahtags

Popular Questions