BigQuery could not parse "\N" as INT or DATETIME using nullMarker: "\N" in table definition for external federated table

1.9k Views Asked by At

I am using \N for null either for INTEGER or DATETIME data types in the JSON table definition(nullMarker:"\N") for external tables but then also BigQuery is not able to parse it and throwing the below error:

error message: Could not parse '\N' as int for field [Field Name](position 2) 
starting at location 0

Could anyone please help me, what can be the possible fix?

1

There are 1 best solutions below

0
Roman Czerwinski On

I am using the BigQuery Cloud SDK CLI to attempt to do the same. I am utilizing the flag --null_marker="\N" and --source_format NEWLINE_DELIMITED_JSON to load data. Here is the error message I get:

BigQuery error in load operation: Only CSV imports may specify a null marker.

So it appears that null markers are not supported for BigQuery imports in JSON format. Pretty disappointing for me.

Here is the exact code and sample I used to test and verify this:

this is a file called example.json where I want the \N character to represent the NULL marker.

    {"col1": "1", "col2": "A Fat Man", "col3": "3.5", "col4": "SOme moRE TExt"}
    {"col1": "2", "col2": "\N", "col3": "4.2", "col4": "texttexttext"}
    {"col1": "3", "col2": "\N", "col3": "\N", "col4": "JUSTTEXT"}
    {"col1": "4", "col2": "uno", "col3": "2.0", "col4": ""}

this is the command to make the table (my dataset is temp14):

    bq mk temp14.null_json_test col1:INTEGER,col2:STRING,col3:FLOAT,col4:STRING

and here is the command to load the data which did not work :(

    bq load --source_format NEWLINE_DELIMITED_JSON --null_marker="\N" temp14.null_json_test ./example.json col1:INTEGER,col2:STRING,col3:FLOAT,col4:STRING

I can replace the \N in the JSON with the empty string like

    {"col1": "1", "col2": "A Fat Man", "col3": "3.5", "col4": "SOme moRE TExt"}
    {"col1": "2", "col2": "", "col3": "4.2", "col4": "texttexttext"}
    {"col1": "3", "col2": "", "col3": "", "col4": "JUSTTEXT"}
    {"col1": "4", "col2": "uno", "col3": "2.0", "col4": ""}

and it will load if I remove the flag:

    bq load --source_format NEWLINE_DELIMITED_JSON temp14.null_json_test ./example.json col1:INTEGER,col2:STRING,col3:FLOAT,col4:STRING

But the strings load into the table as "" instead of null and I need them to be NULL. The FLOAT field correctly assigns NULL, but the STRING does not.

My work around is to pre-process the empty strings out of my dataset in a staging area which is kind of like a scratch space, then have my final table represent that data with something like this:

    SELECT 
      col1
      , CASE WHEN col2 = '' THEN NULL ELSE col2 END as col2
      , col3
      , CASE WHEN col4 = '' THEN NULL ELSE col4 END as col4
    FROM
      temp14.null_json_test

I cannot use CSV for my data sadly and need JSON to work but also need NULL values to be represented correctly.

EDIT: After coming back to this I did discover that JSON formatted data will allow for NULLs, but not with a null marker. You simply need to omit the key/value pair for JSON data to load to BigQuery as NULL. So the example.json should look like this to nullify those above fields where the \N used to be.

    {"col1": "1", "col2": "A Fat Man", "col3": "3.5", "col4": "SOme moRE TExt"}
    {"col1": "2", "col3": "4.2", "col4": "texttexttext"}
    {"col1": "3", "col4": "JUSTTEXT"}
    {"col1": "4", "col2": "uno", "col3": "2.0", "col4": ""}

And I verified that the above does work with no additional flags required, as well as with external tables.