Bigquery export data output is not in proper order

48 Views Asked by At

I am trying to export a table from Bigquery to single file in GCS. But my output is not in sorted order.

export data options(uri = 'gs://<bucket><object>/file1_*.csv', format = 'CSV',overwrite = true, header = true, field_delimiter = '|') as select distinct * from (select * from table1 where country not in (select distinct(country) from table2) order by col_rnk)

distinct * has been used to get the output into single file in GCS.

Any help would be great !

1

There are 1 best solutions below

4
Rainy sidewalks On

i think you should move ORDER BY col_rnk out side .

with that please try the below

EXPORT DATA OPTIONS(
  uri = 'gs://<bucket>/<object>/file1_*.csv',
  format = 'CSV',
  overwrite = true,
  header = true,
  field_delimiter = '|'
) AS
SELECT DISTINCT *
FROM (
  SELECT *
  FROM table1
  WHERE country NOT IN (
    SELECT DISTINCT(country)
    FROM table2
  )
)
ORDER BY col_rnk

NOT THE PART OF ANSWER TO THE ABOVE QUESTION BUT TO DRAWN ATTANTATION TO THE FOLLOWING POINTS.

please take a look at query-syntax#select_distinct

SELECT DISTINCT

A SELECT DISTINCT statement discards duplicate rows and returns only the remaining rows. SELECT DISTINCT cannot return columns of the following types:

1.STRUCT

2.ARRAY

also can you verify the logic and try without the uses of SELECT DISTINCT *

please consider the following approach

EXPORT DATA OPTIONS(
  uri = 'gs://<bucket>/<object>/file1_*.csv',
  format = 'CSV',
  overwrite = true,
  header = true,
  field_delimiter = '|'
) AS
SELECT col1, col2, col_rnk
FROM (
  SELECT col1, col2, col_rnk
  FROM table1
  WHERE country NOT IN (
    SELECT DISTINCT(country)
    FROM table2
  )
)
ORDER BY col_rnk