I have the following pandas dataframe:
import pandas as pd
df = pd.DataFrame({"id": [1,2,3], "items": [('a', 'b'), ('a', 'b', 'c'), tuple('d')]}
>print(df)
id items
0 1 (a, b)
1 2 (a, b, c)
2 3 (d,)
After registering my GCP/BQ credentials in the normal way...
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path_to_my_creds.json"
... I try to export it to a BQ table:
import pandas_gbq
pandas_gbq.to_gbq(df, "my_table_name", if_exists="replace")
but I keep getting the following error:
Traceback (most recent call last):
File "<string>", line 4, in <module>
File "/Users/max.epstein/opt/anaconda3/envs/rec2env/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 1205, in to_gbq
...
File "/Users/max.epstein/opt/anaconda3/envs/rec2env/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 342, in bq_to_arrow_array
return pyarrow.Array.from_pandas(series, type=arrow_type)
File "pyarrow/array.pxi", line 915, in pyarrow.lib.Array.from_pandas
File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 122, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'tuple' object
I have tried converting the tuple column to string with df = df.astype({"items":str})
and adding a table_schema
param to the pandas_gbq.to_gbq...
line but I keep getting this same error.
I have also tried replacing the pandas_gbq.to_gbq...
line with the bq_client.load_table_from_dataframe
method described here but still get the same pyarrow.lib.ArrowTypeError: Expected bytes, got a 'tuple' object
error...
So I think this is a weird issue with pandas dtypes being separate from Python types, and the astype only converting the type and not the pandas dtype. Try also converting the dtype to match the type after the
astype
statement.Such that.
Is replaced with:
Let me know if this works.