I feel like this method has changed up multiple times over the years and I haven't been able to keep up with the change.
Quite simply, how can I upload a regular data frame/table from R to BigQuery:
Here was my original approach and error
job = insert_upload_job(
project = "projectid",
dataset = "dataset",
table = "tablename",
values = table_wanting_to_upload,
write_disposition='WRITE_TRUNCATE',
create_disposition = "CREATE_IF_NEEDED") # pushing table to BQ
And this is my error
Error: billing is not a string (a length one character vector).
In addition: Warning message:
'insert_upload_job' is deprecated.
Use 'bq_perform_upload' instead.
See help("Deprecated") and help("bigrquery-deprecated").
Naturally, I use R's suggestion and use bq_perform_upload instead
job = bq_perform_upload(
project = "projectid",
dataset = "dataset",
table = "table_name",
values = table_wanting_to_upload,
write_disposition='WRITE_TRUNCATE',
create_disposition = "CREATE_IF_NEEDED") # pushing table to BQ
And I get the following error:
**Error in as_bq_table(x) : argument "x" is missing, with no default**
I don't understand what "x" means and the documentation around the bq_perform_upload function is lacking. I then tried to do the following to satisfy the x error
job = bq_perform_upload(x = predictions,
project = "projectid",
dataset = "dataset",
table = "table_name",
write_disposition='WRITE_TRUNCATE',
create_disposition = "CREATE_IF_NEEDED") # pushing table to BQ
And I then get the following error which is unclear to what the issue is
Error in UseMethod("as_bq_table") :
no applicable method for 'as_bq_table' applied to an object of class "c('grouped_df', 'tbl_df', 'tbl', 'data. Frame')"
The method used to be much simpler, for example & simply:
job <- bq_perform_upload("project","dataset","table_name", dataframe)
Answer
bqr_upload_data(projectId = 'project',
datasetId = 'dataset', tableId = 'table_name', upload_data = predictions ,
create = c("CREATE_IF_NEEDED"),
schema = NULL, sourceFormat = c("CSV", "DATASTORE_BACKUP",
"NEWLINE_DELIMITED_JSON", "AVRO"), wait = TRUE, autodetect = TRUE,
nullMarker = NULL, maxBadRecords = NULL, allowJaggedRows = FALSE,
allowQuotedNewlines = FALSE, fieldDelimiter = ",")
You can do this without using the
bigQueryR(which I read is no longer maintained) and just use thebigrquerypackage.So, in relation to your error,
xis something that can be coerced to abq_tableobject, which it seems is a reference to a table in BigQuery, as opposed to an object that is a table in itself. The simplest way to create an object that is coercible to abq_tableobject is just a string with the reference to the table,"project.dataset.table_name", the same as in aFROMstatement in SQL. Conversely, when you were pointing it to the dataframe you want to upload, it doesn't know how to coerce that into a table reference (bq_table) for BigQuery.So, I'm assuming you already have a pre-existing project and dataset in BiqQuery, given you're specifying those.
From there it's as simple as this:
If the table doesn't yet exist in BigQuery it will try to create it for you, but I found that it doesn't do this in a sensible way (replicating the table schema from the data frame) and instead seems to base the column data types on a sample of the data within the dataframe (e.g. even for a CHARACTER variable in a dataframe, if the sample it scans appears to be and INT64, it will create it as an INT64, then fail when it hits an alpha character).
So you can first create a table to replicate the data frame schema if it doesn't already exist.
as_bq_fields()is a handy way of coercing the structure of the dataframe into abq_fieldsobject to define the columns.And, the other piece I didn't know yet (having never used BigQuery before) was that you need to have a project, and then a dataset (analogous to schema or database in other DBMS), so I first had to create my dataset.
Now, why it took me (and I assume alse you) an hour or two to figure this out, and seems to have stumped so many people, I don't understand. It seems something that should be clearly documented and easy to understand. But hey...
Worth noting is the
bigrqueryfunctions are namedbq_noun_action, which helped me to figure things out (bq_perform_uploadseems to break from this convention).