Use dplyr to query a temporal table at a point in time

32 Views Asked by At

In Microsoft SQL Server you can define system times for tables and query them like this:

select top 10 * from table FOR SYSTEM_TIME AS OF '2024-01-01'

I would like to use R and dplyr to query the same table. Assuming I have a connection con, how would I do this?

I have tried:

tbl(con, "table FOR SYSTEM_TIME AS OF '2024-01-01'")

and

tbl(con, "table") |>
   filter(sql("FOR SYSTEM_TIME AS OF '2024-01-01'"))

Anyone know if it is possible?

1

There are 1 best solutions below

0
Simon.S.A. On BEST ANSWER

The documentation on temporal tables states:

If the PERIOD columns aren't hidden, their values appear in a SELECT * query. If you specified PERIOD columns as HIDDEN, their values don't appear in a SELECT * query. When the PERIOD columns are hidden, you must reference the PERIOD columns specifically in the SELECT clause to return the values for these columns.

It is possible that R may still observe the temporal columns even if they are hidden. I'd recommend something like the following to check.

remote_table = tbl(con, "table")
colnames(remote_table)

If the temporal columns are visible to R, then you should be able to include them in standard filter clauses.


Otherwise, I would recommend making use of the fact that a dbplyr object is essentially two pieces: a database connection and a query. Hence, the following is plausible:

remote_table = tbl(con, sql("SELECT * FROM table FOR SYSTEM_TIME AS OF '2024-01-01'"))

This is similar to code in your question, the key difference being that this needs to be a complete query.


If this works, my inclination would be to make a custom function, something like:

system_time_as_of = function(table, date){

  db_connection = table$src$con
  
  sql_query = dbplyr::build_sql(
    con = db_connection,
    dbplyr::sql_render(table), " FOR SYSTEM_TIME AS OF ", date
  )
  return(dplyr::tbl(db_connection, dbplyr::sql(sql_query)))
}

# intended use
remote_table = tbl(con, "table")
remote_table = system_time_as_of(remote_table, '2024-01-01')