How to filter on a date with dbplyr and ROracle?

41 Views Asked by At

Setup

I am using tidyverse and ROracle.

I have established a connection conn:

> conn
## User name:              
##   Connect string:        IPIAMPR2.WORLD 
## Server version:        19.0.0.0.0 
## Server type:           Oracle RDBMS 
## Results processed:     60 
## OCI prefetch:          FALSE 
## Bulk read:             1000 
## Bulk write:            1000 
## Statement cache size:  0 
## Open results:          4 

I have also created a table on the oracle server side from a dataframe:

df <- tibble(x = 1:3, date = as.Date("2023-10-29"))

dbWriteTable(conn, "TEST_DATE", df)

Then, I am using a lazy connection to access the table on the oracle server side:

DF <- tbl(conn, "TEST_DATE")

DF
# # Source:   table<TEST_DATE> [?? x 2]
# # Database: OraConnection
# x date               
# <dbl> <dttm>             
#    1  1 2023-10-29 00:00:00
#    2  2 2023-10-29 00:00:00
#    3  3 2023-10-29 00:00:00

Question

How to filter the oracle table on a date?

Issue

(
  DF
  |> filter(date == as.Date("2023-10-29"))
)
## Error in .oci.SendQuery(conn, statement, data = data, prefetch = prefetch,  : 
## ORA-00936: missing expression

I am not getting why this error happens...

A workaround that I suspect to be slow on large tables

(
  DF
  |> filter(as.character(date) == "29/10/23 00:00:00,000000"))
)

I am afraid that the call to as.character would be extremely slow on a table with billions of lines. More precisely, I am working with very large tables indexed on the date column, and any operation on this column will be too long.

/!\ I am working on french systems, I guess that is why the date become "29/10/23" with as.character()

2

There are 2 best solutions below

0
Simon.S.A. On BEST ANSWER

Cause:

Based on the translation you provided:

SELECT *
FROM ("TEST_DATE")
WHERE ("date" = DATE('2023-10-29'))

My best guess (I am not an Oracle user) is that there is a fault in the dbplyr translation. For Oracle, the TO_DATE function appears to be the conversion function (reference). Hence dbplyr is using the wrong function - this should be a bug report on dbplyr.

Regarding the error message:

The DATE function in Oracle takes four arguments: Date (Format, Day, Month, Year) (reference). So I would guess that the error message

ORA-00936: missing expression

is because only a single argument is provided in the translation.

There is also a chance that the error is due to confusion by the database as to whether date is a column name or a function. Oracle does not accept 'table' as a table name, perhaps a similar dynamic is at work here?


So what to do about this?

There are two options that seem plausible:

(1) rely on implicit conversion

If you provide the date as a string, the database will likely convert this into a date to do the comparison (reference).

DF %>%
  filter(date == "2023-10-29")

I use this approach in SQL Server consistently. This will require you to provide the date in the default format of your database (likely YYYY-MM-DD or DD-MON-YYYY).

(2) use non-translation

Similar to @pietrodito's answer, if dbplyr does not have a translation defined, then it will pass the command untranslated. Hence you can do the following:

DF %>%
  filter(date == TO_DATE("2023-10-29", "YYYY-MM-DD"))

Other info

I tested both lubridate::as_date and base::as.Date. Both gave me translation errors. This may have been fixed in the latest version of dbplyr, otherwise it is more evidence of a bug.

# setup
library(dbplyr)
library(dplyr)

df = data.frame(
  id = 1:3,
  my_date = as.Date("2023-01-01", "2023-02-01", "2024-07-19")
)

remote_df = tbl_lazy(df, con = simulate_oracle())

Testing base::as.Date:

# this works
as.Date("2023-10-29", format = "%Y-%m-%d")
# this errors
remote_df %>%
  filter(my_date == as.Date("2023-10-29", format = "%Y-%m-%d")) %>%
  show_query()

Testing lubraidate::as_date:

library(lubridate)
# this works
as_date("20231029", format = "%Y%m%d")
# this errors
remote_df %>%
  filter(my_date == as_date("2023-10-29", format = "%Y-%m-%d")) %>%
  show_query()

I also tested converting to date format prior to the dbplyr query:

converted_date = as.Date("2023-10-29", format = "%Y-%m-%d")

remote_df %>%
  filter(my_date == converted_date) %>%
  show_query()

However, during translation this gets converted back to text:

SELECT *
FROM (`df`) 
WHERE (`my_date` = '2023-10-29')

which is part of why I suggest relying on implicit conversion.

0
pietrodito On

Maybe one of the best way is to inject SQL in the query like that:

(
  ER_PRS_F
  |> filter(FLX_DIS_DTD == sql("TO_DATE ('2013-02-01', 'YYYY-MM-DD')"))
)