Using RpostgreSQL with sqldf crashes R

290 Views Asked by At

I am using sqldf to train some r-users in SQL. Loading RPostgreSQL or RH2 before loading sqldf will change the default SQL used by sqldf. This works fine with h2, but every time I load RPostgreSQL, R will crash on first attempted query. I want to make sqldf work with RPostgreSQL to enable use of date functions which are lacking in SQLite and h2.

# packages
library(tidyverse)
# library(RH2) # comment out this row or the next to determine default SQL for sqldf
library(RPostgreSQL) 
library(sqldf)

Output confirms that "sqldf will default to using PostgreSQL"

Create some data:

set.seed(42)
N <- 1e6

sales <- tibble(
    buyer_id = 1:N/100,
    sales_date = sample(seq(as.Date('2018/01/01'), as.Date('2021/01/01'), by="day"), N, replace = TRUE),
    sales_amount = rpois(N, 200)
) 

Crash R:

sqldf("
select 
    max(sales_date) 
from sales
")

"R Session Aborted R encountered a fatal error The session was terminated"

1

There are 1 best solutions below

0
G. Grothendieck On BEST ANSWER

If the only reason not to use H2 is date_trunc then here are some ways around that.

1) Macros Here is a workaround for truncation to the beginning of year/quarter/month/week. These functions act as macros which expand into code accepted by H2. Be sure to prefix sqldf with fn$ and surround each with back quotes to turn on substitution. Note that in each case the argument is a string. Add the verbose=TRUE argument to sqldf to see the generated code.

library(RH2)
library(sqldf)

trunc_year <- function(x) sprintf("DATEADD(day, 1-day_of_year(%s), %s)", x, x)
trunc_qtr <- function(x) sprintf("DATEADD(month, 3*(quarter(%s)-1), %s)", x, 
  trunc_year(x))
trunc_month <- function(x) sprintf("DATEADD(day, 1-day_of_month(%s), %s)", x, x)
trunc_week <- function(x) sprintf("DATEADD(day, -iso_day_of_week(%s), %s)", x, x)

# test
DF <- data.frame(x = as.Date("2021-11-15"))
fn$sqldf("select x, 
                `trunc_year('x')` year, 
                `trunc_qtr('x')` qtr,
                `trunc_month('x')` month,
                `trunc_week('x')` week
          from DF")
##            x       year        qtr      month       week
## 1 2021-11-15 2021-01-01 2021-10-01 2021-11-01 2021-11-14

2) Patch RH2 Another possibility is to patch your installation of RH2 with the newer version of H2 which has date_trunc. To do this below we remove the the h2-1.3.175.jar file in RH2, which contains H2, and replace it with the newer version h2-1.4.200.jar. This should work as long as you have write permission in the java subdirectory of your RH2 installation. Just run this code in R and your RH2 installation will be patched.

u <- "https://h2database.com/h2-2019-10-14.zip"
z <- basename(u)
tmp <- tempdir()  # create temporary dir 
old.dir <- setwd(tmp)  # go to it
download.file(u, z)  # download u
unzip(z, "h2/bin/h2-1.4.200.jar")  # extract jar from zip file

# copy new jar file to RH2 installation and remove old one
new.jar <- file.path(tmp, "h2", "bin", "h2-1.4.200.jar")
old.jar <- dir(system.file("java", package = "RH2"), "h2.*jar$", full.names = TRUE)
if (basename(old.jar) != basename(new.jar)) {
  file.copy(new.jar, dirname(old.jar))
  file.remove(old.jar)
}
setwd(old.dir)  # return to original directory

# test
library(RH2)
library(sqldf)
sqldf("select h2version()")
##   '1.4.200'
## 1   1.4.200
# see more tests in (3) below

3) Rebuild RH2 Another possibility is create a new source version of RH2 with the newer version of H2 that has date_trunc. The first argument is 'year', 'quarter', 'month' or 'week' and the second argument is a date. Thus, if you are willing to rebuild RH2 with this new H2 version then it is available.

(a) Download the source of RH2 from https://cran.r-project.org/package=RH2 and detar it to create a directory tree RH2.

(b) Download this h2 zip https://h2database.com/h2-2019-10-14.zip and extract h2/bin/h2-1.4.200.jar from that zip file and place the jar file in the RH2/inst/java directory of the RH2 source removing the old one, h2-1.3.175.jar

(c) rebuild and test RH2 like this:

# rebuild RH2
# cd to the RH2 directory containing its DESCRIPTION file
setwd("...whatever.../RH2")

# build RH2 
#  on Windows this needs Rtools40 (not an R package)
#  https://cran.r-project.org/bin/windows/Rtools/
library(devtools)
build()
install(args = "--no-multiarch")

# test
library(RH2)
library(sqldf)
sqldf("select h2version()") # check it's the latest version
##   '1.4.200'
## 1   1.4.200

DF <- data.frame(x = as.Date("2000-11-15"))
sqldf("select date_trunc('month', x) from DF")
##   DATE_TRUNC('month', x)
## 1             2000-11-01