I'm running R 4.3.1 on Windows 10 using RGui. If I connect to my database:
library(RODBC)
connstr = sprintf('driver={ODBC Driver 17 for SQL Server};server=%s;database=%s;uid=%s;pwd=%s',
/* my parameters */)
dbhandle <- odbcDriverConnect(connstr)
I can get back a single column:
tbl1 <- sqlQuery(dbhandle, sprintf('select runID from myTable WHERE runID = 34'))
but if I try to get both columns:
tbl2 <- sqlQuery(dbhandle, sprintf('select runID, fit from myTable WHERE runID = 34'))
RGui just crashes. It doesn't display a message; its windows just close and that's the end of that.
R.EXE at the command line does pretty much the same: it ends and returns me to the command prompt with no error message.
Why does such a trivial exercise crash R? How can I use R without it crashing?
Also on Ubuntu
Using R 3.6.3 on Ubuntu 20.04, I have the same problem but at least I get a segfault message:
> tbl2 <- sqlQuery(dbhandle, sprintf('select fit from myTable WHERE runID = 34'))
gives this:
*** caught segfault ***
address 0x560b5bbef000, cause 'memory not mapped'
Traceback:
1: odbcFetchRows(channel, max = max, buffsize = buffsize, nullstring = nullstring, believeNRows = believeNRows)
2: sqlGetResults(channel, errors = errors, ...)
3: sqlQuery(dbhandle, sprintf("select fit from myTable WHERE runID = 34"))
How can I successfully query data from ODBC using R?
After working on this and the related issue, I think it's pretty clear there's a bug in RODBC surrounding the handling of binary data types on SQL Server.
If I start with a connection:
and in that target database, I create a little table:
I should be able to select the data out. Here, the data is just five bytes so the chances that the memory corruption issue are encountered are low, but not zero. With my connection in R:
the data is returned. But it's incorrect:
It's all zeroes, when
01 02 03 04 05should be returned.Casting the data to
imagewill get correct results back on Linux (Ubuntu):However, on the Windows RGui client, the same code fails with an error about allocating an impossible quantity of memory:
On Linux, getting a non-trivial amount of data from a
varbinary(max)column (in my case, about 13 megabytes) causes a segfault failure sometimes, and a return of an object with the correct length but uninitialized data the rest of the time. (Is this a security concern?) However, using theIMAGEcast seems to almost always work.The offending code is in the RODBC
cachenbind()function, where the column description inspection happens and data binding is set up:The call to
R_Callocis passed a size of one, and a quantity computed asnRows * (datalen + 1).datalencame from theSQLDescribeCol()API for this column.For the
IMAGEdata type, we getSQL_LONGVARBINARYas the type anddatalenis -1. With that computation and its careless sign management, plus ignoring overflow, the quantity requested ofR_Callocis 214748364800, matching the error message we see on Windows. (Since I'm not set up for C development on Linux just now, I can't comment on what's going on there. Maybe Linux manages to map some memory in response to thisR_Calloccall, or maybe it ends up getting different numbers from the Linux version of the same driver. Or, could be that something else happens.)For the
VARBINARY(max)type, we getSQL_VARBINARYas the type and a datalen of 0. The number of bytes requested ofR_Calloc, then, is justnRowswhich is way smaller than expected -- looks like it's 100 bytes.Curiously,
nRowsis used here rather than the computedNROWS.Regardless, it seems as though we're simply lucky that RODBC can return binary data at all because it's not correctly handing the binding of the data or the allocation of the binding memory.
I've written to the package maintainer today, but I haven't received a response yet.