I want to read a .dbf file in R that contains "deleted" rows. However, I don´t know how to avoid reading all records. I am using the package "foreign" and the function read.dbf.
According to the .dbf file description, each record begins with a 1-byte "deletion" flag. The byte's value is a space (0x20), if the record is active, or an asterisk (0x2A), if the record is deleted.
How can I extract this information using R? E.g. for a small sample of the iris data set saved as a .dbf file:
library(foreign)
dbf.file <- 'iris.dbf'
write.dbf(iris[1:5, ], file=dbf.file)
We can use the
readBin()function to read the .dbf file as binary data.Then, based on the .dbf format description, we can read information necessary to navigate ourselves to the first byte of each record. I use a custom function to convert the appropriate bytes from the .dbf header into an unsigned integer.
With these, it is possible to compute what are the records' first bytes and see if they mark the record as deleted or not.
Indeed, none of the records were marked as deleted so we can at least check if the bytes hold the expected value of
0x20:On a side note, from documentation it is not clear how
read.dbf()treats the deleted records so chances are it ignores them and you won't have to deal with this issue at all. It would be interesting to know this so please let us know in the comments.