R functions on workbook skipping blank cells

Question

R functions on workbook skipping blank cells

59 Views Asked by js4032 At 22 September 2023 at 14:09

I'm trying to write a function that edits workbooks imported via loadWorkbook and have a problem where blank cells get ignored rather than treated as NAs. The following code demonstrates this issue.

df <- data.frame(x = c(1, 2, 3, 4, NA, 6), y = c(7, 8, 9, 10, NA, 12))
xlsx::write.xlsx(df, file = 'testdf.xlsx', showNA = FALSE, row.names = FALSE)
testwb <- loadWorkbook(file = 'testdf.xlsx')

sheetnames <- c('Sheet1')
sheets <- lapply(sheetnames, function(name) getSheets(testwb)[[name]])
rows <- getRows(sheets[[1]], rowIndex=1:7)
cells <- getCells(rows, colIndex=1:2)
values <- sapply(unlist(cells), getCellValue)
values
length(values)

When I run it as is I get what I expected, with NAs included:

1.1 1.2 2.1 2.2 3.1 3.2 4.1 4.2 5.1 5.2 6.1 6.2 7.1 7.2

"x" "y" "1" "7" "2" "8" "3" "9" "4" "10" NA NA "6" "12"

However, if I open the excel file and manually delete the first row of data and then run it again, I get this:

1.1 1.2 3.1 3.2 4.1 4.2 5.1 5.2 7.1 7.2

"x" "y" "2" "8" "3" "9" "4" "10" "6" "12"

This causes problems for me because when doing comparisons between two sheets they end up with different numbers of values if one has more blank cells than the other. How do I force it to include NA values?

I tried manually rebuilding a copy of the workbook so I could use the keepNA argument in writeData like this:

testwb2 <- openxlsx::createWorkbook()
addWorksheet(testwb2, 'Sheet1')
writeData(testwb2, 'Sheet1', read_xlsx('testdf.xlsx', sheet = 1), keepNA = TRUE)

sheetnames <- c('Sheet1')
sheets <- lapply(sheetnames, function(name) getSheets(testwb2)[[name]])
rows <- getRows(sheets[[1]], rowIndex=1:7)
cells <- getCells(rows, colIndex=1:2)
values <- sapply(unlist(cells), getCellValue)
values
length(values)

but then I get the error

Error in envRefInferField(x, what, getClass(class(x)), selfEnv) : ‘getNumberOfSheets’ is not a valid field or method name for reference class “Workbook”

which I also don't understand. Is there a simple way to handle this?

Original Q&A

There are 1 best solutions below

**Jan Marvin** · Answer 1 · 2023-09-23T13:24:04.990000

The question is a little confusing, but the problem has probably been answered a few times already. Here's what I think is happening: When you remove the first row, your spreadsheet software does some kind of housecleaning and removes the empty row from the data (it is no longer part of the xml structure in the xlsx file you wrote), and the xlsx package returns only the cells it could read in the workbook.

I have tried to fix similar problems with openxlsx, so I assume that xlsx behaves similarly.

Below I have created a file that behaves as you described the problem, showing that openxlsx also behaves naughty, but that there are other packages that behave nice.

df <- data.frame(
  x = c(1, 2, 3, 4, NA, 6),
  y = c(7, 8, 9, 10, NA, 12)
)

tmp <- openxlsx2::temp_xlsx()
openxlsx2::wb_workbook()$
  add_worksheet("Sheet1")$
  add_data(x = df[2:4,], dims = "A1:B4")$
  add_data(x = df[6,], dims = "A6:B6", col_names = FALSE)$
  save(tmp)

testwb <- xlsx::loadWorkbook(file = tmp)

sheetnames <- c('Sheet1')
sheets <- lapply(sheetnames, function(name) xlsx::getSheets(testwb)[[name]])
rows   <- xlsx::getRows(sheets[[1]], rowIndex=1:7)
cells  <- xlsx::getCells(rows, colIndex=1:2)
values <- sapply(unlist(cells), xlsx::getCellValue)
values
#>  1.1  1.2  2.1  2.2  3.1  3.2  4.1  4.2  6.1  6.2 
#>  "x"  "y"  "2"  "8"  "3"  "9"  "4" "10"  "6" "12"
length(values)
#> [1] 10

# naughty
openxlsx::read.xlsx(tmp)
#>   x  y
#> 1 2  8
#> 2 3  9
#> 3 4 10
#> 4 6 12

# nice
readxl::read_xlsx(tmp)
#> # A tibble: 5 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     2     8
#> 2     3     9
#> 3     4    10
#> 4    NA    NA
#> 5     6    12

# nice
openxlsx2::read_xlsx(tmp)
#>    x  y
#> 2  2  8
#> 3  3  9
#> 4  4 10
#> 5 NA NA
#> 6  6 12

R functions on workbook skipping blank cells

There are 1 best solutions below

Related Questions in R

Related Questions in OPENXLSX

Related Questions in R-XLSX

Trending Questions

Popular # Hahtags

Popular Questions