I am trying to open an Excel dataset in R studio, exactly how it looks in Excel, so I can learn how to manipulate it in different ways within R studio, rather than doing prep via Excel first. I am essentially trying a blind practical self teaching of R, with no prior coding knowledge and see this as a good way to figure my way around R. The issue I am having is that some of the format of the data has translated differently in R (for example, a filename extension of 8.7 in Excel coming through as 8.6666529999776429). I have tried to work around this, but it has just made all cells have the correct numbers but with the same number of decimal places, which is still not what I need, and I am now stumped as to how I need to adapt my open.xlsx read code to have it match exactly what Excel shows.
I have been able to read my Excel file into R studio using openxlsx package and with the following code:
read.xlsx('*filepath*.xlsx', sheet = "overview", startRow = 1, colNames = TRUE, rowNames = FALSE, detectDates = FALSE, skipEmptyRows = TRUE, skipEmptyCols = TRUE, rows = NULL, cols = NULL, check.names = FALSE, sep.names = " ", namedRegion = NULL, na.strings = NA, fillMergedCells = FALSE,
followed by:
data<- read.xlsx('*filepath*.xlsx', sheet = "overview", startRow = 1, colNames = TRUE, rowNames = FALSE, detectDates = FALSE, skipEmptyRows = TRUE, skipEmptyCols = TRUE, rows = NULL, cols = NULL, check.names = FALSE, sep.names = " ", namedRegion = NULL, na.strings = NA, fillMergedCells = FALSE)
But when I look at the table it has imported the format of some numbers in the 'Filename extension' column incorrectly. I have added the following code:
#reframing the column for numeric setting in the same format as the source
column_name <- 'Filename extension'
#setting the Filename extension column as numeric
data[[column_name]] <- as.numeric(data[[column_name]])
But this has added decimal places where they are not needed as I want the filename numbers as they are presented in Excel. For example, the left column here shows Excel, and the right column shows what R presents.
| Excel | R Studio |
|---|---|
| 1.0 | 1.00 |
| 2.1 | 2.10 |
| 3.2 | 3.20 |
| 10.9 | 10.90 |
| 11.10 | 11.10 |
| 12.11 | 12.11 |
I want to present the filenames correctly and have read that I need to change the column classes to 'character', after which I can then assign that particular column as above, and this should retain the numbers in that column exactly as they are in the excel sheet. I have tried several variations of code in which I have added colClasses arguments to the end of my read_xlsx code, but each time I get the error showing the colClasses as an unused argument. I've also added 'as.data.frame = TRUE alongside the colClasses and had the same response. What am I doing wrong? And will this colClasses addition actually get me to my desired end result? I am massively new to R studio and coding, so explanations in great detail would be fantastic as I'm getting most of my code from pages like this, or chatGPT, so it's very much a trial and error learning process for me.