R - gdata package read.xls - how to deal with quote marks in original text?

939 Views Asked by At

Apologies - I'm fairly new to R, so I hope this isn't too obvious a question.

I have been trying to read an Excel file in xls format into R. I am using the read.xls function from the gdata package.

I initially tried:

V2 <- read.xls("file.xls",header=TRUE)

But this couldn't deal with the fact that some of the strings in the file contain quotation marks (") to denote inches. This a) caused some fields in V2 to incorporate multiple strings from the original, and b) generated an "EOF within string" error where it simply stops reading.

So I read up on this problem and discovered that the solution is to disable quoting, and copied a format suggested for someone having the same issue with the base read.csv function:

V2 <- read.xls("file.xls",sep=",",quote="",row.names=NULL)

This reads all the rows correctly and avoids the previous problems. However, this means that every single non-blank string in V2 has double-quote marks around it - even where there was previously no quotation mark in the data.

It also still generates a backslash character in each string which contained an inch quotation mark in the original, e.g.:

Product GG 7" Tablet 

in the original Excel file becomes

"Product GG 7\" Tablet"

In V2.

How can I a) avoid these backslashes appearing in V2 and b) avoid the quotation marks appearing around every single string?

1

There are 1 best solutions below

0
tigerloveslobsters On

Feel free to use gsub() to remove the \" as below.

If there is one column that has the issue.

1.Using mtcars as an example

mtcars <- mtcars

2.Hardcode mpg column to be having \"

mtcars$mpg <- "Product GG 7\" Tablet"

3.Use gsub() to remove \" for this column

mtcars$mpg <- gsub(pattern = "\"",replacement = "",x = mtcars$mpg) 

mtcars$mpg

If there are multiple columns that have the issue.

1.Using mtcars as an example

mtcars <- mtcars

2.Hardcode two columns

mtcars$mpg <- "Product GG 7\" Tablet"
mtcars$cyl <- "6\" Tablet"

3.Create a function that does the removal.

remove_slash_quote <- function(data) {
  data <- gsub(pattern = "\"",replacement = "",x = data) 
}

4.Using dplyr and apply() to do the operation on all character columns

library(dplyr)

mtcars %>% 
  select_if(is.character) %>% 
apply(2,remove_slash_quote)