Apologies - I'm fairly new to R, so I hope this isn't too obvious a question.
I have been trying to read an Excel file in xls format into R. I am using the read.xls function from the gdata package.
I initially tried:
V2 <- read.xls("file.xls",header=TRUE)
But this couldn't deal with the fact that some of the strings in the file contain quotation marks (") to denote inches. This a) caused some fields in V2 to incorporate multiple strings from the original, and b) generated an "EOF within string" error where it simply stops reading.
So I read up on this problem and discovered that the solution is to disable quoting, and copied a format suggested for someone having the same issue with the base read.csv function:
V2 <- read.xls("file.xls",sep=",",quote="",row.names=NULL)
This reads all the rows correctly and avoids the previous problems. However, this means that every single non-blank string in V2 has double-quote marks around it - even where there was previously no quotation mark in the data.
It also still generates a backslash character in each string which contained an inch quotation mark in the original, e.g.:
Product GG 7" Tablet
in the original Excel file becomes
"Product GG 7\" Tablet"
In V2.
How can I a) avoid these backslashes appearing in V2 and b) avoid the quotation marks appearing around every single string?
Feel free to use
gsub()to remove the \" as below.If there is one column that has the issue.
1.Using mtcars as an example
2.Hardcode mpg column to be having \"
3.Use
gsub()to remove \" for this columnIf there are multiple columns that have the issue.
1.Using mtcars as an example
2.Hardcode two columns
3.Create a function that does the removal.
4.Using
dplyrandapply()to do the operation on all character columns