I need to merge and clear several dozen Excel files into one database in R. Some further steps are prevented by cells containing in manually line break in Excel cells of imported files. This is called variously: line break, manual content break, hard enter, etc. This is obtained in Excel by using the Alt+Enter combination, searched by the Alt+010 value. The effect is that "enters" / line breaks are inserted within one Excel cell. If this description is not sufficient, I will try to upload an example file somewhere.
After importing mentioned data from Excel into a dataframe in R, the individual cells or names look like this:
data <- as.data.frame(read_excel('sample_database1.xlsx'))
data[8]
> Select\r\none answer\r\nonly
> 1 dont\r\nknow
> 2 yes
> 3 no
where these Excel line breaks are written as \r\n. I tried replacing them with a simple command
data <- gsub('[\n\r]',' ',fixed = TRUE,data)
but it mess all dataframe:
> data[8]
> [1] "c(\"dont\\r\\nknow\", \"yes\", \"no\", \"no\", ...
I then tried different replacement combinations for individual cells (the way I understand it is that subsequent uses of lapply simply work for each cell in dataframe individually)"
data[] <- lapply(data, gsub, pattern = "[\r\n]", replacement = " ", fixed = TRUE)
data[] <- lapply(data, gsub, pattern = "\r\n", replacement = " ", fixed = TRUE)
data[] <- lapply(data, gsub, pattern = "\\r\\n", replacement = " ", fixed = TRUE)
but none of them seem to have any use, either for data or variable names. Nothing happens, the data set looks the same. I don't know, maybe I somehow overthink the problem? Maybe I don't understand something in gsub expressions? Could someone just tell me which direction to go?
EDIT: I made as @r2evans sugested:
> dput(data[8])
structure(list(`Select
one answer
only` = c("dont\r\nknow",
"yes", "no", "dont\r\nknow", "yes", "no", "dont\r\nknow", "yes",
...
, row.names = c(NA, -80L), class = "data.frame")
and result stil shows me \r\n. Even weirder - the names simply display enter.
As @Onyambu sugested, i used command without 'fixed':
data[] <- lapply(data, gsub, pattern = "[\r\n]", replacement = " ")
And it worked for my data:
> data[8]
Select\r\none answer\r\nonly
1 dont know
2 yes
3 no
Now, same expresion used for names:
names(data) <- lapply(names(data), gsub, pattern = "[\r\n]", replacement = " ")
Which worked also :)
> data[8]
Select one answer only
1 dont know
2 yes
Thank You, that's all. I'm just wondering if there is any method that I can apply to both data and names in the dataset at the same time?
P.S. Unfortunately, the solution proposed in @r2evans' answer does NOT work for my data. Simply applying gsub (without lapply) to dataset in r messes the data, making it unprocessable:
> data <- gsub('[\n\r]',' ',data)
> data[8]
[1] "c(\"dont know\", \"yes\", \"no\", \"dont know\"...
This even changes the class of the object:
> class(data)
[1] "character"
So, I think this is the wrong approach, but maybe I'm missing something.