Remove Backslash in a word in R

75 Views Asked by At

I have been trying to do topic modeling for articles. I cleaned the raw data which contains a lot of backslash and numbers. Even after removing the punctuations, backslash, and numbers, but I got the backslash along with numbers in top terms in topic 1. The code snippet which I used for the preprocessing is

articles <- tm::tm_map(articles, content_transformer(tolower))
# Remove numbers
articles<- tm_map(articles, removeNumbers)
# Remove english common stopwords
articles<- tm_map(articles, removeWords, stopwords("english"))
# Remove punctuations
articles<- tm_map(articles, removePunctuation)
# Eliminate extra white spaces
articles <- tm_map(articles, stripWhitespace)
toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x))
articles <- tm_map(articles,toSpace, "\\\\" )

Even after trying to clean the data, I got the backslash and numbers in top terms in topics, design
robot
class
medical
device wkh\003
students
dcbl
ri\003
course

The backslash and the numbers in the topics are totally inappropriate. Kindly help me with a solution

1

There are 1 best solutions below

0
william3031 On

You can use the stringr package. For example:

library(tidyverse)

df <- tibble(text = c("robot", "class", "medical", "device wkh\\003", "students", "dcbl", "ri\\003", "course", NA))


df %>% 
  mutate(text = str_remove_all(text, "\\\\"))
  
# A tibble: 9 × 1
  text         
  <chr>        
1 robot        
2 class        
3 medical      
4 device wkh003
5 students     
6 dcbl         
7 ri003        
8 course       
9 NA