Let's assume I have this dataframe:
df =data.frame(text=c("This is a very long sentence that I would like to trim because I might need to put it as a label somewhere",
"This is another very long sentence that I would also like to trim because I might need to put it as who knows what"),col2=c("1234","5678"))
Following this post I have been able to get a new column that gets me the start of the sentence with complete words, which is fine.
df$short_txt = sapply(strsplit(df$text, ' '), function(i) paste(i[cumsum(nchar(i)) <= 20], collapse = ' '))
> df$short_txt
[1] "This is a very long" "This is another very"
However, I would also be interested in pasting the end of complete words from 20 characters before the end, having something close to this output.
> df$short_txt
[1] "This is a very long...it as a label somewhere" "This is another very...it as who knows what"
I can't figure out how to complete the sapply function to reach this outcome. I tried using the paste function and changing the cumsum function as df$short_txt = sapply(strsplit(df$text, ' '), function(i) paste(i[cumsum(nchar(i)) <= 20],"...",i[cumsum(nchar(i)) >= (nchar(i)-20)], collapse = ' ')) but it does not return what I want.
Appreciate the help.
Perhaps we can regex this?
Regex explanation:
This did not include your
itat the beginning because without it, the substring is 20-long.I'll look at
df$text[1]with various numbers for leading/trailing complete-word substrings.I don't know off-hand how to protect against the spaces before/after the added
...here, but it can be cleaned up post-editing (safe as long as your strings don't natively contain"...").