In R to extract a string, which incldues a string

85 Views Asked by At
ex02ChildrenInverse <- function(sentence) {
  
assertString(sentence)
  
matches <- regmatches(
    
    sentence,
    
    regexec('^(.*?) is the (father|mother) of "(.*?)"', sentence))[[1]]
  
parent <- matches[[2]]
  
male <- matches[[3]] == "father"
  
child <- matches[[4]]
  child <- gsub('".*"', '', matches[4])
  
return(list(parent = parent, male = male, child = child))
}

Here is my Code. My Problem is that I want to output the children's Name even though it has double quotations in his name. F.e:

input: 'Gudrun is the mother of "Rosamunde ("Rosi")".'

my output:


$parent

\[1\] "Gudrun"

$male

\[1\] FALSE

$child

\[1\] "Rosamunde ("

but i want

$parent

\[1\] "Gudrun"

$male

\[1\] FALSE

$child

\[1\] "Rosamunde ("Rosi")"

I tried my code and it didn't work out like I wanted to.

I want to change child \<- gsub(.......)

2

There are 2 best solutions below

0
The fourth bird On BEST ANSWER

If it alway is the last part of the string, you can match the dot after the last double quote:

(.*?) is the (father|mother) of "(.*?)"\.

For example:

ex02ChildrenInverse <- function(sentence) {
  
  matches <- regmatches(
    sentence,
    regexec('(.*?) is the (father|mother) of "(.*?)"\\.', sentence))[[1]]
    
  parent <- matches[[2]]
  male   <- matches[[3]] == "father"
  child  <- matches[[4]]
  
  return(list(parent = parent, male = male, child = child))
}
ex02ChildrenInverse('Gudrun is the mother of "Rosamunde ("Rosi")".')

Output

$parent
[1] "Gudrun"

$male
[1] FALSE

$child
[1] "Rosamunde (\"Rosi\")"

See an R demo and a regex demo

0
jpsmith On

A fresh-code approach would be to gsub and grepl to get the relevant information you want, instead of trying to do it all with regmatches:

freshCode <- function(sentence) {
  parent <- gsub("(\\w+).*", "\\1", sentence)
  male <- grepl("father", sentence)
  child <- gsub("\\.", "", substring(sentence, regexpr('"', sentence) + 1))
  list(parent = parent, male = male, child = child)
}

freshCode('Gudrun is the mother of "Rosamunde ("Rosi")".')

# $parent
# [1] "Gudrun"
# 
# $male
# [1] FALSE
# 
# $child
# [1] "Rosamunde (\"Rosi\")\""

# Note the "\" in the above are not truly "visible": 
# > cat(freshCode('Gudrun is the mother of "Rosamunde ("Rosi")".')[[3]])
# Rosamunde ("Rosi")"

Or slightly modify your existing code:

ex02ChildrenInverse <- function(sentence) {
  matches <- regmatches(
    sentence,
    regexec('^(.*?) is the (father|mother) of "(.*?)"', sentence))[[1]]
  parent <- matches[[2]]
  male <- matches[[3]] == "father"
  child <- gsub("\\.", "", substring(sentence, regexpr('"', sentence) + 1))
  
  return(list(parent = parent, male = male, child = child))
}

Which will return the same output as above.