Extract strings long vector

49 Views Asked by At

It seems to be a simple question but I cannot figure a way to sort this problem.

I have a text file in a single long vector with expressions like:

&nuSessao=26.2023
&nuSessao=21.2013.N
&nuSessao=24.2023
&Data=22/12/2023
&txFaseSessao=Ordem do Dia
&txFaseSessao=Fechamento

and many others.

I would like to extract this information in a way that all instances of a given variable and have it in a vector:

[1] "&nuSessao=26.2023"   "&nuSessao=21.2013.N" "&nuSessao=24.2023" 

So far i have tried the following command to each variable:

results<-stringr::str_extract_all(MyPage, "&amp;nuSessao=*") |> unlist()

But all I have is a vector of the exact mach of my search variable and not the entire expression as I need:

results
 [1] "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao="
 [5] "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao="
 [9] "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao="
[13] "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao="
[17] "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao="
[21] "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao="
[25] "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao="
[29] "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao="
[33] "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao="
[37] "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao="
[41] "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao="
[45] "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao=" "&amp;nuSessao="
[49] "&amp;nuSessao=" "&amp;nuSessao="

I would appreciate any help

1

There are 1 best solutions below

0
jpsmith On

In base R, you can use strsplit on \n to split up the large string, then grep to identify those with your desired pattern:

strng <- "&amp;nuSessao=26.2023
&amp;nuSessao=21.2013.N
&amp;nuSessao=24.2023
&amp;Data=22/12/2023
&amp;txFaseSessao=Ordem do Dia
&amp;txFaseSessao=Fechamento"

mySplits <- strsplit(strng, "\n")[[1]]
# [1] "&amp;nuSessao=26.2023"          "&amp;nuSessao=21.2013.N"        "&amp;nuSessao=24.2023"         
# [4] "&amp;Data=22/12/2023"           "&amp;txFaseSessao=Ordem do Dia" "&amp;txFaseSessao=Fechamento"  

grep("nuSessao", mySplits, value = TRUE)
# [1] "&amp;nuSessao=26.2023"   "&amp;nuSessao=21.2013.N" "&amp;nuSessao=24.2023"  

Since you tagged stringr, the analogous approach would be to use str_split and str_detect:

mySplits <- str_split(strng, "\n")[[1]]
mySplits[str_detect(mySplits, "nuSessao")]