I am working in Rwith files that contain blocks e.g.
block name { block contents can be anything: strings, numbers or even curly braces {} or whatever}
blockn4m3 containing numbers {
Can be something junk like:
ans{a{a[sf'asödfä'asdösdö'äasdö'äasdö}}}
}}
And then I would like to extract them into a vector so that:
"block name { block contents can be anything strings, numbers or even brackets {} or whatever}","blockn4m3 containing numbers {
Can be something junk like:
ans{a{a[sf'asödfä'asdösdö'äasdö'äasdö}}}
}}"
I assume regular expressions do not work, since there can be curly braces (and nested blocks) within blocks?
So I thought that maybe I just read every file character by character, and then I wrote a following function:
separateBlocksFromFile <- \(file) {
input <- file %>% readLines %>% {paste(., collapse = "\n")}
blocks <- c()
blockNumber = 1 #We start from the first block
netBracketValue = 0 #0, when reading a block name
for(i in 1:nchar(input)) {
currentCharacter = substr(input,i,i)
#Did we enter a block?
netBracketValue = netBracketValue + (currentCharacter == "{")
#Write the character into its correct place.
#Previous characters in the current block...
previousCharacters <- ifelse(is.na(blocks[blockNumber]),"",blocks[blockNumber])
#...are put before current character
blocks[blockNumber] <- paste0(previousCharacters,currentCharacter)
#Did we exit a block? If so, the netBracketValue becomes 0 here.
netBracketValue = netBracketValue - (currentCharacter == "}")
#Block number is updated, if needed.
#Updated when we pass "}" character and the character ends a block i.e.
#netBracketValue == 0
blockNumber <- blockNumber + (netBracketValue == 0)*(currentCharacter == "}")
}
return(blocks)
}
While this works, the solution tends to be a bit slow when dealing with larger files. I was wondering whether there is a faster methods to accomplish this?
EDIT: The block contents cannot have closing } before opening {. If this was the case, then there would no way knowing if we exited a block for sure.