How do I find the first # after an even number of "?

76 Views Asked by At

Reading a text file with the format:

e2c=["(vsim-86)" ,'kkk', "pppp", 
"bbbbbb", #"old", "uio",
" sds # sds", #"old2",
" sds # sds", " a # b",#"old2",
# ' sds # sds',
# - "example of string to override"
 "aaaaaa" ]

I'm trying to get the index of the first # after 0-N sub-strings using regular expressions but I don't manage to find the correct one.

The string example is " sds # sds", #"old2",

The code is:

while IFS= read -r rline; do
  echo $rline # prints a line from the file
  index=$(grep -P '(^[^\"]*(["][^\"]*["][^\"]*){0,}[^\"]*#)'  <<< "$rline" | awk '{print index($0, "#")-1}')
  echo "The index of the first not in string # is: $index"
done < file.txt

And the output:

e2c=["(vsim-86)" ,'kkk', "pppp", 
The index of the first not in string # is: 
"bbbbbb", #"old", "uio",
The index of the first not in string # is: 10
" sds # sds", #"old2",
The index of the first not in string # is: 6
" sds # sds", " a # b",#"old2",
The index of the first not in string # is: 6
# ' sds # sds',
The index of the first not in string # is: 0
# - "example of string to override"
The index of the first not in string # is: 0
 "aaaaaa" ]
The index of the first not in string # is: 

it keeps returning index = 6 instead of 14

if the string was " sds # sds", " a # b",#"old2", should be giving 23 but also gives 6.

1

There are 1 best solutions below

0
anubhava On

You don't need to use grep | awk here. Just single awk like this would do the job on any awk:

awk -F '"' '{
s = 0
for (i=1; i<=NF; ++i)
   if (i%2 && (p = index($i, "#"))) {
      print s+p
      next
   }
   else
      s += length($i)+1
}' file

11
15
24
1

PS: These indices start from position 1 as per the awk standard.

This awk command splits each line on " character and i%2 != 0 condition makes sure to operate on odd numbered fields only to make sure we are searching for # outside the double quoted text.