I have a performance issue with gawk 3.1.5 (Linux) related to the use of for-loop.
Run this code
BEGIN {
TagOpen["Req"] = "<S:Envelope"
TagClose["Req"] = "<\\/S:Envelope"
TagOpen["Resp"] = "<SOAP-ENV:Envelope"
TagClose["Resp"]= "<\\/SOAP-ENV:Envelope"
}
{ if ( NR % 10000 == 0 ) print NR }
{
for (i in TagOpen) {
if ( match($0, TagOpen[i]) ) printf "Open [%s]\n", i
if ( match($0, TagClose[i]) ) printf "Close [%s]\n", i
}
}
on a 800,000 lines text file takes:
real 0m56.84s
user 0m56.02s
sys 0m0.29s
Run the apparently identical
BEGIN {
TagOpen["Req"] = "<S:Envelope"
TagClose["Req"] = "<\\/S:Envelope"
TagOpen["Resp"] = "<SOAP-ENV:Envelope"
TagClose["Resp"]= "<\\/SOAP-ENV:Envelope"
}
{ if ( NR % 10000 == 0 ) print NR }
{
i="Req"
if ( match($0, TagOpen[i]) ) printf "Open [%s]\n", i
if ( match($0, TagClose[i]) ) printf "Close [%s]\n", i
i="Resp"
if ( match($0, TagOpen[i]) ) printf "Open [%s]\n", i
if ( match($0, TagClose[i]) ) printf "Close [%s]\n", i
}
takes:
real 0m3.36s
user 0m3.23s
sys 0m0.21s
I can't believe what I'm seeing!
Any idea?
P.S. This doesn't appen with "legacy" awk on HP-UX