gawk for-loop performance issue

58 Views Asked by At

I have a performance issue with gawk 3.1.5 (Linux) related to the use of for-loop.

Run this code

BEGIN {
    TagOpen["Req"] = "<S:Envelope"
    TagClose["Req"] = "<\\/S:Envelope"
    TagOpen["Resp"] = "<SOAP-ENV:Envelope"
    TagClose["Resp"]= "<\\/SOAP-ENV:Envelope"
}

{ if ( NR % 10000 == 0 ) print NR }

{
    for (i in TagOpen) {

        if ( match($0, TagOpen[i]) )  printf "Open  [%s]\n", i

        if ( match($0, TagClose[i]) ) printf "Close [%s]\n", i
    }
}

on a 800,000 lines text file takes:

real    0m56.84s
user    0m56.02s
sys     0m0.29s

Run the apparently identical

BEGIN {
    TagOpen["Req"]  = "<S:Envelope"
    TagClose["Req"] = "<\\/S:Envelope"
    TagOpen["Resp"] = "<SOAP-ENV:Envelope"
    TagClose["Resp"]= "<\\/SOAP-ENV:Envelope"
}

{ if ( NR % 10000 == 0 ) print NR }

{
    i="Req"

    if ( match($0, TagOpen[i]) )  printf "Open  [%s]\n", i
    if ( match($0, TagClose[i]) ) printf "Close [%s]\n", i
            i="Resp"
    if ( match($0, TagOpen[i]) )  printf "Open  [%s]\n", i
    if ( match($0, TagClose[i]) ) printf "Close [%s]\n", i
}

takes:

real    0m3.36s
user    0m3.23s
sys     0m0.21s

I can't believe what I'm seeing!

Any idea?

P.S. This doesn't appen with "legacy" awk on HP-UX

0

There are 0 best solutions below