Ruby 2.4: How To Speed Up Regexp Initializing Dynamically For Use with .match?

1.4k Views Asked by At

I just read about Regexp.match?('string') for Ruby 2.4 and was very excited to see the results! But when I tried it out in my application, I hardly saw any gains.

str = 's'
Benchmark.bm do |b|
  b.report(".match         ") { 100000.times { 'string'.match /s/ } }
  b.report(".match?        ") { 100000.times { 'string'.match? /s/ } }
  b.report(".match dynamic ") { 100000.times { 'string'.match /#{str}/ } }
  b.report(".match? dynamic") { 100000.times { 'string'.match? /#{str}/ } }
end
                 user     system      total        real
.match           0.140000   0.000000   0.140000 (  0.143658)
.match?          0.020000   0.000000   0.020000 (  0.029628)
.match dynamic   0.370000   0.010000   0.380000 (  0.371935)
.match? dynamic  0.260000   0.010000   0.270000 (  0.278614)

From the Benchmark, we see a tremendous gain from .match to .match?, but once i start dynamically creating complicated regex as my app requires, I'm loosing a lot of the gains.

My question is, why is there such a drastic difference and can I somehow create dynamic regexp to utilize the performance of .matches? in the example below? I tested my benchmarks using ruby 2.4.2p198

str = 'my text with words'
reg_str = '((^|[\s\"“])(cherry pie|cherry pies)($|[\s\"”\.\,\:\?\!])|(\#(cherrypie|cherrypies)($|\s|\#|\.|\,|\:|\?|\!)))'
puts Benchmark.measure {
  100000.times { str.match? /#{reg_str}/i }
}
9.380000   0.010000   9.390000 (  9.403821)

puts Benchmark.measure {
  100000.times { str.match? /((^|[\s\"“])(cherry pie|cherry pies)($|[\s\"”\.\,\:\?\!])|(\#(cherrypie|cherrypies)($|\s|\#|\.|\,|\:|\?|\!)))/i }
}  
0.020000   0.000000   0.020000 (  0.017900)
3

There are 3 best solutions below

0
On

The speed improvement of match? comes from not allocating the MatchData objects and globals like $1. It just returns true or false. You can't use match? if you need to return something from the regex.

match? won't be any faster at compiling regex-strings into Regexp objects.

Perhaps in your code you can first create the regexes and then use those in the loop instead of constantly recreating them:

# bad:
lines.each { |line| puts "Found a match!" if line.match?(/abcd/) }

# good:
regex = /abcd/
lines.each { |line| puts "Found a match!" if line.match?(regex) }
3
On

Use the /o modifier, so the interpolation is only performed once:

str = 's'
Benchmark.bm do |b|
  b.report(".match         ") { 100000.times { 'string'.match /s/ } }
  b.report(".match?        ") { 100000.times { 'string'.match? /s/ } }
  b.report(".match dynamic ") { 100000.times { 'string'.match /#{str}/o } }
  b.report(".match? dynamic") { 100000.times { 'string'.match? /#{str}/o } }
end
       user     system      total        real
.match           0.120000   0.010000   0.130000 (  0.117889)
.match?          0.020000   0.000000   0.020000 (  0.027255)
.match dynamic   0.110000   0.000000   0.110000 (  0.113300)
.match? dynamic  0.030000   0.000000   0.030000 (  0.034755)
1
On

You basically measure the string/regexp interpolation vs literal instantiation. The time of match? itself is not affecting the result of the measure at all.

To compare match? against match, one should instantiate the regexp upfront:

str = 'my text with words'
reg_str = '...'
reg = /#{reg_str}/i
puts Benchmark.measure {
  100000.times { str.match? reg }
}

The result of above will be roughly the same as in your second test.

That said, the string/regexp interpolation is the beast who takes most of the time. If you need a complicated interpolation in regular expression, the difference between match? and match won’t be noticeable, since the interpolation is a bottleneck, not the matching.