Regular expressions rapidly become too complex (for me) to understand. Even something so simple as [ab][cd], has several logical branches. My goal is to improve the maintainability of our code base, so answers to these questions could help us detect and fix complex code:
- Are there computational complexity metrics (similar to cyclomatic complexity) that include the complexity inherent in regular expressions?
- Are there any tools that produce a complexity number for regular expressions?
- Are there tools that can suggest simplifications of regular expressions?
You might try using the compiled form of the regexp and try mapping some code complexity metrics to that, like, lines of code, or cyclomatic complexity. To see what I mean, look at the following stackoverflow answer: https://stackoverflow.com/a/2348725/5747415, which shows how with perl you can access the compiled form of a regexp. Another example is shown here: http://perldoc.perl.org/perldebguts.html#Debugging-Regular-Expressions, quoting the tool output from that page:
Btw., I congratulate you to the decision to improve the code maintainability. That said, I just have to express my doubts that any formal metric provides a better guidance than (or can even get close to) the judgement of experienced developers...