Basically the regex im looking to create is something that would match every domain google except google.com and google.com.au
So google.org google.uk or google.com.pk would be a match. Im working within the limitations of re2 and the best i've been able to come up with is
google\.([^c][^o][^m]\.?[^a]?[^u]?)
This doesnt work for the extended domains like google.com.pk and it doesnt work if the root is double digit eg. .cn instead of .org etc
It works if there's no extended domain and the root isnt two digit google.org matches google.com doesnt match
Here's the link with test cases. regexr.com/7rbkn
Im looking for a workaround for negative lookahead. Or whether its possible to accomodate this within a single regex string.
Sure you can. The pattern will look a bit ugly, but what you are asking for is totally possible.
Let's assume that the input already satisfy the regex
google(?:\.[a-z]+)+(i.e.googlefollowed by at least one domain names) for ease of explanation. If you want more precision, see this answer.Match a name that is not a given name
The inverted of
comwould be:c, oro, orm.Translate that to regex and we have:
The same applies to
au:Match a hostname that is not a given hostname
There are two cases you want to avoid:
google.comandgoogle.com.au. The inverted of that would be the union of the following cases:google.*where*is any name butcomgoogle.*.*where the first*is any name butcom, orgoogle.com.*where*is any name butaugoogle.*.*.* ...Or, a bit more logical:
com, it doesn't matter how many names are left.comand the second name is notau, the rest of the names are also irrelevant.comandaucorrespondingly, then there must be at least one other name, which means there are at least three extra names.That said, we only need three branches. Let
be the inverted ofcomcom, here's what the pattern looks like in pseudo-regex:See the common parts? We can extract them out:
Insert what we had from section 1, and voilà.
The final pattern
Try it on regex101.com: PCRE2 with comments, Go, multiline mode.