Regex ignore specific string between two strings

178 Views Asked by At

So I have a string like this:

test //ita https://example.com lics// test // another // one

I can capture text between 2 "//" strings easy enough like so:

\/\/(.*?)\/\/

Which will return the groups ita https: and test however I'm trying to get it to ignore the cases where there is a "http://" or "https://".

So I'm trying to get it so that it only returns ita https://example.com lics and another.

4

There are 4 best solutions below

1
Patrick Simard On

Looks like your on the right track.

(?<!https?:\/\/)\/\/(.*?)(?=(?:\s|https?:\/\/|$))

Heres how it works

(?<!https?:\/\/)

Negative lookbehind assertion to ensure that there is no "http://" or "https://" before "//".

\/\/

Matches the "//" strings.

(.*?)

Captures any text between "//" using a non-greedy match.

(?=(?:\s|https?:\/\/|$))

Positive lookahead assertion to ensure that what follows is either a whitespace character, "http://" or "https://", or the end of the string.

You should post a few string examples to better test but based on what I tried, looks like this works.

5
Nick On

You can use this regex to match your strings:

(?<!https:|http:)//\s*((?:https?://|(?!//).)*)(?<!\s)\s*//

This will match:

  • (?<!https:|http:)// : //, not preceded by https: or http:
  • \s* : some amount of whitespace
  • ((?:https?://|(?!//).)+) : capture group 1, some number of either:
    • https?:// : // preceded by https: or http:; or
    • (?!//). : a character which is not the start of //
  • (?<!\s)\s* : some amount of whitespace, not preceded by whitespace (this prevents capturing any whitespace before the closing // in group 1)
  • // : literal //

Regex demo on regex101

The strings you are interested in will be captured in group 1. In PHP:

$text = 'test //ita https://example.com lics// test // another // one';
$regex = '~(?<!https:|http:)//\s*((?:https?://|(?!//).)*)(?<!\s)\s*//~';
preg_match_all($regex, $text, $matches);
var_export($matches[1]);

Output:

array (
  0 => 'ita https://example.com lics',
  1 => 'another',
)

PHP demo on 3v4l.org

2
Hao Wu On
(https?://)(*SKIP)(*F)|//\s*((?:(?1)|.)*?)\s*//

I propose this solution with control verbs (*SKIP) and (*F).

Here's the regex101 proof and PHP proof

1
mickmackusa On

Just to show another implemented solution, I'll demonstrate preg_split() instead of preg_match_all() so that the isolated strings are immediately returned from the native function call instead of accessing a reference variable.

Code: (Demo)

var_export(
    preg_split(
        '~(?1)(?://\s*((?:https?://|.)*?)\s*//)?~',
        $text,
        -1,
        PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY
    )
);

Output:

array (
  0 => 'ita https://example.com lics',
  1 => 'another',
)

Breakdown:

(?1)                  #match subpattern logic from capture group 1 ("non-marker" characters -- in this context)
(?:                   #start non-capturing group
   //                 #two literal forward slashes
   \s*                #zero or more whitespace characters
   (                  #start capture group 1
      (?:             #start non-capturing group
         https?://    #match http:// or https://
         |            #OR
         .            #match a single character
      )*?             #end non-capturing group and lazily match zero or more repetitions of the pattern
   )                  #end capture group 1
   \s*                #zero or more whitespace characters
   //                 #two literal forward slashes
)?                    #close non-capturing group and allow zero or one repetition