Extract version-specific upgrade notice from readme text

138 Views Asked by At

I am currently writing a PHP function which should help me to extract an upgrade notice from a given readme text.

This is my source text:

Some stuff before this notice like a changelog with versioning and explanation text.

== Upgrade Notice ==

= 1.3.0 =

When using Master Pro, 1.3.0 is the new minimal required version!

= 1.1.0 =

When using Master Pro, 1.1.0 is the new minimal required version!

= 1.0.0 =

No upgrade - just install :)

[See changelog for all versions](https://plugins.svn.wordpress.org/master-pro/trunk/CHANGELOG.md).

This is the function:

/**
 * Parse update notice from readme file
 *
 * @param string $content
 * @param string $new_version
 *
 * @return void
 */
private function parse_update_notice( string $content, string $new_version ) {
    $regexp  = '~==\s*Upgrade Notice\s*==\s*(.*?=+\s*' . preg_quote( $new_version ) . '\s*=+\s*(.*?)(?=^=+\s*\d+\.\d+\.\d+\s*=+|$))~ms';

    if ( preg_match( $regexp, $content, $matches ) ) {
        $version = trim( $matches[1] );
        $notices = (array) preg_split( '~[\r\n]+~', trim( $matches[2] ) );

        error_log( $version );
        error_log( print_r( $notices, true ) );
    }
}

I am currently stuck at my RegEx. I'm not really getting it to work. This was my initial idea:

  1. Only search after == Upgrade Notice ==
  2. Check if we have a version matching $new_version
  3. Get the matched version between the = x.x.x = as match 1 e.g. 1.1.0
  4. Get the content after the version as match 2 but stopping after an empty new line. The upgrade notice can go over multiple lines but without an empty new line.
4

There are 4 best solutions below

0
The fourth bird On BEST ANSWER

To get the first part after "Upgrade Notice", matching only the first following block with non empty lines, you can omit the s flag to have the dot match a newline and capture matching all following lines that contain at least a single non whitespace character.

^==\h*Upgrade Notice\h*==\R\s*^=\h*(1\.3\.0)\h*=\R\s*^((?:\h*\S.*(?:\R\h*\S.*)*)+)

The line in PHP:

$regexp = '~^==\h*Upgrade Notice\h*==\R\s*^=\h*(' . preg_quote( $new_version ) . ')\h*=\R\s*^((?:\h*\S.*(?:\R\h*\S.*)*)+)~m';

Regex demo


If you want to be able to determine which occurrence after matching "Upgrade Notice", you can use a quantifier to skip the amount of occurrences that start with the version pattern:

^==\h*Upgrade Notice\h*==(?:(?:\R(?!=\h*\d+\.\d+\.\d+\h*=$).*)*\R=\h*(\d+\.\d+\.\d+)\h*=$\s*){2}(^\h*\S.*(?:\R\h*\S.*)+)
  • ^ Start of string
  • ==\h*Upgrade Notice\h*== The starting pattern, where \h* match optional horizontal whitespace characters
  • (?: Non capture group
    • (?:\R(?!=\h*\d+\.\d+\.\d+\h*=$).*)* Match all lines that do not start with a version pattern
    • \R=\h* Match a newline and = followed by horizontal whitespace characters
    • (\d+\.\d+\.\d+) Capture group 1, match the version
    • \h*=$\s* Match horizontal whitespace characters, = and assert the end of the string and match optional whitespace characters
  • ){2} Use a quantifier (in this case {2}) to match n times a version pattern
  • ^ Start of string
  • ( Capture group 2
    • (?:\h*\S.*(?:\R\h*\S.*)*)+ Match 1 or more lines that contain at least a single non whitespace character
  • ) Close the group

Regex demo

2
Markus Zeller On

You don't need to do everything with a regex. Just use a regex for the version detection. Here's a simplified version:

Demo: https://3v4l.org/aMdXF

$versions = [];
$currentVersion = '';
$ignore = true;
foreach(explode("\n", $md) as $line) {
    if (str_starts_with($line, '== Upgrade Notice ==')) {
        $ignore = false;
        continue;
    }

    if (preg_match('/^= ([0-9.]+) =/', $line, $matches)) {
        $currentVersion = $matches[1];
        continue;
    }

    if (true === $ignore || '' === $currentVersion) {
        continue;
    }

    $versions[$currentVersion][] = $line;
}
4
Olivier On

Here is a solution not based on regex but good old strpos():

function getNotice($readme, $version)
{
    $txt = str_replace("\r", '', $readme);
    $p1 = strpos($txt, "== Upgrade Notice ==");
    if($p1 !== false)
    {
        $ver = "= $version =";
        $p2 = strpos($txt, $ver, $p1);
        if($p2 !== false)
        {
            $p2 += strlen($ver) + 2;
            $p3 = strpos($txt, "\n\n", $p2);
            if($p3 !== false)
                return substr($txt, $p2, $p3 - $p2);
            else
                return substr($txt, $p2);
        }
    }
    return '';
}

$readme = <<<README
Some stuff before this notice which is not relevant.

== Upgrade Notice ==

= 1.3.0 =

When using Master Pro, 1.3.0 is the new minimal required version!
Additional line.

= 1.1.0 =

When using Master Pro, 1.1.0 is the new minimal required version!

= 1.0.0 =

No upgrade - just install :)

[See changelog for all versions](https://plugins.svn.wordpress.org/master-pro/trunk/CHANGELOG.md).
README;

echo getNotice($readme, '1.3.0');

Output:

When using Master Pro, 1.3.0 is the new minimal required version!
Additional line.
3
shingo On

It seems like it's just a mistake in the position of your parentheses:

'~==\s*Upgrade Notice\s*==\s*.*?=+\s*(' . preg_quote( $new_version ) .
  ')\s*=+\s*(.*?)(?=^=+\s*\d+\.\d+\.\d+\s*=+|^\s*?$)~ms'

https://3v4l.org/WY3aE