Regex - display / match strings containing one or more ++ at the end of string

57 Views Asked by At

I have a text file which includes different packages (name, id, current version, new version, source) extracted from winget (winget upgrade) (I removed the first two lines and the last line)

Content of the text file:

Brave                        Brave.Brave         111.1.49.120         111.1.49.128        winget
Git                          Git.Git             2.39.2               2.40.0              winget
Notepad++ (64-bit x64)       Notepad++.Notepad++ 8.5                  8.5.1               winget
Spotify                      Spotify.Spotify     1.2.7.1277.g2b3ce637 1.2.8.907.g36fbeacc winget
Teams Machine-Wide Installer Microsoft.Teams     1.5.0.30767          1.6.00.4472         winget
PDFsam Basic                 PDFsam.PDFsam       5.0.3.0              5.1.1.0             winget

I am trying to use Python3 to filter out all package ids, cause the output of winget upgrade is just text based.

What I have tried so far:

import re

with open(r"C:\Users\Username\Desktop\winget_upgrade.txt", "r") as f:
    for line in f:
        match = re.search(r"\b([a-zA-Z]+[a-zA-Z0-9!@#$%^&*()+\-.]*\.[a-zA-Z]+[a-zA-Z0-9!@#$%^&*()+\-.]*\+*)\b", line)
        if match:
            print(match.group(1))

The output is:

Brave.Brave
Git.Git
Notepad++.Notepad
Spotify.Spotify
Microsoft.Teams
PDFsam.PDFsam

The problem here is that the package notepad is missing two + characters at the end. How can I edit my regex syntax to successfully display:

notepad++.notepad++ instead of notepad++.notepad

I think I must change something at the + filter: ()+\-.]*\+*)
But I am not sure what.
Can you help me?

1

There are 1 best solutions below

0
markalex On BEST ANSWER

Problem is caused by \b, as transition from + to space is not word boundary.

Use lookahead (?=\s) instead:

import re

lines = [
'Brave                        Brave.Brave         111.1.49.120         111.1.49.128        winget',
'Git                          Git.Git             2.39.2               2.40.0              winget',
'Notepad++ (64-bit x64)       Notepad++.Notepad++ 8.5                  8.5.1               winget',
'Spotify                      Spotify.Spotify     1.2.7.1277.g2b3ce637 1.2.8.907.g36fbeacc winget',
'Teams Machine-Wide Installer Microsoft.Teams     1.5.0.30767          1.6.00.4472         winget',
'PDFsam Basic                 PDFsam.PDFsam       5.0.3.0              5.1.1.0             winget',
    ]

for line in lines:
    match = re.search(r"\b([a-zA-Z]+[a-zA-Z0-9!@#$%^&*()+\-.]*\.[a-zA-Z]+[a-zA-Z0-9!@#$%^&*()+\-.]*\+*)(?=\s)", line)
    if match:
        print(match.group(1))

Output:

Brave.Brave
Git.Git
Notepad++.Notepad++
Spotify.Spotify
Microsoft.Teams
PDFsam.PDFsam