Fixing roman numerals with regex and VBScript

76 Views Asked by At

I have a text with broken roman numerals and want to fix them using VBScript.

I have following example string

some text Part I, some text Part Ii, some more text IiI, anything DXiV

I found following regex expression that captures roman numerals very well and don't capture regular words

\b(?=[MDCLXVI])(M{0,3})(C[DM]|D?C{0,3})(X[LC]|L?X{0,3})(I[VX]|V?I{0,3})\b

Now i'm making a script that will replace broken numbers with correct (i.e. all caps) ones. Here is what i have so far:

Set re = New RegExp
re.Pattern = "\b(?=[MDCLXVI])(M{0,3})(C[DM]|D?C{0,3})(X[LC]|L?X{0,3})(I[VX]|V?I{0,3})\b"
re.IgnoreCase = True
re.Global = True
str = "some text Part I, some text Part Ii, some more text IiI, anything DXiV"
result = re.Replace(str,UCase("$1$2$3$4"))

I'm getting same string as input.

But if i write re.Replace(str,"xxx") i'm getting all roman numerals correctly replaced with xxx.

I have read some tutorials and RegExp object documentation but still can't find the way. I guess this will be not as easy as i thought.

Thanks for your help!

1

There are 1 best solutions below

3
LesFerch On BEST ANSWER

I see that you're referencing the capture groups in the Replace, but I'm not sure that can be done in combination with a case change. If no other answers provide a way to do that, here's a more verbose solution:

Function CorrectRomanNumeralsCase(inputString)
    Dim pattern, matches, match, romanNumeral, correctedRomanNumeral
    pattern = "\b[IVXLCDM]+\b"
    Set regex = New RegExp
    regex.Pattern = pattern
    regex.IgnoreCase = True
    regex.Global = True
    Set matches = regex.Execute(inputString)
    For Each match In matches
        romanNumeral = match.Value
        correctedRomanNumeral = UCase(romanNumeral)
        inputString = Replace(inputString, romanNumeral, correctedRomanNumeral)
    Next
    CorrectRomanNumeralsCase = inputString
End Function

inputString = "some text Part I, some text Part Ii, some more text IiI, anything DXiV"
correctedString = CorrectRomanNumeralsCase(inputString)
WScript.Echo correctedString

FYI, here's a PowerShell version

function Correct-RomanNumeralsCase {
    param ($inputString)
    $pattern = '\b([IVXLCDM]+)\b'
    $matches = [regex]::Matches($inputString, $pattern)
    foreach ($match in $matches) {
        $romanNumeral = $match.Groups[1].Value
        $correctedRomanNumeral = $romanNumeral.ToUpper()
        $inputString = $inputString -replace [regex]::Escape($romanNumeral), $correctedRomanNumeral
    }
    return $inputString
}

$inputString = "some text Part I, some text Part Ii, some more text IiI, anything DXiV"
$correctedString = Correct-RomanNumeralsCase -inputString $inputString
Write-Host $correctedString