I'm trying to filter invalid characters from an XML file, and have the following test project;
class Program
{
private static Regex _invalidXMLChars = new Regex(@"(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F\uFEFF\uFFFE\uFFFF]", RegexOptions.Compiled);
static void Main(string[] args)
{
var text = "assdabv";
Console.WriteLine(_invalidXMLChars.IsMatch(text));
}
}
This test project outputs the expected result (True) with .NET fiddle;
But when I try to implement the same code in my project, the invalid characters are not found and outputs "False".
How come this works in .NET fiddle, but not in my project?
Altering the source XML file is not an option
Visual Studio is right. None of the characters
&,#,x,For;are part of your Regex. However, in HTMLtranslates to the C# pendant\u000fwhich then is replaced due to the Regex definition\0xE-\0x1F.Using
\u000fin Visual Studio gives a match: