Returning the matching characters with the result with IndexOf

103 Views Asked by At

I have a large text file and looking to find some data

Start1234 …data…End

I need to match on Start that has 4 chars in front of it (1234 is just an example) and four spaces in front of that then read to End. This would be fine using IndexOf returning data but I need to return the 1234 with the result, but it is part of the match so won’t be included. Any ideas on how I can to do this?

3

There are 3 best solutions below

1
Tim Schmelter On

I need to return the 1234 with the result, but it is part of the match so won’t be included

That's wrong. If you use text.IndexOf("1234") you will get the index before 1234, so it's included:

string text  = "Start1234 …data…End";
string find = "1234";
int index = text.IndexOf(find);
string result = index == - 1
    ? null
    : text.Substring(index); // 1234 …data…End
0
breadswonders On

Correct me if I'm wrong: you need to find the first instance of the word "Start" which is followed by four non-whitespace characters, then four spaces. You then need to extract those four characters and then everything after the four spaces.

If I am correct in my understanding, then this function should do what you want:

public bool ParseData(string input, out string trailingFour, out string data)
{
    //Regex explanation
    //(Start): matches the word Start
    //(\S{4}): matches any four characters that are not whitespace
    //( {4}): matches four spaces
    //([\s\S]*): matches all remaining chacters in the string, including newlines
    Regex rx = new Regex(@"(Start)(\S{4})( {4})([\s\S]*)");
    
    //A successful match will have 5 capture groups
    var match = rx.Match(input);
    trailingFour = match.Success ? match.Groups[2].Value : string.Empty;
    data = match.Success ? match.Groups[4].Value : string.Empty;
    
    return match.Success;
}
0
dr.null On

Use RegEx instead of the string methods for this problem. Form a pattern to capture lines:

  • Start with literally Start word.
  • Followed by any four characters (.{4}). Grouped to get the value.
  • Followed by four whitespaces \s{4}.
  • Followed by some text (.*?). Grouped to get the value, and
  • End with literally End word.

Put it together:

Start(.{4})\s{4}(.*?)End

Example

var input = "Start1234 …data…End Start3453    sdfsdfsdfsEnd\nStartSLDE    some data.End";
var pattern = @"Start(.{4})\s{4}(.*?)End";

foreach (Match m in Regex.Matches(input, pattern))
    Console.WriteLine($"{m.Value}, 1st Group: {m.Groups[1].Value}, 2nd Group: {m.Groups[2].Value}");

This will return two matches only:

Start3453    sdfsdfsdfsEnd, 1st Group: 3453, 2nd Group: sdfsdfsdfs
StartSLDE    some data.End, 1st Group: SLDE, 2nd Group: some data.

Call RegEx.Replace method If you need to replace the matches with something else and return a new string:

var replaced = Regex.Replace(input, pattern, string.Empty);

regex101