Java substring of multi-line String up to nth line and character index on that line

97 Views Asked by At

In Java, given a multi-line String I want to get the substring from the beginning up to the nth line and the character index on that line. (Both line and character indexes are zero-based.)

For example, if we were to implement a method like this:

  /**
   * Returns the substring of the given string up to the given character index on the given line index.
   *
   * @param text      input text
   * @param line      line index
   * @param character character index
   * @return substring
   */
  public static String substring(String text, int line, int character);

Then, consider the following multi-line String:

hello
world
how
are
you?

For given inputs, the above method should return

  • substring(text, 0, 2);

he

  • substring(text, 1, 3);

hello
wor

  • substring(text, 3, 0);

hello
world
how

I've considered a couple of approaches:

  1. Construct the substring, by operating on the String up to the nth line:
    One approach is to use string.lines(), and build the substring. Something like this:
    UPDATE: updated with an improved & neater implementation, based upon Eritrean's answer:
  public static String buildSubstring(String text, int line, int character) {
    long textLines = text.lines().limit(line + 1).count();
    if (line > textLines) {
      return text;
    } else {
      String[] rows = text.lines().toArray(String[]::new);
      return IntStream.range(0, line + 1)
          .mapToObj(i -> {
            String lineText = rows[i];
            return i == line ? lineText.substring(0, Math.min(character, lineText.length())) : lineText;
          })
          .collect(Collectors.joining(System.lineSeparator()));
    }
  }

However, my main concern would be the performance impact of excessive String creation.

  1. Get the substring up to the character index in the original String:
    A more intuitive approach might be to use string.substring(0, x), where x is the character index - in the original multi-line String - for the nth line and the position in that line.
    However, I don't have a clear idea as to what might be the best approach of finding that character index in the original String.
    One approach could be to iteratively use string.indexOf(System.lineSeparator(),lineIndex) to identify the location of the line in the original String, and add the character index on that line. Something like this:
  public static String indexSubstring(String text, int line, int character) {
    String separator = System.lineSeparator();
    int separatorLength = separator.length();

    int lineIndex = 0;
    if (line > 0) {
      lineIndex = text.indexOf(separator) + separatorLength;
      for (int i = 1; i < line; i++) {
        lineIndex = text.indexOf(separator, lineIndex) + separatorLength;
      }
    }
    return text.substring(0, lineIndex + character);
  }

However, this will not handle cases if the line separators in the text are different from the System.lineSeparator(); which is the case in my situation - that is, the original text could come from a unix or Windows environment and/or this functionality might be executed on a unix or Windows environment, and they need to be interoperable.
Of course, one could do a string.replaceAll("\\r?\\n, System.lineSeparator()), but that's going to do even more String creation than for the first approach using string.lines().

Note: For the purposes of this question, I'm not dealing with error cases - for example, that either of the line/character indexes are beyond the length of the original String, or the character index is beyond the length of the line. Those will be factored in later, once I've decided upon the underlying approach; or, for simplicity, we can assume that it will return everything on the line or in the input text.

Questions:

  1. How can one get the character position in a multi-line String for the nth line and the character index on that line?
    i.e. for use in string.substring(0, x).
  2. Is there a better approach than either of those I've set out above to get the substring?
5

There are 5 best solutions below

1
Konstanius EU On

Using existing System classes and methods will always get you a longer way, they are more efficient and get you to your result with more precision.

import java.util.Scanner;

public class Main {
    public static void main(String[] args) {
        String text = """
                hello
                world
                how
                are
                you?""";
        System.out.println(substring(text, 0, 2)); // he
        System.out.println(substring(text, 1, 3)); // hello\nwor
        System.out.println(substring(text, 3, 0)); // hello\nworld\nhow\n
        try {
            System.out.println(substring(text, 6, 0)); // Line index out of bounds
        } catch (IndexOutOfBoundsException e) {
            System.out.println(e.getMessage());
        }
        try {
            System.out.println(substring(text, 0, 6)); // Range [0, 6) out of bounds for length 5
        } catch (IndexOutOfBoundsException e) {
            System.out.println(e.getMessage());
        }
    }

    /**
     * Returns the substring of the given string up to the given character index on the given line index.
     *
     * @param text      input text
     * @param line      line index (starting at 0 for the first line)
     * @param character character index (starting at 0 for the first character)
     * @return substring
     */
    public static String substring(String text, int line, int character) throws IndexOutOfBoundsException {
        Scanner scanner = new Scanner(text);
        int lineCount = 0;
        StringBuilder sb = new StringBuilder();
        while (scanner.hasNextLine()) {
            String lineText = scanner.nextLine();
            if (lineCount == line) {
                sb.append(lineText, 0, character);
                break;
            } else {
                sb.append(lineText);
                sb.append(System.lineSeparator());
            }
            lineCount++;
        }
        if (lineCount < line) {
            throw new IndexOutOfBoundsException("Line index out of bounds");
        }

        return sb.toString();
    }
}
1
Eritrean On

Assuming you don't have a huge input, I would just split the input into rows and store in an array and use an IntStream and map each row index to the whole line except the line is equal to the parameter line then just map to a substring. something like:

public static String buildSubstring(String text, int line, int character){
    String[] rows = text.lines().toArray(String[]::new);

    return IntStream.range(0, line + 1)
                    .mapToObj(i -> i == line ? rows[i].substring(0,character) : rows[i])
                    .collect(Collectors.joining(System.lineSeparator()));
}
0
DONGMO BERNARD GERAUD On

I think this solution will work for almost all Java versions.

public static String indexSubstring(String text, int line, int character) {
    String result = "";
    try {
        String[] lines = text.split("\n");
        for (int i = 0; i < line; i++) {
            result += lines[i] + "\n";
        }
        result += lines[line].substring(0, character);
        return result;
    } catch (Exception e) {
        e.printStackTrace();
    }
    return result;
}

I tested it with java 15 and it works for all multi-line strin of the form """your multi-line string here""";

0
tevemadar On

If you are okay with getting back the substring with the original line breaks (*), you can loop over the characters, and do a single actual substring() call at the very end only:

public static void main(String[] args) {
    String n = "hello\nworld\nhow\nare\nyou?";
    String r = "hello\rworld\rhow\rare\ryou?";
    String rn = "hello\r\nworld\r\nhow\r\nare\r\nyou?";
    System.out.println(substring(n, 0, 2));
    System.out.println(substring(r, 0, 2));
    System.out.println(substring(rn, 0, 2));
    System.out.println(substring(n, 1, 3));
    System.out.println(substring(r, 1, 3));
    System.out.println(substring(rn, 1, 3));
    System.out.println(substring(n, 3, 0));
    System.out.println(substring(r, 3, 0));
    System.out.println(substring(rn, 3, 0));
}

public static String substring(String text, int line, int character) {
    int pos = 0;
    char sep = 0;
    while (line > 0) {
        char c = text.charAt(pos++);
        if (c == '\n' || c == '\r') {
            if (sep == 0)
                sep = c;
            if (c == sep)
                line--;
        }
    }
    char c = text.charAt(pos);
    if (c != sep && (c == '\n' || c == '\r'))
        pos++;
    return text.substring(0, pos + character);
}

The assumption is that line breaks are consistent in the string, so encountering the first actual line break character means all the other ones are will look the same, and the other character is either not used, or can be ignored (but still needs a bit of special handling after the loop).

The code actually works here: https://ideone.com/AWLBuD, however (*) applies, as you can see IdeOne succeeds with the conversion most of the time, but in the case when the substring(x, 3, 0) produces a string with an "original" line break at the end, and then it gets println()-d (adding a "native" linebreak), it results in 2 or 1 total line breaks printed depending on whether the "original" linebreak matches with the "native" linebreak or not. I think it may happen in actual consoles too.

0
Reilas On

"... How can one get the character position in a multi-line String for the nth line and the character index on that line?
i.e. for use in string.substring(0, x). ..."

Count the new-line delimiters with a regular expression pattern.

Here is an example.

String substring(String text, int line, int character) {
    Pattern p = Pattern.compile("\\R");
    Matcher m = p.matcher(text);
    int o = 0;
    while (line-- > 0 && m.find()) o = m.end();
    return text.substring(0, o + character);
}