In Java, given a multi-line String I want to get the substring from the beginning up to the nth line and the character index on that line. (Both line and character indexes are zero-based.)
For example, if we were to implement a method like this:
/**
* Returns the substring of the given string up to the given character index on the given line index.
*
* @param text input text
* @param line line index
* @param character character index
* @return substring
*/
public static String substring(String text, int line, int character);
Then, consider the following multi-line String:
hello
world
how
are
you?
For given inputs, the above method should return
- substring(text, 0, 2);
he
- substring(text, 1, 3);
hello
wor
- substring(text, 3, 0);
hello
world
how
I've considered a couple of approaches:
- Construct the substring, by operating on the
Stringup to the nth line:
One approach is to usestring.lines(), and build the substring. Something like this:
UPDATE: updated with an improved & neater implementation, based upon Eritrean's answer:
public static String buildSubstring(String text, int line, int character) {
long textLines = text.lines().limit(line + 1).count();
if (line > textLines) {
return text;
} else {
String[] rows = text.lines().toArray(String[]::new);
return IntStream.range(0, line + 1)
.mapToObj(i -> {
String lineText = rows[i];
return i == line ? lineText.substring(0, Math.min(character, lineText.length())) : lineText;
})
.collect(Collectors.joining(System.lineSeparator()));
}
}
However, my main concern would be the performance impact of excessive String creation.
- Get the substring up to the character index in the original String:
A more intuitive approach might be to usestring.substring(0, x), where x is the character index - in the original multi-lineString- for the nth line and the position in that line.
However, I don't have a clear idea as to what might be the best approach of finding that character index in the original String.
One approach could be to iteratively usestring.indexOf(System.lineSeparator(),lineIndex)to identify the location of the line in the original String, and add the character index on that line. Something like this:
public static String indexSubstring(String text, int line, int character) {
String separator = System.lineSeparator();
int separatorLength = separator.length();
int lineIndex = 0;
if (line > 0) {
lineIndex = text.indexOf(separator) + separatorLength;
for (int i = 1; i < line; i++) {
lineIndex = text.indexOf(separator, lineIndex) + separatorLength;
}
}
return text.substring(0, lineIndex + character);
}
However, this will not handle cases if the line separators in the text are different from the System.lineSeparator(); which is the case in my situation - that is, the original text could come from a unix or Windows environment and/or this functionality might be executed on a unix or Windows environment, and they need to be interoperable.
Of course, one could do a string.replaceAll("\\r?\\n, System.lineSeparator()), but that's going to do even more String creation than for the first approach using string.lines().
Note: For the purposes of this question, I'm not dealing with error cases - for example, that either of the line/character indexes are beyond the length of the original String, or the character index is beyond the length of the line. Those will be factored in later, once I've decided upon the underlying approach; or, for simplicity, we can assume that it will return everything on the line or in the input text.
Questions:
- How can one get the character position in a multi-line
Stringfor the nth line and the character index on that line?
i.e. for use in string.substring(0, x). - Is there a better approach than either of those I've set out above to get the substring?
Using existing System classes and methods will always get you a longer way, they are more efficient and get you to your result with more precision.