Natural Sorting in Java

122 Views Asked by At

Just a heads up: This is maybe not as much of a 'help me program' question, but more of an 'is this a bug or just not clearly defined behaviour'.

Let's take the following list of strings:

item01mark2, item10mark2, item10mark03, item1mark3

Intuitively it should (of course) sort to:

item01mark2, item1mark3, item10mark2, item10mark03

Using Java17's standard 'natural' ordering implementation 'Comparator.naturalOrder()', this list sorts to:

item01mark2, item10mark03, item10mark2, item1mark3

After some quick checks, other natural sorting libraries (e.g. Python 'natsort') agree with the intuitive sorting. Would you consider this a bug (worth reporting), or is this just due to the vagueness of the definition of natural string sorting?

Update: Okay, so basically this is just a language-disambiguation thing. I expected something called 'NaturalSorting' to be an implementation of Natural Sorting, which is just isn't. Case closed.

2

There are 2 best solutions below

0
Alwin Joseph On BEST ANSWER

In the case of Java's Comparator.naturalOrder(), it follows the lexicographic order of characters in the string, so it considers "10" as smaller than "2" when comparing the strings. This behavior aligns with how strings are typically sorted as plain text.

On the other hand, libraries like Python's natsort use a more human-friendly interpretation of natural sorting. They attempt to recognize and compare numbers within the strings as actual numeric values, which leads to a different sorting result that matches human expectations more closely.

Whether this is a bug or not largely depends on the specific use case and the expectations of the users. If the goal is to sort strings in a way that matches human intuition, then the behavior of natsort may be preferred. However, if you require strictly lexicographic sorting, Java's Comparator.naturalOrder() behavior would be more appropriate.

0
NoDataFound On

If you want such a comparator, you would need a specific library such as https://github.com/gpanther/java-nat-sort (I simply searched on DuckDuckGo for that one; and I used another which I can't find back).

The main idea of those kind of Comparator is to:

  • split each string by two types of token; a token being a set of numerical character or every other character
  • sort each token using the appropriate sort (eg: lexical sort for string token, numeric sort for numeric tokens)

The natural order of Java mostly means transform a Comparable into a Comparator: like most other language, natural comparison of String use the lexical order of characters.