At JCrete, I learned that the sort methods in the Java standard library can throw an exception with the curious message “Comparison method violates its general contract!”. Of course I had to dig into the gory details.
The algorithm that is used by Arrays.sort
and Collections.sort
is called TimSort, after his his creator Tim Peters. The Java implementation can throw an exception with the message “Comparison method violates its general contract!”. I've never run into it, but an internet search reveals that an innumerable number of hapless coders have. If your comparison function doesn't produce a total order, the algorithm can sometimes detect it at minimal cost, and then it alerts the hapless coders with that exception.
This presentation by Stuart Marks explains (at the very end) in more detail when this detection happens.
Since I don't normally write bad comparators, I had a heck of a time constructing an example where the exception is reliably triggered. I finally produced one, inspired by this report:
import java.util.*; public class Test { public static int count = 0; public static void main(String[] args) { Integer[] values = { 1940, 1893, 3363, 62, 2049, 1253, 4377, 4766, 2984, 1672, 3578, 4227, 3509, 3708, 3381, 290, 2841, 1679, 233, 2072, 1439, 1254, 228, 3311, 874, 3187, 2011, 3106, 2846, 2072, 7, 1557, 961, 4189, 1200, 1197, 4735, 1761, 1796, 1786, 2803, 2543, 1404, 4833, 2091, 3169, 2212, 3254, 2358, 1711 }; Arrays.sort(values, (a, b) -> { count++; return count % 42 == 21 ? -1 : 1; }); System.out.println(Arrays.toString(values)); } }
Run it and weep.
Clearly this comparison function is pathological. It doesn't even look at the values to be compared, but always claims that b
comes after a
, except every 42nd call, when it claims the opposite.
If you change any of the values, the program is likely to complete without an exception, of course with a useless result.
Next, I ran this example from a diligent coder:
import java.sql.Timestamp; import java.util.ArrayList; import java.util.Collections; import java.util.Date; import java.util.List; public class Test { // the same test data with all Dates, all Timestamps, all Strings or all Longs does NOT fail. // only fails with mixed Timestamp and Date objects public static void main(String[] args) { List<Date> dates = new ArrayList<>(); dates.add(new Timestamp(1498621254602L)); dates.add(new Timestamp(1498621254603L)); dates.add(new Timestamp(1498621254603L)); dates.add(new Timestamp(1498621254604L)); dates.add(new Timestamp(1498621254604L)); dates.add(new Timestamp(1498621254605L)); dates.add(new Timestamp(1498621254605L)); dates.add(new Timestamp(1498621254605L)); dates.add(new Timestamp(1498621254605L)); dates.add(new Timestamp(1498621254606L)); dates.add(new Timestamp(1498621254607L)); dates.add(new Date(1498621254605L)); dates.add(new Timestamp(1498621254607L)); dates.add(new Timestamp(1498621254609L)); dates.add(new Date(1498621254603L)); dates.add(new Date(1498621254604L)); dates.add(new Date(1498621254605L)); dates.add(new Date(1498621254605L)); dates.add(new Date(1498621254607L)); dates.add(new Timestamp(1498621254607L)); dates.add(new Date(1498621254608L)); dates.add(new Timestamp(1498621254608L)); dates.add(new Date(1498621254611L)); dates.add(new Timestamp(1498621254612L)); dates.add(new Timestamp(1498621254613L)); dates.add(new Date(1498621254607L)); dates.add(new Timestamp(1498621254607L)); dates.add(new Timestamp(1498621254608L)); dates.add(new Timestamp(1498621254609L)); dates.add(new Timestamp(1498621254611L)); dates.add(new Date(1498621254603L)); dates.add(new Date(1498621254606L)); for (int i = 0; i < 200; i++) { System.out.println(dates); Collections.sort(dates); Collections.shuffle(dates); } } }
Weirdly, this fails with the “Comparison method violates its general contract!” exception in Java 8 but it works fine in Java 17.
Marc Hoffmann found the Git commit that is responsible for the change.
It turns out that even though I myself never write bad comparators, the JDK had one.
Recall that the java.util.Date
class represents a point in time with a precision of a millisecond. The java.sql.Timestamp
is a subclass of Date
with nanosecond precision. The nanoseconds are stored in the subclass, and everything up to the seconds and zero millseconds in the superclass. It seemed like a good idea at the time.
Up to JDK 8, Date.compareTo(Date other)
compared the number of milliseconds from the epoch in this
and other
, using its internal state. But that state is not accurate with Timestamp
instances. When other
has milliseconds, they were ignored. This was fixed in Java 9, by calling getTime
instead of relying on the internal state whenever other
is an instance of a subclass of Date
.
Date d = new Date(1000L); // One (!) second after the epoch, i.e. 1970-01-01 00:00:01 UTC Timestamp t = new Timestamp(1050L); // 50 milliseconds later System.out.println(d.compareTo(t)); // 0 up to Java 8, -1 since Java 9 System.out.println(t.compareTo(d)); // 1 in all versions of Java
Up to Java 8, the compareTo
method is not symmetric, which is a violation of the general contract. As long as you use Java 9 or above, you don't have to worry about this.
What about those who are stuck with Java 8? If you run with -Djava.util.Arrays.useLegacyMergeSort=true
, then Timsort won't be used, and there won't be a “general contract” exception. The array may not be properly sorted, but who cares as long as the program doesn't crash, right?
With fewer than 32 elements, Timsort switches to insertion sort and never throws a “general contract” exception.
Here is a program with a completely random comparison method, sorting a list of 1000 objects. The sort has an 11% chance of throwing a “general contract” exception.
import java.util.*; import java.util.stream.*; record DrunkBox(int index) implements Comparable<DrunkBox> { private static Random random = new Random(); public int compareTo(DrunkBox other) { return random.nextInt(-10, 10); } } public class Test { public static void main(String[] args) { int count = 0; final int NELEM = 1000; final int NTRIES = 10000; for (int i = 0; i < NTRIES; i++) { try { var boxes = new ArrayList<DrunkBox>(IntStream.range(0, NELEM).mapToObj(DrunkBox::new).toList()); Collections.sort(boxes); count++; } catch (Exception ex) { } } System.out.println(count); } }
With 32 elements, the probability of failure is about 0.5%. Still, that's plenty if you repeat it often enough.
Here is another tale of woe. Swing programmers complain that tab navigation in their apps became unusable when Timsort was introduced. Instead of moving to the next input element with focus, their app threw a “general contract” exception.
The culprit was yet another bad comparator in the JDK. This time, javax.swing.LayoutComparator.java
. Here is the relevant code:
int ax = a.getX(), ay = a.getY(), bx = b.getX(), by = b.getY(); int zOrder = a.getParent().getComponentZOrder(a) - b.getParent().getComponentZOrder(b); if (horizontal) { if (leftToRight) { // LT - Western Europe (optional for Japanese, Chinese, Korean) if (Math.abs(ay - by) < ROW_TOLERANCE) { return (ax < bx) ? -1 : ((ax > bx) ? 1 : zOrder); } else { return (ay < by) ? -1 : 1; } } else { ... } } else { ... }
The other cases (vertical, right to left) are similar.
What's wrong? There are separate rules for components whose y-coordinates are close to each other (less than 10 pixels apart). That can never work. When ay
and by
are close, and by
and cy
are close, it does not follow that ay
and cy
are also close. It is easy to make a scenario where the comparison is not transitive.
Ok, it's a little bit tedious because LayoutComparator
is a package-private class in the java.desktop
module, and these days the module system blocks reflective access. But as it turns out, the public DefaultFocusManager
class has an instance variable
private final LayoutComparator comparator = new LayoutComparator();
and a method
public boolean compareTabOrder(Component a, Component b) { return (comparator.compare(a, b) < 0); }
that we can use to explore the broken LayoutComparator
. The following failure in JDK 17 shows that it was never fixed.
import javax.swing.*; var dfm = new DefaultFocusManager() var a = new JLabel("Fred"); var b = new JLabel("Fred"); var c = new JLabel("Fred"); a.setLocation(110, 100); b.setLocation(105, 105); c.setLocation(100, 110); var p = new JPanel() p.add(a) p.add(b) p.add(c) dfm.compareTabOrder(a, b) // false, i.e. a ≥ b dfm.compareTabOrder(b, c) // false, i.e. b ≥ c dfm.compareTabOrder(a, c) // true, i.e. a < c
If you are a Swing programmer, this is pretty depressing. You can't control the comparator. Your only remedy is to use the -Djava.util.Arrays.useLegacyMergeSort=true
setting and live with the fact that the sorting will sometimes be wrong. The users who care are the ones who can't use a mouse and rely on keyboard navigation. With few Swing programmers and few such users, this may never get fixed.
What's also depressing is the terrible advice that some people give when they attempt to diagnose such a situation. Here is one hapless user asking for help, and he is told that he should only access Swing components in the event dispatch thread. That's surely true, but it has nothing to do with the stack trace.
A call comparator.compare(a, b)
or a.compareTo(b)
returns a negative integer if a
comes before b
, a positive integer if a
comes after b
, and 0 if a
and b
are indistinguishable.
There are three conditions that you need to ensure:
a.equals(b)
, the comparison yields 0 a
and b
swaps the sign of the resulta
comes before b
and b
comes before c
, then a
comes before c
.When you see something like
if (Math.abs(ay - by) < ROW_TOLERANCE) { return (ax < bx) ? -1 : ((ax > bx) ? 1 : zOrder); } else { return (ay < by) ? -1 : 1; }
you should immediately perk up. “Closer than a tolerance” is never transitive. Now suppose you remove that:
if (ay == by) { return (ax < bx) ? -1 : ((ax > bx) ? 1 : zOrder); } else { return (ay < by) ? -1 : 1; }
That doesn't look antisymmetric. Why is it -1 in one branch and either 1 or zOrder
in the other? You can create arguments where the sign doesn't flip when you flip them.
Generally, lovingly hand-crafted comparators are problematic. There are only a couple of common patterns for making correct comparators. Typically, you compare against a single field, or first against one, and if that's a tie, against another. The Comparator
interface has utility methods comparing
and thenComparing
for these cases. Stuart Marks' video shows these. Use them instead of handcrafting your comparators!
What happens if a.equals(b)
is not the same as a.compareTo(b) == 0
? Check this out:
var a = new BigDecimal("0"); var b = new BigDecimal("0.00"); a.compareTo(b) // 0 a.equals(b) // false (!) var h = new HashSet<BigDecimal>(); var t = new TreeSet<BigDecimal>(); h.add(a); t.add(b); t.equals(h) // true h.equals(t) // false (!) t.add(a), h.add(b); h.size() // 2 t.size() // 1 (!)
There is a puzzler in there somewhere.
It is perfectly ok if equals
lumps together objects with different state. The designers of BigDecimal
could have gone that route and made new BigDecimal("0")
and new BigDecimal("0.00")
equal, just like Set.of(1, 2)
and Set.of(2, 1)
are equal. The only requirement for equals
is that it is an equivalence relation that partitions the values into mutually indistinguishable sets.
For classes that implement Comparable
, it is not a good idea to have an equals
that is pickier than compareTo
.
Comments powered by Talkyard.