# Tidbits #2 from JCrete 2022—TimSort

At JCrete, I learned that the sort methods in the Java standard library can throw an exception with the curious message “Comparison method violates its general contract!”. Of course I had to dig into the gory details.

## Hey Tim, Are You Feeling Out Of Sorts?

The algorithm that is used by `Arrays.sort` and `Collections.sort` is called TimSort, after his his creator Tim Peters. The Java implementation can throw an exception with the message “Comparison method violates its general contract!”. I've never run into it, but an internet search reveals that an innumerable number of hapless coders have. If your comparison function doesn't produce a total order, the algorithm can sometimes detect it at minimal cost, and then it alerts the hapless coders with that exception.

This presentation by Stuart Marks explains (at the very end) in more detail when this detection happens.

Since I don't normally write bad comparators, I had a heck of a time constructing an example where the exception is reliably triggered. I finally produced one, inspired by this report:

```import java.util.*;

public class Test {
public static int count = 0;
public static void main(String[] args) {
Integer[] values = { 1940, 1893, 3363, 62, 2049, 1253, 4377, 4766, 2984,
1672, 3578, 4227, 3509, 3708, 3381, 290, 2841, 1679, 233, 2072,
1439, 1254, 228, 3311, 874, 3187, 2011, 3106, 2846, 2072, 7, 1557,
961, 4189, 1200, 1197, 4735, 1761, 1796, 1786, 2803, 2543, 1404,
4833, 2091, 3169, 2212, 3254, 2358, 1711 };
Arrays.sort(values, (a, b) -> {
count++;
return count % 42 == 21 ? -1 : 1;
});
System.out.println(Arrays.toString(values));
}
}
```

Run it and weep.

Clearly this comparison function is pathological. It doesn't even look at the values to be compared, but always claims that `b` comes after `a`, except every 42nd call, when it claims the opposite.

If you change any of the values, the program is likely to complete without an exception, of course with a useless result.

## Time Stamps

Next, I ran this example from a diligent coder:

```import java.sql.Timestamp;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Date;
import java.util.List;

public class Test {
// the same test data with all Dates, all Timestamps, all Strings or all Longs does NOT fail.
// only fails with mixed Timestamp and Date objects
public static void main(String[] args) {
List<Date> dates = new ArrayList<>();

for (int i = 0; i < 200; i++) {
System.out.println(dates);
Collections.sort(dates);
Collections.shuffle(dates);
}
}
}
```

Weirdly, this fails with the “Comparison method violates its general contract!” exception in Java 8 but it works fine in Java 17.

Marc Hoffmann found the Git commit that is responsible for the change.

It turns out that even though I myself never write bad comparators, the JDK had one.

Recall that the `java.util.Date` class represents a point in time with a precision of a millisecond. The `java.sql.Timestamp` is a subclass of `Date` with nanosecond precision. The nanoseconds are stored in the subclass, and everything up to the seconds and zero millseconds in the superclass. It seemed like a good idea at the time.

Up to JDK 8, `Date.compareTo(Date other)` compared the number of milliseconds from the epoch in `this` and `other`, using its internal state. But that state is not accurate with `Timestamp` instances. When `other` has milliseconds, they were ignored. This was fixed in Java 9, by calling `getTime` instead of relying on the internal state whenever `other` is an instance of a subclass of `Date`.

```Date d = new Date(1000L); // One (!) second after the epoch, i.e. 1970-01-01 00:00:01 UTC
Timestamp t = new Timestamp(1050L); // 50 milliseconds later
System.out.println(d.compareTo(t)); // 0 up to Java 8, -1 since Java 9
System.out.println(t.compareTo(d)); // 1 in all versions of Java
```

Up to Java 8, the `compareTo` method is not symmetric, which is a violation of the general contract. As long as you use Java 9 or above, you don't have to worry about this.

What about those who are stuck with Java 8? If you run with `-Djava.util.Arrays.useLegacyMergeSort=true`, then Timsort won't be used, and there won't be a “general contract” exception. The array may not be properly sorted, but who cares as long as the program doesn't crash, right?

## How Frequent Are These Failures?

With fewer than 32 elements, Timsort switches to insertion sort and never throws a “general contract” exception.

Here is a program with a completely random comparison method, sorting a list of 1000 objects. The sort has an 11% chance of throwing a “general contract” exception.

```import java.util.*;
import java.util.stream.*;

record DrunkBox(int index) implements Comparable<DrunkBox> {
private static Random random = new Random();

public int compareTo(DrunkBox other) {
return random.nextInt(-10, 10);
}
}

public class Test {
public static void main(String[] args) {
int count = 0;
final int NELEM = 1000;
final int NTRIES = 10000;
for (int i = 0; i < NTRIES; i++) {
try {
var boxes = new ArrayList<DrunkBox>(IntStream.range(0, NELEM).mapToObj(DrunkBox::new).toList());
Collections.sort(boxes);
count++;
}
catch (Exception ex) {
}
}
System.out.println(count);
}
}
```

With 32 elements, the probability of failure is about 0.5%. Still, that's plenty if you repeat it often enough.

## They Keep On Coming

Here is another tale of woe. Swing programmers complain that tab navigation in their apps became unusable when Timsort was introduced. Instead of moving to the next input element with focus, their app threw a “general contract” exception.

The culprit was yet another bad comparator in the JDK. This time, `javax.swing.LayoutComparator.java`. Here is the relevant code:

```int ax = a.getX(), ay = a.getY(), bx = b.getX(), by = b.getY();
int zOrder = a.getParent().getComponentZOrder(a) - b.getParent().getComponentZOrder(b);
if (horizontal) {
if (leftToRight) {
// LT - Western Europe (optional for Japanese, Chinese, Korean)
if (Math.abs(ay - by) < ROW_TOLERANCE) {
return (ax < bx) ? -1 : ((ax > bx) ? 1 : zOrder);
} else {
return (ay < by) ? -1 : 1;
}
}
else { ... }
}
else { ... }
```

The other cases (vertical, right to left) are similar.

What's wrong? There are separate rules for components whose y-coordinates are close to each other (less than 10 pixels apart). That can never work. When `ay` and `by` are close, and `by` and `cy` are close, it does not follow that `ay` and `cy` are also close. It is easy to make a scenario where the comparison is not transitive.

Ok, it's a little bit tedious because `LayoutComparator` is a package-private class in the `java.desktop` module, and these days the module system blocks reflective access. But as it turns out, the public `DefaultFocusManager` class has an instance variable

```private final LayoutComparator comparator =
new LayoutComparator();
```

and a method

```public boolean compareTabOrder(Component a, Component b) {
return (comparator.compare(a, b) < 0);
}
```

that we can use to explore the broken `LayoutComparator`. The following failure in JDK 17 shows that it was never fixed.

```import javax.swing.*;
var dfm = new DefaultFocusManager()
var a = new JLabel("Fred");
var b = new JLabel("Fred");
var c = new JLabel("Fred");
a.setLocation(110, 100);
b.setLocation(105, 105);
c.setLocation(100, 110);
var p = new JPanel()
dfm.compareTabOrder(a, b) // false, i.e. a ≥ b
dfm.compareTabOrder(b, c) // false, i.e. b ≥ c
dfm.compareTabOrder(a, c) // true, i.e. a < c
```

If you are a Swing programmer, this is pretty depressing. You can't control the comparator. Your only remedy is to use the `-Djava.util.Arrays.useLegacyMergeSort=true` setting and live with the fact that the sorting will sometimes be wrong. The users who care are the ones who can't use a mouse and rely on keyboard navigation. With few Swing programmers and few such users, this may never get fixed.

What's also depressing is the terrible advice that some people give when they attempt to diagnose such a situation. Here is one hapless user asking for help, and he is told that he should only access Swing components in the event dispatch thread. That's surely true, but it has nothing to do with the stack trace.

## How Not to Write a Comparator

A call `comparator.compare(a, b)` or `a.compareTo(b)` returns a negative integer if `a` comes before `b`, a positive integer if `a` comes after `b`, and 0 if `a` and `b` are indistinguishable.

There are three conditions that you need to ensure:

Reflexivity
If `a.equals(b)`, the comparison yields 0
Antisymmetry
Swapping `a` and `b` swaps the sign of the result
Transitivity
If `a` comes before `b` and `b` comes before `c`, then `a` comes before `c`.

When you see something like

```if (Math.abs(ay - by) < ROW_TOLERANCE) {
return (ax < bx) ? -1 : ((ax > bx) ? 1 : zOrder);
} else {
return (ay < by) ? -1 : 1;
}
```

you should immediately perk up. “Closer than a tolerance” is never transitive. Now suppose you remove that:

```if (ay == by) {
return (ax < bx) ? -1 : ((ax > bx) ? 1 : zOrder);
} else {
return (ay < by) ? -1 : 1;
}
```

That doesn't look antisymmetric. Why is it -1 in one branch and either 1 or `zOrder` in the other? You can create arguments where the sign doesn't flip when you flip them.

Generally, lovingly hand-crafted comparators are problematic. There are only a couple of common patterns for making correct comparators. Typically, you compare against a single field, or first against one, and if that's a tie, against another. The `Comparator` interface has utility methods `comparing` and `thenComparing` for these cases. Stuart Marks' video shows these. Use them instead of handcrafting your comparators!

## A Problem with Indistinguishable Elements

What happens if `a.equals(b)` is not the same as `a.compareTo(b) == 0`? Check this out:

```var a = new BigDecimal("0");
var b = new BigDecimal("0.00");
a.compareTo(b) // 0
a.equals(b) // false (!)
var h = new HashSet<BigDecimal>();
var t = new TreeSet<BigDecimal>();
t.equals(h) // true
h.equals(t) // false (!)
It is perfectly ok if `equals` lumps together objects with different state. The designers of `BigDecimal` could have gone that route and made `new BigDecimal("0")` and `new BigDecimal("0.00")` equal, just like `Set.of(1, 2)` and `Set.of(2, 1)` are equal. The only requirement for `equals` is that it is an equivalence relation that partitions the values into mutually indistinguishable sets.
For classes that implement `Comparable`, it is not a good idea to have an `equals` that is pickier than `compareTo`.