.jpg

Help! My LocalDate Isn’t Flattened

Project Valhalla promises to flatten objects in the Java virtual machine. Of course, it is much better to have an array of fifty million values than fifty million object references, each with a header and the value. Just before giving a presentation about that at the JFall conference, I saw an article at inside.java that promised such flattening with an array of LocalDate objects. But I could not reproduce it. The culprit? Serialization, of course. Read on for the gory details!

Project Valhalla

Eleven years in the making, Project Valhalla promises: “Codes like a class, runs like an int.” And it is really getting there. You define a value class or value record, such as

public value record Point(double x, double y) {
   public Point moveBy(double dx, double dy) { return new Point(x + dx, y + dy); }
}

The JVM will try to represent it as a 16 byte value instead of a reference to a heap object with a header + 16 byte.

Right now, that is not easy to observe. In my conference presentation, I had to use an internal API to create an array that could be flattened:

Point[] path = (Point[]) ValueClass.newNullRestrictedNonAtomicArray(Point.class,
    NPOINTS, new Point(0, 0));

At some point in the future, you'll be able to write something like

var path = new Point![NPOINTS]{ new Point(0, 0) } // Syntax may change

The morning before the presentation, I stumbled upon an article by Dan Smith that made it all so easy. He makes a set of LocalDate, calls toArray, and processes the values. With Valhalla, the trivial benchmark is significantly faster. No internal API required.

But it only worked with the “early access” release posted at https://jdk.java.net/valhalla/. When I built the Valhalla JDK from the Git repo, there was no improvement. I was very confused.

The whole point of my presentation was “Trust, but verify”. So, following my own sage advice, I verified with

jcmd DateTest GC.class_histogram | head

And indeed, with the EA build, we can see flattening:

1:             1      400000016  [Ljava.time.LocalDate; (java.base@26-jep401ea2)

Woohoo—a flat array of 50_000_000 × 8 bytes + a header.

But when building Valhalla from source, nope:

    1:      50000003     1200000072  java.time.LocalDate (java.base@26-internal)
    2:             1      200000016  [Ljava.time.LocalDate; (java.base@26-internal)

That’s fifty million objects and an array of fifty million references thereto. Plus a header.

Why the difference? A mystery.

Nullability

On the way to resolving this mystery, we need to talk about null. Java being Java, any object can potentially be null. Being an object of a value class (or value record) doesn’t change that. So, with a value type Point

Point p = null;

will be perfectly ok.

That puts pressure on flattening values. One needs an extra bit to indicate whether a value is null, or full of flattened field goodness. In the example of Point, a potentially null value needs 129 bits. In my presentation, I was able to demonstrate effective flattening with the non-public method newNullRestrictedNonAtomicArray, promising no null values. Because currently Valhalla has no syntax for explicitly declaring such information. The ! syntax will come later.

With Valhalla, LocalDate is a value type. Flattened to its year/month/day representation. Or null. After all, there will be some code out there where a LocalDate of null indicated some situation where a date was not available.

On current hardware, an implicitly flattenable value needs to fit into eight bytes, including the nullness indicator bit. It seems that LocalDate woud not qualify. It has three instance fields

    private final int year;
    private final short month;
    private final short day;

All together, 65 bits. That’s why I could not see any optimization when building Valhalla from source.

Why did the early access build do better? It changed LocalDate to

    private final int year;
    private final byte month;
    private final byte day;

And why not? Calendar months and days fit into a byte. And now LocalDate takes 48 bits, or 49 with a nullness bit. (The latter actually takes a byte.) Small enough to flatten.

Serialization

Sadly, the change from short to byte in LocalDate was soon undone. And why? Serialization. Of course.

Of course? I had a closer look. The LocalDate class doesn’t simply write the field values to an ObjectOutputStream. That would indeed be fragile. Instead, it uses the writeReplace mechanism to serialize a separate object, holding the year as an int, and the month and day as byte. For efficiency. This has been the case ever since Java 8.

So, changing the fields from short to byte should make no difference. That’s what the JDK experts thought too, because in Java 25, they did just that.

Which allowed for the significant performance improvement in the inside.java article.

Why did they undo such an auspicious change? This is pretty subtle. The writeReplace mechanism did exactly what it was supposed to do. It separated the wire format from the internal representation. Serializing java.time.LocalDate objects with Java 8 and deserializing them with Java 25, or the other way around, worked perfectly.

But then, someone serialized the LocalDate.class object, probably deep in the guts of some framework. And that did not work well.

More Than You Ever Wanted to Know About Serialization

.gif

Within weeks after Java 1.0 was released, way back when in 1996, the first edition of Core Java was published. Its code supplement contained an elementary serialization library, a feature that was obviously missing from Java at the time. Serialization isn’t hard. The point is to be able to identify objects that have been previously encountered. When Java 1.1 came along, with a proper serialization mechanism, I updated the coverage to drop our own, and described the official one, including a deep dive into the bytes of the wire format.

The basic idea is simple. An object is serialized by writing out its fields, skipping the ones that are static or transient. Conversely, when reading in an object, those fields are restored from the values in the object stream.

What happens when a class changes after the time that the object was written? The data of a serialized object include the serialVersionUID of the class: a hash of the names and modifiers of the class, interfaces, fields, and methods. If that hash changes, deserialization fails. And that makes sense. If the class has changed in any way, all bets are off what those out-of-date field values might mean.

But classes evolve all the time, and wouldn’t it be a shame if a harmless change stood in the way of deserialization? For that reason, the Java designers provide a mechanism for “versioning”.

In order to opt into versioning, a class must declare a static field

private static final long serialVersionUID = ...L;

Then that value, and not the hash, is used to identify the class. The programmer is responsible for changing the serialVersionUID whenever the data representation changes incompatibly, or to keep it the same and do something to take care of any compatibility issues.

As an example, the java.util.Date class provides a serialVersionUID and declares all instance fields as transient. None of them are saved. Instead, the number of milliseconds since the Unix epoch is written:

@java.io.Serial
private void writeObject(ObjectOutputStream s) throws IOException {
    s.defaultWriteObject();
    s.writeLong(getTimeImpl());
}

With this setup, the Date class is free to change the internal representation at will, as long as it always converts it to and from those milliseconds.

What’s with the java.io.Serial annotation? Over time, the rules for serialization became rather byzantine. Annotating all serialization features is good practice and allows for checking that the programmer gets everything right.

An unhappier example is java.math.BigInteger. At one point way back when, a bunch of fields were written that no longer exist. To maintain backwards compatibility, all those field data are still written. The class declares a static serialPersistentFields array with all the “fields” that should be saved, thereby overriding the default mechanism of saving the non-static and non-transient fields.

@java.io.Serial
private static final ObjectStreamField[] serialPersistentFields = {
    new ObjectStreamField("signum", Integer.TYPE),
    new ObjectStreamField("magnitude", byte[].class),
    new ObjectStreamField("bitCount", Integer.TYPE),
    new ObjectStreamField("bitLength", Integer.TYPE),
    new ObjectStreamField("firstNonzeroByteNum", Integer.TYPE),
    new ObjectStreamField("lowestSetBit", Integer.TYPE)
};

Then the writeObject method writes the field data:

@java.io.Serial
private void writeObject(ObjectOutputStream s) throws IOException {
    ObjectOutputStream.PutField fields = s.putFields();
    fields.put("signum", signum);
    fields.put("magnitude", magSerializedForm());
    // The values written for cached fields are compatible with older
    // versions, but are ignored in readObject so don't otherwise matter.
    fields.put("bitCount", -1);
    fields.put("bitLength", -1);
    fields.put("lowestSetBit", -2);
    fields.put("firstNonzeroByteNum", -2);
    s.writeFields();
}

The java.time.localDate class handles serialization more elegantly. It bypasses the normal serialization with the writeReplace method:

@java.io.Serial
private Object writeReplace() {
    return new Ser(Ser.LOCAL_DATE_TYPE, this);
}

This means that a Ser object is written to the object stream instead of the LocalDate object.

That Ser object writes the year, month, and day as an int and two byte. Even though LocalDate has three instance variables of type int, short, and short.

.png

Also, the Ser object uses the Externalizable interface for slightly more efficient serialization, but that's not important right now.

What is important is that classes can take control over serialization, writing and reading data that are stable and independent of the current implementation.

Why Did It Fail?

So, let’s recapitulate. Valhalla wants to change the LocalDate fields from int/short/short to int/byte/byte. And serialization is flexible enough to accommodate that. Because LocalDate had the foresight to skip the default mechanism in favor of writeReplace.

And that really works. You can freely exchange serialized LocalDate instances between Java 25 and the Valhalla early access build.

But it fails when serializing the class object LocalDate.class.

To see why, one needs to dig deeply into the wire protocol. A class object is saved as

0x76 classDesc

where classDesc includes the non-static non-transient fields

If they don’t match, deserialization fails for the class object, because the fields are not compatible.

Let that sink in. Even though the fields are completely ignored for serialization, their mismatch causes failure with the class object.

There is a remedy. The LocalDate class needs to disavow knowledge of any fields for the purpose of serialization:

private static final ObjectStreamField[] serialPersistentFields = {};

Then the fields are again compatible because missing fields are ok. https://bugs.openjdk.org/browse/JDK-8371410 proposes just that remedy.

Let’s hope that happens, so that arrays of LocalDate can be flattened, as they should be.

Conclusion

Everyone loves to hate serialization, and for once, everyone is right.

If you really want to make your own class serializable, do not simply use the default that dumps the instance fields. Instead, design a wire format that is stable in the long term. Use the writeReplace mechanism to write those stable data, just like LocalDate does. Set serialVersionUID to 42L, or whatever your favorite constant is. And finally, set serialPersistentFields to an empty array. Or null, but that’s two more characters.

Comments

With a Mastodon account (or any account on the fediverse), please visit this link to add a comment.

Thanks to Carl Schwann for the code for loading the comments.

Not on the fediverse yet? Comment below with Talkyard.