A Fibonacci microbenchmark runs slightly faster with Java 8 than Java 17 on some fellow's laptop. Should you stick with Java 8?
In this blog, Marian Čaikovski claims to demonstrate that “Java 17 is slower at calculations” than Java 8.
He offers two pieces of evidence.
The well-known slow Fibonacci method
public static long fibonacci(long n) { if (n <= 1) { return n; } else { return fibonacci(n - 1) + fibonacci(n - 2); } }
runs 9% slower with Java 17 than with Java 8 on his laptop when computing fibonacci(40)
a hundred times.
And computing the same number is 16% slower when using the code in the RecursiveTask
API docs:
class Fibonacci extends RecursiveTask<Integer> { final int n; Fibonacci(int n) { this.n = n; } protected Integer compute() { if (n <= 1) return n; Fibonacci f1 = new Fibonacci(n - 1); f1.fork(); Fibonacci f2 = new Fibonacci(n - 2); return f2.compute() + f1.join(); } }
The complete code is in this repo.
Elsewhere he points out that the language features between 8 and 17 are mostly meh. Combined with what he calls “performance regressions”, he doubts whether it's worth “upgrading an existing perfectly working application” to the latest Java version.
I agree that there haven't been huge language and API changes. The principal reason I upgrade is the ongoing maintenance. There are over 50,000 fixed issues between Java 9 and Java 17 in the bug database.
Of course, many of these issues are pretty technical, but I do run into them on a regular basis. Like https://blog.fastthread.io/2021/10/06/performance-impact-of-java-lang-system-getproperty/. Quietly solved in Java 11.
Back to those benchmarks. Do they really measure “calculations”? Not really. The first one does mostly method calls. The second exercises the fork join pool. Based on these data, I wouldn't say “whoa, our app does calculations; let's stick with Java 8”.
Are the numbers accurate? Java Champion Henry Tremblay wrote a JMH benchmark (code here) that is a bit more reliable than running the program directly, since it warms up the JVM first. On his laptop, Java 17 is also slower.
Java 8 Benchmark Mode Cnt Score Error Units MyBenchmark.fibPar avgt 20 71.610 ± 3.832 ms/op MyBenchmark.fibSeq avgt 20 326.094 ± 0.536 ms/op Java 17 Benchmark Mode Cnt Score Error Units MyBenchmark.fibPar avgt 20 82.433 ± 2.607 ms/op MyBenchmark.fibSeq avgt 20 356.393 ± 0.601 ms/op
Could it be that the code generated by the just in time compiler has gotten worse? I looked at the assembly code that the JIT generated for Java 8 and Java 17. This article by Gunnar Morling, another Java Champion, showed me how to generate the HSDIS library that is necessary for disassembly, and how to run the program with a log that can be viewed by JITWatch.
java -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=8.log -cp src/main/java com.acme.performance.Main 40 100
The C2 disassembly looked essentially identical for both versions (Java 8 line 69253, Java 17 line 2939). Interestingly, it inlined one level of call, computing
fibonacci(n - 2) + fibonacci(n - 3) + fibonacci(n - 3) + fibonacci(n - 4)
I also played with the parallel solution. I never liked this example. Clearly computing Fibonacci without memoization is dumb. I know it's just an example, and it's supposed to teach recursive subdivision. But the structure is curiously asymmetric. Why call
f2.compute() + f1.join();
What if you fork each subtask and combine the results? I rewrote the code like that:
Fibonacci f1 = new Fibonacci(n - 1); f1.fork(); Fibonacci f2 = new Fibonacci(n - 2); f2.fork(); return f1.join() + f2.join();
And when I ran that variant, it ran faster than the original. You would think that it shouldn't make much of a difference. And it ran faster with JDK 17 than with JDK 8. At least on my laptop. But when I ran it again today, I couldn't reproduce that.
Heinz Kabutz (also a Java Champion) reported this:
Hi Henri, your benchmark on my server: Java 8 # VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08 Benchmark Mode Cnt Score Error Units MyBenchmark.fibPar avgt 20 60.407 ± 0.527 ms/op MyBenchmark.fibSeq avgt 20 495.456 ± 0.035 ms/op Java 17 # VM version: JDK 17, OpenJDK 64-Bit Server VM, 17+35-2724 Benchmark Mode Cnt Score Error Units MyBenchmark.fibPar avgt 20 59.035 ± 0.525 ms/op MyBenchmark.fibSeq avgt 20 495.776 ± 0.219 ms/op
And he later wrote: I'm not a great fan of running benchmarks on laptops. Things that I thought were issues often went away when running on server hardware.
The takeaway is that these microbenchmarks are really hard to get right. There is a reason that JMH has the following message after it completes a run:
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell.
So, should you stay with JDK 8? Unless you want to compute factorials the dumb way on a laptop, you might want to take these results with many grains of salt. If in doubt, benchmark your workload under realistic conditions and give yourself the time to do it right. It's not an easy thing to do.
Comments powered by Talkyard.