I got a query about a curious incident of two recursive calls, both causing a stack overflow. But when the calls are reversed, one of them succeeds. Read on to find out how the case unfolds.

Ever so often, I get queries from people who are under the misapprehension that I know everything about Java. David Taylor from San José State University sent me this puzzler that he got from his colleague Ben Reed.

import java.util.Arrays; public class StackOverflowMystery { static int[] memo; static int fib(int i) { if (i < 2) {return 1;} if (memo[i] != -1) {return memo[i];} memo[i] = (fib(i - 2) + fib(i - 1)) % Short.MAX_VALUE; return memo[i]; } static void doFib(int i) { try { memo = new int[i + 1]; Arrays.fill(memo, -1); System.out.print("fib: " + fib(i)); } catch (StackOverflowError e) { System.out.println("fib: stack overflow at " + i); } } static int fib2(int i) { if (i < 2) {return 1;} if (memo[i - 1] == -1) {memo[i - 1] = fib2(i - 1);} if (memo[i - 2] == -1) {memo[i - 2] = fib2(i - 2);} memo[i] = (memo[i - 1] + memo[i - 2]) % Short.MAX_VALUE; return memo[i]; } static void doFib2(int i) { try { memo = new int[i + 1]; Arrays.fill(memo, -1); System.out.print("fib2: " + fib2(i)); } catch (StackOverflowError e) { System.out.println("fib2: stack overflow at " + i); } } public static void main(String[] args) { int n = args.length < 1 ? 20_000 : Integer.parseInt(args[1]); doFib(n); doFib2(n); System.out.println("====> reversing the order"); doFib2(n); doFib(n); } }

Never mind that there is a much better algorithm for computing Fibonacci numbers. The Fibonacci numbers aren't the point. They just provide an example of an algorithm with high stack depth.

Go ahead, compile and run it. There are two versions of computing the Fibonacci sequence with memoization (i.e., remembering previously computed values). A bit of background. Without memoization, computing

fib(n) = n < 2 ? 1 : fib(n - 2) + fib(n - 1)

is really bad. You keep recomputing the same values. For example:

fib(10) -> fib(8) -> fib(6) -> ... fib(7) -> ... fib(9) -> fib(7) -> ... fib(8) -> ...

Note that `fib(7)`

and `fib(8)`

will be computed again even though their results are already known. The recursive calls grow exponentially. This is commonly covered in an introductory programming class as a poor example of using recursion.

Ben's code doesn't have that problem. It stores previously computed values in an array. The versions have subtly different characteristics. When calling `fib(i)`

. the depth of recursion is `i / 2`

. Here are the calls made by `fib(10)`

:

fib(10) -> fib(8) -> fib(6) -> fib(4) -> fib(2) -> fib(0) fib(1) fib(3) -> fib(1) fib(2) fib(5) -> fib(3) fib(4) fib(7) -> fib(5) fib(6) fib(9) -> fib(7) fib(8)

Also note that the number of recursive calls no longer grows exponentially since they are looked up in the `memo`

array once known.

BTW, I am guessing that the `% Short.MAX_VALUE`

is there just to make sure that the result can never be -1, which denotes a previously unknown result. Again, the focus is not on computing Fibonacci numbers, but on studying recursive call patterns.

The depth of recursion with `fib2(i)`

is higher, namely `i - 1`

, as you can see from this chain of calls:

fib2(10) -> fib2(9) -> fib2(8) -> fib2(7) -> ... -> fib(1)

Ben expected `fib2`

to conk out at some stack depth where `fib`

still succeeds. But the order of the calls shouldn't matter. Yet it does:

fib: stack overflow at 20000 fib2: stack overflow at 20000 ====> reversing the order fib2: stack overflow at 20000 fib: 11628

That was his question. Why does the second call to `fib`

succeed?

First off, I added a counter for the stack depth. That gave even more confusing results.

fib: stack overflow at 20000 maxDepth: 7744 fib2: stack overflow at 20000 maxDepth: 7441 ====> reversing the order fib2: stack overflow at 20000 maxDepth: 6558 fib: 11628 maxDepth: 10000

Why would the stack first get shorter and then long enough to make `fib(20000)`

complete?

It almost looks like the stack size can adapt. I could find very little information about how the JVM stack is actually implemented. The JVM spec §2.5.2 is very vague: Because the Java Virtual Machine stack is never manipulated directly except to push and pop frames, frames may be heap allocated. The memory for a Java Virtual Machine stack does not need to be contiguous... This specification permits Java Virtual Machine stacks either to be of a fixed size or to dynamically expand and contract as required by the computation.

I experimented with the `-Xss`

flag that sets the stack size. It influenced the size of the stack, but the strange behavior was still the same.

I found an article Stack Overflow handling in HotSpot JVM by Andrei Pangin that seemed interesting but unrelated. It talks about native calls and a “guard zone”.

The only other article that I ran across was On Stack Replacement in HotSpot JVM by Stanley Guan, published under the pseudonym of “XML and More”. It had some log records in XML, and my eyes glazed over. This was a mistake. It contained the seed to the actual explanation.

In desperation, I asked Heinz (“Hercule”) Kabutz. He sent me the program back with a simple change in `main`

and the terse comment “run it several times and the mystery goes away ...”

public static void main(String[] args) { int m = args.length < 1 ? 10 : Integer.parseInt(args[0]); int n = args.length < 2 ? 20_000 : Integer.parseInt(args[1]); for (int i = 0; i < m; i++) { doFib(n); doFib2(n); System.out.println("====> reversing the order"); doFib2(n); doFib(n); System.out.println(); } }

Sure enough, after a few iterations, the result becomes regular.

fib: stack overflow at 20000 maxDepth: 7737 fib2: stack overflow at 20000 maxDepth: 7480 ====> reversing the order fib2: stack overflow at 20000 maxDepth: 6558 fib: 11628 maxDepth: 10000 fib: stack overflow at 20000 maxDepth: 7378 fib2: 11628 maxDepth: 19999 ====> reversing the order fib2: stack overflow at 20000 maxDepth: 7232 fib: 11628 maxDepth: 10000 fib: 11628 maxDepth: 10000 fib2: stack overflow at 20000 maxDepth: 6558 ====> reversing the order fib2: stack overflow at 20000 maxDepth: 6558 fib: 11628 maxDepth: 10000 fib: 11628 maxDepth: 10000 fib2: stack overflow at 20000 maxDepth: 6558 ====> reversing the order fib2: 11628 maxDepth: 19999 fib: 11628 maxDepth: 10000 fib: 11628 maxDepth: 10000 fib2: 11628 maxDepth: 19999 ====> reversing the order fib2: 11628 maxDepth: 19999 fib: 11628 maxDepth: 10000 fib: 11628 maxDepth: 10000 fib2: 11628 maxDepth: 19999 ====> reversing the order fib2: 11628 maxDepth: 19999 fib: 11628 maxDepth: 10000 fib: 11628 maxDepth: 10000 fib2: 11628 maxDepth: 19999 ====> reversing the order fib2: 11628 maxDepth: 19999 fib: 11628 maxDepth: 10000 ...

Apparently the behavior becomes stable once the just-in-time compiler kicks in. The compiled instructions run just as expected. With larger values of `n`

, calls to `fib2(n)`

overflows the stack, but also without surprises (after a few initial confusing iterations).

To see when the JIT kicks in, run the program with

java -XX:+UnlockDiagnosticVMOptions "-XX:CompileCommand=print StackOverflowMystery::fib*" \ StackOverflowMystery 10 30000

On Ubuntu, I had to install `libhsdis0-fcml`

. This detailed article shows you how to build the helper library on other platforms, and it goes into great detail about the disassembly process.

There are actually two levels of compilation, called C1 and C2. More details here.

You can turn off both levels, leaving the code to run under the virtual machine without Hotspot:

java -XX:+UnlockDiagnosticVMOptions -XX:TieredStopAtLevel=0 \ StackOverflowMystery 10 30000

Then the results are completely uniform again. The virtual machine does not magically grow the stack. (Thanks to this article for pointing out how to use the `TieredStopAtLevel`

flag.)

The problem that Ben observed is a simple consequence of just-in-time compilation. Of course, the JIT doesn't neatly proceed after a top-level call to `fib`

or `fib2`

has completed. Somewhere in the recursive call sequence, the JIT kicks in (twice). When the compiled call replaces the virtual machine call or C1 call, the stack contains a mixture of C2, C1, and virtual machine frames, which will have different sizes, as shown in this figure from Andrei Pangin's article:

With such a mixture, it is hard to predict when exactly the stack overflows. Once all code is at C2, the behavior is uniform.

- When I explained my findings, Ben replied: “it's tragic that this is another gotcha with using java.” I think that's a little unfair. In Python, the current darling in the ed biz, it's no different. Try teaching Big-Oh to students when they realize that it's much faster to sort an array to find the maximum than to run the O(n) algorithm that you taught them. Because
`sort`

makes a native call, and no Python code can compete with that, O(n) or not. Is that tragic? If you want a teaching environment with a clear and consistent execution model, and no tragedy, consider either C or Scheme. - The traditional wisdom in Java is that the JIT is your friend. And it really is in most situations. If you have a long-running computation, who cares how it performs the first few times? Of course, it would be bothersome to have a stack overflow before the JIT kicks in, but that is both rare and manageable.
- What if you don't have a long-running computation? Then maybe Java isn't the best choice. Or if you have a good reason for using Java, can you use native compilation (Graal)? The native image of
`StackOverflowMystery`

has no surprises, is faster and uses less memory.

Comments powered by Talkyard.