Side Effect Puzzles

.png

Inspired by Heinz Kabutz' daily JGym workout, I present some puzzlers that show how horrible side effects in nested expressions can be.

Side Effects in Expressions

An expression has a value. For example, the expression (int) Math.sqrt(36 * 49), composed of a multiplication, a method call, and a cast (in that order) has value 42.

If an expression has any effect other than chewing up time, then the expression has a “side effect”. Examples are:

System.out.println(i) // Makes stuff appear in System.out
i++ // Increments i

Note that these are expressions, not statements. They become statements when you add a semicolon:

System.out.println(i);
i++;

A statement has no value. It is executed solely for its side effect.

The trouble comes when an expression with a side effect is used inside another expression.

We are safe with System.out.println(i). The return type of PrintStream.println(int) is void. You can never nest it inside another expression.

But i++ has both a side effect (incrementing i) and a value (the value of i before the increment). In contrast, the value of ++i is the value after the increment. Someone may have pestered you with puzzles such as the following:

// What does this print?
int i = 42;
System.out.println(i++ + ++i);

The right answer should of course be “Who cares”. But ever so often, one sees an array lookup like a[i++] (2969 times in the JDK 11 source) or a[++i] (173 times), so one needs to be familiar with those idioms.

Assignment has a Value

In Java, as in C++, = is an operator. An assignment such as i = 42 (without a semicolon) is an expression. Its value is that is being assigned, in this case, 42. Note that this is different from a variable declaration. Declarations such as

int i = 0;
var j = 0;

are statements, not expressions. You can't omit the semicolons, and you can't nest them in larger expressions.

Why make assignment into an expression? As Kernighan and Ritchie write in “The C Programming language”:

The line
nl = nw = nc = 0;
sets all three variables to zero. This is not a special case, but a consequence of the fact that an assignment is an expression with the value and assignments associated from right to left. It's as if we had written
nl = (nw = (nc = 0));

Of course, that means you can nest assignment statements anywhere. This is fair game:

j = (i = 3) * i;

Since C leaves the order of evaluation up to the implementation, this statement might set i to 3, then multiply with 3. Or it might first retrieve the value of i, then set i to 3 and compute the product. Because of this ambiguity, C programmers use nested assignments with restraint.

Left to right, operands before operation

Java is very proud not to leave details such as evaluation order to the whims of implementors. As The Java Language: A White paper pronounces: “Unlike C and C++, there are no "implementation dependent" aspects of the specification. ”

With expressions, evaluation order mostly works like you expect: left to right, inside to outside. For example, in the expression (i = 3) * i, the operands (i = 3) and i must be evaluated befor the *, and the left hand side of * is guaranteed to be evaluated first.

How complex can this get? In my printed copy of the Java Language Specification, Section 15.7 Evaluation Order goes over 6 mind-numbing pages. Starting with:

15.7.1 Evaluate Left-Hand Operand First

The left-hand operand of a binary operator appears to be fully evaluated before any part of the right-hand operand is evaluated.

Let's look at an example:

int[] a = { 1, 2, 3 };
int[] b = { 2, 3, 4 };
int[] c = a;
a[0] = a[(a = b)[0]];

Is it legal? Of course it is. The assignment expression a = b has a value, namely a reference to the array b.

Clearly, the value of (a = b)[0] is b[0] or 2. And a[2] is 3. Or is that b[2]? After all, a has just become b. No, it's left-to-right, so the old value of a is used.

And what about the leftmost assignment? Is the value assigned to the old or the new a? Again, the left-to-right rule applies, even though the = operator is right-associative. Since the value of a is only changed further to the right, the old value is used. I saved the a reference in c, so that you can run this in JShell and verify that c[0] is now 3.

Lazy evaluation

There is one important exception to the operands-before-operator rule. The operators

&&
||
? :

are evaluated lazily. The left operand is always evaluated, but the others may not be. For example:

i = 1;
j = 2;
if ((i = j) == 1 && (j = i) == 1) {
   System.out.println("Now they are both 1");
}

The first assignment i = j is carried out, but since it's value is 2, the left operand of && is false. Therefore, the right operand is never evaluated, and the second assignment doesn't happen.

Should you care? Surely nobody writes code anywhere as horrible as this. Except, I just spotted these beauties in the JDK source:

while ((s = stack) != null && (index += (len = s.length)) >= n)
while ((h = f.stack) != null ||
   (f != this && (h = (f = this).stack) != null))

Exceptions

What happens if the evaluation of a subexpression throws an exception? Of course, the expression is not further evaluated. However, side effects of previously evaluated subexpressions persist. For example, when evaluating

(x = 3) * (1 / 0)

x is set to 3 before the evaluation of the right hand side throws an ArithmeticException. This is simply a consequence of the left-to-right rule.

What about

(a = null)[1 / 0]

Does the array access throw a NullPointerException? No. The operands-before-operator rule specifies that both operands are evaluated first. Therefore, a is set to null, the first operand of the bracket operator is null, the evaluation of the second operand throws an exception, and the bracket operator never gets to throwing a NullPointerException.

Nothing new to see here. Keep it moving. Move along.

The authors of the Java Language Specification don't seem to have much faith in the impact of the previously mentioned mind-numbing pages in Section 15.7. In Section 15.10.4, it is explained all over again.

The Wisdom of the Ages

It seems obvious that subexpressions with side effects should be used sparingly. I could not find a style guide that gives useful guidance. The Twitter style guide has something to say about subexpressions: Be explicit about operator precedence. Don't make your reader open the spec to confirm, if you expect a specific operation ordering, make it obvious with parenthesis [sic].

// Bad.
return a << 8 * n + 1 | 0xFF;

// Good.
return (a << (8 * n) + 1) | 0xFF;

Huh? If I must be explicit, shouldn't it be

return ((a << ((8 * n) + 1)) | 0xFF);

Anyway, that has nothing to do with side effects. It's not easy to formulate an effective rule. The device

int c;
while ((c = in.read()) != -1)

is common enough. But why stop there? What if we also want to store the element in an array? We can't do

while ((buf[i++] = (byte) in.read()) != -1)

because that (a) stores the sentinel -1 and (b) falsely stops at a byte of 0xFF. Here's how to do it:

while ((c = in.read()) != -1 && (buf[i++] = (byte) c) < 0)

The condition < 0 is necessary to turn the right operand of && into a Boolean. It doesn't actually matter in this case what the Boolean is, and < 0 was the shortest Boolifier I could come up with. But if we had a subsequent &&, we could use < 128.

Just kidding, of course. Don't code like that.

After grepping the JDK source, I did the same for some of my own. I couldn't find any embedded ++ and only a few embedded =, when reading from an input stream. These days, there are methods for reading the entire stream, so I won't need those loops again. I feel pretty confident that I can pledge not to use embedded assignments in the future.

Comments powered by Talkyard.