Twelve Days of Pattern Matching

Pattern matching in Java was derived from the syntax of the switch and instanceof statements in order to leverage the familiarity that programmers have with those constructs. As pattern matching has become more powerful in recent Java versions, that familiarity is colliding with the new needs of the pattern matching syntax and semantics. In this article, also published at Java Advent, I present six puzzlers and six principles to help you understand the latest pattern matching features.

.png

Puzzler #1: Colons, Arrows, Braces, Break, Yield

Which of the following lines can occur in a switch? (Not all necessarily in the same.) And in which kind of switch?

case 5: if (Math.random() < 0.5) break;
Click arrow to reveal answer

Sure. This is legal in a classic switch statement. Half the time, execution falls through the next case. Don't code like this at home.

case 5 -> log("TGIF"); yield "Friday";
Click arrow to reveal answer

No. The -> must be followed by an expression, throw, or a block. It would be ok if you enclosed the code following the -> in braces: case 5 -> { log("TGIF"); yield "Friday"; }

case 5: log("TGIF"); yield "Friday";
Click arrow to reveal answer

Yes. This is legal in a switch expression with fall through. This branch doesn't fall through, actually, but yields the expression's value. No braces needed because, colon.

case 5 -> { if (Math.random() < 0.5) break; log("TGIF"); }
Click arrow to reveal answer

All good in a switch statement without fall through. Half the time, the call to log is skipped. Don't code like this at home. Note that the braces are necessary.

Did you get all four right? Congratulations! You earned a partridge in a pear tree. Skip the next section and move on to puzzler #2.

Principle #1: Two Axes

The classic switch of the C language had a simple purpose: to be compiled into a “jump table” that holds the memory addresses of the code for each case. The value of the “selector”—the expression inside switch (...)—is used as table index, either as an offset or with a binary search. That is more efficient than a linear if/else if/else if/else branch sequence, particularly if the number of cases is large. In a high-level language, there is no way to code a jump table directly. Hence the switch statement.

Many programmers, when learning switch, were warned of the weirdness of “fall through”. By default, execution flows from one case to the next. Of course it does. That's how jump tables work. They only care about the efficient jump. If you don't want fall through, just add a break, which is compiled into a jump to the end.

Thirty years later, many modern programming languages support pattern matching. In its simplest form, using Java syntax:

String seasonName = switch (seasonCode)
   {
      case 0 -> "Spring";
      case 1 -> "Summer";
      case 2 -> "Fall";
      case 3 -> "Winter";
      default -> "???";
   };

There are two crucial differences:

Are these differences crucial enough to come up with a different syntax for pattern matching? The Java designers didn't think so. This is what they wrote in JEP 361:

“By teasing the desired benefits (expression-ness, better control flow, saner scoping) into orthogonal features, switch expressions and switch statements could have more in common. The greater the divergence between switch expressions and switch statements, the more complex the language is to learn, and the more sharp edges there are for developers to cut themselves on.”

Not everyone agreed.

So, now we have four forms of switch:

For a switch to be an expression, it must be in expression position: assigned to a variable or passed as a method argument. Also, if you see break, you know it must be a statement. And if you see yield, it must be an expression.

The colon : denotes classic fall-through. The -> indicates no fall-through. Mercifilly, you can't mix them in the same switch.

After a colon, you can have any number of statements. As always. With a switch expression, there must be one or more yield statements.

Conversely, after an arrow, there can only be an expression, or throw, or a block. Which must have yield in a switch expression.

Caution: Some programmers think that -> signals an expression switch because it looks like a lambda expression. And because it must be followed by an expression or block. That is not so. A no-fall-through switch statement uses case ... -> { ... }.

Puzzler #2

Is this legal?

Object x = ...;
String result = switch (x) {
   case "" -> "empty";
   case 0 -> "zero";
   default -> "something else";
};
Click arrow to reveal answer

No—a constant label of type java.lang.String and of type int is not compatible with switch selector type Object

What about

enum Size { SMALL, MEDIUM, LARGE, EXTRA_LARGE };
Object x = ...;
String result = switch (x) {
   case Size.EXTRA_LARGE -> "extra large";
   default -> "something else";
};
Click arrow to reveal answer

Perfectly legal.

Why isn't it like the preceding code snippet? The constant label has type Size, and the switch selector type is Object.

The rules are different for enum case constants. Their value must merely be assignment compatible to the selector type.

Principle #2: Selector types

The selector types of switch have expanded over time:

With pre-pattern matching switches, a constant case label must be a compile-time constant, and it must be assignment-compatible to the selector expression type. For example, you can have case 5 when the selector type is Integer.

With pattern-matching switches, the rules are different and complex. When the selector type is Object or some other supertype of String, Integer, Short, Byte, or Char, you can't have constant labels. For example,

case 0 -> "zero";

won't work when the selector type is something other than int, short, byte, char, Integer, Short, Byte, Char.

The remedy is:

case Integer i when i == 0 -> "zero";

But for enum, the rules have evolved differently. First off, the rules have changed for the case constants. Previously, you wrote

case EXTRA_LARGE -> "extra large";

The enum type was inferred from the selector type. Since now the selector type can be a supertype, you qualified enum names:

case Size.EXTRA_LARGE -> "extra large";

You can use them even if you don't have to, with an enum selector type.

More importantly, you are allowed to use enum constants in case labels. This is useful for pattern matching in a sealed hierarchy where some of the implementing classes are enumerations, such as in this (incomplete) JSON primitive type hierarchy:

sealed interface JSONPrimitive permits JSONNumber, JSONString, JSONBoolean {}
final record JSONNumber(double value) implements JSONPrimitive {}
final record JSONString(String value) implements JSONPrimitive {}
enum JSONBoolean implements JSONPrimitive { FALSE, TRUE; }
JSONPrimitive p = ...;
result = switch (p) {
   case JSONNumber(v) when v == 0 -> "zero";
   case JSONString(s) when s.isEmpty() -> "empty";
   case JSONBoolean.FALSE -> "false";
   default -> "something else";
}

Finally, note that constants are not allowed inside record patterns. For example, you cannot use

case JSONNumber(0) -> "zero";

You can use a when clause, as in the preceding example. Nicer syntax may come in the future.

Puzzler #3

Looking again at this (incomplete) JSON primitive type hierarchy:

sealed interface JSONPrimitive permits JSONNumber, JSONString, JSONBoolean {}
final record JSONNumber(double value) implements JSONPrimitive {}
final record JSONString(String value) implements JSONPrimitive {}
enum JSONBoolean implements JSONPrimitive { FALSE, TRUE; }

compare

if (j instanceof JSONNumber(var v)) d = "" + v;
else if (j instanceof JSONString(var s)) d = s;
else if (j instanceof JSONBoolean b) d = b.name();

and

switch (j) {
   case JSONNumber(var v): d = "" + v; break;
   case JSONString(var s): d = s; break;
   case JSONBoolean b: d = b.name(); break;
};

Do they do exactly the same thing? If no, for which value of j do they differ?

Click arrow to reveal answer

By design, pattern matching for instanceof and switch have the same behavior, including the binding to the matched variable (v or b in the example).

But there is one crucial difference. For historical reasons, instanceof is null-friendly. The expression null instanceof ... is simply false. But switch is null-hostile: switch (null) { ... } throws a NullPointerException.

So, the answer is: the two statements have the same effect except when j is null.

Knowing this, let's move on to record patterns:

record Box<T>(T contents) { }

Box<String> boxed = null;
String unboxed = switch (boxed) {
   case Box(String s) -> s;
};

What happens?

Click arrow to reveal answer

A NullPointerException. No surprise.

What about

Box<String> boxed = new Box(null);
String unboxed = switch (boxed) {
   case Box(String s) -> s;
};
Click arrow to reveal answer

No problem. s is bound to null, and unboxed becomes null.

What about

Box<Box<String>> doubleBoxed = new Box(null);
String unboxed = switch (doubleBoxed) {
   case Box(Box(String s)) -> s;
};
Click arrow to reveal answer

An implicit mechanism tries to match Box(null) wiith a Box(b), which is a Box(String s), and then set s = b.contents(). The match is deemed to fail, and there are no further matching cases. Therefore, a MatchException is thrown. Not a NullPointerException.

Principle #3: Null

To nobody's surprise, null is always a cause of grief. In Java 1.0, switch was only defined for primitive types, so null wasn't an issue. When wrappers were added, it made sense to say that null was exceptional. When enum was added in Java 5, that still made sense. Why would an enum value ever be null? And with switching on strings in Java 7, there was no reason to rock the boat either. A switch with a null selector simply throws a NullPointerException.

But with pattern matching, it was decided that it would be ugly to surround switch with checks against null, and a case null was allowed. For example:

String unboxed = switch (boxed) {
   case Box(String s) -> s;
   case null -> "empty";
};

Note that the first case is not a match. That explains the doubleBoxed puzzler.

You can combine case null with default, but not with any other case:

case null, default -> "something else"; // Ok
case null, 0 -> "nullish"; // ERROR

Adding case null to any switch makes the switch null-friendly, but it also turns it into a “enhanced” switch, which has more stringent requirements than its classic cousin. See the following sections.

Puzzler #4

Compare the following two uses of switch. Which one is incorrect, and why?

int x = ...;
String d = switch (x) {
   case 0 -> "zero"; 
   case 1, 2, 3 -> "small"; 
}

switch (x) {
   case 0: d = "zero"; break;
   case 1, 2, 3: d = "small"; break;
}
Click arrow to reveal answer

The first switch—an expression—won't compile. It is not exhaustive. If x is something other than 0, 1, 2, 3, it can't produce a value.

The second switch—a classic statement—doesn't have to be exhaustive. If x is something other than 0, 1, 2, 3, nothing happens.

Ok, now what about

Integer x = ...;
String d = "";
switch (x) {
   case 0: d = "zero"; break;
   case 1, 2, 3: d = "small"; break;
   case null: d = "null"; break;
}
Click arrow to reveal answer

This switch statement doesn't compile. It is not exhaustive.

Wait...since when do switch statements have to be exhaustive? If you are surprised, read on.

Principle #4: Exhaustiveness

All switch expressions must be exhaustive. For any selector value, there must be a matching case. This is necessary since the expression must always yield a value.

Classic switch statements need not be exhaustive. But “enhanced” switch statements have to. If you mean to do nothing when none of the cases match, add a default: break; or default -> {};

A switch is enhanced if it has a pattern, case null, or a selector other than a primitive/primitive wrapper/String/enum.

Note that cases with when clauses are ignored for exhaustiveness checking (unless the when clause is a compile-time constant). This switch is not exhaustive:

Integer x = ...;
String d = switch (x) {
   case 0 -> "zero";
   case Integer n when n > 0 -> "positive";
   case Integer n when n < 0 -> "negative";
}

The compiler isn't a mathematician. It doesn't try to reason that every integer must be zero, positive, or negative.

Remedy: case Integer _ or default in the last clause.

Exhaustiveness is particularly useful with sealed hierarchies:

switch (j) {
   case JSONNumber(var v) -> "" + v;
   case JSONString(var v) -> v;
   case JSONBoolean.FALSE -> "false";
}; // oops--what about JSONBoolean.TRUE?

Finally, note that null is never used in exhaustiveness checking. A switch can be exhaustive without case null. It is just null-hostile and throws a NPE with a null selector. Or a MatchError when there is a nested null in a record.

Puzzler #5

What is wrong with this switch?

String d = switch (obj) {
   case Number n -> "a number";
   case Integer i -> "an integer";
   default -> "something else";
};
Click arrow to reveal answer

With type and record patterns, order matters. The first case dominates the second. That is a compile-time error.

What about

Integer x = 0;
String d = switch (x) {
   case Integer i when i > 0 -> "positive";
   default -> "negative";
   case 0 -> "zero";
}
Click arrow to reveal answer

It's perfectly fine. For historical reasons, default has inconsistent dominance rules. Read on for the details.

Principle #5: Dominance

Type and record patterns are processed top to bottom. The compiler generates an error if one case dominates the other. For example:

case Number n

dominates

case Integer i

and

case Number n when n.intValue() == 0

The record pattern

case Box(var b)

dominates

case Box(JSONString(var s))

As with exhaustiveness checking, the contents of when clauses is not analyzed (unless they are compile-time constants). The compiler can't tell that

case Number n when n.intValue() >= 0

dominates

case Number n when n.intValue() == 0

The default clause must come after any patterns. But for historical reasons, it can come before constant cases.

With classic switch statements, the order of the cases doesn't matter, except when there is fall through. Because you can fall through from the default clause, it can be anywhere:

switch (n) {
  case 0: log("zero"); break;
  default: log("ignore the next log entry"); // FALL THROUGH
  case 1: log("one"); break;
}

I couldn't think of a realistic example where this behavior would be useful. Just put default last.

Puzzler #6

Can you declare variables with the same name in different cases?

switch (n) {
   case 0, 1: String d = "binary"; log(d); break;
   default: String d = "not binary"; log(d); break;
}
Click arrow to reveal answer

Since Java 1.0, it has been legal to declare a variable inside a switch. The scope extends from the point of declaration until the end of the switch.

Therefore, the switch above doesn't compile. The variable d is declared twice. Remedy: Use braces to confine d to a block.

What about variables introduced in patterns?

JSONPrimitive j = ...;
String d;
switch (j) {
   case JSONNumber(var v): d = "" + v; break;
   case JSONString(var v): d = v; break;
   case JSONBoolean v: d = v.name(); break;
};
Click arrow to reveal answer

This switch compiles. The scope of each pattern variable v extends to the end of the statements in the case.

Principle #6: Variable Scopes

There are three ways of declaring variables inside a switch:

  1. Inside a block: { var a = ...; ... }. These are unsurprising. The scope ends with the block.
  2. Inside a pattern: case JSONNumber(var v). The scope starts with the declaration, so you can use it in guards:
    case JSONNumber(var v) when v >= 0
    

    The scope is confined to the case.

  3. In a statement following a colon of a case. This is a weird historical artifact. More below.

Ever since the switch statement in the C programming language, it has been legal to declare a variable anywhere in the switch. Its scope extends to the end of the statement. After all, the case labels are just jump targets. This is perfectly legal:

int n = ...;
switch (n) {
   case 0, 1:
      String d = "binary";
      log(d);
      // FALL THROUGH
   default:
      d = "default";
      log(d);
}

Note that the default branch must assign something to d before using it. Otherwise, the compiler reports an error about a possibly uninitialized variable.

Because of the tracking of uninitialized variables, such switch-scoped variables are never useful. I have only seen them in certification exam questions. Just stick to block-scoped and pattern variables.

The alert reader may swonder what happens with fall-through into a pattern:

case Integer n: log(n); // FALL THROUGH
case String s: log(s.length()); break; // ERROR

This is an error. When falling through from case Integer n, it is impossible to bind the selector value to s. But you can fall into a type pattern that does not bind the match to a variable:

case Integer n: log(n); // FALL THROUGH
case String _: log("string"); break; // Ok

Don't code like that at home!

Conclusion

Pattern matching has the potential to make code easier to read, particularly when working with sealed type hierarchies that are designed with pattern matching in mind. This is common practice in functional programming languages. I imagine it will become much more common in Java when we have efficient value objects.

Java has chosen to incorporate pattern matching into the classic switch and instanceof syntax. That leverages programmer experience in straightforward cases. But it can create confusion in edge cases, as you can probably confirm from your performance on those puzzlers. (If you got them all correct, award yourself five gold rings.)

To keep out of trouble, I send you these six geese a-laying rules of thumb:

Comments powered by Talkyard.