Lessons from Advent Of Code 2023 - Part 2: Streams

In this series of articles, I reflect on what I learned from working through the 2023 Advent of Code problems in Java. This installment is all about streams. When should you use them, and when should you stick with loops?

.jpeg

In my previous post, I compared my pedestrian solution to a Advent of Code against a considerably shorter one that uses streams heavily. I liked some uses of the streams but not others.

On the Oracle https://dev.java site, Venkat Subramanian has a series of tutorials entitled Refactoring from the Imperative to the Functional Style. I find some of those refactorings compelling, others less so.

JEP 461 proposes an enhancement to the stream API. So-called gatherers support custom intermediate operations, similar to what collectors do for terminal operations. I use collectors to produce maps, but I find that they get complex quickly. The examples in the JEP seem pretty complex too.

In general, why the rush to streams? In situations, where the stream code is clear and to the point, I absolutely see the benefit. I also understand that with practice, some patterns that look strange today will look normal once they are commonly used. On the other hand, consider the selling point of virtual threads: to free programmers from the tyranny of the reactive style. What is bad about reactive? Chained callback methods are less flexible than the branches, loops, and exceptions that Java already has. And then, debugging. Stack traces are uninformative. Breakpoints are triggered in unintuitive orders.

The same is true for streams. It is much simpler to reason about programs that act sequentially.

In this post, I'd like to propose some simple rules when to use streams, and when to stick with traditional loops and branches.

Max and Min

Suppose you need to find the largest or smallest element of a collection, using some comparison order. The Java API gives you two choices:

T largest = Collections.max(collection, comparator);
T largest = collection.stream().max(comparator).get();

I used to reach for the first solution. Its behavior and performance are easy to understand.

However, there are some issues. What happens when the collection is empty? Does Collections.max return null? No, it throws a NoSuchElementException. In contrast, Stream.max returns an Optional<T>, which gives me more options to hande the exceptional case.

Also, what if the data are in an array? There is no Arrays.max, but streams work fine.

Stream.of(array).max(comparator)

Verdict: In 2024, use streams.

Sums

Advent of Code 2023 Problem 1 asks to add the number made from the first and last digit in each input line. For example, with the inputs

1abc2
pqr3stu8vwx
a1b2c3d4e5f

the sum is 12 + 38 + 15 = 65.

Here is a solution with a loop:

List<String> lines = Files.readAllLines(path);
int sum = 0;
Pattern p = Pattern.compile("^[^\\d]*(\\d).*(\\d)[^\\d]*$");
for (String line : lines) {
    Matcher m = p.matcher(line);
    if (m.matches()) {
        int firstLast = Integer.parseInt(m.group(1) + m.group(2));
        sum += firstLast;
    }
}

That's simple enough to understand. For each line that matches the regex, form the two-digit number and add it to sum. You have probably written many similar loops.

This loop can be neatly transformed into a stream computation:

int sum = lines.stream()
    .map(line -> p.matcher(line))
    .filter(m -> m.matches())
    .mapToInt(m -> Integer.parseInt(m.group(1) + m.group(2)))
    .sum();

I think that is a win. The stream code conveys an important message, namely that there is nothing complex going on. Note that after the initial map, we only use the Matcher objects. The line is no longer used (and it is no longer accessible).

Verdict: Use (Int|Long|Double)Stream.sum if you compute the sum of a property.

Method Expressions

One can make the stream computation even neater by using method expressions:

int sum = lines.stream()
    .map(p::matcher)
    .filter(Matcher::matches)
    .mapToInt(m -> Integer.parseInt(m.group(1) + m.group(2)))
    .sum();

I must confess that these always slow me down a bit. Here is a potentially useful table:

Variant Lambda expression Method expression
Class::staticMethod x -> Pattern.compile(x) Pattern::compile
Class::instanceMethod x -> x.matches() Matcher::matches
object::instanceMethod x -> p.matcher(x) p::matcher
Out of luck x -> x.substring(k)

One can argue whether method expressions are actually easier to read than the equivalent lambda expressions. However, they clearly signal that there is a single method call.

Verdict: Use them when you can.

For Each

Now actually, the Advent of Code problem has a twist. If the input line has a single digit, it becomes the first and last digit. For example,

treb7uchet

yields a summand of 77. My actual solution was:

Pattern p = Pattern.compile("^[^\\d]*(\\d).*(\\d)[^\\d]*$");
Pattern p2 = Pattern.compile("^[^\\d]*(\\d)[^\\d]*$");
for (String line : lines) {
    Matcher m = p.matcher(line);
    if (m.matches()) {
        int firstLast = Integer.parseInt(m.group(1) + m.group(2));
        sum += firstLast;
    }
    else {
        m = p2.matcher(line);
        if (m.matches()) {
            int firstLast = Integer.parseInt(m.group(1) + m.group(1));
            sum += firstLast;
        }
    }
}

Now the sum is incremented in two branches, and there is no obvious stream equivalent.

At this point, some programmers turn to forEach:

lines.forEach(line -> {
        Matcher m = p.matcher(line);
        if (m.matches()) {
            . . .
        }
        else {
            . . .
        }
    });

For the puzzle, this doesn't even work. You cannot have

sum += firstLast;

in a lambda expression. A lambda expression cannot access a non-final local variable from the enclosing scope.

I have seen programmers turn sum into a field, or an array of length 1, to overcome this restriction.

But why? There is no intrinsic advantage to forEach. Just use a loop. Or extract the if/else into a helper function and sum its results, as shown in the next section,

Complex Lambdas

Instead of using forEach, we can compute the contribution in a lambda expression, and then get the sum of all contributions:

int sum = lines.mapToInt(line -> {
        Matcher m = p.matcher(line);
        if (m.matches()) return Integer.parseInt(m.group(1) + m.group(2));
        else {
            m = p2.matcher(line);
            if (m.matches()) return Integer.parseInt(m.group(1) + m.group(1));
            else return 0;
        }
    }).sum();

That works, but it is pretty disruptive to understand both the stream pipeline and the complex code in the lambda expression.

Avoid such complex lambdas by refactoring into a method:

public static int firstLast(String line) {
    Matcher m = p.matcher(line);
    if (m.matches()) return Integer.parseInt(m.group(1) + m.group(2));
    else {
        m = p2.matcher(line);
        if (m.matches()) return Integer.parseInt(m.group(1) + m.group(1));
        else return 0;
    }
}

. . .

int sum = lines.mapToInt(line -> firstLast(line)).sum();

Finding, Counting, Collecting Matches

To find an element in a list, you can use the indexOf method. But only if you are searching for an exact match. If you want to find a value that matches some criterion, then you need a loop:

boolean found = false;
int i = 0; 
while (i < list.length() && !found) {
   if (list.get(i) is a match) found = true;
   i++;
}
if (found) ... 
  

Here, streams are a big win:

Optional<T> result = list.stream().filter(e -> e is a match).findFirst();

Counting and collecting all matches are also simple:

long count = list.stream().filter(e -> e is a match).count();
List<T> matches = list.stream().filter(e -> e is a match).toList();

Integer Ranges

What if you need the position of a match? You get it from indexOf or the loop of the preceding section. But not with the stream solutions.

The stream alternative is to make a stream of the index values:

int position = IntStream.range(0, lst.size())
  .filter(i -> list.get(i) is a match)
  .findFirst()
  .orElse(-1);

At first sight, this may look a bit odd because we are used to putting the elements, and not their positions, into the stream. But it is not really different from the for loop that my fingers can type on autopilot:

for (int i = 0; i < list.size(); i++) 

It is also easy to collect the positions of all matches:

int[] positions = IntStream.range(0, lst.size())
  .filter(i -> list.get(i) is a match)
  .toArray();

Unlike this tutorial, I would not rewrite every for loop as an IntStream. But if you compute one or more index values, then it makes sense to start with a stream of all of them.

Reduce

The primitive type streams have methods for computing sum and average, but what if you need to compute a product? In my day job, that doesn't often happen, but it's a thing in Advent of Code.

Consider Day 6. You are given a list of times and record distances for toy boat races, like this:

Time:      7  15   30
Distance:  9  40  200

When racing the boat, you get to choose a parameter x so that the distance traveled is x * (t - x), where t is an entry from the first row. Read the problem description for the back story.

You are asked to find in how many ways you can beat the record distance d (the corresponding entry from the second row.). That is, for how many integer x is x * (t - x) >= d?

And then you are supposed to multiply those wins.

To compute the number of wins for a given t and d, find the zeroes of the quadratic equation and count the integer points between them. You can do the middle school math yourself, or you can just trust that this method does it:

public static long wins(double t, double d) {
    double disc = Math.sqrt(t * t / 4 - d);
    long x1 = (long) Math.ceil(Math.nextUp(t / 2 - disc));
    long x2 = (long) Math.floor(Math.nextDown(t / 2 + disc));
    return x2 - x1 + 1;
}

I just wrote a loop:

long p = 1;
for (int i = 0; i < times.length; i++) {
   p *= wins(times[i], distances[i]);
}

But if I had the wins collected in an array, I could have written

long p = LongStream.of(numberOfWins).reduce((x, y) -> x * y).getAsLong();

Note that reduce returns an Optional since the stream might be empty. You can avoid that by providing a “neutral element”:

long p = LongStream.of(numberOfWins).reduce(1, (x, y) -> x * y);

There is actually no need to collect the wins. One can make a stream of them:

IntStream.range(0, times.length)
    .map(i -> times[i] * distances[i])
    .reduce(1, (x, y) -> x * y);

Is that clearer than the for loop? I guess it depends on how familiar the readers of your code are with the concept of reduction.

Verdict: Probably not worth it unless you already have the data in a stream.

Iterate

In this tutorial, you find the recommendation to rewrite complex loops with Stream.iterate:

for(int i = 0; i < 15; i = i + 3) {
  . . .
}

⇒

IntStream.iterate(0, i -> i < 15, i -> i + 3)
  .forEach(i -> { . . . });

In theory, you can rewrite every loop in this way. Consider for example Advent of Code 2023 Day 9.

Given an array of integers, you are asked to form adjacent differences until you get all zeroes. For example:

10  13  16  21  30  45
   3   3   5   9  15  
     0   2   4   6  
       2   2   2  
         0   0  

Then form the sum of the last elements of each row. In this case, 2 + 6 + 15 + 45 = 68.

As a historical aside, this is a method for evaluating values of polynomials. The polynomial generating the first row is 1/3 x3 - x2 + 11/3 x + 10, where x ranges from 0 to 5. The computation yields the value of the polynomial when x = 6. Charles Babbage made use of this algorithm in his Difference Engine.

My pedestrian solution:

static int nextValue(int[] values) {
   int nv = 0;
   int[] a = values;
   while (!allZero(a)) {
      nv += last(a);
      a = differences(a);
   }
   return nv;
}

Here allZero, last, and differences are helper methods that are hopefully obvious.

The loop transforms the state given by the variables nv and a. You can write that with an iterator:

record State(int nv, int[] a) {}

Stream<State> states = Stream.iterate(
   new State(0, values),
   state -> !allZero(state.a()),
   state -> new State(state.nv() + last(state.a()), differences(state.a())));

Then you need to get the last element from that stream, and extract its nv value:

int nv = states.reduce((x, y) -> y).map(State::nv).orElse(0);

Ok, it can be done, but surely it is not a happy path in general.

In this case, one can do a little better since the values that are added up to form nv can be computed from a. So, we can just iterate over the a, then obtain a stream of the summands, and sum them:

nv = Stream.iterate(
      a,
      Predicate.not(Day09::allZero),
      Day09::differences)
   .mapToInt(Day09::last)
   .sum();

This is more “functional” than the iterative solution, but is it better? It's nice that one sees the sum at the end. You have to decide whether that is a big enough win.

Verdict: Unless there is something awesome following it, avoid iterate.

Nested Loops

In my previous article, I looked at this snippet in Rémi Forax' solution:

var neighbors = starts.stream()
    .flatMap(start -> Stream.of(-length, 0, length)
       .flatMap(j -> IntStream.of(-1, 0, 1)
          .mapToObj(i -> start + i + j)));

This is the stream equivalent of nested loops:

for (int start : starts) {
   for (int i = -length; i <= length; i += length) {
      for (j = -1; j <= 1; j++) {
         neighbor = start + i + j;
         . . .
      }
   }
}

One gets better at flatMap ... flatMap map with practice. I think it is a win only if the stream operations that follow are compelling.

Downstream Collectors

In Advent of Code 2023 Day 7, you have to determine the type of a hand of five cards:

  1. All distinct ABCDE
  2. One pair AABCD
  3. Two pairs AABBC
  4. Three of a kind AAABC
  5. Full house AAABB
  6. Four of a kind AAAAB
  7. Five of a kind AAAAA

For example, the hand described by the string T55J5 is “three of a kind”.

I determine the frequency of each character, then sort the frequencies:

var freq = new HashMap<Character, Integer>();
for (int i = 0; i < cards.length(); i++) {
   char c = cards.charAt(i);
   freq.put(c, freq.getOrDefault(c, 0) + 1);
}
var freqValues = new ArrayList<Integer>(freq.values());
Collections.sort(freqValues);

For example, T55J5 yields a List(1, 1, 3).

Then I look up the type in a map:

private static Map<List<Integer>, Integer> TYPE = Map.of(
   List.of(1, 1, 1, 1, 1), 1,
   List.of(1, 1, 1, 2), 2, 
   List.of(1, 2, 2), 3, 
   List.of(1, 1, 3), 4,
   List.of(2, 3), 5, 
   List.of(1, 4), 6, 
   List.of(5), 7);

Could one produce that frequency map with streams? You make a map with the groupingBy collector. Provide a classifier function that yields a map key for every stream element, and a downstream collector that collects the values with the same key and yields the key's value.

In our case, the map keys are just the characters themselves, and the collector is counting:

Map<Integer, Long> freq = cards.chars()
   .boxed()
   .collect(groupingBy(Function.identity(), counting()));

Then you get the list as

List<Integer> freqValues = freq.values()
   .stream()
   .map(Long::intValue)
   .sorted()
   .toList();

That's all nice. You have to pay attention to the types, though. If you omit .boxed(), you get a baffling error message:

|  Error:
|  method collect in interface java.util.stream.IntStream cannot be applied to given types;
|    required: java.util.function.Supplier<R>,java.util.function.ObjIntConsumer<R>,java.util.function.BiConsumer<R,R>
|    found:    java.util.stream.Collector<java.lang.Object,capture#1 of ?,java.util.Map<java.lang.Object,java.lang.Long>>
|    reason: cannot infer type-variable(s) R
|      (actual and formal argument lists differ in length)
|  Map<Integer, Long> freq = cards.chars()
|                            ^------------...
|  Error:
|  incompatible types: cannot infer type-variable(s) R
|      (argument mismatch; no instance(s) of type variable(s) capture#2 of ?,T,K,A,D,T,capture#3 of ?,T exist so that java.util.stream.Collector<T,?,java.util.Map<K,D>> conforms to java.util.function.Supplier<R>)
|  Map<Integer, Long> freq = cards.chars()
|                            ^------------...

That's because Stream has two overloaded collect methods, one for a Collector and one for a “mutable reduction”. But IntStream only has the latter.

In this case, groupingBy works pretty well. But it can get complex quickly. If you find yourself with downstream collectors such as teeing or collectingAndThen, maybe it is time to step back?

As a cautionary example, let's look at Rémi's solution to day 9. In the problem, each hand also has an integer bid amount. We are supposed to sort the hands by type, breaking ties with lexicographic comparison of the card ranks (2...9, 10, Jack Queen, King, Ace). This is what I did:

List<Hand> hands = new ArrayList<>();
for (String line : Files.readAllLines(Path.of("input07"))) {
    String[] tokens = line.split(" ");
    hands.add(new Hand(tokens[0], Integer.parseInt(tokens[1])));
}
Collections.sort(hands, 
    Comparator.comparingInt(Hand::type)
    .thenComparing((h1, h2) -> Arrays.compare(h1.ranks(), h2.ranks())));

Here, record Hand(String cards, int bid) has methods type, returning the type from 1 to 7, as explained before, and ranks, which returns an int[] of the ranks of all cards.

I should have used streams:

List<Hand> hands = Files.lines(Path.of("input07"))
   .map(line -> line.split(" "))
   .map(tokens -> new Hand(tokens[0], Integer.parseInt(tokens[1])))
   .sorted(Comparator.comparingInt(Hand::type)
      .thenComparing((h1, h2) -> Arrays.compare(h1.ranks(), h2.ranks())))
   .toList();

Here is Rémi's solution, slightly renamed to match mine:

record Hand(long hash, int bid) {}

private static final Map<Integer, Integer> MAP =
   range(0, 13).boxed().collect(toMap(i -> (int) "AKQJT98765432".charAt(i), i -> 12 - i));

List<Hand> hands = Stream.iterate(new Scanner(input), Scanner::hasNext, s -> s)
   .map(scanner -> scanner.next().chars().mapToObj(MAP::get)
   .collect(teeing(toList(),
      collectingAndThen(groupingBy(v -> v,counting()),
         m -> m.values().stream().sorted(reverseOrder()).limit(2).toList()),
         (hand, counts) -> new Hand(((long) counts.hashCode()) << 32 | hand.hashCode(), scanner.nextInt()))))
   .sorted(comparingLong(Hand::hash))

It's a tour de force. If you have some free time on your hands, it is well worth figuring out why it works. But don't code like that at home. I am pretty sure that's exactly the point he is trying to make.

Exceptions

In Issue 314 of his JavaSpecialists newsletter, Heinz Kabutz writes about a publishing process that involves uploading stuff to ChatGPT. As one does in 2024. But there was also an issue with copying symlinks, and Heinz asked ChatGPT how to do that. As one does in 2024. He got this code back:

private static void copySymbolicLinks(Path source, Path target)
        throws IOException {
    Files.walkFileTree(source, new SimpleFileVisitor<>() {
        @Override
        public FileVisitResult visitFile(Path file,
                                         BasicFileAttributes attrs)
                throws IOException {
            if (Files.isSymbolicLink(file)) {
                Path targetLink = target.resolve(source.relativize(file));
                Files.createDirectories(targetLink.getParent());
                Files.copy(file, targetLink,
                        StandardCopyOption.REPLACE_EXISTING,
                        LinkOption.NOFOLLOW_LINKS);
            }
            return FileVisitResult.CONTINUE;
        }
    });
}

I was just updating some advice about modern file I/O, in which I recommended not to use the obviously complex walkFileTree. He should just use Files.walk, which yields a Stream<Path>.

He replied, like this?

private static void copySymbolicLinks(Path source, Path target) throws IOException {
    try (Stream<Path> walk = Files.walk(source)) {
         walk.forEach(file -> {
             if (Files.isSymbolicLink(file)) {
                 Path targetLink =target.resolve(source.relativize(file));
                 try {
                     Files.createDirectories(targetLink.getParent());
                     Files.copy(file, targetLink, StandardCopyOption.REPLACE_EXISTING, LinkOption.NOFOLLOW_LINKS);
                 } catch (IOException e) {
                     throw new UncheckedIOException(e);
                 }
             }
         });
    }
    catch (UncheckedIOException e) {
         throw e.getCause();
    }
}

Ugh—that's longer than the walkFileTree approach. Because of checked exceptions.

Streams really don't like checked exceptions. This blog by Nicolas Fränkel gives an overview of some remedies, none of which is a silver bullet.

Conclusion

I hope this somewhat rambling analysis convinces you that it is worth thinking about when streams are better and when they are not.

Comments powered by Talkyard.