Lessons from Advent Of Code 2023 - Part 1: Introduction

This year, I was inspired to participate in the Advent of Code puzzle event and doggedly worked every problem until I collected my fifty stars. I learned a lot about algorithms that I had previously only seen in dull theory. And I reflected on effective Java programming. This is the first of several posts where I try to relate my advent experience to everyday programming.

.png

The Advent of Code is an annual coding puzzle event, put together by Eric Wastl. Each year, the puzzles start out pretty simple and get fiendishly complex towards the end. Amazingly enough, the real pros can solve the problems in less time than it takes most of us to read them. This is the first time I participated, and it took me until 4 days after christmas to turn in my last solution.

Most of the solutions posted at the subreddit use Python. Many participants used the event to become more fluent in a new language, which seems like a good idea. Or not—there was a fair amount of grumbling about fighting the Rust borrow checker. And of course, a few showed off mad skillz, using sed, SQL, Befunge or something similarly challenging.

Obviously, I am not competing for first place in speed coding or mad skillz. Instead, I wanted to find out what I could learn and teach, from the point of view of a “blue collar” programmer. I will summarize what I learned in this and the following posts.

Since Java was famously designed to be a “blue collar language”, I figured I should use Java.

Overall, I was very happy with that choice. I liked having an IDE with good code completion and a rock-solid debugger. The core API has what you need. There were a few rough edges, which I will summarize in a future post.

To Stream or Not to Stream

Rémi Forax posted solutions for the first part of every day, using Java streams as much as possible. That's impressive, but is it a good idea for the blue collar programmer?

Let's have a look at the Day 3 puzzle. You are given numbers and symbols in a grid of characters, such as

467..114..
...*......
..35..633.
......#...
617*......
.....+.58.
..592.....
......755.
...$.*....
.664.598..

The backstory says that all numbers that are adjacent (even diagonally) to a symbol other than the background . are “part numbers”. In the example above, the top * symbol has adjacent part numbers 467 and 35. Your task is to find all part numbers and, as as evidence, submit their sum.

Here is Rémi's solution:

private static final Pattern PATTERN = Pattern.compile("([0-9]+)|([*#+*$])");

static int sumOfParts(String input) {
    var length = 1 + input.indexOf('\n');
    var map = PATTERN.matcher(input).results()
        .collect(partitioningBy(result -> result.start(1) != -1,
            toMap(MatchResult::start, MatchResult::group, (_1, _2) -> null, TreeMap::new)));
    return map.get(false).keySet().stream()
        .flatMap(start -> Stream.of(-length, 0, length)
            .flatMap(j -> IntStream.of(-1, 0, 1).mapToObj(i -> start + i + j)))
        .flatMap(start -> Stream.ofNullable(map.get(true).floorEntry(start))
            .filter(e -> start < e.getKey() + e.getValue().length()))
        .collect(collectingAndThen(toSet(), set -> set.stream().mapToInt(e -> parseInt(e.getValue())).sum()));
}

I was impressed. Mine was a lot longer and more pedestrian:

private static String FILENAME = "input3s";

record Location(int row, int col) {}

record Number(String v, Location loc) {        
    boolean isNeighbor(Location otherLoc) {
        return Math.abs(otherLoc.row() - loc.row()) <= 1 && 
        otherLoc.col() >= loc.col() - 1 && otherLoc.col() <= loc.col() + v.length();        
    }
    int value() { return Integer.parseInt(v); }
}
    
public static void main(String[] args) throws IOException {
    List<String> lines = Files.readAllLines(Path.of(FILENAME));
    Pattern PATTERN = Pattern.compile("(\\d+)|[^.\\d]");
    var numbers = new ArrayList<Number>();
    var symbolLocations = new HashSet<Location>();
        
    for (int i = 0; i < lines.size(); i++) {
        Matcher matcher = PATTERN.matcher(lines.get(i));
        while (matcher.find()) {
            var loc = new Location(i, matcher.start());
            if (matcher.start(1) != -1) 
                numbers.add(new Number(matcher.group(), loc));
            else 
                symbolLocations.add(loc);
        }
    }
    int sum = 0;
    for (Number n : numbers) {
        for (Location loc : symbolLocations) {
            if (n.isNeighbor(loc)) {
                sum += n.value();
                break;
            }
        }
    }
    System.out.println(sum);
}

Which is better? Of course, that depends on what you mean by “better”. In the context of a coding competition, brevity is good. But I am more interested in lessons for everyday programming, where code must be comprehensible and maintainable.

Patterns

.jpg

I teach my students to recognize common loop patterns, such as finding and counting matches, computing sums and max/min, inserting separators, and so on. I want them to know at a glance that this loop computes the sum of the matches:

int sum = 0;
for (var e : elements) {
   if (some condition)
      sum += some value derived from e;
}

If I was teaching the stream API, I would instead make them recognize:

int sum = elements.stream()
    .filter(e -> some condition)
    .mapToInt(e -> derive a value)
    .sum();

Here the stream variant is very appealing. It clearly says what is done. That's the “what, not how” principle at its finest.

But what if we aren't in that “happy day” scenario? With loops, you have a lot of flexibility, as you saw from my rather brutish nested loop:

for (Number n : numbers) {
    for (Location loc : symbolLocations) {
        if (n.isNeighbor(loc)) {
            sum += n.value();
            break;
        }
    }
}

Can we express this with streams? Yes, and it's actually nicer:

for (Number n : numbers)
    if (symbolLocations.stream().anyMatch(n::isNeighbor))
        sum += n.value();

Now all together:

int sum = numbers.stream()
    .filter(n -> symbolLocations.stream().anyMatch(n::isNeighbor))
    .mapToInt(Number::value)
    .sum();

This is where my eyes glaze over. From years of experience, I can glance at nested for and if statements and method calls and intuitively grasp the structure. When I see nested streams and method expressions, I have to carefully look at every part. Like, why is it n::isNeighbor but Number::value? Maybe it's time for me to get better at stream patterns.

Branches

Consider Rémi's code snippet:

var map = PATTERN.matcher(input).results()
    .collect(partitioningBy(result -> result.start(1) != -1,
        toMap(MatchResult::start, MatchResult::group, (_1, _2) -> null, TreeMap::new)));

I think PATTERN.matcher(input).results() is nicer than the explicit loop

Matcher matcher = PATTERN.matcher(lines.get(i));
while (matcher.find()) {
    . . .
}

I always found that stateful Matcher weird.

.jpg

But now we need to process the matches in two different ways, depending whether they match the first or second group in the regular expression. Rémi clevery avoids if branches by using partitioningBy, which stores the results in a Map<Boolean, ...>.

I don't love that as a general strategy. What if we want to collect them in different collections, like the List<Number> and Set<Location> in my solution?

You can solve that with an if inside a forEach:

PATTERN.matcher(input).results().forEach(result -> {
    if (result.start(1) != -1) {
        numbers.add(. . .);
    } else {
        symbolLocations.add(. . .);
    }
});

That's less pure than a solution without side effects, but ok. Life isn't always pure.

Unfortunately, there is a technical disadvantage. I looped over the input lines and parsed each of them separately. Then I don't have access to the row index in the lambda:

for (int i = 0; i < lines.size(); i++) {
    PATTERN.matcher(input).results().forEach(result -> {
        var loc = new Location(i, matcher.start()); // Error--can't access non-final i
        . . .
    });
}

That needs to be finessed with a slightly icky workaround:

for (int i = 0; i < lines.size(); i++) {
    int row = i; 
    PATTERN.matcher(input).results().forEach(result -> {
        var loc = new Location(row, matcher.start()); // Ok--can access implicitly final row
        . . .
    });
}

Rémi avoids this problem by parsing the entire input string and using an integer offset from the start, which must have the form row * length + col, since all lines have the same length. That is clever, but it only works when all rows have the same length.

My point is that I sometimes start out with a stream, and then find that it is not going so well. Then I have to soldier on with something less than optimal, or undo everything and write a loop after all.

Nested Loops

.jpg

The second part of Rémi's solution is a bit of a puzzler:

return map.get(false).keySet().stream()
    .flatMap(start -> Stream.of(-length, 0, length)
        .flatMap(j -> IntStream.of(-1, 0, 1).mapToObj(i -> start + i + j)))
    .flatMap(start -> Stream.ofNullable(map.get(true).floorEntry(start))
        .filter(e -> start < e.getKey() + e.getValue().length()))
    .collect(collectingAndThen(toSet(), set -> set.stream().mapToInt(e -> parseInt(e.getValue())).sum()));

You can think of flatMap followed by map as a pattern that is equivalent to nested loops. Consider

Stream.of(-length, 0, length)
    .flatMap(j -> IntStream.of(-1, 0, 1).mapToObj(i -> start + i + j))

This has the same effect as the more familiar

for (int i = -length; i <= length; i += length)
   for (j = -1; j <= 1; j++)
      yield start + i + j

For example, if length is 10 and start is 24, the lambda expression

j -> IntStream.of(-1, 0, 1).mapToObj(i -> start + i + j)
yields a miniature stream for each of the three invocations:

13 14 15
23 24 25
33 34 35

The flatMap is necessary because streams are one-dimensional. The mini streams are flattened into a single stream, containing the neighbor index values

13 14 15 23 24 25 33 34 35

.jpg

Having seen this pattern before, I was able to trace my way through to the last flatMap, which yields a stream of Map.Entry<Integer, String> whose keys are locations and whose values are numbers that neighbor one of the symbol locations (which are stored in map.get(false)):

0→467, 24→35, 24→35, 28→633, 28→633, 44→617, 68→592, 100→664, 100→664, 83→755, 104→598, 104→598

After getting this far, I was baffled by the call to collect. Why not use the shorter and clearer

    .distinct().mapToInt(e -> parseInt(e.getValue())).sum()

All in all, I think of this as “dancing bear” code. The wonderful thing about a dancing bear is not how well it dances, but that it dances at all.

Conclusion

I hope this somewhat rambling analysis convinces you that it is worth thinking about when streams are better and when they are not. I'll write more about that in the next post.

I also want to write more about records. I came to love records during Advent of Code. It is so much nicer to have a record Location(int row, int col) than having to deal with an array of length 2. Or a single int that holds both the row and column. It is also nice to have a convenient place for utility methods, such as the isNeighbor method in my Number record.

Admittedly, that Number record is somewhat fake because I only use those utility methods, and never the components. Is that bad? In another post, I will summarize what I learned about record design.

Also note that neither Rémi's nor my solution store the grid of characters. In fact, many Advent of Code puzzles have nasty turns in their second parts, where the grid becomes humongous or even infinite, and a char[][] array is not practical. Now, as it happens, if there is anyone who should know something about grids, it's me. Many years ago, I implemented a framework for the “GridWorld” AP Computer Science case study that was used for several years in the AP CS exam. For the benefit of future advent coders, I'll write a post with grid tips, and another with tips for graph algorithms.

Comments powered by Talkyard.