Having learned about DeepSeek, the new kid on the AI chatbot block, I tasked it with writing a solver for an annoying puzzle. I was surprised how well it did, quite a bit better than the current state of the art. Check out the code that it produced as I kept challenging it. Reading and evaluating code, rather than writing it, is what we will all need to get better at.
The code snippets are executable, so you can experiment with them. I wish the chatbots had that feature.
During the holidays, I tried to solve the annoying “Crazy Cheese” puzzle. It has nine tiles, and four sizes of holes, each with a small and a large part. I didn't get very far, put it aside, and planned to write a program to solve it, if and when I could find the time.
How many possible arrangements are there? Each tile can be flipped and rotated. That's 9! × 89 arrangements. Except, some of those are flips and rotations of each other. 9! × 88 = 6088116142080 arrangements.
Fortunately, a backtracking solver only examines partial matches, so it won't try them all. See these slides. (Hit the space bar or swipe left to move forward.) It should not be much work to change the eight-queens solver from those slides.
Frankly, I might never have have gotten around to writing that program, but I got an email extolling the virtues of a new AI bot, DeepSeek. So I tried it. You can follow the chat here.
I didn't want to have the explain the holes, so I used colors.
“I need a Java program that can solve the following puzzle. Nine squares, each with four sides of different colors. Put them together to a 3x3 square so that the colors match.”
The bot produced the program below, together with a long and tedious explanation. Scroll to the end and click the Run button to see the output. (This has nothing to do with DeepSeek—it uses CodeCheck to run the generated program.)
Running the program yields an unimpressive
S S S S S S S S S
I'll tackle that later.
The crazy cheese puzzle actually comes with one solution. I could of course just copy that and declare victory, but that's lame. Are there more solutions? How many?
“Can you make it print all solutions?”
Here the response:
This program prints 20,000 solutions, which seems a lot, My first thought was that the solutions (still unhelpfully displayed as arrays of nine S
characters) weren't really unique. If you rotate any solution, you get another one. Looking at the program logic, it is clear that there is no effort to filter those out.
“Ok, but now it prints rotated solutions. Make it so that only solutions that are unique up to rotation are displayed.”
I immediately got an improved program. But it is pretty complex.
And it is incredibly slow. If you run it with CodeCheck, it will time out after 50 or so iterations. I don't even want to find out why.
With a 3×3 puzzle, it is dumb to generate all solutions and then eliminate duplicates. Just fix the center tile!
This isn't the kind of insight that I expect from an AI code generator, but let's prompt for it.
“That's a lot of trouble. Wouldn't it be simpler if you made the MIDDLE piece unrotated?”
The response: “Yes, you're absolutely right! Fixing the middle piece in an unrotated position is a much simpler and more efficient way to ensure that solutions are unique up to rotation. By anchoring the middle piece, we eliminate the need to generate and compare all rotations of the grid, significantly simplifying the logic. Here’s how we can implement this approach:”
This is where I started to be impressed with DeepSeek. When I tried the same with ChatGPT, I got a brutish and incompetent attempt.
Have a look how the “fixing the middle piece” was implemented. Maybe not the most elegant ever, but correct. Unlike ChatGPT. Is this just regurgitating a solution from the corpus of puzzle problems? Is that where the “anchoring” language came from? Or is there some other spark of virtual insight? We'll never know.
The printout is dumb. I need to see the actual squares, not just S S S ....
The model takes criticiscm in stride. “You're absolutely right! Printing just S for each square doesn't provide any useful information about the actual arrangement of the squares and their colors. Let's update the program to print the actual squares with their colors in a readable format.”
The output was in a format that only a mother could love.
Solution 1: Red Green Red Yellow BlueBlue YellowYellow Blue Green Red Green ----------------------------- Green Red Green Red BlueYellow BlueBlue Red Yellow Green Yellow ----------------------------- Yellow Green Yellow Red BlueBlue RedRed Blue Green Yellow Green -----------------------------
“That looks weird. I'd rather see each square formatted asG B R Yusing only the first letter of each color”
“Got it! Let's simplify the output to show each square in a compact format, using only the first letter of each color. This format is much cleaner and easier to read. Here's the updated Java program:”
Yeah, I can finally see the solutions. Here is the first one:
Solution 1: R G R Y BB YY B G R G ----------------------------- G R G R BY BB R Y G Y ----------------------------- Y G Y R BB RR B G Y G
Have a look at the printSolution
method in the program code. It's not rocket science, but it perfectly implements the tedium of showing the squares.
If you look at the preceding output, you'll see that the matching doesn't work at the middle tile. The code still has the classic matching algorithm, where you start at the top left and check for matches at the top and left edge for added tiles.
“But now just checking the top and left matches is no longer sufficient. They also must match the previously placed middle.”
The bot concedes the point: “You're absolutely correct! If the middle square is fixed, then when placing squares around it, we must ensure that they match not only their top and left neighbors but also the middle square if they are adjacent to it.”
The next iteration of the program fixes it. Not super elegant but serviceable.
When you run this program, it will not find a solution. Hold that thought.
I introduced those matching colors because I didn't think I was going to get anywhere with describing holes in cheese. But those holes come in two sizes.
Time to break it to the bot:
“Ok, it turns out my problem is a little more complicated than that. Some of the color names start with an uppercase letter (e.g. Yellow) and some with a lowercase letter (e.g. yellow). To match, one of them must be uppercase and the other lowercase. That is, yellow matches Yellow but not yellow.”
And:
“Oh, one more thing. You can also flip the tiles.”
This is where I appreciated the chattiness of the response. The bot replied:
Got it! Now the tiles can also be flipped (mirrored), in addition to being rotated. This means each tile has 8 possible orientations: Original orientation Rotated 90° Rotated 180° Rotated 270° Flipped (mirrored) Flipped and rotated 90° Flipped and rotated 180° Flipped and rotated 270°
Attaboy. And the resulting program looked sound.
Except, it could not find any solutions.
The program runs correctly. Try it with the crazy cheeze puzzle tiles:
new Square("s", "S", "M", "x"), new Square("s", "m", "L", "X"), new Square("s", "M", "L", "x"), new Square("s", "M", "L", "x"), new Square("s", "M", "L", "x"), new Square("S", "m", "x", "X"), new Square("S", "l", "m", "X"), new Square("S", "l", "m", "X"), new Square("S", "x", "m", "X")
You get 216 solutions. Here is one:
Solution 1: L m X M xX lL s s S m ----------------------------- S s M x Xx Ss L m M x ----------------------------- M m X s Ll Xx S x S m
The first solution was different from the one that the puzzle provided, so I was able to show off my achievement to my (sadly disinterested) family.
There are actually only 18 = 216 / (2! × 3!) distinct solutions, since one square occurs twice and another three times.
“When I run that program, it prints "No solutions". Can you choose a set of nine squares so that there is at least one solution?”
Following established bot etiquette, the bot cheerfully admits its mistake and promises to do better: “You're absolutely right! The program might not find a solution if the provided set of squares doesn't have a valid arrangement. Let's define a new set of nine squares that are guaranteed to have at least one valid solution. I'll carefully construct the squares so that their colors match according to the rules (case-sensitive matching and flipping allowed).”
Sadly, the carefully constructed squares still have no solution.
As so often with an AI idiot savant, it is baffling to see how it can do complex tasks extremely well, while struggling with trivial ones.
I was very surprised how well DeepSeek handled my change requests. This is where my experience with other chatbots fell flat. After a few rounds, I always found it to be less trouble to take the code and fix it myself. I know that some people have the patience to coax the bot to a finished solution, but I never felt it was worth the effort.
I guess that hasn't changed. Perhaps I could have kept going, suggesting how to produce a viable sample, dealing with duplicate tiles, and cleaning up the code.
I wasn't too hopeful about asking for a rewrite in modern Java, since I assume that it was trained on code from the last thirty years. Instead, I did that myself. (For this demo, I had to add public static void main
back in since CodeCheck is at Java 21 without preview features.)
Sure, that is nicer. And shorter. And it works. Having the AI generated code made it much faster to get there.
In the end, I did something that I rarely do. I bookmarked the DeepSeek page.
PS. As I post this, the stock market has caught onto the fact that DeepSeek is potentially disruptive.
With a Mastodon account (or any account on the fediverse), please visit this link to add a comment.
Thanks to Carl Schwann for the code for loading the comments.
Not on the fediverse yet? Comment below with Talkyard.