Two different approaches to debugging a software problem:
The Sudoku approach: stare at the limited set of clues you have, and think harder and harder about them until you find a way to deduce something useful.
The Minesweeper approach: don't even try to figure out the solution from only the clues you have right now. Instead, focus on finding a way to acquire another clue, and then using that to get another, and so on. Eventually you've collected so many clues that the answer is obvious.
Sometimes the Sudoku approach is necessary, because you've got all the clues you're ever going to get. But I think my new motto is "Never Sudoku a problem when you can Minesweeper it."
@Scmbradley true, you have to interpret "can" with a certain amount of pragmatism. I agree that there are situations in which Minesweepering is technically possible but prohibitively expensive.
In this situation I generally try extra hard to get the client to provide a reproducible example. Sometimes they still can't, or won't, but sometimes they just haven't thought of trying, so in my experience it's worth a shot.
And there is the problem... It is almost impossible to get the "client" to give you the answers you need and when they do, it is often like pulling teeth to get the second question answered.
Often in that situation, I will start with sudoku until the client gives me enough that I can re-create their failure myself and then switch to minesweeper to do the actual troubkeshooting.
(and also @artemis who made a similar comment in parallel): true, of course, but I'm thinking in particular of the part where even deducing _one_ extra number to fill in can require arbitrarily complicated thinking about the clues you already have.
Also, in Sudoku, the "extra information" was already contained within the existing information – the clues you fill in are logical consequences of what you already had. In Minesweeper you have to get the extra information from outside yourself. That makes them philosophically different.
(Indeed, in the Sudoku style of debugging, the same phenomenon can happen!)
Hmm. I would have said this describes solving a Sudoku: "finding a way to acquire another clue, and then using that to get another, and so on." You identify how to acquire one piece of information, one clue. As you add pieces of information, you can eventually solve the puzzle. You don't stare at it until you solve it...you figure out how to use the information you have to get another piece of information.
I guess I think Minesweeper & Sudoku are solved in a very similar manner?
When things don't sodoku or minesweep, I'll start at the top and start tracing values through until I see something completely wrong. But if you are debugging AI written concurrent code or event driven code or some unholy combination of distributed technologies across several computers, then 🤷🤷♂️🤷♀️ ?
The other big trick I learned that can help a lot when nothing is making sense is to ask myself what stupid thing I've done. Seriously, hunting for the stupid thing I did is usually far more productive than hunting for some obscure compiler or library or OS bug. ... just please don't be the one to remind me of this, it's no fun when people imply I do stupid things!
What computers are best at is doing slightly different things over and over very, very fast.
Both games reveal more given digits as you work on the problem. I get it: There's a qualitative difference between the puzzle revealing additional facts as a reward for correct deductions, and the solver filling in additional clues for themself. But that doesn't really impact the approach to solving all that much, so I don't find this metaphor particularly meaningful or revealing.
Btw., look up “fog of war Sudoku” for a fun combination of both puzzle types 😺
https://blog.plover.com/prog/katara-advice.html
“Novice programmers often imagine that they can figure out what is wrong from looking at the final output and intuiting the solution Sherlock Holmes style. This is mistaken. Nobody can do this. Debugging is an engineering discipline: You come up with a hypothesis, then test the hypothesis. Then you do it again.”
I shouldn't have said "nobody", maybe you can do it. But as you said, it's not something to count on.
And novice programmers watching experienced programmers debug often *think* that is what is happening because they don't yet understand the process and because it goes by too fast to understand.
From the highly eclectic blog of Mark DominusThe Universe of Discourse : Advice to a novice programmer
Although your description doesn't seem to match my "finger of blame" model. Perhaps there are real differences.
I'm a blamey person. I don't pretend it's a good quality, but we all have to work with what we have.
@mjd oh, I think both of the approaches I describe still end up pointing the finger of blame at a specific piece of code, it's only a question of how you get there.
I think the most common way that my debugging problems are more multidimensional than the kind you describe there (binary search through the data flow of the failing example until you find the point where correct data goes in and incorrect data comes out) is that it's not always obvious what _is_ correct data, because the data being manipulated is huge, or its correctness depends on a complicated spec that I don't keep all of in my head. (The intermediate representation inside a compiler is a good example.) So the question you have to answer at each step of the search is also nontrivial.
@mjd having slept on this, the mention of Sherlock Holmes is an interesting one, because in the Holmes stories, I think there's actually a pretty good mix of the Sudoku and Minesweeper styles.
In some stories, Holmes has to rely entirely on a statement given by a client, and deduces the answer entirely in his head from the information he's given. But in others, he goes out to the crime scene and minutely examines it for forensic evidence, or investigates in other ways, up to and including tricking the bug^Wculprit into betraying themself.
“Why do you not solve it yourself, Mycroft? You can see as far as I.”
“Possibly, Sherlock. But it is a question of getting details. Give me your details, and from an armchair I will return you an excellent expert opinion. But to run here and run there, to cross-question railway guards, and lie on my face with a lens to my eye—it is not my métier. No, you are the one man who can clear the matter up.”
@rich ha! That reminds me of a totally different 'two approaches to debugging' incident I remember from university.
I was sitting in a friend's room while he wrote a program. He'd written a large amount of code before testing any of it – more than I would have been inclined to do myself. So when he started trying to run it, I observed:
"You know, when I debug, it's like hunting. Lay tempting bait, sit very still, try to entice the elusive bug to show itself, then bring it down with a single shot. But you're spurring your horse into a whole army of bugs, laying about you left and right with a broadsword, bringing several down in every stroke, but there are always more."
(He did get the thing to work, though, as I remember! Just a difference of preference in what order we liked to do things.)
@essjayjay I agree that when you feel as if you've solved a problem in a flash of intuition, there might have been more logic in your subconscious's approach than your conscious mind realised.
But I include in the Sudoku approach the _conscious_ use of logic to reason about the clues you already have and narrow down possibilities. It's still the Sudoku approach if you're perfectly well aware of all the logic you're using!
@djm62 that would be a great way to show that a character in film or TV was superintelligent.
I've never tried it with even a _small_ Sudoku (like the kind Solo calls 2x3), but I would assume it's way too hard.
On the other hand, there's a puzzle type found in some British newspapers (whose name I've forgotten) which involves arranging the numbers 1—9 in a 3×3 grid given some clues about the sum of digits in certain subsets of the grid. I used to find those lying around in the office kitchen, and I could (just about) do one entirely in my head, which avoided spoiling it for the next person who came along.
Oh yeah, you were assuming best practices; carry on.
("Everyone has a development instance. Some teams also have the luxury of a production instance, separate from the development instance. Do not trust environment names to tell you which you are operating on." 😉 )
You sound like a person who would have had Minesweeper blow up in your face quite often. 😆
.
I've always seen Minesweeper as primarily a logic game.
But sometimes devolving into having some probability driven moves -- as in, when logic is insufficient, which move is least risky?
And sometimes, relatively rarely, just a gamble -- when you have N moves, each equally risky.
Logic.
Admittedly, your first move is done blind.
Probability wise, I find that hitting a corner first is good. Sometimes that "opens up the board." And sometimes you just have to make a probabilistic move for your move #2. But when you get past that, it's mostly logic.
I just did a game on the site you just mentioned. After the first move, I used logic to mark the known flags and known safe spaces to click. [See attached images.]
"Easy Peasy" 💕
This sounds good.
But I like to make hypotheses which I then try to disprove. Often a moment of thought can outweigh a flurry of action, and instead guide fewer more focused ones. Like Hercule Poirot.
I try to disprove easy to disprove ones earliest. So "have I actually saved the file" is out of the way.
I also try to make it so I know the potential causes are limited; write a bit of code and tests in small steps, constrain with types.
Another thing: the Sudoku solution is great for really common problems that we've all seen before, and it gives a certain degree of satisfaction to solve the problem by simple inspection (plus decades of experience).
But, it's an absolutely *terrible* example to people starting out in development. It looks like you just guess the solution and get it right. This leads to (e.g.) students approaching problems with random hacking.
1/n
So you end up with students doing things like throwing a * in here, and a & there, and seeing if the compiler accepts it.
When working with beginners, you need to grind through the Minesweeper steps for even the simplest things, so that they can see the process, and the effort that it takes to understand the code. Even if it's perfectly obvious to you that they missed out a & on line 174.
2/2
And I think this speaks directly to this comment up-thread:
I find it interesting that in mathematics the two approaches, Sudoku and Minesweeper, are really the same: in math the way you gather further clues is by thinking and deducing something useful!
In debugging a prorgam the two approaches are different because there are non-thinking-and-deducing way of running experiments to obtain clues: reading log files or executing the code, possibly with minor modifications or under control of a debugger.
The same thing applies to hardware debug.
Except that sometimes minesweeper debugging involves probes that cost >$20K or frying another expensive prototype PCB to see how it behaves if you twiddle some variable.
I feel like you're describing something similar to using experience vs method to debug. Experience tells you that this clue or that is likely to mean a particular thing is causing the problem. Being methodical means investigating exactly what's happening, using a debugger/adding print statements, reading through code paths etc. until you know exactly what the problem must be.
For me, the skill is in knowing when to switch from experience to method. Trade off speed/reliability.
yes, that's part of what I was hoping to say – not just giving snappy names to the two styles, but flagging up that sometimes you can choose which one to do.
As I said in the root toot, sometimes you _have_ to Sudoku, because no more clues are ever going to come your way. But also, sometimes you have to Minesweeper, because you _clearly_ don't have enough information right now to narrow down to a single cause of failure. The interesting case is where you have both options.
In that situation I'm not a fan of "just think harder". If a solution comes to mind very quickly, great, you've saved some effort; if it doesn't, go get more information. Maybe an 'aha' will come from your subconscious anyway, because it was pondering in the background while you were working on setting up the next test run. If so, great – but if not, you haven't wasted the time, because that next test run is set up now and will give you another clue. Whereas staring at the screen and thinking harder and harder is an all-or-nothing approach: if it _doesn't_ yield an aha, you've wasted the time completely, and still haven't started on the alternative approach.
You forgot the "workaround" approach, when you know the symptoms but not the source of the problem and you just invent a loose patch that is likely to fall off or have side effects.
But I don't know which game is the right allegory.
@deusfigendi both the approaches I mentioned are ways to get a true understanding of the bug. The approach you describe is abandoning that goal completely and avoiding the whole problem.
So perhaps that quotation from 'WarGames' is relevant, about both tic-tac-toe and nuclear war: "The only winning move is not to play."
@ask I don't think you're contradicting me! The thing I characterised as the Minesweeper approach to debugging involves _not_ stopping to think hard.
(But also, it does depend on the Minesweeper variant. My own version, which eliminates guesswork by ensuring the board can be solved by logic, also lets you turn up the mine count so that there are 4,5,6 clues everywhere, since the guaranteed solubility makes it not impractical. In _that_ mode, thinking becomes more significant compared to clicking.)
Interesting. I think about this a lot. I do a version of reading core dumps: using whatever knobs tools or features you're writing that induce the problem to make it repeatable, then bench reading code, mentally stepping through logic, and determining how that (unwanted) outcome is produced.
But in that is some of your mine sweeper: poking it from various ways to find what causes the failure, or similar things that cause the one failure, or things that caused a similar but different failure, more clues.
Making a failure repeatable is key. Then you can usually find it, from the difference between expectation and reality: it's right there in your code.
Heisenbugs are the worst! And wtf causes this failure! problems.
@a32 my type example is where you first become aware of a bug because some kind of automated test run fails, and presents you with a log file containing a cryptic error message.
The Sudoku approach to this involves trying to figure out what the error message might mean, e.g. by grepping the source code for the message string and backtracking from there to see where the failing function might have been called from. Sometimes there's more than one possibility: a characteristic of the Sudoku approach is that you try to _guess_ which one seems more likely in this case.
The Minesweeper approach starts by reproducing the same failure on your own machine; stripping off all the layers of test-runner script and makefile until you have a single shell command that runs the actual process that crashes; then running that command again and again in ways that give you more information, such as
• in a debugger
• under strace
• with extra verbose or diagnostic options
• with modified versions of the input file that provoked the failure
A Minesweeper-oriented developer wouldn't bother trying to _guess_ which part of the program had called the failing function. They'd put a breakpoint on it, and get the debugger to _tell_ them.
@a32 I think there's also a minesweeper approach where you comment out a bunch of code to see if it actually fails in a different way. Note that this may not involve stripping off the layers to find a clue.
Many years ago, I was impressed by a call put out at Google for what a particular flag was intended to do. At that time, Google binaries (and all incorporated libraries) were configured by flags, and a typical Google binary might have a few thousand flags. This flag was apparently implicated in crashing a cluster of a few hundred machines running a machine learning training, so fixing it was high priority.
How did the requester know this flag (which was a common low-level flag built into almost every binary) was to blame, if they didn't understand the code?
The cluster would crash after 10 minutes. The engineer listed all the flags with 600, 600_000, etc default values. This reduced the candidate controls to around 20. They then set each of these flags to a distinct value between 5m and 25m, and waited to see when the cluster crashed. 11m later, they had a very likely candidate to understand. (I never found out the actual fix, but the debugging technique made an impression...)