[...] Here is the right way to address this bug:
- Learn more about manifests, so I know what a good one looks like.
- Take a look at the one we’re generating for Kiln; see if anything obvious screams out.
- If so, dive into the build system [blech] and have it fix up the manifest, or generate a better one, or whatever’s involved here. This part’s a second black box to me, since the Kiln Storage Service is just a py2exe executable, meaning that we might be hitting a bug in py2exe, not our build system.
- If not, burn a Microsoft support ticket so I can learn how to get some more debugging info out of the error message.
Here’s the first thing I actually did:
- Look at the executable using a dependency checker to see what DLLs it was using, then make sure they were present on Windows 2003.
This is not the behavior of a rational man. [...]
http://bitquabit.com/post/cargo-cult-debugging/
...except that, reading what he did, it makes perfect sense.
He recognized one potential cause of the issue, it was cheap to test, so he tested it.
The best approach to solving problems isn't to look into the most likely cause of the problem, but the most cost-effective, in terms of probability of being the issue and time involved in testing it.
I have issues with character encodings fairly regularly in my job from systems upstream or downstream giving me incorrect information about what to expect from them/send to them. I have a toolbox of preprogrammed solutions. It's cheaper for me to test every tool in my toolbox (takes less than an hour), see what fixes it, and use that, than it is for me to figure out exactly what the system upstream or downstream is doing wrong (takes many hours). On rare occasion, none of my tools work, and I need to debug the problem properly - then I modularize the solution and add it to the toolbox.
I could spend six hours every time, or one hour nine times out of ten and seven hours the tenth. One is the "proper" way to solve the problem, the other is trying random solutions (cargo cult debugging) and seeing what sticks.
This article sounds good. In practice, I don't think it measures up.
Because if it takes twenty minutes to check your dependencies, and twenty hours to learn how to read a manifest, you need to be sixty times more certain that it's a problem with the manifest to justify running that test first.
Yeah. If checking under the streetlight is cheap, there's little reason not to do that anyway. (Even if the chances of payoff don't outweigh the time, it counts as due diligence to check the obvious stuff. And IME, checking the obvious couldn't-possibly-be stuff gets a win often enough to make a good habit.)