Here's the new thread for posting quotes, with the usual rules:
- Please post all quotes separately, so that they can be voted up/down separately. (If they are strongly related, reply to your own comments. If strongly ordered, then go ahead and post them together.)
- Do not quote yourself
- Do not quote comments/posts on LW/OB
- No more than 5 quotes per person per monthly thread, please.
Another quote from the same piece, just before that para:
I really, really like this. Thanks for posting it!
To elucidate the "bug model" a bit, consider "bugs" not in a single piece of software, but in a system. The following is drawn from my professional experience as a sysadmin for large-scale web applications, but I've tried to make it clear:
Suppose that you have a web server; or better yet, a cluster of servers. It's providing some application to users — maybe a wiki, a forum, or a game. Most of the time when a query comes in from a user's browser, the server gives a good response. However, sometimes it gives a bad response — maybe it's unusually slow, or it times out, or it gives an error or an incomplete page instead of what the user was looking for.
It turns out that if you want to fix these sorts of problems, considering them merely to be "flakiness" and stopping there is not enough. You have to actually find out where the errors are coming from. "Flaky web server" is an aggregate property, not a simple one; specifically, it is the sum of all the different sources of error, slowness, and other badness — the disk contention; the database queries against un-indexed tables; the slowly failing NIC; the excess load from the web spider that's copying the main page ten times a second looking for updates; the design choice of retrying failed transactions repeatedly, thus causing overload to make itself worse.
There is some fact of the matter about which error sources are causing more failures than others, too. If 1% of failed queries are caused by a failing NIC, but 90% are caused by transactions timing out due to slow database queries to an overloaded MySQL instance, then swapping the NIC out is not going to help much. And two flaky websites may be flaky for completely unrelated reasons.
Talking about how flaky or reliable a web server is lets you compare two web servers side-by-side and decide which one is preferable. But by itself it doesn't let you fix anything. You can't just point at the better web server and tell the worse one, "Why can't you be more like your sister?" — or rather, you can, but it doesn't work. The differences between the two do matter, but you have to know which differences matter in order to actually change things.
To bring the analogy back to human cognitive behavior: yes, you can probably measure which of two people is "more rational" than the other, or even "more intelligent". But if someone wants to become more rational, they can't do it by just trying to imitate an exemplary rational person — they have to actually diagnose what kinds of not-rational they are being, and find ways to correct them. There is no royal road to rationality; you have to actually struggle with (or work around) the specific bugs you have.
I agree with the general thrust of the essay (that broad, fuzzy labels like "bad at" are more useful if reduced to specific bug descriptions,) but I'll note that being aware of the specific bugs that cause people to make the mistakes they're making does not stop me from thinking of peopl... (read more)