Test-driven development is a common subject of affective death spirals, and this post seems to be the product of one. In general, programmers ought to write more unit tests than they usually do, but not everything should be unit tested, and unit testing alone is absolutely not sufficient to ensure good code or good abstractions.
Writing tests is not free; it takes time, and while it often pays for itself, there are plenty of scenarios where it doesn't. Time is a limited resource which is also spent on thinking about abstractions, coding, refactoring, testing by hand, documenting and many other worthy activities. The amount of time required to write a unit test depends on the software environment, and the specific thing being tested. Things that involve interactions crossing out of your program's domain, like user interfaces, tend to be hard to test automatically. The amount of time required to write a test is not guaranteed to be reasonable. The benefits of a test also vary, depending on the thing being tested. Trivially simple code is not worth testing; for example, it would be wrong to test Java getters and setters except as an implicit part of a larger test.
All of the above is true.
Also true is that most of the specific cases where you think you should skip testing first are errors, and you should have started with the test.
(Extensive unit testing without a good mocking and stubbing framework is hard. Testing around external interfaces is also hard (but not hard).)
("most" != "all"; and jim, your beard may be longer than mine, in which case you are assumed to be an exception to the vast over-generalisation I commit above.)
Exploratory programming is a way of getting to that point, fast. You can bring out the tests when you've got a project (or piece of a project) that you're pretty sure you'll be using for a while.
There are two problems with this idea. First, I've found TDD to be extraordinarily effective at helping break down a problem that I have no idea how to solve. That is, if I don't even know what to sketch, I sometimes start with the tests. (Test-Driven Development By Example has some good examples of when/why you'd do that.)
Second: we can be really bad at guessing whether or not something will get thrown away. Rarely does version 0 really get thrown away, and so by the time you've built up a bunch of code, the odds that you'll go back and write the tests are negligible.
Like most things in IT, test-driven development has a sweet spot of applicability. Good programmers must recognize when a specific technique would help and when it would hurt. See Joel's article about the five worlds of software development and ask yourself which "worlds" would benefit most from TDD. For example, what percentage of a typical game's code is amenable to TDD at all? What percentage of a typical webapp? A typical shrinkwrap Windows app? A typical device driver?
The answer is that, in general, Extreme Programming practices like TDD were created to make writing internal corporate apps easier and more predictable, but other "worlds" have limited use for them - because these practices solve only the very easy problems in these worlds, but still require you to twist your development process so you no longer know how to solve the harder ones.
At my current job (web UI development and bits and pieces of the backend for an online map) I used TDD exactly once - when I needed a mini-parser for a subset of SQL. But even there the deciding factor was my familiarity with the concept of "parser combinators", not my familiarity with TDD.
What you're saying is too abstract, I can't understand any of it. What would be the "preconditions and postconditions" for Google Maps? "The tiles must join seamlessly at the edges"? "When the user clicks and drags, all tiles move along with the cursor"? How do you write automated tests for such things?
In a child comment wnoise says that "every bug that is found should have a unit test written". For the record, I don't agree with that either. Take this bug: "In Opera 10.2 the mousewheel delta comes in with the wrong sign, which breaks zooming." (I can't vouch for the exact version, but I do remember that Opera pulled this trick once upon a minor version update.) It's a very typical bug, I get hundreds of those; but how do you write a test for that?
You could say web development is "special" this way. Well, it isn't. Ask game developers what their typical bugs look like. (Ever try writing a 3D terrain engine test-first?) Ask a Windows developer fighting with version hell. Honestly I'm at a loss for words. What kind of apps have you seen developed with TDD start to finish? Anything interesting?
Maybe related: Ron Jeffries (wel...
Just a side note, but essentially the entire field of mind hacking as I teach it is based on TDD. Specifically, the testing of autonomous anticipations: i.e., what your "near" brain expects will happen in a concrete situation, and especially the emotional tags it assigns to that expectation.
If you know how to do that type of testing, you can test any self-help technique designed for changing beliefs, and determine in a matter of minutes whether it's actually any good (assuming you're able to execute the technique correctly, of course).
Most of th...
I program without TDD.
I only sometimes write automatically evaluated tests.
One thing that appeals to me about the idea of TDD is that perhaps it seems like a puzzle or game to pass some (easy to create) test. That is, the feedback is free once established.
Since I'm good at finding and fixing bugs without an always-run suite of tests, I don't bother creating them at all for most small programs (some of which are easy to implement correctly, but annoying to automatically test comprehensively). At some scale or difficulty, my abilities will fail and I will ...
Sorry, I have a meta question. (Will comment on the content later.) When you moved this article from the discussion area to LW proper, did its upvotes get converted at a 1-to-10 rate? :-)
Another (less important) item of curiosity: Are people who upvoted before the move (for 1 point) permitted to upvote again (for 10 points)? Can they back-out their 1 point upvotes, generating a -9 point no-op and then continue to downvote for an additional -10.
PS: I used to work in software test. :)
Okay, I've edited my article again, reducing some of the (retrospectively obvious) over-enthusiasm, adding some more details about anti-akrasia benefits and the potential downfalls of TDD, and also sprinkling in a few more links. I appreciate the comments I've got so far, and further suggestions are very welcome.
Should I submit this article to the LW post queue proper?
I find it interesting that you used graphics for your examples, which is among the areas it is hard to automate tests for (there is a lot of complexity behind "verify a rectangle was drawn on the screen", that we don't notice because our visual cortexes take care of it for us), but you don't address any of the problems with testing in this domain.
This leads me to agree with jimrandomh's affective death spiral theory.
EDIT: Moving the article from discussion has changed its URL and the comments' URL. Corrected link (Original left to illustrate prob...
Upvoted for your very precise articulation of what encountering a bug feels like. ;)
I find it interesting that this style of development seems very familiar to me ... except for the bit about automated testing! (Bear with me.) I tend to work on small projects, and because of the way I'm wired (i.e. because I haven't rewired myself out of it), when I'm working on something larger, I can't really keep many pieces of it in my head at once. When I try to, bits which aren't drawing my attention right now will fall out of my brain, and sometimes those bits were ...
It's confusing to think of TDD as its own rationality technique: testing your belief that a piece of code works is not fundamentally different from testing any other belief. Okay, so that part is just running unit tests once. Since whether a piece of code works is a different belief from whether that piece of code with a few modifications works, for efficiency, since writing tests is work, you need to keep that tests around and rerun them. So, that's unit testing. TDD is just writing your tests beforehand, which makes a difference in the process of designi...
We are slowly moving towards languages with contracts (like Spec#), where "unit tests" are replaced by "contract tuning" + "functional tests".
Computer programming can be a lot of fun, or it can be brain-rendingly frustrating. The transition between these two states often goes something like this:
Paula the Programmer: Computer, using the "Paula's_Neat_Geometry" library, draw a triangle near the top of the screen once per frame.
Cronus the Computer: Sure, no problem.
P: After drawing that triangle, draw a rectangle 50 units below it.
C: Will do, boss.
P: Sweet. Alright, after the rectangle, draw a circle 20 units to the right, then another 20 units to the left.
C: GLARBL GLARBL GLARBL I hear it's amazing when the famous purple stuff wormed in flap-jaw space with the tuning fork does a raw blink on Hari Kiri Rock! I need scissors! 61!1 System error.
P: Crap! Crap crap crap. Um, okay, let's see...
And then Paula must spend the next 45 minutes turning the circle drawing code on and off and figuring out where the wackiness originates from. When the circle code is off, she sees everything work fine. When she turns it back on, she sees everything that she thought she understood so well, that she was previously able to manipulate with the calm joyful deftness of a virtuoso playing a violin, turn into a world of mystery and ice-slick confusion. Something about that request to draw that circle at that particular time and place is exposing a difference between Paula's model of the computer and the computer's reality.
When this happens to a programmer often enough, they begin to realize that even when things seem to be working fine, these differences still probably lurk unseen beneath the surface, waiting invisibly to strike. This is an unsettling feeling. As a technique of rationality, or just because being uncomfortable is unpleasant, they seek diligently to avoid creating these cross-model inconsistencies (known colloquially as "bugs") in their own code, so as to avoid subjecting themselves to GLARBL GLARBL GLARBL moments.
Having a sincere desire to be less wrong in one's thinking is fine, but not enough. One also needs an effective process to follow, a system for making it harder to fool oneself, or at least for noticing when it's happened. Test Driven Development is one such system; not the only one, and not without its practical problems (which will be at most briefly glossed over in this introductory article), but one of my personal favorites, primarily because of the way it makes me feel confident about the quality of my work.
Why Computer Programming Requires Rationality
Computer programming is the process of getting a messy, incomplete, often self-contradictory, and overall badly organized idea out of one's head and explaining it completely and thoroughly to a quite stupid machine that has no common sense whatsoever. This is beneficial for the users of the program, but also for the programmer, because the computer does not have a programmer's human biases, such as mistaking the name of an idea with an understanding of how that idea works.
It has been said that you only truly understand how to do something when you can teach a computer how to do it for you. This doesn't mean that you have to understand the thing perfectly before you can begin programming; the process of programming itself will change and refine the idea in the programmer's mind, chipping away rotten bits and smoothing connections as the idea moves piece-by-piece from the programmer's mind into a harsh reality that doesn't care about how neat something sounds, just whether or not it works.
Through the process of explaining the problem and solution to the computer, the programmer is also explaining it to themselves, checking that that explanation is correct as they go, and adjusting it in their own minds as necessary to make it match.
In a typical single-person development process, a programmer will think about the problem as a whole, mentally sketch out a framework of the tools and structures they will have to write to make the problem solvable, then begin implementing those tools in whatever order seems most intuitive. At this point, great loud alarm bells should be ringing in the heads of Less Wrong readers, indicating that this is a problematically far-mode way to go about things.
Why Test Driven Development Is Rational
The purpose of Test Driven Development is to formalize and divide into tiny pieces that part right before a programmer starts writing code: the part where they think about what they are expecting the code to do. They are then encouraged to think about each of those small pieces individually, in near-mode, using the following steps:
RED: Figure out what feature you want to add next; make it a small feature, like "draw a triangle". Write a test, a tiny test, a test that only checks for the one new feature, and that will only pass if the feature is working properly. This part can be hard if you didn't really have a clear idea of the feature in the first place, but at least you're dealing with that difficulty now and not when 20 other things in the program already depend on your slightly flawed understanding. Anyways, once you've written the test, run it and make sure it fails in the expected manner, since the feature hasn't actually been implemented yet.
GREEN: Now actually go and write the code to make the test pass. Write as little code as possible, with minimum cleverness, to make this one immediate goal happen. Don't write any code that isn't necessary for making the test pass.
REFACTOR: Huzzah, the test passes! But the code has some bad smells: it's repetitious, it's hard to read, it generally creates a feeling of creeping unease. Make it clean, remove all the duplicated parts, both in the test and the implementation.
BLISS: Run all the prior tests; they should still be green. Feel a sense of serene satisfaction that all your expectations continue to be met; be confident your mental model of the whole program continues to be a pretty close match. If you have a version control system (and you really should), commit your changes to it now with a witty yet descriptive message.
Working piece by tiny piece, your code will become as complicated as you need it to be, but no more so. You are not as likely to waste time creating vast wonderful code castles made of crystal and silver that turn out to be pointless and useless because you were thinking of the wrong abstraction. You are more likely to notice right away if you accidentally break something, because that something shouldn't be there in the first place unless it had a test to justify it, and that test will complain.
TDD is a good anti-akrasia technique for writing tests. Classically, tests are written after the program is working, but such tests are rarely very thorough, because it feels superfluous to write a test that already tells you what you (think that you) know, that the program works.
TDD is also helpful broadly fighting against programming akrasia in general. You receive continuous feedback that what you are doing is accomplishing something and not breaking anything. It becomes more difficult to dawdle, since there's always an immediate short-term goal to focus on.
Finally, for me and for many other people who've tried it, TDD makes programming more fun, and more satisfying. There's nothing quite like the feeling of confidence that comes from knowing that your program does just what you think it does.
Or, well, thinking that you know.
Why Test Driven Development Isn't Perfect
Basking innocently in the feeling of the BLISS stage, you check your email and get an angry bug report: when the draw color is set to turquoise, instead of rectangles your program is drawing something that looks vaguely like a silhouette of Carl Friedrich Gauss engaged in a swordfight against a trout. What's going on here? Why wasn't this bug caught by the tests? There's a "Rectangles_Are_Drawable" test, and a "Turquoise_Things_Are_Drawable" test, and they both pass, so how can drawing turquoise rectangles fail?
Something about turqouiseness and rectangleness is lining up just right and causing things to fall apart, and this outcome is certainly not predicted by the programmer's mental model of the program. This means that either that something in the program is not actually being tested at all, or (more likely) that one of the tests doesn't test everything the programmer thinks it does. TDD (among its other benefits) does reduce the chance of bugs being created, but doesn't eliminate it, because even within the short near-mode phases of Red-Green-Refactor-Bliss there's still opportunity for us to foul things up. Eliminating all bugs is a grand dream, but not likely to happen in reality as long as the program isn't dead simple (or formally verifiable, but that's a technique for another day).
However, because we can express bugs as testable assumptions, TDD applies just as well to creating bugfixes as it does to adding new features:
RED: Write a new test "Turquoise_Rectangles_Are_Drawable", which sets the color to turquoise, tells the library to draw a rectangle, and makes sure a rectangle and not some other shape was drawn. Run the test, it should fail. If it doesn't, then the bug report was incomplete, and the situation that needs to be setup before Gauss is drawn is more elaborate.
GREEN: Figure out what's making the bug happen. Fix it. Test passes.
REFACTOR: Make the fix pretty.
BLISS: The rest of the program still works as expected (to the degree that your expectations were expressed, anyways). Also, this particular bug will never come back, because if someone does accidentally reintroduce it then the test that checks this newly created expectation will complain. Commit changes with a joke about Gaussian blurring.
Why Test Driven Development Isn't Always Appropriate
A word of warning: this article is intended to be readable for people who are unfamiliar with programming, which is why simple, easily visualized examples like drawing shapes were used. Unfortunately, in real life, graphics-drawing is just the sort of thing that's hardest to write tests for.
As an extreme example, consider CAPTCHA, software that tries to detect whether a human being or a spambot is trying to get an account on your site by asking them to read back an image of squirrelly-looking text. TDD would at best be minimally useful for this; you could bring in the best OCR algorithms you have available and pass the test if they *cannot* pull text out of the image... but it would be hard to tell if that was because the program was producing properly hard-to-scan images, or because it was producing useless nonsense!
It's part of a larger category of things which are hard to automatically test because their typical operation involves working with a human, and we can't simulate humans very well at all (yet). Any program that's meant to interact with a human, and depend upon that human behaving in a sophisticated human way (or in other words, any program that has a user interface which isn't incredibly simple), will have difficulty being thoroughly tested in a non-brittle way. This problem is exacerbated because user interfaces tend to change significantly as they are subjected to usability testing and rethought, necessitating tedious changes in any tests that depend on their specifics. That doesn't mean TDD isn't applicable to such programs, just that it is more useful when working on their inner machinery than their user-facing shell.
(There are also ways of minimizing this problem in certain sorts of user interface scenarios, but that's beyond the scope of this article.)
Test Driven $BEHAVIOR
It is unfortunate that this technique is not more widely applicable to situations other than computer programming. As a rationalist, the process of improving my beliefs should be like TDD: doing one specific near-mode thing at a time, doing checks they can definitively pass or fail, and building up through this process a set of tests/experiments that thoroughly represent and drive changes to the program implementation, aka my model of the world.
The major disadvantage my beliefs have compared to a computerized test suite is that they won't hold still and be counted. I cannot do an on-demand enumeration through every single one of my beliefs and test them individually to make sure they all still hold up; I have to rely on my memories of them, which might well be shifting and splitting up and making a mess of themselves whenever I'm not looking. I can do RED and GREEN phases on particular ideas when they come to mind, but I'm unfortunately unable to do anything like a thorough and complete BLISS phase.
This article has partly been about introducing a coding technique which I think is pretty neat and of relevance to rationalists, but it's also about leading up to this question that I'd like to ask Less Wrong: how can I improve my ability to do Test Driven Thinking?
1. This bit of wonderfully silly text is from Konami's Metal Gear Solid 2.