While I think your comment is generally true, I feel that it's almost a disservice to emphasize this point. A huge number of problems in the statistical sciences could be overcome by just a tiny bit of uniformity among model checking procedures. If it was seen as "bad form" to submit a journal article without doing some model expansion checks, or without providing test statistic analysis that goes beyond classical p-values, then the quality of publications would jump up. Even uniformity of the classical p-value testing would be helpful. I don't really like the use of classical p-values and test statistics, but they do say something about model validity. However, even in that domain, the test statistics are not always computed correctly; the way in which they were computed is rarely reported; and there are tons of systematic errors made by folks unfamiliar with the theory behind the statistical tests. Even if we had to continue using classical hypothesis testing, but we could just get people to apply the tests in a correct, systematic way, this would be a huge improvement. I would happily wager eating a stick of butter to get a world in which I didn't have to read statistical results and in my head be thinking, "Okay, how did these authors mess this up? Are they reporting the right thing? Did they just keep gathering data until they reached a significance level they wanted? Etc..."
Essentially, I think your comparison breaks down in one important way. While it may be possible to write software that is bug free, it's not as easy to prove that your code is as efficient as it needs to be, or that it will generalize to new use cases. Unit testing definitely focuses on proving correctness and bug-free-ness. But another, less directly objective part of it is proving that your code is well-suited to the computational task. Why did you pick the algorithm, design pattern, or language that you chose? If you truly design unit tests well, then some of the tests will also address slightly higher level issues like these, which are closer to the model checking issues.
Also, I think the flip-side to the Box quote is just as important: "All models are right; most are useless." This is discussed here.
Andrew Gelman recently linked a new article entitled "Induction and Deduction in Bayesian Data Analysis." At his blog, he also described some of the comments made by reviewers and his rebuttle/discussion to those comments. It is interesting that he departs significantly from the common induction-based view of Bayesian approaches. As a practitioner myself, I am happiest about the discussion on model checking -- something one can definitely do in the Bayesian framework but which almost no one does. Model checking is to Bayesian data analysis as unit testing is to software engineering.
Added 03/11/12
Gelman has a new blog post today discussing another reaction to his paper and giving some additional details. Notably: