Then it did something that would be amazing in this era, much less 1969: [snip description of reboot]
That's not really amazing. It's par for the course for modern microcontrollers, of the sort that litter the innards of modern cars and tractors and such. They usually keep their programs in NOR Flash memory, so they don't need to be read from a hard drive on start-up, and don't need to keep much state in volatile memory. And they are usually designed to be able to start up in the blink of an eye. There are fairly cheap microcontrollers with better specs than the Apollo Guidance Computer, and they're common in applications that need reliable embedded software. It's a safe bet that the private space industry uses quite a lot of them. And the job prioritization is typical for any system designed to be hard realtime.
Even in big computers like the one on your desk, failing really quickly and well can help with reliability. There's a school of thought in server design which says that servers should consist of large numbers of isolated parts, which crash if anything goes wrong, and can be rebooted very quickly. This is how most web sites stay up despite bugs, random crashes, and server failures.
I think what is interesting is not the reboot but the fact that it every task was prioritized and unimportant ones were inherently discarded. I do not think this is a feature typical to embedded programming.
They Write the Right Stuff is about software which "never crashes. It never needs to be re-booted. This software is bug-free. It is perfect, as perfect as human beings have achieved. Consider these stats : the last three versions of the program -- each 420,000 lines long-had just one error each. The last 11 versions of this software had a total of 17 errors. Commercial programs of equivalent complexity would have 5,000 errors."
The programmers work from 8 to 5, with occasional late nights. They wear dressy clothes, not flashy or grungy. I assume there's a dress code, but I have no idea whether conventional clothes are actually an important part of the process. I'm sure that working reasonable numbers of hours is crucial, though I also wonder whether those hours need to be standard office hours.
"And the culture is equally intolerant of creativity, the individual coding flourishes and styles that are the signature of the all-night software world. "People ask, doesn't this process stifle creativity? You have to do exactly what the manual says, and you've got someone looking over your shoulder," says Keller. "The answer is, yes, the process does stifle creativity." " I have no idea what's in the manual, or if there can be a manual for something as new as self-optimizing AI. I assume there could be a manual for some aspects.
What follows is main points quoted from the article:
The important thing is the process: The product is only as good as the plan for the product. About one-third of the process of writing software happens before anyone writes a line of code.
2. The best teamwork is a healthy rivalry. The central group breaks down into two key teams: the coders - the people who sit and write code -- and the verifiers -- the people who try to find flaws in the code. The two outfits report to separate bosses and function under opposing marching orders. The development group is supposed to deliver completely error-free code, so perfect that the testers find no flaws at all. The testing group is supposed to pummel away at the code with flight scenarios and simulations that reveal as many flaws as possible. The result is what Tom Peterson calls "a friendly adversarial relationship."
I note that it's rivalry between people who are doing different things, not people competing to get control of a project.
3. The database is the software base.
One is the history of the code itself -- with every line annotated, showing every time it was changed, why it was changed, when it was changed, what the purpose of the change was, what specifications documents detail the change. Everything that happens to the program is recorded in its master history. The genealogy of every line of code -- the reason it is the way it is -- is instantly available to everyone.
The other database -- the error database -- stands as a kind of monument to the way the on-board shuttle group goes about its work. Here is recorded every single error ever made while writing or working on the software, going back almost 20 years. For every one of those errors, the database records when the error was discovered; what set of commands revealed the error; who discovered it; what activity was going on when it was discovered -- testing, training, or flight. It tracks how the error was introduced into the program; how the error managed to slip past the filters set up at every stage to catch errors -- why wasn't it caught during design? during development inspections? during verification? Finally, the database records how the error was corrected, and whether similar errors might have slipped through the same holes.
The group has so much data accumulated about how it does its work that it has written software programs that model the code-writing process. Like computer models predicting the weather, the coding models predict how many errors the group should make in writing each new version of the software. True to form, if the coders and testers find too few errors, everyone works the process until reality and the predictions match.
4. Don't just fix the mistakes -- fix whatever permitted the mistake in the first place.
The process is so pervasive, it gets the blame for any error -- if there is a flaw in the software, there must be something wrong with the way its being written, something that can be corrected. Any error not found at the planning stage has slipped through at least some checks. Why? Is there something wrong with the inspection process? Does a question need to be added to a checklist?
Importantly, the group avoids blaming people for errors. The process assumes blame - and it's the process that is analyzed to discover why and how an error got through. At the same time, accountability is a team concept: no one person is ever solely responsible for writing or inspecting code. "You don't get punished for making errors," says Marjorie Seiter, a senior member of the technical staff. "If I make a mistake, and others reviewed my work, then I'm not alone. I'm not being blamed for this."