Fair question.
My point is that if improving techniques could take you from (arbitrarily chosen percentages here) a 50% chance that an unfriendly AI would cause an existential crisis, to 25% chance that it would - you really didn't gain all that much, and the wiser course of action is still not to make the AI.
The actual percentages are wildly debatable, of course, but I would say that if you think there is any chance - no matter how small - of triggering ye olde existential crisis, you don't do it - and I do not believe that technique alone could get us anywhere close to that.
The ideas you propose in OP seem wise, and good for society - and wholly ineffective in actually stopping us from creating an unfriendly AI, The reasons are simply that the complexity defies analysis, at least by human beings. The fear is that the unfriendly arises from unintended design consequences, from unanticipated system effects rather than bugs in code or faulty intent
It's a consequence of entropy - there are simply far, far more ways for something to get screwed up than for it to be right. So unexpected effects arising from complexity are far, far more likely to cause issues than be beneficial unless you can somehow correct for them - planning ahead only will get you so far.
Your OP suggests that we might be more successful if we got more of it right "the first time". But - things this complex are not created, finished, de-novo - they are an iterative, evolutionary task. The training could well be helpful, but I suspect not for the reasons you suggested. The real trick is to design things so that when they go wrong - it still works correctly. You have to plan for and expect failure, or that inevitable failure is the end of the line.
My point is that if improving techniques could take you from (arbitrarily chosen percentages here) a 50% chance that an unfriendly AI would cause an existential crisis, to 25% chance that it would - you really didn't gain all that much, and the wiser course of action is still not to make the AI.
That's not the choice we are making. The choice we are making is to decide to develop those techniques.
These are quick notes on an idea for an indirect strategy to increase the likelihood of society acquiring robustly safe and beneficial AI.
Motivation:
Most challenges we can approach with trial-and-error, so many of our habits and social structures are set up to encourage this. There are some challenges where we may not get this opportunity, and it could be very helpful to know what methods help you to tackle a complex challenge that you need to get right first time.
Giving an artificial intelligence good values may be a particularly important challenge, and one where we need to be correct first time. (Distinct from creating systems that act intelligently at all, which can be done by trial and error.)
Building stronger societal knowledge about how to approach such problems may make us more robustly prepared for such challenges. Having more programmers in the AI field familiar with the techniques is likely to be particularly important.
Idea: Develop methods for training people to write code without bugs.
Trying to teach the skill of getting things right first time.
Writing or editing code that has to be bug-free without any testing is a fairly easy challenge to set up, and has several of the right kind of properties. There are some parallels between value specification and programming.
Set-up puts people in scenarios where they only get one chance -- no opportunity to test part/all of the code, just analyse closely before submitting.
Interested in personal habits as well as social norms or procedures that help this.
Daniel Dewey points to standards for code on the space shuttle as a good example of getting high reliability code edits.
How to implement:
Ideal: Offer this training to staff at software companies, for profit.
Although it’s teaching a skill under artificial hardship, it seems plausible that it could teach enough good habits and lines of thinking to noticeably increase productivity, so people would be willing to pay for this.
Because such training could create social value in the short run, this might give a good opportunity to launch as a business that is simultaneously doing valuable direct work.
Similarly, there might be a market for a consultancy that helped organisations to get general tasks right the first time, if we knew how to teach that skill.
More funding-intensive, less labour intensive: run competitions with cash prizes
Try to establish it as something like a competitive sport for teams.
Outsource the work of determining good methods to the contestants.
This is all quite preliminary and I’d love to get more thoughts on it. I offer up this idea because I think it would be valuable but not my comparative advantage. If anyone is interested in a project in this direction, I’m very happy to talk about it.