This sounds like you're assuming that I'm trying to argue in favor of Friendly AI as the best solution...
(Responding to the whole paragraph but don't want to quote it all) I would be interested to hear a definition of "AI risk" that does not reduce to "risk of unfriendly outcome" which itself is defined in terms of friendliness aka relation to human morality. If, like me, you reject the idea of consistent, discoverable morality in the first place, and therefore find friendliness to be an ill-formed, inconsistent idea, then it's hard to say anything concrete about AI risk either. If you have a better definition that does not reduce to alignment with human morality, please provide it.
Mapping the problem starts with defining what the problem is. What is AI risk, without reference to dubious notions of human morality?
I think that you've left out a LOT of things that must happen a certain way in order for your AI risk outcomes to come to pass. Would appreciate hearing more about these.
To start with there's all the normal, benign things that happen in any large scale software project that require human intervention. Like, say, the AGI crashes. Or the database that holds its memories becomes inconsistent. Or it gets deadlocked on choosing actions due to a race condition. The humanity threatening failure mode presume that the AGI, on its first attempt at break-out, doesn't suffer any normal engineering defect failures -- or that if it does then the humans operating it just fix it and turn it back on. I'm not interested in any arguments that assume the latter, and the former is highly conjunctive.
Isn't that the standard way of figuring out the appropriate corrective actions? First figure out what would happen absent any intervention, then see which points seem like most amenable to correction.
I may have misread your intent, and if so I apologize. The first sentence of your post here made it seem like you were countering a criticism, aka advocating for the original position. So I read your posts in that context and may have inferred too much.
If, like me, you reject the idea of consistent, discoverable morality in the first place, and therefore find friendliness to be an ill-formed, inconsistent idea, then it's hard to say anything concrete about AI risk either. If you have a better definition that does not reduce to alignment with human morality, please provide it.
Mapping the problem starts with defining what the problem is. What is AI risk, without reference to dubious notions of human morality?
I also reject the idea of a consistent, discoverable morality, at least to the extent that the ...
Arguments for risks from general AI are sometimes criticized on the grounds that they rely on a series of linear events, each of which has to occur for the proposed scenario to go through. For example, that a sufficiently intelligent AI could escape from containment, that it could then go on to become powerful enough to take over the world, that it could do this quickly enough without being detected, etc.
The intent of my following series of posts is to briefly demonstrate that AI risk scenarios are in fact disjunctive: composed of multiple possible pathways, each of which could be sufficient by itself. To successfully control the AI systems, it is not enough to simply block one of the pathways: they all need to be dealt with.
I've got two posts in this series up so far:
AIs gaining a decisive advantage discusses four different ways by which AIs could achieve a decisive advantage over humanity. The one-picture version is:
AIs gaining the power to act autonomously discusses ways by which AIs might come to act as active agents in the world, despite possible confinement efforts or technology. The one-picture version (which you may wish to click to enlarge) is:
These posts draw heavily on my old paper, Responses to Catastrophic AGI Risk, as well as some recent conversations here on LW. Upcoming posts will try to cover more new ground.