"what can we do to prevent some small group of humans (the SIAI, a secret conspiracy of billionaires, a secret conspiracy of Google employees, whoever) from steering a first-mover scenario in a direction that's beneficial to themselves and perhaps their blood relatives, but harmful to the rest of humanity?"
Actually, if they managed to do that, then they have managed to build an FAI. The large(/largest ?) risk some (as SIAI I think, but I am not an expert) is that they think they are building an FAI (or perhaps a too weak AI to be really dangerous) but that they are misstaken in that assumption. In reality they have been building an uFAI that takes over the world and humanity as a whole is doomed, including a small minority of humanity that possibly the AI was supposed to be friendly to.
There seems to be three different problems here. To analyse how dangerous AIs in general are. If dangerous, how can one make an FAI, that is an AGI that is at least beneficial to some. And then if an FAI can be built, to whom should it be friendly. As I interprete your post you are discussing the third question and dangers related to that while hypothetically assuming that the small group building the AGI has managed to solve the second question? If so, you are not really discussing why some would build an uFAI half-way by purpose but why some would build an FAI that is unfriendly to most humans?
That's not how I understood the "on the whole, beneficial to humans and humanity." It would benefit some humans, but it wouldn't fulfill the "on the whole" part of the quoted definition of Friendly AI.
That does, though, highlight some of the confusions that seem to surround the term "Friendly AI."
First, some quotes from Eliezer's contributions to the Global Catastrophic Risks anthology. First, from Cognitive biases potentially affecting judgement of global risks:
And from Artificial Intelligence as a Positive and Negative Factor in Global Risk:
I think the first quote is exactly right. But it leaves out something important. The effects of someone's actions do not need to destroy the world in order to be very, very, harmful. These definitions of Friendly and unFriendly AI are worth quoting (I don't know how consistently they're actually used by people associated with the SIAI, but they're useful for my purposes):
Again, an action does not need to destroy the world to be, on the whole, harmful to humans and humanity; malevolent rather than benevolent. An assurance that a human or humans will not do the former is no assurance that they will not do the latter. So if there ends up being a strong first-mover effect in the development of AI, we have to worry about the possibility that whoever gets control of the AI will use it selfishly, at the expense of the rest of humanity.
The title of this post says "halfway on purpose" instead of "on purpose," because in human history even the villains tend to see themselves as heroes of their own story. I've previously written about how we deceive ourselves so as to better deceive others, and how I suspect this is the most harmful kind of human irrationality.
Too many people--at least, too many writers of the kind of fiction where the villain turns out to be an all-right guy in the end--seem to believe that if someone is the hero of their own story and genuinely believes they're doing the right thing, they can't really be evil. But you know who was the hero of his own story and genuinely believed he was doing the right thing? Hitler. He believed he was saving the world from the Jews and promoting the greatness of the German volk.
We have every reason to think that the psychological tendencies that created these hero-villains are nearly universal. Evolution has no way to give us nice impulses for the sake of having nice impulses. Theory predicts, and observation confirms, that we tend to care more about blood-relatives than mere allies and allies more than strangers. As Hume observed (remarkably, without any knowledge of Hammilton's rule) "A man naturally loves his children better than his nephews, his nephews better than his cousins, his cousins better than strangers, where every thing else is equal." And we care more about ourselves than any single other individual on the planet (even if we might sacrifice ourselves for two brothers or eight cousins.)
Most of us are not murderers, but then most of have never been in a situation where it would be in our interest to commit murder. The really disturbing thing is that there is much evidence that ordinary people can become monsters as soon as the situation changes. Science gives us the Stanford Prison Experiment and Milgram's experiment on obedience to authority, history gives us even more disturbing facts about how many soldiers commit atrocities in war time. Of the soldiers who came from societies where atrocities are frowned on, most of them must have seemed perfectly normal before they went off to war. Probably most of them, if they'd thought about it, would have sincerely believed they were incapable of doing such things.
This makes a frightening amount of evolutionary sense. There's reason for evolution to, as much as possible, give us conditional rules for behavior so we only do certain things when it's fitness increasing to do so. Normally, doing the kind of things done during the Rape of Nanking leads to swift punishments, but the circumstances when such things actually happen tend to be circumstances where punishment is much less likely, where the other guys are trying to kill you anyway and your superior officer is willing to at minimum look the other way. But if you're in a situation where doing such things is not in your interest, where's the evolutionary benefit of even being aware of what you're capable of?
Taking this all together, the risk is not that someone will deliberately use AI to harm humanity (do it on purpose). The risk is that they'll use AI to harm humanity for selfish reasons, while persuading themselves they're actually benefiting humanity (doing it halfway on purpose.) If whoever gets control of a first-mover scenario sincerely believed, prior to gaining unlimited power, that they really wanted to be really, really careful not to do that, that's no assurance of anything, because they'll have been thinking that before the situation changed and there was a chance for the conditional rule, "Screw over other people for personal gain if you're sure if getting away with it" triggered.
I don't want to find out what I'd do with unlimited power. Or rather, all else being equal I would like to find out, but I don't think putting myself in a position where I actually could find out would be worth the risk. This is in spite of the fact that the fact that I am even worrying about these things may be a sign that I'd be less of a risk than other people. That should give you an idea of how little I would trust other people with such power.
The fact that Eliezer has stated his intention to have the Singularity Institute create FOOM-capable AI doesn't worry me much, because I think the SIAI is highly unlikely to succeed at that. I think if we do end up in a first-mover scenario, it will probably be the result of some project backed by a rich organization like IBM, the United States Department of Defense, or Google.
Forgetting about that, though, this looks to me like an absolutely crazy strategy. Eliezer has said creating FAI will be a very meta operation, and I think I heard him once mention putting prospective FAI coders through a lot of rationality training before beginning the process, but I have no idea why he would think those are remotely sufficient safeguards for giving a group of humans unlimited power. Even if you believe there's a significant risk that creating FOOM-capable FAI could be necessary to human survival, shouldn't, in that case, there be a major effort to first answer the question, "Is there any possible way to give a group of humans unlimited power without it ending in disaster?"
More broadly, given even a small chance that the future of AI will end up in some first-mover scenario, it's worth asking, "what can we do to prevent some small group of humans (the SIAI, a secret conspiracy of billionaires, a secret conspiracy of Google employees, whoever) from steering a first-mover scenario in a direction that's beneficial to themselves and perhaps their blood relatives, but harmful to the rest of humanity?"