In the comments on this post (which in retrospect I feel was not very clearly written), someone linked me to a post Eliezer wrote five years ago, "The Hidden Complexity of Wishes." After reading it, I think I've figured out why the term "Friendly AI" is used so inconsistently.
This post explicitly lays out a view that seems to be implicit in, but not entirely clear from, many of of Eliezer's other writings. That view is this:
There are three kinds of genies: Genies to whom you can safely say "I wish for you to do what I should wish for"; genies for which no wish is safe; and genies that aren't very powerful or intelligent.
Even if Eliezer is right about that, I think that view of his has led to confusing usage of the term "Friendly AI." If you accept Eliezer's view, it may seem to make sense to not worry to much about whether by "Friendly AI" you mean:
-
A utopia-making machine (the AI "to whom you can safely say, 'I wish for you to do what I should wish for.'") Or:
-
A non-doomsday machine (a doomsday machine being the AI "for which no wish is safe.")
And it would make sense not to worry too much about that distinction, if you were talking only to people who also believe those two concepts are very nearly co-extensive for powerful AI. But failing to make that distinction is obviously going to be confusing when you're talking to people who don't think that. It will make it harder to communicate both your ideas and your reasons for holding those ideas to them.
One solution would be to more frequently link people back to "The Hidden Complexity of Wishes" (or other writing by Eliezer that makes similar points--what else would be suitable?) But while it's a good post and Eliezer makes some very good points with the "Outcome Pump" thought-experiment, the argument isn't entirely convincing.
As Eliezer himself has argued at great length, (see also section 6.1 of this paper) humans' own understanding of our values is far from perfect. None of us are, right now, qualified to design a utopia. But we do have some understanding of our own values; we can identify some things that would be improvements over our current situation while marking other scenarios as "this would be a disaster." It seems like there might be a point in the future where we can design an AI whose understanding of human values is similarly serviceable but no better than that.
Maybe I'm wrong about that. But if I am, until there's a better easy to read explanation of why I'm wrong for everybody to link to, it would be helpful to have different terms for (1) and (2) above. Perhaps call them "utopia AI" and "safe AI," respectively?
Just because it doesn't do exactly what you want doesn't mean it is going to fail in some utterly spectacular way.
You aren't searching for solutions to a real world problem, you are searching for solutions to a model (ultimately, for solutions to systems of equations), and not only you have limited solution space, you don't model anything irrelevant. Furthermore, the search space is not 2d and not 3d, and not even 100d, the volume increases really rapidly with size. The predictions of many systems are fundamentally limited by Lyapunov's exponent. I suggest you stop thinking in terms of concepts like 'improve'.
If something self improves at software level, that'll be a piece of software created with very well defined model of changes to itself, and the very self improvement will be concerned with cutting down the solution space and cutting down the model. If something self improves at hardware level, likewise for the model of physics. Everyone wants artificial rainman. The autism is what you get from all sorts of random variations to baseline human brain; looks like the general intelligence that expands it's model and doesn't just focus intensely is a tiny spot in the design space. I don't see why expect general intelligence to suddenly overtake specialized intelligences; the specialized intelligences have better people working on them, have the funding, and the specialization massively improves efficiency; superhuman specialized intelligences require lower hardware power.
I certainly agree, and I am not even sure what the official SI position is on the probability of such failure. I know that Eliezer in hist writing does give the impression that any mistake will mean certain doom, which I believe to be an exaggeration. But failure of this kind is fundamentally unpredictable, and if a low probability even kills you, you are still dead, and I think that it is high enough that the Friendly AI type effort would n... (read more)