I think I've found the source of what's been bugging me about "Friendly AI"

ChrisHallquist

In the comments on this post (which in retrospect I feel was not very clearly written), someone linked me to a post Eliezer wrote five years ago, "The Hidden Complexity of Wishes." After reading it, I think I've figured out why the term "Friendly AI" is used so inconsistently.

This post explicitly lays out a view that seems to be implicit in, but not entirely clear from, many of of Eliezer's other writings. That view is this:

There are three kinds of genies: Genies to whom you can safely say "I wish for you to do what I should wish for"; genies for which no wish is safe; and genies that aren't very powerful or intelligent.

Even if Eliezer is right about that, I think that view of his has led to confusing usage of the term "Friendly AI." If you accept Eliezer's view, it may seem to make sense to not worry to much about whether by "Friendly AI" you mean:

A utopia-making machine (the AI "to whom you can safely say, 'I wish for you to do what I should wish for.'") Or:
A non-doomsday machine (a doomsday machine being the AI "for which no wish is safe.")

And it would make sense not to worry too much about that distinction, if you were talking only to people who also believe those two concepts are very nearly co-extensive for powerful AI. But failing to make that distinction is obviously going to be confusing when you're talking to people who don't think that. It will make it harder to communicate both your ideas and your reasons for holding those ideas to them.

One solution would be to more frequently link people back to "The Hidden Complexity of Wishes" (or other writing by Eliezer that makes similar points--what else would be suitable?) But while it's a good post and Eliezer makes some very good points with the "Outcome Pump" thought-experiment, the argument isn't entirely convincing.

As Eliezer himself has argued at great length, (see also section 6.1 of this paper) humans' own understanding of our values is far from perfect. None of us are, right now, qualified to design a utopia. But we do have some understanding of our own values; we can identify some things that would be improvements over our current situation while marking other scenarios as "this would be a disaster." It seems like there might be a point in the future where we can design an AI whose understanding of human values is similarly serviceable but no better than that.

Maybe I'm wrong about that. But if I am, until there's a better easy to read explanation of why I'm wrong for everybody to link to, it would be helpful to have different terms for (1) and (2) above. Perhaps call them "utopia AI" and "safe AI," respectively?

This post explicitly lays out a view that seems to be implicit in, but not entirely clear from, many of of Eliezer's other writings. That view is this:

There are three kinds of genies: Genies to whom you can safely say "I wish for you to do what I should wish for"; genies for which no wish is safe; and genies that aren't very powerful or intelligent.

A utopia-making machine (the AI "to whom you can safely say, 'I wish for you to do what I should wish for.'") Or:
A non-doomsday machine (a doomsday machine being the AI "for which no wish is safe.")

The highly specific predictions should be lowered in their probability when updating on the statement like 'unpredictable'.

That depends what your initial probability is and why. If it already low due to updates on predictions about the system, then updating on "unpredictable" will increase the probability by lowering the strength of those predictions. Since destruction of humanity is rather important, even if the existential AI risk scenario is of low probability it matters exactly how low.

This of course has the same shape as Pascal's mugging, but I do not believe that SI claims are of low enough probability to be dismissed as effectively zero.

Not everything is equally easy to describe as equations.

That was in fact my point, which might indicate that we are likely to be talking past each other. What I tried to say is that an artificial intelligence system is not necessarily constructed as an explicit optimization process over an explicit model. If the model and the process are implicit in its cognitive architecture then making predictions about what the system will do in terms of a search are of limited usefulness.

And even talking about models, getting back to this:

cutting down the solution space and cutting down the model

On further thought, this is not even necessarily true. The solution space and the model will have to be pre-cut by someone (presumably human engineers) who doesn't know where the solution actually is. A self-improving system will have to expand both if the solution is outside them in order to find it. A system that can reach a solution even when initially over-constrained is more useful than the one that can't, and so someone will build it.

I think you have a very narrow vision of 'unstable'.

I do not understand what you are saying here. If you mean that by unstable I mean a highly specific trajectory a system that lost stability will follow, then it is because all those trajectories where the system crashes and burns are unimportant. If you have a trillion optimization systems on a planet running at the same time you have to be really sure that nothing can't go wrong.

I just realized I derailed the discussion. The whole AGI in specialized AI world is irrelevant to what started this thread. In the sense of chronology of being developed I cannot tell how likely it is that AGI could overtake specialized intelligences. It really depends whether there is a critical insight missing for the constructions of AI. If it is just an extension of current software then specialized intelligences will win for reasons you state. Although some of the caveats I wrote above still apply.

If there is a critical difference in architecture between current software and AI then whoever hits that insight will likely overtake everyone else. If they happen to be working on AGI or even any system entangled with the real world, I don't see how once can guarantee that the consequences will not be catastrophic.

Too much anthropomorphization.

Well, I in turn believe you are applying overzealous anti-anthropomorphization. Which is normally a perfectly good heuristic when dealing with software, but the fact is human intelligence is the only thing in "intelligence" reference class we have, and although AI will almost certainly be different they will not necessarily be different in every possible way. Especially considering the possibility of AI that are either directly base on human-like architecture or even are designed to directly interact with humans, which requires having at least some human-compatible models and behaviours.

That depends what your initial probability is and why. If it already low due to updates on predictions about the system, then updating on "unpredictable" will increase the probability by lowering the strength of those predictions. Since destruction of humanity is rather important, even if the existential AI risk scenario is of low probability it matters exactly how low.

The importance should not weight upon our estimation, unless you proclaim that I should succumb to a bias. Furthermore, it is the destruction of the mankind that is the predicti... (read more)

15

I think I've found the source of what's been bugging me about "Friendly AI"

15

15

15

I think I've found the source of what's been bugging me about "Friendly AI"

15

15