gattsuru comments on The genie knows, but doesn't care - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (515)
On one hand, Friendly AI people want to convert "make me happy" to a formal specification. Doing that has many potential pitfalls. because it is a formal specification.
On the other hand, Richard, I think, wants to simply tell the AI, in English, "Make me happy." Given that approach, he makes the reasonable point that any AI smart enough to be dangerous would also be smart enough to interpret that at least as intelligently as a human would.
I think the important question here is, Which approach is better? LW always assumes the first, formal approach.
To be more specific (and Bayesian): Which approach gives a higher expected value? Formal specification is compatible with Eliezer's ideas for friendly AI as something that will provably avoid disaster. It has some non-epsilon possibility of actually working. But its failure modes are many, and can be literally unimaginably bad. When it fails, it fails catastrophically, like a monotonic logic system with one false belief.
"Tell the AI in English" can fail, but the worst case is closer to a "With Folded Hands" scenario than to paperclips.
I've never considered the "Tell the AI what to do in English" approach before, but on first inspection it seems safer to me.
I don't think that's how the analysis goes. Eliezer says that AI must be very carefully and specifically made friendly or it will be disasterous, but that disaster is not a part of being only nearly careful or specifically made enough : he believes an AGI told merely to maximize human pleasure is very dangerous (and probably even more dangerous) than an AGI with a merely 80% Friendly-Complete specification.
Mr. Loosemore seems to hold the opposite opinion, that an AGI will not take instructions to unlikely results, unless it was exceptionally unintelligent and thus not very powerful. I don't believe his position says that a near-Friendly-Complete specification is very risky -- after all, a "smart" AGI would know what you really meant -- but that such a specification would be superfluous.
Whether Mr. Loosemore is correct isn't cause by whether we believe he is correct, just as whether Mr. Eliezer is not wrong just because we choose a different theory. The risks have to be measured in terms of their likelihood from available facts.
The problem is that I don't see much evidence that Mr. Loosemore is correct. I can quite easily conceive of a superhuman intelligence that was built with the specification of "human pleasure = brain dopamine levels", not least of all because there are people who'd want to be wireheads and there's a massive amount of physiological research showing human pleasure to be caused by dopamine levels. I can quite easily conceive of a superhuman intelligence that knows humans prefer more complicated enjoyment, and even do complex modeling of how it would have to manipulate people away from those more complicated enjoyments, and still have that superhuman intelligence not care.
I think it's a question of what you program in, and what you let it figure out for itself. If you want to prove formally that it will behave in certain ways, you would like to program in explicitly, formally, what its goals mean. But I think that "human pleasure" is such a complicated idea that trying to program it in formally is asking for disaster. That's one of the things that you should definitely let the AI figure out for itself. Richard is saying that an AI as smart as a smart person would never conclude that human pleasure equals brain dopamine levels.
Eliezer is aware of this problem, but hopes to avoid disaster by being especially smart and careful. That approach has what I think is a bad expected value of outcome.
Huh I thought he wanted to use CEV?
You are right. I think PhilGoetz must be confused. EY has at least certainly never suggested programming an AI to maximise human pleasure.
Being the same species comes with certain advantages for the possiibility of cooperation. But I wasn't very friendly towards a wasp-nest I discovered in my attic. People aren't very friendly to the vast majority of different species they deal with.
I'm superintelligent in comparison to wasps, and I still chose to kill them all.
Humans are made to do that by evolution AIs are not. So you have to figure what the heck evolution did, in ways specific enough to program into a computer.
Also, who mentioned giving AIs a priori knowledge of our preferences? It doesn't seem to be in what you replied to.
Harder than saying it in English, that's all.
No he wants to program the AI to deduce morality from us it is called CEV. He seems to be still working out how the heck to reduce that to math.
I don't think Loosemore was addressing deliberately unfriendly AI, and for that matter EY hasn't been either. Both are addressing intentionally friendly or neutral AI that goes wrong.
Wouldn't it care about getting things right?