PhilGoetz comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: gattsuru 06 September 2013 03:37:48AM 4 points [-]

Which approach gives a higher expected value? Formal specification is compatible with Eliezer's ideas for friendly AI as something that will provably avoid disaster. It has some non-epsilon possibility of actually working. But its failure modes are many, and can be literally unimaginably bad. When it fails, it fails catastrophically, like a monotonic logic system with one false belief. "Tell the AI in English" can fail, but the worst case is closer to a "With Folded Hands" scenario than to paperclips.

I don't think that's how the analysis goes. Eliezer says that AI must be very carefully and specifically made friendly or it will be disasterous, but that disaster is not a part of being only nearly careful or specifically made enough : he believes an AGI told merely to maximize human pleasure is very dangerous (and probably even more dangerous) than an AGI with a merely 80% Friendly-Complete specification.

Mr. Loosemore seems to hold the opposite opinion, that an AGI will not take instructions to unlikely results, unless it was exceptionally unintelligent and thus not very powerful. I don't believe his position says that a near-Friendly-Complete specification is very risky -- after all, a "smart" AGI would know what you really meant -- but that such a specification would be superfluous.

Whether Mr. Loosemore is correct isn't cause by whether we believe he is correct, just as whether Mr. Eliezer is not wrong just because we choose a different theory. The risks have to be measured in terms of their likelihood from available facts.

The problem is that I don't see much evidence that Mr. Loosemore is correct. I can quite easily conceive of a superhuman intelligence that was built with the specification of "human pleasure = brain dopamine levels", not least of all because there are people who'd want to be wireheads and there's a massive amount of physiological research showing human pleasure to be caused by dopamine levels. I can quite easily conceive of a superhuman intelligence that knows humans prefer more complicated enjoyment, and even do complex modeling of how it would have to manipulate people away from those more complicated enjoyments, and still have that superhuman intelligence not care.

Comment author: PhilGoetz 07 September 2013 08:21:14PM *  2 points [-]

I think it's a question of what you program in, and what you let it figure out for itself. If you want to prove formally that it will behave in certain ways, you would like to program in explicitly, formally, what its goals mean. But I think that "human pleasure" is such a complicated idea that trying to program it in formally is asking for disaster. That's one of the things that you should definitely let the AI figure out for itself. Richard is saying that an AI as smart as a smart person would never conclude that human pleasure equals brain dopamine levels.

Eliezer is aware of this problem, but hopes to avoid disaster by being especially smart and careful. That approach has what I think is a bad expected value of outcome.

Comment author: Fronken 14 September 2013 05:06:53PM *  1 point [-]

I think that "human pleasure" is such a complicated idea that trying to program it in formally is asking for disaster. That's one of the things that you should definitely let the AI figure out for itself.

[...]

Eliezer is aware of this problem, but hopes to avoid disaster by being especially smart and careful. That approach has what I think is a bad expected value of outcome.

Huh I thought he wanted to use CEV?

Comment author: nshepperd 15 September 2013 01:46:37AM 2 points [-]

You are right. I think PhilGoetz must be confused. EY has at least certainly never suggested programming an AI to maximise human pleasure.

Comment deleted 12 September 2013 10:51:22AM [-]
Comment author: ArisKatsaris 12 September 2013 10:59:45AM *  4 points [-]

People manage to be friendly without apriori knowledge of everyone else's preferences. Human values are very complex...and one person's preferences are not another's.

Being the same species comes with certain advantages for the possiibility of cooperation. But I wasn't very friendly towards a wasp-nest I discovered in my attic. People aren't very friendly to the vast majority of different species they deal with.

Comment deleted 12 September 2013 12:36:10PM [-]
Comment author: ArisKatsaris 12 September 2013 05:14:05PM 3 points [-]

I'm superintelligent in comparison to wasps, and I still chose to kill them all.

Comment author: Fronken 12 September 2013 02:42:59PM *  1 point [-]

Humans are made to do that by evolution AIs are not. So you have to figure what the heck evolution did, in ways specific enough to program into a computer.

Also, who mentioned giving AIs a priori knowledge of our preferences? It doesn't seem to be in what you replied to.

Comment deleted 12 September 2013 05:16:46PM [-]
Comment author: Fronken 13 September 2013 06:35:37PM *  1 point [-]

Is that going to be harder that coming up with a mathematical expension of morality and preloading it?

Harder than saying it in English, that's all.

EY. It's his answer to friendliness.

No he wants to program the AI to deduce morality from us it is called CEV. He seems to be still working out how the heck to reduce that to math.