Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

wedrifid comments on SotW: Be Specific - Less Wrong

39 Post author: Eliezer_Yudkowsky 03 April 2012 06:11AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (306)

You are viewing a single comment's thread. Show more comments above.

Comment author: wedrifid 07 April 2012 11:15:17AM *  3 points [-]

Of course. Then we need to have some simple friendliness for when it is dumber.

A dumb (ie. About as smart as us but far more rational) AI would, I assume, think along the lines of:

"So... I'm kinda dumb. How about I make myself smart before I fuck with stuff? So for now I'll do a basic analysis of what my humans seem to want and make sure I don't do anything drastic to damage that while I'm in my recursive improvement stage. For example I'm definitely not going to turn them all into computation. It don't take a genius to figure out they probably don't want that."

ie. It is possible to maximise the expected value of a function without being able to perfectly calculate the details of said function but just approximate them. This involves taking into account risks that you are missing something important and also finding a way to improve your ability to better calculate and search for maxima within the function without risking significant damage to the overall outcome. This doesn't necessarily require special case programming although the FAI developers may find special case programming easier to make proofs about.

Have you got a better idea than that? If so, then probably the FAI would do that instead of what I just came up with after 2 seconds thought.

Hell, I'm not sure enough the extrapolated volition isn't a death wish.

I'm not sure either - specifically because the way different humans's values are aggregated is distinctly underspecified in CEV as Eliezer has ever discussed. That is combined with going about implicitly (it's this implicit part that I particularly don't like) assuming that "all of humanity" is what CEV must be run on. I can't know that CEV<humanity> will not kill me. Even if it doesn't kill me it is nearly tautologically true that CEV<people more like me> is better (in the subjectively objective sense of 'better').

This is one of two significant objections to Eliezer-memes that I am known to harp on from time to time.

Comment author: fubarobfusco 07 April 2012 05:50:14PM *  1 point [-]

That is combined with going about implicitly (it's this implicit part that I particularly don't like) assuming that "all of humanity" is what CEV must be run on. I can't know that CEV<humanity> will not kill me. Even if it doesn't kill me it is nearly tautologically true that CEV<people more like me> is better (in the subjectively objective sense of 'better').

Here's the trouble, though: by the same reasoning, if someone is implementing CEV<white people> or CEV<Russian intellectuals> or CEV<Orthodox Gnostic Pagans> or any such, everyone who isn't a white person, Russian intellectual, or Orthodox Gnostic Pagan has a damned good reason to be worried that it'll kill them.

Now, it may turn out that CEV<Orthodox Gnostic Pagans> is sufficiently similar to CEV<humanity> that the rest of humanity needn't worry. But is that a safe bet for all of us who aren't Orthodox Gnostic Pagans?

Comment author: Incorrect 07 April 2012 06:35:51PM 0 points [-]

For anyone who implements an AI, any justification for including other members of humanity in their CEV calculation is valid iff their CEV would specify that anyway.

So, the rational course of action for anyone implementing an AI is to simply use their own CEV. If that CEV specifies to consider the CEV of other members of humanity then so be it.

Comment author: wedrifid 07 April 2012 06:47:02PM 0 points [-]

For anyone who implements an AI, any justification for including other members of humanity in their CEV calculation is valid iff their CEV would specify that anyway.

YES! CEV is altruism inclusive. For some reason this is often really hard to make people understand that the altruis belongs inside the CEV calculation while the compromise-for-instrumental-purposes goes on the outside.

So, the rational course of action for anyone implementing an AI is to simply use their own CEV. If that CEV specifies to consider the CEV of other members of humanity then so be it.

This is true all else being equal. (The 'all else' being specifically that you are just as likely to succeed in creating FAI<CEV<self>> as you are in creating FAI<CEV<whatever>>.)

Comment author: hairyfigment 07 April 2012 07:42:20PM 0 points [-]

For some reason this is often really hard to make people understand

IAWYC, but who doesn't get this?

Given our attitude toward politics, I'd expect little if any gain from replacing 'humanity' with 'Less Wrong'. Moreover, others would correctly take our exclusion of them as evidence of a meaningful difference if we actually made this decision. And I can't write an AGI by myself, nor can the smarter version of me calling itself Eliezer.

Comment author: wedrifid 07 April 2012 10:24:04PM 0 points [-]

IAWYC, but who doesn't get this?

I don't recall the names. The conversations would be archived though if you are interested.

Comment author: wedrifid 07 April 2012 06:11:02PM *  0 points [-]

Compromise is often necessary for the purpose of cooperation and CEV<humanity> is a potentially useful Schelling point to agree upon. However, it should be acknowledged that these considerations are instrumental - or at least acknowledged that they are decisions to be made. Eliezer's discussion of the subject up until now has been completely innocent of even awareness of the possibility that 'humanity' is the only thing that could conceivably be plugged in to CEV. This is, as far as I am concerned, a bad thing.

Comment author: Incorrect 07 April 2012 06:39:03PM *  0 points [-]

This is, as far as I am concerned, a but thing.

Huh?

Comment author: wedrifid 07 April 2012 06:40:37PM 0 points [-]

bad thing. Fixed.

Comment author: Dmytry 07 April 2012 11:35:49AM *  -2 points [-]

A dumb (ie. About as smart as us but far more rational) AI would, I assume, think along the lines of:

Sounds like a lot of common sense that is very difficult to derive rationally.

"So... I'm kinda dumb. How about I make myself smart before I fuck with stuff? So for now I'll do a basic analysis of what my humans seem to want and make sure I don't do anything drastic to damage that while I'm in my recursive improvement stage. For example I'm definitely not going to turn them all into computation. It don't take a genius to figure out they probably don't want that."

Have you got a better idea than that? If so, then probably the FAI would do that instead of what I just came up with after 2 seconds thought.

Just a little more anthropomorphizing and we'll be speaking of AI that just knows what is the moral thing to do, innately, because he's such a good guy.

The 'basic analysis of what my humans seem to want' has fairly creepy overtones to it (testing hypotheses style). On top of it, say, you tell it, okay just do what ever you think i would do if I thought faster, and it obliges, you are vaporized, because you would of gotten bored into suicide if you thought faster, your simple values system works like this. What's exactly wrong about that course of action? I don't think 'extrapolating' is well defined.

re: volition of mankind. Yep.

Comment author: wedrifid 07 April 2012 12:27:07PM *  1 point [-]

Sounds like a lot of common sense

It doesn't sound like particularly common sense - I'd guess that significantly less than half of humans would arrive at that as a cached 'common sense' solution.

that is very difficult to derive rationally.

It's utterly trivial application of instrumental rationality. I can come up with it in 2 seconds. If the AI is as smart as I am (and with far less human biases) it can arrive at the solution as simply as I can. Especially after it reads every book on strategy that humans have written. Heck, it can read my comment and then decide whether it is a good strategy.

Artificial intelligences aren't stupid.

Just a little more anthropomorphizing and we'll be speaking of AI that just knows what is the moral thing to do, innately, because he's such a good guy.

Or... not. That's utter nonsense. We have been explicitly describing AIs that have been programmed with terminal goals. The AI would then

The 'basic analysis of what my humans seem to want' has fairly creepy overtones to it (testing hypotheses style). On top of it, say, you tell it, okay just do what ever you think i would do if I thought faster, and it obliges, you are vaporized, because you would of gotten bored into suicide if you thought faster, your simple values system works like this. What's exactly wrong about that course of action? I don't think 'extrapolating' is well defined.

CEV is well enough defined that it just wouldn't do that unless you actually do want it - in which case you, well, want it to do that so have no cause to complain. Reading even the incomplete specification from 2004 is sufficient to tell us that a GAI that does that is not implementing something that can reasonably called CEV. I must conclude that you are replying to a straw man (presumably due to not having actually read the materials you criticise.)

Comment author: Dmytry 07 April 2012 01:52:07PM *  -2 points [-]

CEV is not defined to do what you as-is actually want, but to do what you would of wanted, even in circumstances when you as-is actually want something else, as the 2004 paper cheerfully explains.

In any case, once you assume such intent-understanding interpretative powers of AI, it's hard to demonstrate why instructing the AI in plain English to "Be a good guy. Don't do bad things" would not be a better shot.

Comment author: wedrifid 07 April 2012 02:02:26PM 2 points [-]

In any case, once you assume such intent-understanding interpretative powers of AI

Programmed in with great effort, thousands of hours of research and development and even then great chance of failure. That isn't "assumption".

it's hard to demonstrate why instructing the AI in plain English to "Be a good guy. Don't do bad things" would not be a better shot.

That would seem to be a failure of imagination. That exhortation tells even an FAI-complete AI that is designed to follow commands to do very little.