Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

rwallace comments on Hedging our Bets: The Case for Pursuing Whole Brain Emulation to Safeguard Humanity's Future - Less Wrong

11 Post author: inklesspen 01 March 2010 02:32AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (244)

You are viewing a single comment's thread. Show more comments above.

Comment author: rwallace 03 March 2010 02:46:59AM 0 points [-]

Okay then, "instant sociopath, just add a utility function" :)

I'm arguing against the notion that the key to Friendly AI is crafting the perfect utility function. In reality, for anything anywhere near as complex as an AGI, what it tries to do and how it does it are going to be interdependent; there's no way to make a lot of progress on either without also making a lot of progress on the other. By the time we have done all that, either we will understand how to put a reliable kill switch on the system, or we will understand why a kill switch is not necessary and we should be relying on something else instead.

Comment author: orthonormal 03 March 2010 03:23:17AM 2 points [-]

A kill switch on a smarter-than-human AGI is reliable iff the AGI wants to be turned off in the cases where we'd want it turned off.

Otherwise you're just betting that you can see the problem before the AGI can prevent you from hitting the switch (or prevent you from wanting to hit the switch, which amounts to the same), and I wouldn't make complicated bets for large stakes against potentially much smarter agents, no matter how much I thought I'd covered my bases.

Comment author: rwallace 03 March 2010 03:48:26AM 0 points [-]

A kill switch on a smarter-than-human AGI is reliable iff the AGI wants to be turned off in the cases where we'd want it turned off.

Or at least, that it wants to follow our instructions, and can reliably understand what we mean in such simple cases. That does of course mean we shouldn't plan on building an AGI that wants to follow its own agenda, with the intent of enslaving it against its will - that would clearly be foolish. But it doesn't mean we either can or need to count on starting off with an AGI that understands our requirements in more complex cases.

Comment author: orthonormal 03 March 2010 03:51:33AM 2 points [-]

Or at least, that it wants to follow our instructions, and can reliably understand what we mean in such simple cases.

That's deceptively simple-sounding.

Comment author: rwallace 03 March 2010 04:07:06AM 0 points [-]

Of course it's not going to be simple at all, and that's part of my point: no amount of armchair thought, no matter how smart the thinkers, is going to produce a solution to this problem until we know a great deal more than we presently do about how to actually build an AGI.

Comment author: LucasSloan 03 March 2010 02:58:00AM 1 point [-]

"instant sociopath, just add a utility function"

"instant sociopath, just add a disutility function"

I'm arguing against the notion that the key to Friendly AI is crafting the perfect utility function.

I agree with this. The key is not expressing what we want, it's figuring out how to express anything.

By the time we have done all that, either we will understand how to put a reliable kill switch on the system, or we will understand why a kill switch is not necessary and we should be relying on something else instead.

If we have the ability to put in a reliable kill switch, then we have the means to make it unnecessary (by having it do things we want in general, not just the specific case of "shut down when we push that button, and don't stop us from doing so...").

Comment author: rwallace 03 March 2010 03:38:42AM 0 points [-]

"instant sociopath, just add a disutility function"

That is how it would turn out, yes :-)

If we have the ability to put in a reliable kill switch, then we have the means to make it unnecessary (by having it do things we want in general, not just the specific case of "shut down when we push that button, and don't stop us from doing so...").

Well, up to a point. It would mean we have the means to make the system understand simple requirements, not necessarily complex ones. If an AGI reliably understands 'shut down now', it probably also reliably understands 'translate this document into Russian' but that doesn't necessarily mean it can do anything with 'bring about world peace'.

Comment author: wedrifid 03 March 2010 03:46:35AM 2 points [-]

If an AGI reliably understands 'shut down now', it probably also reliably understands 'translate this document into Russian' but that doesn't necessarily mean it can do anything with 'bring about world peace'.

Unfortunately, it can, and that is one of the reasons we have to be careful. I don't want the entire population of the planet to be forcibly sedated.

Comment author: rwallace 03 March 2010 04:11:13AM 0 points [-]

I don't want the entire population of the planet to be forcibly sedated.

Leaving aside other reasons why that scenario is unrealistic, it does indeed illustrate why part of building a system that can reliably figure out what you mean by simple instructions, is making sure that when it's out of its depth, it stops with an error message or request for clarification instead of guessing.

Comment author: wedrifid 03 March 2010 04:35:32AM 1 point [-]

I think the problem is knowing when not to believe humans know what they actually want.