CCC comments on Domesticating reduced impact AIs - Less Wrong

9 Post author: Stuart_Armstrong 14 February 2013 04:59PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (104)

You are viewing a single comment's thread.

Comment author: CCC 14 February 2013 06:51:22PM 6 points [-]

Hmmm. With this setup, there is one piece of useful information that escapes the explosion; the information regarding whether or not the paperclip was, in fact, built. This minor piece of information has a non-zero probability of changing the future (by encouraging you to further continue in AI research).

Therefore, it seems likely that the disciple AI will try to build a paperclip without you finding out. This may require demolishing one or more sensors, placed inside the antimatter cloud (or, for extra thoroughness, subverting them). Alternatively, the AI may do absolutely nothing until a few seconds before the explosion, and then frantically make a single paperclip right at the end, to reduce the number of observations of the paperclip that escape.

As AI failure modes go, though, that seems pretty innocuous.

Comment author: Stuart_Armstrong 15 February 2013 12:13:39PM 2 points [-]

Extra clarification: in this example, I'm assuming that we don't observe the AI, and that we are very unlikely to detect the paperclip. How to get useful work out of the AI is the next challenge, if this model holds up.

Comment author: CCC 16 February 2013 05:53:07AM 0 points [-]

Ah. I see. That does, at least temporarily, resolve my point, though as you point out, it raises the question of how to get anything useful out of the system.

Comment author: ikrase 25 February 2013 10:16:22AM 0 points [-]

Wait, can this thing make infohazards? I... don't 100 percent understand it.

Comment author: CCC 05 March 2013 12:01:08PM 0 points [-]

What is an 'infohazard'? I am unfamiliar with your terminology.

Comment author: TheOtherDave 05 March 2013 02:45:48PM 1 point [-]
Comment author: CCC 05 March 2013 06:40:09PM 0 points [-]

Huh.

In that case, I don't see why a self-improving AI wouldn't be able to create infohazards, if it wanted to. The question here is, whether the AI model under consideration would want to.

...it seems possible, if it could construct an infohazard that would hide the fact that said infohazard existed.

Comment author: TheOtherDave 05 March 2013 06:41:16PM 0 points [-]

You probably meant to reply to ikrase.

Comment author: ikrase 06 March 2013 11:49:31PM 0 points [-]

I've been wondering about the limits of AI boxing myself. The ultimate hazard from an AI that is set up to even possibly be useful is when an AI is set up to transmit a (very limited) message before self-destruct (and so that the AI cannot witness the result of any of its actions including that message) and that message is still hazardous.

THat, or that the AI can somehow pervert mundane computing hardware into manipulators and sensors through software alone despite the fact that mundane computing hardware is designed to be the opposite of that

Comment author: CCC 13 March 2013 07:58:22AM 1 point [-]

The ultimate hazard from an AI that is set up to even possibly be useful is when an AI is set up to transmit a (very limited) message before self-destruct (and so that the AI cannot witness the result of any of its actions including that message) and that message is still hazardous.

The self-fulfilling prophecy has been well-known in fiction for centuries. Or the ambiguous prophecy - consider what is said to have happened when Croesus asked the Oracle whether he should attack the Persians. "If you attack," the Oracle reputedly said, "you will destroy a great empire." Wanting to destroy the great Persian empire, and encouraged by this answer, Croesus immediately attacked...

...an action which led to the Persians promptly destroying Croesus' empire.

Prophecy can be a weapon, and it can be turned against those who know what the prophecy says.

Comment author: shminux 07 March 2013 12:11:03AM 0 points [-]

I don't stop being amazed that some people here seriously think that a lesser intelligence can reliably contain a greater one. There are just too many outs a lesser intelligence cannot even conceive of. Even the best cage thought up by five-year-olds will not contain an adult, and the gap between a human and a full-blown AGI is much larger than that.

Comment author: ikrase 07 March 2013 08:29:57PM 0 points [-]

I agree that the boxing problem is BIG and that a useful super-AI cannot be boxed.

However, some things can just be too one-sided. I'm pretty much entirely talking about 1. No manipulators and 2. No communication.

Even a 5 year old can come up with three meter thick metal boxes.

My sample is: AI using 1000kg of present day computer hardware, powered by battery, attached to no manipulators, sensors, or comms hardware whatsoever, encased in multiple layers of vacuums, Faraday cages, antimatter screens, gaps containing ionizing and neutron radiation, and vibration isolators, inside a completely separate spacecraft with no extra fuel and controlled by primitive (possibly analog or mechanical) flight contollers, containing multiple nukes with fail-deadly time-delay or remote-activated fuses. The ship is placed at a Venusian Lagrange point and the AI (pre-loaded with some degree of knowledge) is turned on and allowed to FOOM for a period of time T before:

Option A) The nukes explode

Option B) N output bytes are copied by an isolated supervisor system and transmitted; the nukes then explode before there is time for a light-speed round trip from Venus Lagrange to Earth and back.

I seem to remember hearing something about a one-line AI-Box, for extremely compact Oddly Persuasive Arguments or infohazards. Still, however, that involved the possibility of being able to unbox the AI whereas this does not. Some people have implied that even Option A above is dangerous which seems impossible to me unless the AI has extremely detailed, specific almost atom-by-atom physics and physical data PLUS just the right sort of (poorly designed?) electronics. I find this... preposterous. This is of course a completely useless AI and does not obviate the need for Friendliness, or at least Obedience / Limitedness.

Comment author: wedrifid 07 March 2013 05:28:37AM 0 points [-]

Your conclusion is good, this premise isn't:

Even the best cage thought up by five-year-olds will not contain an adult

"Let's throw them down that well!"

"I am going to lock you in Daddy's Jail cell!"

Many of the best cages thought up by five year olds will easily contain an adult (and sometimes accidentally outright kill or incapacitate them).