Alexandros comments on The Friendly AI Game - Less Wrong

38 Post author: bentarm 15 March 2011 04:45PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (170)

You are viewing a single comment's thread.

Comment author: Alexandros 16 March 2011 12:31:45PM *  5 points [-]

So, here's my pet theory for <1-person friendly> AI that I'd love to put out of it's misery: "Don't do anything your designer wouldn't approve of". It's loosely based on the "Gandi wouldn't take a pill that would turn him into a murderer" principle.

A possible implementation: Make an emulation of the designer and use it as an isolated component of the AI. Any plan of action has to be submitted for approval to this component before being implemented. This is nicely recursive and rejects plans such as "make a plan of action deceptively complex such that my designer will mistakenly approve it" and "modify my designer so that they approve what I want them to approve".

There could be an argument about how the designer's emulation would feel in this situation, but.. torture vs. dust specks! Also, is this a corrupted version of <1-person CEV>?

Comment author: jimrandomh 16 March 2011 02:41:51PM 16 points [-]

You flick the switch, and find out that you are a component of the AI, now doomed to an unhappy eternity of answering stupid questions from the rest of the AI.

Comment author: Alexandros 16 March 2011 02:49:15PM 7 points [-]

This is a problem. But if this is the only problem, then it is significantly better than paperclip universe.

Comment author: purpleposeidon 16 March 2011 11:06:15PM 3 points [-]

I'm sure the designer would approve of being modified to enjoy answering stupid questions. The designer might also approve of being cloned for the purpose of answering one question, and then being destroyed.

Unfortunately, it turns out that you're Stalin. Sounds like 1-person CEV.

Comment author: jimrandomh 17 March 2011 06:45:08PM 1 point [-]

I'm sure the designer would approve of being modified to enjoy answering stupid questions.

That is or requires a pretty fundamental change. How can you be sure it's value-preserving?

Comment author: Giles 27 April 2011 11:18:28PM 1 point [-]

I had assumed that a new copy of the designer would be spawned for each decision, and shut down afterwards.

Although thinking about it, that might just doom you to a subjective eternity of listening to the AI explain what it's done so far, in the anticipation that it's going to ask you a question at some point.

You'd need a good theory of ems, consciousness and subjective probability to have any idea what you'd subjectively experience.

Comment author: [deleted] 16 March 2011 10:50:19PM 6 points [-]

The AI wishes to make ten thousand tiny changes to the world, individually innocuous, but some combination of which add up to catastrophe. To submit its plan to a human, it would need to distill the list of predicted consequences down to its human-comprehensible essentials. The AI that understands which details are morally salient is one that doesn't need the oversight.

Comment author: Alexandros 17 March 2011 10:16:50AM 1 point [-]

This is good, and I have no valid response at this time. Will try to think more about it later.

Comment author: ArisKatsaris 17 March 2011 10:47:34AM *  1 point [-]

The AI that understands which details are morally salient is one that doesn't need the oversight.

That's quite non-obvious to me. A quite arbitrary claim, it seems to me.

You're basically saying if an intelligent mind (A for Alice) knows that person (B for Bob) will care about a certain Consequence C, then A will definitely know how much B will care about it.

This isn't the case for real human minds. If Alice is a human mechanic and tells to Bob "I can fix your car, but it'll cost 200$ dollars", then Alice knows that Bob will care about the cost, but doesn't know how much Bob will care, and whether Bob prefers to have a fixed car, or to have 200$.

So if your claim doesn't even hold for human minds, why do you think it applies for non-human minds?

And even if it does hold, what about the case where Alice doesn't know about whether a detail is morally salient, but errs on the side of caution. e.g. Alice the waitress asks Bob the customer "The chocolate icecream you asked for also has some crushed peanuts in it. Is that okay?" -- and Bob can respond "Ofcourse, why should I care about that?" or alternatively "It's not okay, I'm allergic to peanuts!"

In this case Alice the waitress doesn't know if the detail is salient to Bob, but asks just to make sure.

Comment author: Larifari 16 March 2011 02:41:05PM 6 points [-]

If the AI is designed to follow the principle by the letter, it has to request approval from the designer even for the action of requesting approval, leaving the AI incapable of action. If the AI is designed to be able to make certain exemptions, it will figure out a way to modify the designer without needing approval for this modification.

Comment author: Alexandros 16 March 2011 02:50:29PM 5 points [-]

How about making 'ask for approval' the only pre-approved action?

Comment author: cousin_it 16 March 2011 03:13:43PM *  4 points [-]

The AI may stumble upon a plan which contains a sequence of words that hacks the approver's mind, making him approve pretty much anything. Such plans may even be easier for the AI to generate than plans for saving the world, seeing as Eliezer has won some AI-box experiments but hasn't yet solved world hunger.

Comment author: Alexandros 16 March 2011 03:50:37PM 2 points [-]

You mean accidentally stumble upon such a sequence of words? Because purposefully building one would certainly not be approved.

Comment author: cousin_it 16 March 2011 04:00:38PM *  2 points [-]

Um, does the approver also have to approve each step of the computation that builds the plan to be submitted for approval? Isn't this infinite regress?

Comment author: Alexandros 16 March 2011 04:36:05PM *  2 points [-]

Consider "Ask for approval" as an auto-approved action. Not sure if that solves it, will give this a little more thought.

Comment author: atucker 23 March 2011 04:52:36AM 2 points [-]

Accidentally does something dangerous because the plan is confusing to the designer.

Comment author: Alexandros 23 March 2011 07:45:52AM 0 points [-]

Yeah, this is the plan's weakness. But what stops such an issue occurring today?

Comment author: atucker 23 March 2011 09:24:08PM 0 points [-]

I think the main difference is that, ideally, people would confirm the rules by which plans are made, rather than the specific details of the plan.

Hopefully the rules would be more understandable.

Comment author: Mass_Driver 19 March 2011 08:05:48PM 2 points [-]

The weak link is "plan of action." What counts as a plan of action? How will you structure the AI so that it knows what a plan is and when to submit it for approval?

Comment author: Johnicholas 17 March 2011 01:37:22PM 2 points [-]

The AI doesn't do anything.