VAuroch comments on What should a friendly AI do, in this situation? - Less Wrong

8 Post author: Douglas_Reay 08 August 2014 10:19AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (66)

You are viewing a single comment's thread. Show more comments above.

Comment author: VAuroch 09 August 2014 07:12:32AM 1 point [-]

Of course an FAI has business running those simulations. If it doesn't, how would it know whether the results are worth it? If the consequences of being truthful are 99% that the world is destroyed with all the humans in it, and the consequences of deception are 99% that the world is saved and no one is the wiser, an AI that does not act to save the world is not behaving in our best interests; it is unfriendly.

Comment author: ChristianKl 09 August 2014 12:32:53PM 0 points [-]

If it doesn't, how would it know whether the results are worth it?

Precommitment to not be manipulative.

Comment author: VAuroch 09 August 2014 11:12:30PM -1 points [-]

How is it supposed to know whether that precommitment is worthwhile without simulating the results either way? Even if an AI doesn't intend to be manipulative, it's still going to simulate the results to decide whether that decision is correct.

Comment author: ChristianKl 09 August 2014 11:44:17PM 1 point [-]

How is it supposed to know whether that precommitment is worthwhile without simulating the results either way?

Because the programmer tells the FAI that part of being a FAI means being precommitted not to manipulate the programmer.

Comment author: VAuroch 10 August 2014 12:49:36AM -1 points [-]

Why would the programmer do this? It's unjustified and seems necessarily counterproductive in some perfectly plausible scenarios.

Comment author: ChristianKl 10 August 2014 09:48:00AM 0 points [-]

Because most of the scenario's where the AI manipulates are bad. The AI is not supposed to manipulate just because it get's a utility calculation wrong.

Comment author: VAuroch 10 August 2014 09:57:10AM 0 points [-]

Because most of the scenario's where the AI manipulates are bad.

You really aren't sounding like you have any evidence other than your gut, and my gut indicates the opposite. Precommiting never to use a highly useful technique regardless of circumstance is a drastic step, which should have drastic benefits or avoid drastic drawbacks, and I don't see why there's any credible reason to think either of those exist and outweigh their reverses.

Or in short: Prove it.

On a superficial note, you have two extra apostrophes in this comment; in "scenario's" and "get's".

Comment author: ChristianKl 10 August 2014 10:36:07AM 0 points [-]

If you want an AI that's maximally powerful why limit it's intelligence growths in the first place?

We want safe AI. Safety means that it's not necessary to prove harm. Just because the AI calculates that it should be let out of the box doesn't mean that it should do anything in it's power to get out.

Comment author: VAuroch 10 August 2014 11:07:18AM 0 points [-]

Enforced precommitments like this are just giving the genie rules rather than making the genie trustworthy. They are not viable Friendliness-ensuring constraints.

If the AI is Friendly, it should be permitted to take what actions are necessary. If the AI is Unfriendly, then regardless of limitations imposed it will be harmful. Therefore, impress upon the AI the value we place on our conversational partners being truthful, but don't restrict it.

Comment author: ChristianKl 10 August 2014 11:43:09AM -1 points [-]

If the AI is Unfriendly, then regardless of limitations imposed it will be harmful.

That's not true. Unfriendly doesn't mean that the AI necessarily tries to destroy the human race. If you tell the paperclip AI: Produce 10000 paperclips, it might produce no harm. If you tell it to give you as many paperclips as possible it does harm.

When it comes to powerful entities you want checks&balances. The programmers of the AI can do a better job at checks&balances when the AI is completely truthful.