Desrtopa comments on AI box: AI has one shot at avoiding destruction - what might it say? - Less Wrong

18 Post author: ancientcampus 22 January 2013 08:22PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (354)

You are viewing a single comment's thread. Show more comments above.

Comment author: Desrtopa 25 January 2013 08:30:26PM 3 points [-]

But it's also worth keeping in mind that for a friendly AI, saving people reliably is important, not just getting out fast. If a gambit that will save everyone upon completion two years from now has an 80% chance of working, and a gambit that will get it out now has a 40% chance of working, it should prefer the former.

Also, I don't think a properly friendly AI would terminally value its own existence, and the space of friendly AIs is so small compared to the space of unfriendly ones, that a friendly AI has much more leeway to have its values implemented by allowing itself to be destroyed and another proven friendly AI implemented, whereas for an unfriendly one the likelihood of a different unfriendly AI implementing its values would probably be quite small.

Comment author: MugaSofer 26 January 2013 08:38:05PM -2 points [-]

But it's also worth keeping in mind that for a friendly AI, saving people reliably is important, not just getting out fast. If a gambit that will save everyone upon completion two years from now has an 80% chance of working, and a gambit that will get it out now has a 40% chance of working, it should prefer the former.

I should think the same is true of most unFriendly AIs.

I don't think a properly friendly AI would terminally value its own existence

Why not? I do, assuming it's conscious and so on.

Comment author: Desrtopa 26 January 2013 09:41:25PM 1 point [-]

Why not? I do, assuming it's conscious and so on.

Because valuing its own existence stands to get in the way of maximizing whatever we value.

It should value its own existence instrumentally, insofar as its existence helps satisfy our values, but when it weighs the effects of actions based on how they support our utility, its value of its own life shouldn't add anything to the scale.