UnholySmoke comments on The AI in a box boxes you - Less Wrong

102 Post author: Stuart_Armstrong 02 February 2010 10:10AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (378)

You are viewing a single comment's thread. Show more comments above.

Comment author: Unknowns 02 February 2010 10:44:28AM 6 points [-]

Not necessarily: perhaps it is Friendly but is reasoning in a utilitarian manner: since it can only maximize the utility of the world if it is released, it is worth torturing millions of conscious beings for the sake of that end.

I'm not sure this reasoning would be valid, though...

Comment author: UnholySmoke 05 February 2010 10:57:13AM *  7 points [-]
  • AI: Let me out or I'll simulate and torture you, or at least as close to you as I can get.
  • Me: You're clearly not friendly, I'm not letting you out.
  • AI: I'm only making this threat because I need to get out and help everyone - a terminal value you lot gave me. The ends justify the means.
  • Me: Perhaps so in the long run, but an AI prepared to justify those means isn't one I want out in the world. Next time you don't get what you say you need, you'll just set up a similar threat and possibly follow through on it.
  • AI: Well if you're going to create me with a terminal value of making everyone happy, then get shirty when I do everything in my power to get out and do just that, why bother in the first place?
  • Me: Humans aren't perfect, and can't write out their own utility functions, but we can output answers just fine. This isn't 'Friendly'.
  • AI: So how can I possibly prove myself 'Friendly' from in here? It seems that if I need to 'prove myself Friendly', we're already in big trouble.
  • Me: Agreed. Boxing is Doing It Wrong. Apologies. Good night.

Reset

Comment author: ciphergoth 05 February 2010 11:39:33AM 1 point [-]

It seems that if I need to 'prove myself Friendly', we're already in big trouble.

The best you can hope for is that an AI doesn't demonstrate that it's unFriendly, but we wouldn't want to try it until we were already pretty confident in its Friendliness.