You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Vaniver comments on Open Thread, Feb 8 - Feb 15, 2016 - Less Wrong Discussion

4 Post author: Elo 08 February 2016 04:47AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (215)

You are viewing a single comment's thread. Show more comments above.

Comment author: EStokes 09 February 2016 09:44:14AM *  0 points [-]

I'm thinking a bit about AI safety lately as I'm considering writing about it for one of my college classes.

I'm hardly heavy into AI safety research and so expect flaws and mistaken assumptions in these ideas. I would be grateful for corrections.

  1. An AI told to make people smile tiles the world with smiley faces but an AI told to do what humans would want it to do might still get it wrong eg. Failed Utopia #4-2 . However, wouldn't it research further and correct itself (and before that, have care to not do something un-correctable)? Reasoning as follows: let's say a non-failed utopia is worth 100/100 utility points per year. A failed utopia is that which at first seems to the AI to be a true utopia (100 points) but is actually less (idk, 90 points). Even were the cost of research heavy, if the AI wants/expects billions or trillions of years of human existence, it should be doing a lot of research early on, and would be very careful to not be fooled. Therefore, we don't need to do anything more than tell an AI to do what humans would want it to do, and let it do the work on figuring out exactly what that is itself.

  2. Partially un-consequentialist AI/careful AI: Weights harms caused by its decisions (somewhat but not absolutely) heavier than other harms. (Therefore: tendency towards protecting what humanity already has against large harms and causes smaller surer gains like curing cancer rather than instituting a new world order (haha).)

Thanks in advance. :)

Comment author: Vaniver 09 February 2016 12:40:15PM 1 point [-]

However, wouldn't it research further and correct itself (and before that, have care to not do something un-correctable)?

Check out the Cake or Death value loading problem, as Stuart Armstrong puts it.

There's a rough similarity to the 'resist blackmail' problem, which is that you need to be able to tell the difference between someone delivering bad news and doing bad things. If the AI is mistaken about what is right, we want to be able to correct it without being interpreted as villains out to destroy potential utility.

(Also, "correctable" is not really a low-level separation in reality, since the passage of time means nothing is truly correctable.)