Vaniver comments on Open Thread, Feb 8 - Feb 15, 2016 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (215)
I'm thinking a bit about AI safety lately as I'm considering writing about it for one of my college classes.
I'm hardly heavy into AI safety research and so expect flaws and mistaken assumptions in these ideas. I would be grateful for corrections.
An AI told to make people smile tiles the world with smiley faces but an AI told to do what humans would want it to do might still get it wrong eg. Failed Utopia #4-2 . However, wouldn't it research further and correct itself (and before that, have care to not do something un-correctable)? Reasoning as follows: let's say a non-failed utopia is worth 100/100 utility points per year. A failed utopia is that which at first seems to the AI to be a true utopia (100 points) but is actually less (idk, 90 points). Even were the cost of research heavy, if the AI wants/expects billions or trillions of years of human existence, it should be doing a lot of research early on, and would be very careful to not be fooled. Therefore, we don't need to do anything more than tell an AI to do what humans would want it to do, and let it do the work on figuring out exactly what that is itself.
Partially un-consequentialist AI/careful AI: Weights harms caused by its decisions (somewhat but not absolutely) heavier than other harms. (Therefore: tendency towards protecting what humanity already has against large harms and causes smaller surer gains like curing cancer rather than instituting a new world order (haha).)
Thanks in advance. :)
Check out the Cake or Death value loading problem, as Stuart Armstrong puts it.
There's a rough similarity to the 'resist blackmail' problem, which is that you need to be able to tell the difference between someone delivering bad news and doing bad things. If the AI is mistaken about what is right, we want to be able to correct it without being interpreted as villains out to destroy potential utility.
(Also, "correctable" is not really a low-level separation in reality, since the passage of time means nothing is truly correctable.)