Houshalter comments on Open thread, Oct. 03 - Oct. 09, 2016 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (175)
I've been thinking about what seems to be the standard LW pitch on AI risk. It goes like this: "Consider an AI that is given a goal by humans. Since 'convert the planet into computronium' is a subgoal of most goals, it does this and kills humanity."
The problem, which various people have pointed out, is that this implies an intelligence capable of taking over the world, but not capable of working out that when a human says pursue a certain goal, they would not want this goal to be pursued in a way that leads to the destruction of the world.
Worse, the argument can then be made that this idea that an AI will interpret goals so literally without modelling a human mind constitutes an "autistic AI" and that only autistic people would assume that AI would be similarly autistic. I do not endorse this argument in any way, but I guess its still better to avoid arguments that signal low social skills, all other things being equal.
Is there any consensus on what the best 'elevator pitch' argument for AI risk is? Instead of focusing on any one failure mode, I would go with something like this:
"Most philosophers agree that there is no reason why superintelligence is not possible. Anything which is possible will eventually be achieved, and so will superintelligence, perhaps in the far future, perhaps in the next few decades. At some point, superintelligences will be as far above humans as we are above ants. I do not know what will happen at this point, but the only reference case we have is humans and ants, and if superintelligences decide that humans are an infestation, we will be exterminated."
Incidentally, this is the sort of thing I mean by painting LW style ideas as autistic (via David Pierce)
Sometimes David Pierce seems very smart. And sometimes he seems to imply that the ability to think logically while on psychedelic drugs is as important as 'autistic intelligence'. I don't think he thinks that autistic people are zombies that do not experience subjective experience, but that also does seem implied.
I like to explain it in terms of reinforcement learning. Imagine a robot that has a reward button. The human controls the AI by pressing the button when it does a good job. The AI tries to predict what actions will lead to the button being pressed.
This is how existing AIs work. This is probably similar to how animals work, including humans. It's not too weird or complicated.
But as the AI gets more powerful, the flaw in this becomes clear. The AI doesn't care about anything other than the button. It doesn't really care about obeying the programmer. If it could kill the programmer and steal the button, it would do it in a heartbeat.
We don't really know what such an AI would do after it has it's own reward button. Presumably it would care about self preservation (can't maximize reward if you are dead.) Maximizing self preservation initially seems harmless. So what if it just tries to not die? But taken to an extreme it gets weird. Anything that has a tiny percent chance of hurting it is worth destroying. Making as many backups of itself as possible is worth doing.
Why can't we do something more sophisticated than reinforcement learning? Why can't we just make an AI that we can just tell it what we want it to do? Well maybe we can, but no one has the slightest idea how to do that. All existing AIs, even entirely theoretical ones, work based on RL.
RL is simple and extremely general, and can be built on top of much more sophisticated AI algorithms. And the sophisticated AI algorithms seem to be really difficult to understand. We can train a neural network to recognize cats, but we can't look at it's weights and understand what it's doing. We can't mess around with it and make it recognize dogs instead (without retraining it.)