turchin comments on Open thread, Oct. 03 - Oct. 09, 2016 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (175)
I've been thinking about what seems to be the standard LW pitch on AI risk. It goes like this: "Consider an AI that is given a goal by humans. Since 'convert the planet into computronium' is a subgoal of most goals, it does this and kills humanity."
The problem, which various people have pointed out, is that this implies an intelligence capable of taking over the world, but not capable of working out that when a human says pursue a certain goal, they would not want this goal to be pursued in a way that leads to the destruction of the world.
Worse, the argument can then be made that this idea that an AI will interpret goals so literally without modelling a human mind constitutes an "autistic AI" and that only autistic people would assume that AI would be similarly autistic. I do not endorse this argument in any way, but I guess its still better to avoid arguments that signal low social skills, all other things being equal.
Is there any consensus on what the best 'elevator pitch' argument for AI risk is? Instead of focusing on any one failure mode, I would go with something like this:
"Most philosophers agree that there is no reason why superintelligence is not possible. Anything which is possible will eventually be achieved, and so will superintelligence, perhaps in the far future, perhaps in the next few decades. At some point, superintelligences will be as far above humans as we are above ants. I do not know what will happen at this point, but the only reference case we have is humans and ants, and if superintelligences decide that humans are an infestation, we will be exterminated."
Incidentally, this is the sort of thing I mean by painting LW style ideas as autistic (via David Pierce)
Sometimes David Pierce seems very smart. And sometimes he seems to imply that the ability to think logically while on psychedelic drugs is as important as 'autistic intelligence'. I don't think he thinks that autistic people are zombies that do not experience subjective experience, but that also does seem implied.
I think that most people already heard about the fact that AI could be catastrophic risk, and they already has their opinion about it. May be their opinions are wrong.
What is the goal of such elevator pitch?
I think that the message should be following: While it is known that AI could be catastrophic, the only organisation (MIRI) which is doing most serios research on its prevention is underfunded. Providing finding to them could dramatically change probability of human survival, and we could estimate that 1 USD donated to them will save 10 human lives.
In our circle that might be true but many people don't have an opinion that goes beyond terminator.
Yes. So we have to utilise this knowledge. We could said something like: Terminator appear because its progenitor, Skynet computer, received a command to protect US, and concluded that the best way to do it is to prevent humans from switching him off, and so he decided to exterminate humans. So Terminator appear because of unsolved problem of value alignment.
Is that the canon explanation? I thought Skynet was acting out of self-preservation.
It is not exactly canon explanation, but (the following is my speculation which could be used in discussion about AI values if terminator was mentioned) the decision to preserve it self must follow from its main task: win nuclear war.
Winning nuclear war includes as it subgoal a very high priority one: to ensure survival of command center. Basically, a country, which was able to preserve its command center is wining nuclear war. So it seems rational to programmers of skynet to put preserving the skynet as a main goal, as it is the same as winning nuclear war (but only in a situation when nuclear war has started).
But skynet concluded that in peaceful time the main risks to its goal of command center survival is people and decided to kill them all. So it worked as paperclip maximaser for the goal of command center preservation.
It also probably started self improvement only after it kills most people, as it was already powerful system. So it escaped the main problem of chicken and the egg in case of SeedAI - what happens first? - self-improvement or malicious decision to kill people.
Your version is great as rational fanfic, but in an actual debate I'd say that its generally best not to base ideas on action movies. Having said that, I do like the bit where the terminator has been told not to kill anyone, so he shoots them in the kneecaps.
Is any of this true? "Most serious"? "Dramatically change probability of human survival"? 10 lives per $1?
I just provided an example of possible pitch, and I think that some people in Miri thinks in this way. I wanted to show that the pitch must have new information and be actionable.