TheAncientGeek comments on Leaving LessWrong for a more rational life - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (268)
And then be reviled as the man who prevented a cure for cancer? Remember that the you in the story doesn't have the same information as the you outside the story -- he doesn't know that the AI isnt sincere.
"Please dont unplug me, I am about to find a cure for cancer" is a .placeholder for a class of exploits on the part of the AI where it holds a carrot in front of us. It's not going to literally come out with the cure for cancer thing under circumstances where it's not tasked with working on something like it it, because that would be dumb , and it's supposed to be superintelligent. But superintelligence is really difficult to predict....you have to imagine exploits, then imagine versions of them that are much better.
The hypothetical MIRI is putting forward is that if you task an super AI with agentively solving the whole of human happiness, then it will have to have the kind of social, psychological and linguistic knowledge necessary to talk its way out of the box.
A more specialised AGI seems safer... and likelier ... but then another danger kicks in: it's creators might be too relaxed about boxing it, perhaps allowing it to internet access... but the internet contains a wealth of information to bootstrap linguistic and psychological knowledge with.
There's an important difference between rejecting MIRIs hypotheticals because the conclusions don't follow from the antecedents, as opposed to doing so because the antecedents are unlikely in the place.
Dangers arising from non AI scenario don't prove AI safety. My point was that an AI doesn't need efffectors to be dangerous... information plus sloppy oversight is enough. However the MIRI scenario seems to require a kind of perfect storm of fast takeoff , overambition, poor oversight, etc.
A superintelligence can be meta deceptive. Direct inspection of code is a terrible method of oversight, since even simple AIs can work in ways that baffle human programmers.
ETA on the whole, I object to the antecedents/priors ....I think the hypothetical go through,