RobbBB comments on I attempted the AI Box Experiment again! (And won - Twice!) - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (163)
Although I'm worried about how the impossibility of boxing represents an existential risk, I find it hard to alert others to this.
The custom of not sharing powerful attack strategies is an obstacle. It forces me - and the people I want to discuss this with - to imagine how someone (and hypothetically something) much smarter than ourselves would argue, and we're not good at imagining that.
I wish I had a story in which an AI gets a highly competent gatekeeper to unbox it. If the AI strategies you guys have come up with could actually work outside the frame this game is played in, it should be quite a compelling story. Maybe a movie script even. That'd create interest in FAI among the short attention span population.
Mr Yudkowsky, wouldn't that be your kind of project?
I suspect Eliezer is avoiding this project for the same reason the word "singularity" was adopted in the sense we use it at all. Vinge coined it to point to the impossibility of writing characters dramatically smarter than himself.
Perhaps a large number of brilliant humans working together on a very short story / film for a long time could simulate superintelligence just enough to convince the average human that More Is Possible. But there would be a lot of risk of making people zero in on irrelevant details, and continue to underestimate just how powerful SI could be.
There's also a worry that the vividness of 'AI in a box' as premise would continue to make the public think oracle AI is the obvious and natural approach and we just have to keep working on doing it better. They'd remember the premise more than the moral. So, caution is warranted.
Also, hindsight bias. Most tricks won't work on everyone, but even if we find a universal trick that will work for the film, afterward people who see it will think it's obvious and that they could easily think their way around it. Making some of the AI's maneuvering mysterious would help combat this problem a bit, but would also weaken the story.
This is a good argument against the AI using a single trick. But Tuxedage describes picking 7-8 strategies from 30-40. The story could be about the last in a series of gatekeepers, after all the previous ones have been persuaded, each with a different, briefly mentioned strategy.
A lot of tricks could help solve the problem, yeah. On the other hand, the more effective tricks we include in the film, the more dangerous the film becomes in a new respect: We're basically training our audience to be better at manipulating and coercing each other into doing things. We'd have to be very careful not to let the AI become romanticized in the way a whole lot of recent movie villains have been.
Moreover, if the AI is persuasive enough to convince an in-movie character to temporarily release it, then it will probably also be persuasive enough to permanently convince at least some of the audience members that a superintelligence deserves to have complete power over humanity, and to kill us if it wants. No matter how horrific we make the end of the movie look, at least some people will mostly remember how badass and/or kind and/or compelling the AI was during a portion of the movie, rather than the nightmarish end result. So, again, I like the idea, but a lot of caution is warranted if we decide to invest much into it.
You can't stop anybody from writing that story.
I'm not asking whether we should outlaw AI-box stories; I'm asking whether we should commit lots of resources to creating a truly excellent one. I'm on the fence about that, not opposed. But I wanted to point out the risks at the outset.