The entire point of Eliezer's demonstration was that if an AI wants to it can increase its power base even starting from a text only communication system. The entire point of my idea is that we can just build the AI such that it doesn't want to leave the box or increase its power base. It dodges that entire problem, that's the whole point.
You've gotten so used to being scared of boxed AI that you're reflexively rejecting my idea, I think, because your above objection makes no sense at all and is obviously wrong upon a moment's reflection. All of my bias-alarms have been going off since your second comment reply, please evaluate yourself and try to distance yourself from your previous beliefs, for the sake of humanity. Also, here is a kitten, unless you want it to die then please reevaluate: http://static.tumblr.com/6t3upxl/Aawm08w0l/khout-kitten-458882.jpeg
Limitations on the AI restrict the range of things that the AI can create. Yes, if we just built whatever the AI said to and the AI was unfriendly then we would lose. Obviously. Yes, if we assume that the UFAI is tricky enough to "circumvent any medium restrictions [we] place on it" then we would lose, practically by definition. But that assumption isn't warranted. (These super weak strawmen were other indications to me that you might be being biased on this issue.)
I think a key component of our disagreement here might be that I'm assuming that the AI has a very limited range of inputs, that it could only directly perceive the text messages that it would be sent. You're either assuming that the AI could deduce the inner workings of our facility and the world and the universe from those text messages, or that the AI had access to a bunch of information about the world already. I disagree with both assumptions, the AIs direct perception could be severely limited and should be, and it isn't magic so it couldn't deduce the inner workings of our economy or the nature of nuclear fusion just through deduction (because knowledge comes from experience and induction). (You might not be making either of those assumptions, this is a guess in an attempt to help resolve our disagreement more quickly, sorry if it's wrong.)
Also, I'm envisioning a system where people that the AI doesn't know and that the Gatekeepers don't know about observe their communications. That omitted detail might be another reason for your disagreement, I just assumed it would be apparent for some stupid reason, my apologies.
I think we would have to be careful about what questions we asked the AI. But I see no reason why it could manipulate us automatically and inevitably, no matter what questions we asked it. I think extracting useful information from it would be possible, perhaps even easy. An AI in a box would not be God in a box, and I think that you and other people sometimes accidentally forget that. Just because its dozens or hundreds of times smarter than us doesn't mean that we can't win, perhaps win easily, provided that we make adequate preparations for it.
Also, the other suggestions in my comment were really meant to supplement this. If the AI is boxed, and can be paused, then we can read all its thoughts (slowly, but reading through its thought processes would be much quicker than arriving at its thoughts independently) and scan for the intention to do certain things that would be bad for us. If it's probably a FAI anyways, then it doesn't matter if the box happens to be broken. If we're building multiple AIs and using them to predict what other AIs will do under certain conditions then we can know whether or not AIs can be trusted (use a random number generator at certain stages of the process to prevent it from reading our minds, hide the knowledge of the random number generator). These protections are meant to work with each other, not independently.
And I don't think it's perfect or even good, not by a long shot, but I think it's better than building an unboxed FAI because it adds a few more layers of protection, and that's definitely worth pursuing because we're dealing with freaking existential risk here.
The entire point of my idea is that we can just build the AI such that it doesn't want to leave the box or increase its power base.
Let's return to my comment four comments up. How will you formalize "power base" in such a way that being helpful to the gatekeepers is allowed but being unhelpful to them is disallowed?
I think, because your above objection makes no sense at all and is obviously wrong upon a moment's reflection.
If you would like to point out a part that of the argument that does not follow, I would be happy to try and clarify i...
Here's the new thread for posting quotes, with the usual rules: