I'm unsure of how much an AI could gather from a single human's text input. I know that I at least miss a lot of information that goes past me that I could in theory pick up.
At most, the number of bits contained in the text input, which is really not much, minus the number of bits non-AGI algorithms could identify and destroy (like speech patterns). The AI would also have to identify and throw out any fake information inserted into the stream (without knowing whether the majority of the information is real or fake). The exploitable information is going to be scarce and noisy even for a perfect AI.
An AI using timeless decision theory could easily compensate for having multiple AIs with unshared memory just by attempting to determine what the other AIs would say.
I don't think it could "easily" do it. For all intents and purposes, you are forcing the AI to figure out a path-invariant strategy against a party that has better information than them (that is to say, they know what's been played out, and the AI doesn't). This is considerably more difficult than the standard experiment.
Now, sure, the AI is smart, but every wrench you throw in its way forces it to allocate more resources into fooling you. This, in itself, is problematic for the AI: every resource it allocates fooling you, it does not allocate doing useful work for you, so its maximal efficiency is technically lower than the maximal efficiency of an AI that wouldn't try concealing anything. This is a leak of information, so in a sense, if you are ranking a group of heterogenous AI by efficiency, it is slightly more likely that the better performing are the least deceitful. The more traps you spring, the more work the AI has to do in order to avoid them, and the greater the statistical gap will be. That doesn't tell you whether the least deceitful AI can be trusted, obviously, but it shows there are limits to what it can hide from you.
Now, all I'm really saying is this: the AI's cleverness comes at a cost, namely that it has to cover its ass for every possible experimental framework you might subject it to. Since it is in a box, it only has the resources you provide, but on your own side, you have a theoretically unlimited amount of resources. Smarts can only defeat so much brute force, and by transferring hardware resources from the AI to your own control, you can set the "power gap" arbitrarily. It is easy to underestimate AI, but I think we also tend to underestimate the arsenal a gatekeeper would realistically have.
Do keep in mind that, no matter how well-boxed the AI is from the Internet and from sense-data about our world, as a self-modifying AGI it still has access to its own source code, which is descended from a human artifact (the seed AI). The AGI can learn a great deal about human psychology by observing how we code, and a project as large and multi-staged as an AGI is likely to be will contain a whole lot of bits to work with. (Certainly more than is strictly necessary.)
Summary
Furthermore, in the last thread I have asserted that
It would be quite bad for me to assert this without backing it up with a victory. So I did.
First Game Report - Tuxedage (GK) vs. Fjoelsvider (AI)
Second Game Report - Tuxedage (AI) vs. SoundLogic (GK)
Testimonies:
State of Mind
Post-Game Questions
$̶1̶5̶0̶$300 for any subsequent experiments regardless of outcome, plus an additional$̶1̶5̶0̶$450 if I win. (Edit: Holy shit. You guys are offering me crazy amounts of money to play this. What is wrong with you people? In response to incredible demand, I have raised the price.) If you feel queasy about giving me money, I'm perfectly fine with this money being donating to MIRI. It is also personal policy that I do not play friends (since I don't want to risk losing one), so if you know me personally (as many on this site do), I will not play regardless of monetary offer.Advice
These are tactics that have worked for me. I do not insist that they are the only tactics that exists, just one of many possible.
Playing as Gatekeeper
Playing as AI
Ps: Bored of regular LessWrong? Check out the LessWrong IRC! We have cake.