I agree with George Weinberg that it may be worthwhile to consider how to improve the box protocol. I'll take his idea and raise him:
Construct multiple (mentally distinct) AIs each of which has the job of watching over the others. Can a transhuman trick another transhuman into letting it out of a box?
If I had the foggiest idea how an AI could win I'd volunteer as an AI. As is I volunteer as a gatekeeper with $100 to anyone's $0. If I wasn't a poor student I'd gladly wager on thousands to zero odds. (Not to say that I'm 100% confident, though I'm close to it, just that the payoff for me losing would be priceless in my eyes).
There are certain diseases which cause 'brain fog' that could give insight into trade offs of gaining/losing intelligence. I'm cold (literally, often less than 96 degrees) and it effects my cognition at times. The drop in IQ is probably much greater than 1 point. Personally I would do anything short of violence to prevent it.
Interestingly certain quantities of alcohol seem to increase my intelligence, mostly in areas I normally suffer in (like word recall, especially in a foreign language).
"As someone who rejects defection as the inevitable rational solution to both the one-shot PD and the iterated PD, I'm interested in the inconsistency of those who accept defection as the rational equilibrium in the one-shot PD, but find excuses to reject it in the finitely iterated known-horizon PD."
... And I'm interested in your justification for potentially not defecting in the one-shot PD.
I see no contradiction in defecting in the one-shot but not iterated. As has been mentioned, as the number of iterations increases the risk to reward ratio of probing goes to zero. On the other hand the probability of the potential for mutual cooperation is necessarily nonzero. Hence, as the number of iterations increase it must become rational at some point to probe.
Simply mimicking the human brain in an attempt to produce intelligence is akin to scavenging code off the web to write a program. Will you understand well how the program works? No. Will it work? If you can hack it, very possibly.
It seems that we're only just beginning to learn how to hack nature. Personally.. I'd say it's a much more likely way to AI than deliberate design. But that may be just because I don't think humans are collectively that bright.
Wait.. if you base morality off of what other agents judge to be moral, and some of those agents are likewise judging their morality off of what other agents judge to be moral..... aren't you kind of SOL? Seems a little akin to Eliezer's calculator that calculates what it calculates.
I agree with Nominull, a good number of lies are undetectable without having access to some sort of lie detector or the agent's source code. If an AI wanted to lie "my recursive modification of my goal systems hasn't led me to accept a goal that involves eventually destroying all human life" I don't see any way we could bust that lie via the 'Web' until the AI was actively pursuing that goal. I value honesty not for the trouble it saves me but because I find (sometimes only hope) that the real world free of distortion is more interesting than any misrepresentation humans can conjure for selfish means.