The Prisoner's Dilemma has been discussed to death here on OB/LW, right? Well, here's a couple new twists to somewhat... uh... expand the discussion.
Warning: programming and math ahead.
Scenario 1
Imagine a PD tournament between programs that can read each other's source code. In every match, player A receives the source code of player B as an argument, and vice versa. Matches are one-shot, not iterated.
In this situation it's possible to write a program that's much better than "always defect". Yes, in an ordinary programming language like C or Python, no futuristic superintelligent oracles required. No, Rice's theorem doesn't cause any problems.
Here's an outline of the program:
// begin PREFIX
Strategy main(SourceCode other)
{
// get source code of this program from "begin PREFIX" to "end PREFIX",
// using ordinary quine (self-printing program) techniques
String PREFIX = "..."
if (other.beginsWith(PREFIX))
return Strategy.COOPERATE;
else
return anythingElse(other);
}
// end PREFIX
// from this point you can write anything you wish
Strategy anythingElse(SourceCode other)
{
return Strategy.DEFECT;
}
Some features of this program:
- It cooperates with itself.
- It cooperates with any other program that begins with PREFIX.
- It's impossible to cheat, because opponents that begin with PREFIX can't not cooperate with this program.
Other authors now have an incentive to include PREFIX in their programs, moving their original logic into the "anythingElse" subroutine. This modification has no downside.
So, introducing such a program into the tournament should lead to a chain reaction until everyone cooperates. Unless I've missed something. What say ye?
Edit: the last point and the conclusion were wrong. Thanks to Warrigal for pointing this out.
Scenario 2
Now imagine another tournament where programs can't read each other's source code, but are instead given access to a perfect simulator. So programs now look like this:
Strategy main(ObjectCode self, ObjectCode other, Simulator simulator) {...}
and can call simulator.simulate(ObjectCode a, ObjectCode b) arbitrarily many times with any arguments. To give players a chance to avoid bottomless recursion, we also make available a random number generator.
Problem: in this setting, is it possible to write a program that's better than "always defect"?
The most general form of a reasonable program I can imagine at the moment is a centipede:
- Programmer invents a number N and a sequence of real numbers 0 < p1 < p2 < ... < pN < 1.
- Program generates a random number 0 < r < 1.
- If r < p1, cooperate.
- Simulate the opponent's reaction to you.
- If opponent defects, defect.
- Otherwise if r < p2, cooperate.
- And so on until N.
Exercise 1: when (for what N and pi) does this program cooperate against itself? (To cooperate, the recursive tree of simulations must terminate with probability one.)
Exercise 2: when does this program win against a simple randomizing opponent?
Exercise 3: what's the connection between the first two exercises, and does it imply any general theorem?
Epilogue
Ordinary humans playing the PD othen rely on assumptions about their opponent. They may consider certain invariant properties of their opponent, like altruism, or run mental simulations. Such wetware processes are inherently hard to model, but even a half-hearted attempt brings out startling and rigorous formalizations instead of our usual vague intuitions about game theory.
Is this direction of inquiry fruitful?
What do you think?
Then the question is whether AIs can (a) trustworthily verify each other's source code by e.g. sending in probes to do random inspections which include verifying that the inspected software is not deliberately opaque nor is it set to accept coded overrides from outside, or (b) don't need to verify each other's source code because the vast majority of initial conditions converge on the same obvious answer to PD problems.
(a) Random inspections probably won't work. It's easy to have code/hardware that look innocent as individual parts, but together have the effect of being a backdoor. You won't detect the backdoor unless you can see the entire system as a whole.
Tim Freeman's "proof by construction" method is the only viable solution to the "prove your source code" problem that I've seen so far.
(b) is interesting, and seems to be a new idea. Have you written it up in more detail somewhere? If AIs stop verifying each other's source code, won't they want to modify their source code to play Defect again?