There may be an interesting connection between this example and AIs knowing each other's source code. The idea is, if one AI can unilaterally prove its source code to another without the receiver being able to credibly deny receipt of the proof, then it should change its source code to commit to an unfair agreement that favors itself, then prove this. If it succeeds in being the first to do so, the other side then has no choice but to accept. So, Freaky Fairness seems to depend on the details of the proof process in some way.
If it succeeds in being the first to do so, the other side then has no choice but to accept.
This presumes that the other side obeys standard causal decision theory; in fact, it's an illustration of why causal decision theory is vulnerable to exploitation if precommitment is available, and suggests that two selfish rational CDT agents who each have precommitment options will generally wind up sabotaging each other.
This is a reason to reject CDT as the basis for instrumental rationality, even if you're not worried that Omega is lurking around the corner.
It's an old book, I know, and one that many of us have already read. But if you haven't, you should.
If there's anything in the world that deserves to be called a martial art of rationality, this book is the closest approximation yet. Forget rationalist Judo: this is rationalist eye-gouging, rationalist gang warfare, rationalist nuclear deterrence. Techniques that let you win, but you don't want to look in the mirror afterward.
Imagine you and I have been separately parachuted into an unknown mountainous area. We both have maps and radios, and we know our own positions, but don't know each other's positions. The task is to rendezvous. Normally we'd coordinate by radio and pick a suitable meeting point, but this time you got lucky. So lucky in fact that I want to strangle you: upon landing you discovered that your radio is broken. It can transmit but not receive.
Two days of rock-climbing and stream-crossing later, tired and dirty, I arrive at the hill where you've been sitting all this time smugly enjoying your lack of information.
And after we split the prize and cash our checks I learn that you broke the radio on purpose.
Schelling's book walks you through numerous conflict situations where an unintuitive and often self-limiting move helps you win, slowly building up to the topic of nuclear deterrence between the US and the Soviets. And it's not idle speculation either: the author worked at the White House at the dawn of the Cold War and his theories eventually found wide military application in deterrence and arms control. Here's a selection of quotes to give you a flavor: the whole book is like this, except interspersed with game theory math.
I sometimes think of game theory as being roughly divided in three parts, like Gaul. There's competitive zero-sum game theory, there's cooperative game theory, and there are games where players compete but also have some shared interest. Except this third part isn't a middle ground. It's actually better thought of as ultra-competitive game theory. Zero-sum settings are relatively harmless: you minimax and that's it. It's the variable-sum games that make you nuke your neighbour.
Sometime ago in my wild and reckless youth that hopefully isn't over yet, a certain ex-girlfriend took to harassing me with suicide threats. (So making her stay alive was presumably our common interest in this variable-sum game.) As soon as I got around to looking at the situation through Schelling goggles, it became clear that ignoring the threats just leads to escalation. The correct solution was making myself unavailable for threats. Blacklist the phone number, block the email, spend a lot of time out of home. If any messages get through, pretend I didn't receive them anyway. It worked. It felt kinda bad, but it worked.