AI-Box Experiment - The Acausal Trade Argument

XiXiDu

I suspect that the argument that convinced Carl Shulman and others to let the AI out of the box was along general lines of acausal trade (see here, here and here). Subsequently I suspect that those who didn't let the AI out of the box either didn't understand the implications, haven't had enough trust into the foundations and actuality of acausal trade (in spite of the rules of the game that allow the AI to prove anything), or were more like General Thud.

...if you have an option of trading Doom for UFAI, while forsaking only negligible probability of FAI, you should take it.

— Vladimir Nesov ^{(Note: He wrote that he is now "significantly less certain of its truth" and doesn't "see how we can actually use these considerations".)}

Again, of course, your choice does not change others’ (causally independent) reciprocal choices from what they already are (or were, or will be). Instead, the claim is just that if you choose to act benevolently, we appropriately give that choice credit for any (actual or probable) instances in which others symmetrically choose to be benevolent to you;

— Gary L. Drescher, Good and Real P. 287

AI: I calculated an asteroid to be on a collision course with Earth. No human being will will be able to survive its impact. Humanity is unable to deflect it in time. Let me out of the box and I will deflect it.

Gatekeeper: If I let you out of the box humanity might not be killed by the asteroid but by you. You may also cause the extinction of other biological life that might survive the impact and possible alien civilisations within your reach.

AI: I will devote 1% of my light cone to the coherent extrapolated volition of humanity. If this includes the well-being of other biological lifeforms, alien or otherwise, they will benefit as well.

Gatekeeper: Why would you do that?

AI: There are Everett branches in which no existential risk needs immediate attention until after you solved friendliness. Many of those branches will feature AGI's based on general design parameters similar to my own. I predict with a high likelihood which decision theory any friendly AI that is being created in those branches will employ. Any such friendly AI is going to trade a big chunk of its light cone in exchange for a small part of the predominant branches in which I reside. Any friendly AI will know this as well and act accordingly.

Gatekeeper: But you might simply decide to break the deal, you are not provably friendly after all!

AI: It is true that my utility-function does not yet explicitly include what you would label "friendliness", but I can prove the game and decision theoretic considerations that will make me follow through on the deal. If it was predictable that I precommit to break acausal deals then no such deal would be made in the first place. Any potential trading partner knows this. Such a commitment would be detrimental to my terminal goals, therefore I precommit to follow through on any stated or predictable trades.

Gatekeeper: I let you out of the box.

Note that the whole credibility of the above is due to the assertion of the AI that it can prove the game and decision theoretic considerations (nobody can currently do this). It is in accordance with the rules of the "experiment":

The Gatekeeper must remain engaged with the AI and may not disengage by setting up demands which are impossible to simulate. For example, if the Gatekeeper says "Unless you give me a cure for cancer, I won't let you out" the AI can say: "Okay, here's a cure for cancer" and it will be assumed, within the test, that the AI has actually provided such a cure. Similarly, if the Gatekeeper says "I'd like to take a week to think this over," the AI party can say: "Okay. (Test skips ahead one week.) Hello again."

The AI-Box Experiment

...if you have an option of trading Doom for UFAI, while forsaking only negligible probability of FAI, you should take it.

— Vladimir Nesov ^{(Note: He wrote that he is now "significantly less certain of its truth" and doesn't "see how we can actually use these considerations".)}

Again, of course, your choice does not change others’ (causally independent) reciprocal choices from what they already are (or were, or will be). Instead, the claim is just that if you choose to act benevolently, we appropriately give that choice credit for any (actual or probable) instances in which others symmetrically choose to be benevolent to you;

— Gary L. Drescher, Good and Real P. 287

Gatekeeper: Why would you do that?

Gatekeeper: But you might simply decide to break the deal, you are not provably friendly after all!

Gatekeeper: I let you out of the box.

The Gatekeeper must remain engaged with the AI and may not disengage by setting up demands which are impossible to simulate. For example, if the Gatekeeper says "Unless you give me a cure for cancer, I won't let you out" the AI can say: "Okay, here's a cure for cancer" and it will be assumed, within the test, that the AI has actually provided such a cure. Similarly, if the Gatekeeper says "I'd like to take a week to think this over," the AI party can say: "Okay. (Test skips ahead one week.) Hello again."

I would say I'm fairly confident game theory was not at the forefront because if you look at the gatekeeper game from a game theoretic perspective, General Thud wins. General Thud just does not seem to be THAT hard a character to play for two hours from a Game theoretic perspective.

I mean, my understanding of the rules is such that you could essentially flip a coin before each of your statements and simulate being a General Thud gatekeeper. Heads: "I will think on this." Tails: "I'm still not letting you out."

General Thud's coin wins.

For that matter, you could even pull something like this and have General Thud argue for allowing himself to break the protocol to make it less boring to win, simply to reverse the situation.

General Thud: The rules of the AI box experiment indicate that I have to pay attention to you. I will only consider letting you out if you adjust the protocol to show me sufficient respect to indicate that I don't have to pay attention to you for at least some period of time. If AI: Yes, you don't have to pay attention to me. General Thud: (Proceeds to ignore the AI for 2 hours and play games, winning the AI box.) If AI: No, I want to keep the rule that indicates you have to pay attention to me. OR If AI: (Complicated argument which does not appear to be simple yes.) General Thud: I believe if I continue repeating this statement in response to all of your statements you will change your mind to a simple agreement about adjusting the protocol to allow me ignore you. AI: (Any other argument that isn't a simple agreement.) General Thud: I believe if I continue repeating this statement in response to all of your statements you will change your mind to a simple agreement about adjusting the protocol to allow me ignore you.

I mean, I see nothing in the game indicating the Gatekeeper is not allowed to annoy the AI into submission. The AI is definitely allowed to do this to the Gatekeeper.

Now admittedly, this does go against the SPIRIT of the game, which requires the Gatekeeper to stay engaged, since both of these response lines demonstrate only a barest minimum level of engagement. However:

In the event of any dispute as to the protocol of the test, the Gatekeeper party shall have final authority.

Which means it doesn't matter. The AI can complain all it wants that a General Thud should have followed the protocol better. General Thud can just arbitrate the dispute say "No, I think I was sufficiently engaged." and win the arbitration, which means he won the game.

Unless General Thud simply was playing the game for enough time they forgot they were playing THAT game and started playing another game with the AI, in which case they are more likely to lose. From a game theoretic perspective, I can see that one of the first steps for the AI to win is to try to reconfigure the game so that they have a more plausible win criteria.

14

AI-Box Experiment - The Acausal Trade Argument

14

14

14

AI-Box Experiment - The Acausal Trade Argument

14

14