9 11 March 2015 01:40PM

A putative new idea for AI control; index here.

Many of the ideas presented here require AIs to be antagonistic towards each other - or at least hypothetically antagonistic towards hypothetical other AIs. This can fail if the AIs engage in acausal trade, so it would be useful if we could prevent such things from happening.

Now, I have to admit I'm still quite confused by acausal trade, so I'll simplify it to something I understand much better, an anthropic decision problem.

## Staples and paperclips, cooperation and defection

Cilppy has a utility function p, linear in paperclips, while Stapley has a utility function s, linear in staples (and both p and s are normalised to zero with one aditional item adding 1 utility). They are not causally connected, and each must choose "Cooperate" or "Defect". If they "Cooperate", they create 10 copies of the items they do not value (so Clippy creates 10 staples, Stapley creates 10 paperclips). If they choose defect, they create one copy of the item they value (so Clippy creates 1 paperclip, Stapley creates 1 staple).

Assume both agents know these facts, both agents use anthropic decision theories, and both agents are identical apart from their separate locations and distinct utility functions.

Then the outcome is easy: both agents will consider that "cooperate-cooperate" or "defect-defect" are the only two possible options, "cooperate-cooperate" gives them the best outcome, so they will both cooperate. It's a sweet story of cooperation and trust between lovers that never agree and never meet.

## Breaking cooperation

How can we demolish this lovely agreement? As I often do, I will assume that there is some event X that will turn Clippy on, with P(X) ≈ 1 (hence P(¬X) << 1). Similarly there is an event Y that turns Stapley on. Since X and Y are almost certain, they should not affect the results above. If the events don't happen, the AIs will never get turned on at all.

Now I am going to modify utility p, replacing it with

p' = p - E(p|¬X).

This p with a single element subtracted off it, the expected value of p given that Clippy has not been turned on. This term feels like a constant, but isn't exactly, as we shall see. Do the same modification to utility s, using Y:

s' = s - E(s|¬Y).

Now contrast "cooperate-cooperate" and "defect-defect". If Clippy and Stapley are both cooperators, then p=s=10. However, if the (incredibly unlikely) ¬X were to happen, then Clippy would not exist, but Stapley would still cooperate (as Stapley has no way of knowing about Clippy's non-existence), and create ten paperclips. So E(p|¬X) = E(p|X) ≈ 10, and p' ≈ 0. Similarly s' ≈ 0.

If both agents are defectors, though, then p=s=1. Since each agent creates its own valuable object, E(p|¬X) = 0 (Clippy cannot create a paperclip if Clippy does not exist) and similarly E(s|¬Y)=0.

So p'=s'=1, and both agents will choose to defect.

If this is a good analogue for acausal decision making, it seems we can break that, if needed.

Sort By: Best
Comment author: 12 March 2015 02:14:23PM 9 points [-]

TL;DR: Acausal trade breaks if you change utility functions from 'how much of X' to 'how much of a positive impact on X I have'

Comment author: 12 March 2015 02:20:12PM *  2 points [-]

Yep. This seems to be a formalisation of that idea, avoiding the subtleties in defining "I".

Comment author: 13 March 2015 08:26:23AM *  3 points [-]

The subtleties in defining "I" are pushed into the subtleties of defining events X and Y with respect to Clippy and Stapley respectively. I'm not sure if that counts as avoiding it at all.

And there are other issues with utility functions that depend on an agent's impact on utilon-contributing elements. Such as, say, replacing all other agents that provide utilon-contributing elements with subagents of the barriered agent, thus making its own impact equal to the impact of all utilon-contributing agents.

This idea needs work, in other words. Not that you ever said otherwise, I just don't think the formula provided is sufficient for preventing acausal trade without incentivizing undesirable strategies. See this comment as well for my concerns on disincentivizing utility conditional upon nonexistence.

Comment author: 13 March 2015 12:48:21PM 1 point [-]

The subtleties in defining "I" are pushed into the subtleties of defining events X and Y with respect to Clippy and Stapley respectively.

Defining events seems much easier than defining identity.

Such as, say, replacing all other agents that provide utilon-contributing elements with subagents of the barriered agent, thus making its own impact equal to the impact of all utilon-contributing agents.

I believe this setup wouldn't have this problem. That's the beauty of using X rather than "non-existence" or something similar, it's "non-created" (essentially), so it has no problems with events happening after its death that it can have an impact on.

Comment author: 13 March 2015 06:53:57PM 0 points [-]

Defining events seems much easier than defining identity.

But events X and Y are specifically regarding the activation of Clippy and Stapley, so a definition of identity would need to be included in order to prove the barrier to acausal trade that p' and s' are claimed to have. Unless the event you speak of is something like "the button labeled 'release AI' is pressed," but there is a greater-than-epsilon probability that the button will itself fail. Not sure if that provides any significant penalty to the utility function.

Comment author: 16 March 2015 11:25:30AM 0 points [-]

Unless the event you speak of is something like "the button labeled 'release AI' is pressed,"

Pretty much that, yes. More like "the button press fails to turn on the AI (an exceedingly unlikely event, so doesn't affect utility calculations much, but can still be conditioned on).

Comment author: 11 March 2015 06:29:54PM *  3 points [-]

Typo in post title: "Acaucal trade barriers"

Comment author: 11 March 2015 06:35:48PM 3 points [-]

Fixed.

Comment author: 13 March 2015 12:33:03AM 2 points [-]

Is this sort of a way to get an agent with a DT that admits acausal trade (as we think the correct decision theory would) to act more like a CDT agent? I wonder how different the behaviors of the agent you specify are from those of a CDT agent -- in what kinds of situations would they come apart? When does "I only value what happens given that I exist" (roughly) differ from "I only value what I directly cause" (roughly)?

Comment author: 13 March 2015 08:37:13AM *  2 points [-]

I am concerned about modeling nonexistence as zero or infinitely negative utility. That sort of thing leads to disincentivizing the utility function in circumstances where death is likely. Harry in HPMOR, for example, doesn't want his parents to be tortured regardless of whether he's dead, such that he is willing to take on an increased risk of death to ensure that such will not happen, and I think the same invariance should hold true for FAI. That is not to say that it should be susceptible to blackmail; Harry ensured his parents' safety with a decidedly detrimental effect on his opponents.

Comment author: 13 March 2015 12:22:14PM 1 point [-]

When does "I only value what happens given that I exist" (roughly) differ from "I only value what I directly cause" (roughly)?

Acausal trade with agents who can check whether you exist or not.

Comment author: 13 March 2015 06:55:03PM 0 points [-]

Can those agents check whether your utility function is p vs p'? Because otherwise the point seems moot.

Comment author: 19 March 2015 01:48:08PM 0 points [-]

They can have a probability estimate over it. Just as in all acausal trade. Which I don't fully understand.

Comment author: 13 March 2015 12:07:43PM 1 point [-]

CDT is not stable, and we're not sure where that decision theory could end up at.

It seems this approach could be plugged into even a stable decision theory.

Or, more interestingly, we might be able to turn on certain acausal trades and turn off others.

Comment author: 12 March 2015 10:31:37PM 1 point [-]

So, first you have the utility functions that pay both agents 10 if they cooperate and 1 if they don’t.

Then you change the utility functions to pay the agents 0 if they cooperate and 1 if they don’t. Naturally they will then stop cooperating.

I don’t get it. If you are the one specifying the utility functions, then obviously you can make them cooperate or defect, right?

Comment author: 13 March 2015 12:20:58PM 2 points [-]

The change in utility function isn't removing 10 by hand; it's by removing any utility they gain from acausal trade (whatever it is) while preserving utility gained through direct actions. Thus incentivising them to only focus on direct actions (roughly).

Comment author: 15 March 2015 04:50:42PM 0 points [-]

Then the entire result of the modification is tautologically true, right?

Comment author: 19 March 2015 01:44:53PM 2 points [-]

All of maths is tautologically true, so I'm not sure what you're arguing.

Comment author: 12 March 2015 02:36:24PM 1 point [-]

I think there are more fundamental problem with this sort of argument: staples and paperclips aren't going to be the same resources involved. So assuming a completely symmetric situation isn't going to happen. Worse, as the resource difference gets larger, one of two will have more resources free to work on self-modification.

Comment author: 12 March 2015 02:49:27PM 2 points [-]

I assume symmetry to get acausal trade as I could model it, then broke acausal trade while preserving the symmetry. This seems to imply that the method will break acausal trade in general.

Comment author: 12 March 2015 02:51:28PM 1 point [-]

Ah, that makes sense.

Comment author: 08 December 2016 01:55:24AM 0 points [-]

It's not clear to me why you define p' and s' and what they're supposed to represent. I worry that you're making a unit error or leaving out a probability weighting. (was it supposed to be p' = E(p) - E(p|¬X)P(¬X) ?? but why would that be relevant either???)