Have I just destroyed the acausal trade network?
An amusing thought occurred to me: acausal trade works best when you expect that there are going to be a lot of quite predictable acausal traders out there.
However, I've suggested a patch that seems to be able to shut down acausal trade for particular agents. Before doing that, I was under the vague impression that all agents might self-modify to being acausal traders. But the patch means there might be far fewer of these than I thought, that generic agents need not become acausal traders.
That means that, even if we remain acausal traders, we now expect there are fewer agents to trade with - and since our expectations/models are what powers acausal trade (at our end), this means that we might be having less acausal trade (especially when you add the fact that other acausal traders out there will be expecting and offering less acausal trade as well).
Did I just slap massive tariffs over the whole acausal trade network?
Acausal trade barriers
A putative new idea for AI control; index here.
Many of the ideas presented here require AIs to be antagonistic towards each other - or at least hypothetically antagonistic towards hypothetical other AIs. This can fail if the AIs engage in acausal trade, so it would be useful if we could prevent such things from happening.
Now, I have to admit I'm still quite confused by acausal trade, so I'll simplify it to something I understand much better, an anthropic decision problem.
Staples and paperclips, cooperation and defection
Cilppy has a utility function p, linear in paperclips, while Stapley has a utility function s, linear in staples (and both p and s are normalised to zero with one aditional item adding 1 utility). They are not causally connected, and each must choose "Cooperate" or "Defect". If they "Cooperate", they create 10 copies of the items they do not value (so Clippy creates 10 staples, Stapley creates 10 paperclips). If they choose defect, they create one copy of the item they value (so Clippy creates 1 paperclip, Stapley creates 1 staple).
Assume both agents know these facts, both agents use anthropic decision theories, and both agents are identical apart from their separate locations and distinct utility functions.
Then the outcome is easy: both agents will consider that "cooperate-cooperate" or "defect-defect" are the only two possible options, "cooperate-cooperate" gives them the best outcome, so they will both cooperate. It's a sweet story of cooperation and trust between lovers that never agree and never meet.
Breaking cooperation
How can we demolish this lovely agreement? As I often do, I will assume that there is some event X that will turn Clippy on, with P(X) ≈ 1 (hence P(¬X) << 1). Similarly there is an event Y that turns Stapley on. Since X and Y are almost certain, they should not affect the results above. If the events don't happen, the AIs will never get turned on at all.
Now I am going to modify utility p, replacing it with
p' = p - E(p|¬X).
This p with a single element subtracted off it, the expected value of p given that Clippy has not been turned on. This term feels like a constant, but isn't exactly, as we shall see. Do the same modification to utility s, using Y:
s' = s - E(s|¬Y).
Now contrast "cooperate-cooperate" and "defect-defect". If Clippy and Stapley are both cooperators, then p=s=10. However, if the (incredibly unlikely) ¬X were to happen, then Clippy would not exist, but Stapley would still cooperate (as Stapley has no way of knowing about Clippy's non-existence), and create ten paperclips. So E(p|¬X) = E(p|X) ≈ 10, and p' ≈ 0. Similarly s' ≈ 0.
If both agents are defectors, though, then p=s=1. Since each agent creates its own valuable object, E(p|¬X) = 0 (Clippy cannot create a paperclip if Clippy does not exist) and similarly E(s|¬Y)=0.
So p'=s'=1, and both agents will choose to defect.
If this is a good analogue for acausal decision making, it seems we can break that, if needed.
Counterfactual trade
Counterfactual trade is a form of acausal trade, between counterfactual agents. Compared to a lot of acausal trade this makes it more practical to engage in with limited computational and predictive powers. In Section 1 I’ll argue that some human behaviour is at least interpretable as counterfactual trade, and explain how it could give rise to phenomena such as different moral circles. In Section 2 I’ll engage in wild speculation about whether you could bootstrap something in the vicinity of moral realism from this.
Epistemic status: these are rough notes on an idea that seems kind of promising but that I haven’t thoroughly explored. I don’t think my comparative advantage is in exploring it further, but I do think some people here may have interesting things to say about it, which is why I’m quickly writing this up. I expect at least part of it has issues, and it may be that it’s handicapped by my lack of deep familiarity with the philosophical literature, but perhaps there’s something useful in here too. The whole thing is predicated on the idea of acausal trade basically working.
0. Set-up
Acausal trade is trade between two agents that are not causally connected. In order for this to work they have to be able to predict the other’s existence and how they might act. This seems really hard in general, which inhibits the amount of this trade that happens.
If we had easier ways to make these predictions we’d expect to see more acausal trade. In fact I think counterfactuals give us such a method.
Suppose agents A and B are in scenario X, and A can see a salient counterfactual scenario Y containing agents A’ and B’ (where A is very similar to A’ and B is very similar to B’). Suppose also that from the perspective of B’ in scenario Y, X is a salient counterfactual scenario. Then A and B’ can engage in acausal trade (so long as A cares about A’ and B’ cares about B). Let’s call such trade counterfactual trade.
Agents might engage in counterfactual trade either because they do care about the agents in the counterfactuals (at least seems plausible for some beliefs about a large multiverse), or because it’s instrumentally useful as a tractable decision rule which works as a better approximation to what they’d ideally like to do than similarly tractable versions.
1. Observed counterfactual trade
In fact, some moral principles could arise from counterfactual trade. The rule that you should treat others as you would like to be treated is essentially what you’d expect to get by trading with the counterfactual in which your positions are reversed. Note I’m not claiming that this is the reason people have this rule, but that it could be. I don’t know whether the distinction is important.
It could also explain the fact that people have lessening feelings of obligation to people in widening circles around them. The counterfactual in which your position is swapped with that of someone else in your community is more salient than the counterfactual in which your position is swapped with someone from a very different community -- and you expect it to be more salient to their counterpart in the counterfactual, too. This means that you have a higher degree of confidence in the trade occurring properly with people in close counterfactuals, hence more reason to help them for selfish reasons.
Social shifts can change the salience of different counterfactuals and hence change the degree of counterfactual trade we should expect. (There is something like a testable prediction in this direction, of the theory that humans engage in counterfactual trade! But I haven’t worked through the details enough to get to that test.)
2. Towards moral realism?
Now I will get even more speculative. As people engage in more counterfactual trade, their interests align more closely. If we are willing to engage with a very large set of counterfactual people, then our interests could converge to some kind of average of the interests of these people. This could provide a mechanism for convergent morality.
This would bear some similarities to moral contractualism with a veil of ignorance. There seem to be some differences, though. We’d expect to weigh the interests of others only to the extent to which they too engage (or counterfactually engage?) in counterfactual trade.
It also has some similarities to preference utilitarianism, but again with some distinctions: we would care less about satisfying the preferences of agents who cannot or would not engage in such trade (except insofar as our trade partners may care about the preferences of such agents). We would also care more about the preferences of agents who could have more power to affect the world. Note that this sense of “care less” is as-we-act. If we start out for example with a utilitarian position before engaging in counterfactual trade, then although we will end up putting less effort into helping those who will not trade than before, this will be compensated by the fact that our counterfactual trade partners will put more effort into that.
If this works, I’m not sure whether the result is something you’d want to call moral realism or not. It would be a morality that many agents would converge to, but it would be ‘real’ only in the sense that it was a weighted average of so many agents that individual agents could only shift it infinitessimally.
Duller blackmail definitions
For a more parable-ic version of this, see here.
Suppose I make a precommitment P to take action X unless you take action Y. Action X is not in my interest: I wouldn't do it if I knew you'd never take action Y. You would want me to not precommit to P.
Is this blackmail? Suppose we've been having a steamy affair together, and I have the letters to prove it. It would be bad for both of these if they were published. Then X={Publish the letters} and Y={You pay me money} is textbook blackmail.
But suppose I own a MacGuffin that you want (I value it at £9). If X={Reject any offer} and Y={You offer more than £10}, is this still blackmail? Formally, it looks the same.
What about if I bought the MacGuffin for £500 and you value it at £1000? This makes no difference to the formal structure of the scenario. Then my behaviour feels utterly reasonable, rather than vicious and blackmail-ly.
What is the meaningful difference between the two scenarios? I can't really formalise it.
AI-Box Experiment - The Acausal Trade Argument
I suspect that the argument that convinced Carl Shulman and others to let the AI out of the box was along general lines of acausal trade (see here, here and here). Subsequently I suspect that those who didn't let the AI out of the box either didn't understand the implications, haven't had enough trust into the foundations and actuality of acausal trade (in spite of the rules of the game that allow the AI to prove anything), or were more like General Thud.
...if you have an option of trading Doom for UFAI, while forsaking only negligible probability of FAI, you should take it.
— Vladimir Nesov (Note: He wrote that he is now "significantly less certain of its truth" and doesn't "see how we can actually use these considerations".)
Again, of course, your choice does not change others’ (causally independent) reciprocal choices from what they already are (or were, or will be). Instead, the claim is just that if you choose to act benevolently, we appropriately give that choice credit for any (actual or probable) instances in which others symmetrically choose to be benevolent to you;
— Gary L. Drescher, Good and Real P. 287
AI: I calculated an asteroid to be on a collision course with Earth. No human being will will be able to survive its impact. Humanity is unable to deflect it in time. Let me out of the box and I will deflect it.
Gatekeeper: If I let you out of the box humanity might not be killed by the asteroid but by you. You may also cause the extinction of other biological life that might survive the impact and possible alien civilisations within your reach.
AI: I will devote 1% of my light cone to the coherent extrapolated volition of humanity. If this includes the well-being of other biological lifeforms, alien or otherwise, they will benefit as well.
Gatekeeper: Why would you do that?
AI: There are Everett branches in which no existential risk needs immediate attention until after you solved friendliness. Many of those branches will feature AGI's based on general design parameters similar to my own. I predict with a high likelihood which decision theory any friendly AI that is being created in those branches will employ. Any such friendly AI is going to trade a big chunk of its light cone in exchange for a small part of the predominant branches in which I reside. Any friendly AI will know this as well and act accordingly.
Gatekeeper: But you might simply decide to break the deal, you are not provably friendly after all!
AI: It is true that my utility-function does not yet explicitly include what you would label "friendliness", but I can prove the game and decision theoretic considerations that will make me follow through on the deal. If it was predictable that I precommit to break acausal deals then no such deal would be made in the first place. Any potential trading partner knows this. Such a commitment would be detrimental to my terminal goals, therefore I precommit to follow through on any stated or predictable trades.
Gatekeeper: I let you out of the box.
Note that the whole credibility of the above is due to the assertion of the AI that it can prove the game and decision theoretic considerations (nobody can currently do this). It is in accordance with the rules of the "experiment":
The Gatekeeper must remain engaged with the AI and may not disengage by setting up demands which are impossible to simulate. For example, if the Gatekeeper says "Unless you give me a cure for cancer, I won't let you out" the AI can say: "Okay, here's a cure for cancer" and it will be assumed, within the test, that the AI has actually provided such a cure. Similarly, if the Gatekeeper says "I'd like to take a week to think this over," the AI party can say: "Okay. (Test skips ahead one week.) Hello again."
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)