The recent crossposting from IAFF convinced me that I should move LaTeX support a bit earlier than I had originally intended. I will see whether I can hack something together soon.
This seems to be a bigger modification than just "be introspective." An EDT agent that starts out by searching through different Actions (which in a game theory context might include mixed strategies) and then ouputting the code corresponding to the best Action, upon introspection, might construct a causal graph that includes its Decision. Without further modification, when it sets an Action, this is then excellent knowledge about the Decision node, which is then also evidence about the parents of the Decision node, and we still have ordinary EDT. I'll revisit this in the morning when I have brain, but it seems like this ratification modification is doing something unusual with the Decision node that involves setting it but not conditioning on it!
Also note that maybe an introspective agent would not conclude that D is the only parent of A, and A the only child of D. There may be noise terms, at the least.
(Cross-posted from IAFF.)
I provide conditions under which CDT=EDT in Bayes-net causal models.
Previously, I discussed conditions under which LICDT=LIEDT. That case was fairly difficult to analyse, although it looks fairly difficult to get LICDT and LIEDT to differ. It’s much easier to analyze the case of CDT and EDT ignoring logical uncertainty.
As I argued in that post, it seems to me that a lot of informal reasoning about the differences between CDT and EDT doesn’t actually give the same problem representation to both decision theories. One can easily imagine handing a causal model to CDT and a joint probability distribution to EDT, without checking that the probability distribution could possibly be consistent with the causal model. Representing problems in Bayes nets seems like a good choice for comparing the behavior of CDT and EDT. CDT takes the network to encode causal information, while EDT ignores that and just uses the probability distribution encoded by the network.
It’s easy to see that CDT=EDT if all the causal parents of an agent’s decision are observed. CDT makes decisions by first cutting the links to parents, and then conditioning on alternative actions. EDT conditions on the alternatives without cutting links. So, EDT differs from CDT insofar as actions provide evidence about causal parents. If all parents are known, then it’s not possible for CDT and EDT to differ.
So, any argument for CDT over EDT or vice versa must rely on the possibility of unobserved parents.
The most obvious parents to any decision node are the observations themselves. These are, of course, observed. But, it’s possible that there are other significant causal parents which can’t be observed so easily. For example, to recover the usual results in the classical thought experiments, it’s common to add a node representing “the agent’s abstract algorithm” which is a parent to the agent and any simulations of the agent. This abstract algorithm node captures the correlation which allows EDT to cooperate in the prisoner’s dilemma and one-box in Newcomb, for example.
Here, I argue that sufficient introspection still implies that CDT=EDT. Essentially, the agent may not have direct access to all its causal parents, but if it has enough self-knowledge (unlike the setup in Smoking Lesion Steelman), the same screening-off phenomenon occurs. This is somewhat like saying that the output of the abstract algorithm node is known. Under this condition, EDT and CDT both two-box in Newcomb and defect in the prisoner’s dilemma.
Case 1: Mixed-Strategy Ratifiability
Suppose that CDT and EDT agents are given the same decision problem in the form of a Bayesian network. Actions are represented by a variable node in the network, A, with values a. Agents select mixed strategies somehow, under the constraint that their choice is maximal with respect to the expectations which they compute for their actions; IE:
(Exploration constraint.) I further restrict all action probabilities to be at least epsilon, to ensure that the conditional expectations are well-defined.
(Ratifiability constraint.) I’ll also assume ratifiability of mixed strategies: the belief state from which CDT and EDT make their decision is one in which they know which mixed strategy they select. Put another way, the decision is required to be stable under knowledge of the decision. I discuss ratifiability more here.
We can imagine the agent getting this kind of self-knowledge in several ways. Perhaps it knows its own source code and can reason about what it would do in situations like this. Perhaps it knows “how these things go” from experience. Or perhaps the decision rule which picks out the mixed strategies explicitly looks for a choice consistent with mixed-strategy ratifiability.
How this gets represented in the Bayes net is by a node representing the selection of mixed strategy, which I’ll call D (the “decision” node) which is the direct parent of A (our action node). D gives the probability of A.
(Mixed-strategy implementability.) I also assume that A has no other direct parents, representing the assumption that the choice of mixed strategy is the only thing determining the action. This is like the assumption that the environment doesn’t contain anything which correlates itself with our random number generator to mess with our experimentation, which I discussed in the LICDT=LIEDT conditions post. It’s allowable for things to be correlated with our randomness, but if so, they must be downstream of it. Hence, it’s also a form of my “law of logical causality” from earlier.
Theorem 1. Under the above assumptions, the consistent choices of mixed strategy are the same for CDT and EDT.
Proof. The CDT and EDT expected utility calculations become the same under the mixed-strategy ratifiability condition, since D screens A off from any un-observed parents of D. Besides that, all the rest of the constraints are already the same for CDT and EDT. So, the consistent choices of mixed strategies will be the same. ◻
It’s natural to think of these possible choices as equilibria in the game-theoretic sense. My constraints on the decision procedures for EDT and CDT don’t force any particular choice of mixed strategy in cases where several options have maximal utility; but, the condition that that choice must be self-consistent forces it into a few possibilities.
The important observation for my purposes is that this argument for CDT=EDT doesn’t require any introspection beyond knowing which mixed strategy you’re going to choose in the situation you’re in. Perhaps this still seems like a lot to assume. I would contend that it’s easier than you may think. As we saw in the logical inductor post, it just seems to happen naturally for LIDT agents. It would also seem to happen for agents who can reason about themselves, or simply know themselves well enough due to experience.
Furthermore, the ratifiability constraint is something which CDTers have argued for independent reasons, to fix problems which otherwise arise for CDT. So denial of this assumption seems to be an unavailable response for CDTers.
The way I’ve defined CDT and EDT may seem a bit unnatural, since I’ve constrained them based on max-expectation choice of actions, but stated that they are choosing mixed strategies. Shouldn’t I be selecting from the possible probability distributions on actions, based on the expected utility of those? This would invalidate my conclusion, since the CDT expectation of different choices of D can differ from the EDT expectation. But, it is impossible to enforce ratifiability while also ensuring that conditioning on different choices of D is well-defined. So, I think this way of doing it is the natural way when a ratifiability constraint is in play.[1]
Case 2: Approximate Ratifiability
More concerning, perhaps, is the way my argument takes under-specified decision procedures (only giving constraints under which a decision procedure is fit to be called CDT or EDT) and concludes a thing about what happens in the under-specified cases (effectively, any necessary tie-breaking between actions with equal expected utility must choose action probabilities consistent with the agent’s beliefs about the probabilities of its actions). Wouldn’t the argument just be invalid if we started with fully-specified versions of CDT and EDT, which already use some particular tie-breaking procedure? Shouldn’t we, then, take this as an argument against ratifiability as opposed to an argument for CDT=EDT?
Certainly the conclusion doesn’t follow without the assumption of ratifiability. I can address the concern to some extent, however, by making a version of the argument for fixed (but continuous) decision procedures under an approximate ratifiability condition. This will also get rid of the (perhaps annoying) exploration constraint.
(Continuous EDT) The EDT agent chooses mixed strategies according to some fixed way which is a continuous function of the belief-state (regarded as a function from worlds to probabilities). This function (the “selection function”) is required to agree with maximum-expected-utility choices when the expectations are well-defined and the differences in utilities between options are greater than some ϵ>0.
(Continuous CDT) The same, but taking CDT-style expectations.
(Approximate Ratifiability) Let the true mixed strategy which will be chosen by the agent’s decision rule be d∗. For any other d∈D such that |ln(d(a))−ln(d∗(a)))|>ϵ for any a∈A, we have P(D=d)=0.
(We still assume mixed-strategy implementability, too.)
Approximate ratifiability doesn’t perfectly block evidence from flowing backward from the action to the parents of the decision, like perfect ratifiability did. It does bound the amount of evidence, though: since the alternate d must be very close to d∗, the likelihood ratio cannot be large. Now, as we make epsilon arbitrarily small, there is some delta which bounds the differences in action utilities assigned by CDT and EDT which gets arbitrarily small as well. Hence, the EDT and CDT selection functions must agree on more and more.
By Brouwer’s fixed-point theorem, there will be equilibria for the CDT and EDT selection functions. Although there’s no guarantee these equilibria are close to each other the way I’ve spelled things out, we could construct selection functions for both CDT and EDT which get within epsilon of any of the equilibria from theorem 1.
Case 3: Abandoning Mixed Strategies
There's one more objection I'd like to deal with. Mixed-strategy implementability is required for both arguments above. It might be claimed that this smuggles CDT-like reasoning into EDT.
Informally, I would argue that if mixed-strategy implementability fails, then CDT will learn the wrong causal network. A failure of mixed-strategy implementability basically says that we cannot perform independent experiments. What we thought was a random choice was actually influenced by something. It seems plausible that CDT agents will have constructed incorrect causal networks in such a world, where the extra parents of A are mistakenly treated as children.
To put it a different way: mixed-strategy implementability might be a feature of causal networks that agents can form by their own experimentation in the world. In that case, considering thought experiments where the assumption fails would not be relevant to actual performance of decision procedures.
However, this reply is not very strong. The truth is that "mixed strategies" are an ontological artifact: you actually take one action or another. Only some agent-architectures will first choose a mixed strategy and then randomize according to the chosen probabilities. Even if so, mixed-strategy implementability implies an absurd degree of trust in that randomization.
So to address the problem formally, we replace "mixed strategies" with the agent's own estimate of its action probabilities.
For example, consider rock-paper-scissors between Alice and Bob, who are "reasonably" good at predicting each other. If Alice can predict her own choice at any point, she is concerned that Bob will pick up on the same pattern and predict her. Suppose that Alice decides to play Rock. She will become concerned that Bob has predicted this, and will play Paper. So Alice will shift to play Scissors. And so on. This process will iterate until it is too late to change decisions.
In other words, "mixed strategies" can emerge naturally from following incentives against predictability. The agent just takes whichever action has the highest expected utility; but in doing so, the agent pushes its own beliefs away from certainty, since if it were certain that it would take some specific action, then a different action would look more appealing.
This idea can be formalized by representing everything with Bayesian networks, as before, but now we make the D🡒A link deterministic; the decision procedure of the agent completely determines the action from the beliefs of the agent. D should now be thought of as the agents beliefs about which action it will take, and nothing else -- so what we're doing here is saying that given a belief about its own action probabilities, the agent has some decision it makes. This link is also assumed to be rational (choosing the maximum-expectation action, by either the CDT notion of expectation, or the EDT notion).
(Since we're back to discontinuous functions, we can no longer magic away the division-by-zero problem by invoking continuity, like in the previous section; but we can do other things, which I'll leave to reader imagination for now.)
However, the agent's self-knowledge must be approximate: the agent's probabilities about D are zero outside some epsilon of its true distribution on A at the time of making its decision. (If we required perfection, Alice might get stuck in an infinite loop when trying to play rock-paper-scissors with Bob.)
We now make a similar argument as before: our approximate knowledge of D screens off any correlations between A and parents of D, as epsilon shrinks to zero. Notably, this requires that the relationship between D and its parents is continuous, since the screen-off phenomenon might not occur if the uncertainty is shrinking to zero around a discontinuity.
Consequences for Counterfactuals
The arguments above are fairly rudimentary. The point I’m trying to drive at is more radical: there is basically one notion of counterfactual available. It is the one which both CDT and EDT arrive at, if they have very much introspection. It isn’t particularly good for the kinds of decision-theory problems we’d like to solve: it tends to two-box in realistic Newcomb’s problems (where the predictor is imperfect), defect in prisoner’s dilemma, et cetera. My conclusion is that these are not problems to try and solve by counterfactual reasoning. They are problems to solve with updateless reasoning, bargaining, cooperative oracles, predictable exploration, and so on.
I don’t think any of this is very new in terms of the arguments between CDT and EDT in the literature. Philosophers seem to have a fairly good understanding of how CDT equals EDT when introspection is possible; see SEP on objections to CDT. The proofs above are just versions of the tickle defense for EDT. However, I think the AI alignment community may not be so aware of the extent to which EDT and CDT coincide. Philosophers continue to distinguish between EDT and CDT, while knowing that they wouldn’t differ for ideal introspective agents, on the grounds that decision theories should provide notions of rationality even under failure of introspection. It’s worth asking whether advanced AIs may still have some fundamental introspection barriers which lead to different results for CDT and EDT. From where we stand now, looking at positive introspection results over the years, from probabilistic truth to reflective oracles to logical induction, I think the answer is no.
It’s possible that a solution to AI alignment will be some kind of tool AI, designed to be highly intelligent in a restricted domain but incapable of thinking about other agent, including itself, on a strategic level. Perhaps there is a useful distinction between CDT and EDT in that case. Yet, such an AI hardly seems to need a decision theory at all, much less the kind of reflective decision theory which MIRI tends to think about.
The meagre reasons in the post above hardly seem to suffice to support this broad view, however. Perhaps my Smoking Lesion Steelman series gives some intuition for it (I, II, III). Perhaps I’ll be able to make more of a case as time goes on.
[1]: Really, I think the reasoning in this paragraph is much too fast, but there's a lot to be said on the subject and it would overcomplicate this post.