I'm implementing decision theory, in Python. There are three parts to this:
- An XML language for formally specifying decision theory dilemmas. It can express many dilemmas: Newcomb, Transparent Newcomb, Parfit's Hitchhiker, Death in Damascus, Smoking Lesion, Cosmic Ray, symmetric Prisoner's Dilemma, Counterfactual Mugging, and Sleeping Beautfy.
- Python code that implements CDT, EDT, and UDT.
- A simulator in Python that runs a dilemma using a decision theory to resolve choices. The decision theories all log their reasoning, so you can see the calculations they make in (often excruciating) detail. It's in Python because that's a language lots of people know.
This is all available on Github. My hope is that it can be a launching point to explore and formalize decision theory.
But all is not well! All of the above is implemented, though sometimes poorly or even incorrectly. I could use some guidance. Some questions, if anyone has answers:
- How can the Sleeping Beauty problem be expressed using a Bayesian Network? Imagine that Beauty is given a chance to bet each time she is woken up: there should be a node for "how she bets on Tuesday", but only if the coin lands on tails. I don't know how this would be expressed nicely with Bayesian Networks, and this is preventing me from trying to represent dilemmas using them.
- I'm representing dilemmas as a tree of possible events, where edges are labeled with things like "probability 50%" and "Alice chooses 'defect'". Thus, to represent two coin flips, you would have three nodes: one for the first coin flip, one for the second coin flip if the first landed heads, and one for the second coin flip if the first landed tails. This stands in contrast to Bayesian networks, which would represent two coin flips with just two nodes. My intuition is that it should be possible to convert between the tree representation and the Bayesian network representation, if you're provided with an equivalence relation between nodes in the tree representation (e.g. saying that the two "second coin flip" nodes are equivalent). Is this a thing? Has it been studied?
- I called one of the decision theories I implemented "UDT", but it's actually what I would term "pre-commitment decision theory": "act in the way you would have pre-committed to act". Is there a name for that decision theory? I suspect that it's indistinguishable from UDT and FDT in the space of single-agent dilemmas expressible using the XML language.
- I'm not happy with the current representation of Smoking Lesion and Cosmic Ray. How would you represent dilemmas like these? Remember that you need both a way of writing down the problem, and EDT code that takes that written representation and uses it to do poorly on a dilemma.
- How does one correctly handle multi-agent dilemmas, in which you know the other agents follow the same decision theory? My implementation of "UDT" defects in a prisoner's dilemma against an agent that it knows is following the same decision procedure. More precisely: Alice and Bob follow the same decision procedure, and they both know it. Alice will choose between cooperate/defect, then Bob will choose between cooperate/defect without knowing what Alice picked, then the utility will be delivered. My "UDT" decision procedure reasons as follows for Alice: "if I had pre-commited to cooperate, then Bob would know that, so he would defect, therefore I defect". Is there a known way out of this, besides special casing symmetric dilemmas, which is brittle? The FDT paper said something about "policies" as a way out of this. I have two ideas, which could both plausibly be referred to as 'policies': (i) when picking your action for a decision, you actually pick a set of actions for all agents and decisions, and use game theory to make it a satisfactory one for all agents; or (ii) when picking your action for a decision, you don't actually pick an action, you pick a mapping from what the other agent would do to action.
- How valuable is this? I think the decision theorists here are the main audience. I'm finding myself writing math now, and can imagine writing proofs within this framework. But it can only talk about dilemmas expressible within the XML-dilemma-language. I can imagine someone arguing that the interesting questions all lie outside of that.
Some things that surprised me:
- CDT and EDT aren't well defined, in the sense that their behavior depends on their prior over what their behavior is. This was mentioned in the FDT paper, but it didn't really hit home until I was coding them and need to pick a prior.
- Decision theory is complicated. The logs for the infallible Newcomb problem are ~100 lines long for all the decision theories, and none of that is extraneous (verbose, yes, but not extraneous). There's just a lot of "what I should do depends on whether box B is full, which depends on what the predictor predicted I would do, which depends on ...".
- The space of dilemmas is... smaller than I would have expected? I thought I'd be able to express a few of them, and then each new one would need an extension in the dilemma language. Instead, the dilemma language is very small.
- My view of decision theory now is that it's all about fixpoints. You solve some big equation, and inside of it is the same equation, and there are multiple fixpoint solutions, and you pick the (hopefully unique) best one.
Again, more detail is available in the Github repo README.
Ah, so I'm interested in normative decision theory: how one should ideally behave to maximize their own utility. This is what e.g. UDT&FDT are aiming for. (Keep in mind that "your own utility" can, and should, often include other people's utility too.)
Minimizing runtime is not at all a goal. I think the runtime of the decision theories I implemented is something like doubly exponential in the number of steps of the simulation (the number of events in the simulation is exponential in its duration; each decision typically involves running the simulation using a trivial decision theory).
That's an interesting approach I hadn't considered. While I don't care about efficiency in the "how fast does it run" sense, I do care about efficiency in the "does it terminate" sense, and that approach has the advantage of terminating.
You're doing to defect against UDT/FDT then. They defect against cooperate-bot. You're thinking it's bad to defect against cooperate-bot, because you have empathy for the other person. But I suspect you didn't account for that empathy in your utility function in the payoff matrix, and that if you do, you'll find that you're not actually in a prisoner's dilemma in the game-theory sense. There was a good SlateStarCodex post about this that I can't find.