EDIT: added a simplified version here.
Crossposted at the intelligent agents forum.
In Anthropic Decision Theory (ADT), behaviours that resemble the Self Sampling Assumption (SSA) derive from average utilitarian preferences (and from certain specific selfish preferences).
However, SSA implies the doomsday argument, and, to date, I hadn't found a good way to express the doomsday argument within ADT.
This post will remedy that hole, by showing how there is a natural doomsday-like behaviour for average utilitarian agents within ADT.
Anthropic behaviour
The comparable phrasings of the two doomsday arguments (probability and decision-based) are:
- In the standard doomsday argument, the probability of extinction is increased for an agent that uses SSA probability versus one that doesn't.
- In the ADT doomsday argument, an average utilitarian behaves as if it were a total utilitarian with a higher revealed probability of doom.
Thus in both cases, doomsday agent believes/behaves as if it were a non-doomsday agent with a higher probability of doom.
Revealed probability of events
What are these revealed probabilities?
Well, suppose that X and X' are two events that may happen. The agent has a choice between betting on one or the other; if they bet on the first, they get a reward of r if X happens, if they bet on the second, they get a reward of r' if X' happens.
If an agent is an expected utility maximiser and chooses X over X', this implies that rP(X) ≥ r'P(X'), where P(X) and P(X') are the probabilities the agent assigns to X and X'.
Thus, observing the behaviour of the agent allows one to deduce their probability estimation for X and X'.
Revealed anthropic and non-anthropic probabilities
To simplify comparisons, assume that Y is an event that will happen with probability 1; if the agent bets on Y, it will get a reward of 1. The Y's only purpose is to compare with other events.
Then X is an event that will happen with an unknown probability, if bet on, the agent will get a reward of r. In comparison, Xs is an event that will happen with certainty if and only if humanity survives for a certain amount of time. If the agent bets on Xs and it happens, it will then give a reward of rs.
The agent need to bet on one of Y, X, and Xs. Suppose that the agent is an average utilitarian, and that their actual estimated probability for human survival is p; thus P(Xs)=p. If humanity survives, the total human population will be Ω; if it doesn't, then it will be limited to ω≤Ω.
Then the following table gives the three possible bets and the expected utility the average utilitarian will derive from them. Since the average utilitarian needs to divide their utility by total population, this expected utility will be a function of the probabilities of the different population numbers.
By varying r and rs, we can establish what probabilities the agent actually gives to each event, by comparing with situation when it bets of Y. If we did that, but assumed that the agent was a total utilitarian rather than an average one, we would get the apparent revealed probabilities given in the third column:
Bet | Utility | App. rev. prob. if tot. |
---|---|---|
Y |
(1-p)/ω + p/Ω | 1 |
X | rP(X)[(1-p)/ω + p/Ω] | P(X) |
Xs |
rs(p/Ω) | p'= (p/Ω) / [(1-p)/ω + p/Ω] |
Note that if Ω=ω - if the population is fixed, so that the average utilitarian behaves the same as a total utilitarian - then p' simplifies to (p/ω) / (1/ω) = p, the actual probability of survival.
It's also not hard to see that p' strictly decreases as Ω increases, so it will always be less than p if Ω > ω.
Thus if we interpret the actions of an average utilitarian as if they were a total utilitarian, then for reward conditional on human survival - and only for those rewards, not for others like betting on X - their actions will seem to imply that they give a lower probability of human survival than they actually do.
Conclusion
The standard doomsday argument argues that we are more likely to be in the first 50% of the list of all humans that will ever live, rather than in the first 10%, which is still more likely than us being in the first 1%, and so on. The argument is also vulnerable to changes of reference class; it gives different implications if we consider 'the list of all humans', 'the list of all mammals', or 'the list of all people with my name'. The doomsday argument has no effect on probabilities not connected with human survival.
All these effects reproduce in this new framework. Being in the first n% means that the total human population will be at least ω100/n, so the total population Ω grows as n shrinks -- and p', the apparent revealed probability of survival, shrinks as well. Similarly, average utilitarianism gives different answers depending on what reference class is used to define its population. And the apparent revealed probabilities that are not connected with human survival are unchanged from a total utilitarian.
Thus this seems like a very close replication of the doomsday argument in ADT, in terms of behaviour and apparent revealed probabilities. But note that it is not a genuine doomsday argument. It's all due to the quirky nature of average utilitarianism; the agent doesn't really believe that the probability of survival goes down, they just behave in a way that would make us infer that they believed that, if we saw them as being a total utilitarian. So there is no actual increased risk.
Thanks for the reply! Can you tell more about the failure to publish ADT? I know that from arxiv, but don't know the details.
Curious if you're at all updating using MIRIs poor publishing record as evidence of a problem based on the Stuart+Wei's story below. (Seems like trying to get through journal review might be a huge cost and do little to advance knowledge). Or you think this was an outlier or the class of things MIRI should be publishing is less subject for the kinds of problems mentioned.