AI utility-based correlation
A putative new idea for AI control; index here.
This presents one way of implementing the indifference-based correlations of these posts.
Let u be a utility function, a map from worlds to real numbers. An expected utility maximiser considering whether to produce output Y, looks at the expected utility
Σw u(w)P(w|Y).
We now assume that there is another random variable X in the world, and we want the AI to be indifferent to worlds where Y≠X. We also want it to be indifferent to worlds where Z=0. Then it will assess the value of output Y as:
Σw u(w)P(w|X=Y,Z=1,Y).
Now, the idea of the setup was to ensure that Z=1 would erase the output Y so that it was never read. Hence P(w|Z=1,Y) Hence that equation simplifies to:
Σw u(w)P(w|X=Y).
Therefore the AI will chose the Y that maximises the (conditional) expected utility of u if X=Y. To get the full version of the initial post, you need to define some function f of Y and modify this to
Σw u(w)P(w|X=Y) + f(Y).
Anthropic signature: strange anti-correlations
Imagine that the only way that civilization could be destroyed was by a large pandemic that occurred at the same time as a large recession, so that governments and other organisations were too weakened to address the pandemic properly.
Then if we looked at the past, as observers in a non-destroyed civilization, what would we expect to see? We could see years with no pandemics or no recessions; we could see mild pandemics, mild recessions, or combinations of the two; we could see large pandemics with no or mild recessions; or we could see large recessions with no or mild pandemics. We wouldn't see large pandemics combined with large recessions, as that would have caused us to never come into existence. These are the only things ruled out by anthropic effects.
Assume that pandemics and recessions are independent (at least, in any given year) in terms of "objective" (non-anthropic) probabilities. Then what would we see? We would see that pandemics and recessions appear to be independent when either of them are of small intensity. But as the intensity rose, they would start to become anti-correlated, with a large version of one completely precluding a large version of the other.
The effect is even clearer if we have a probabilistic relation between pandemics, recessions and extinction (something like: extinction risk proportional to product of recession size times pandemic size). Then we would see an anti-correlation rising smoothly with intensity.
Thus one way of looking for anthropic effects in humanity's past is to look for different classes of incidents that are uncorrelated at small magnitude, and anti-correlated at large magnitudes. More generally, to look for different classes of incidents where the correlation changes at different magnitudes - without any obvious reasons. Than might be the signature of an anthropic disaster we missed - or rather, that missed us.
Michael Nielsen explains Judea Pearl's causality
Michael Nielsen has posted a long essay explaining his understanding of the Pearlean causal DAG model. I don't understand more than half, but that's much more than I got out of a few other papers. Strongly recommended for anyone interested in the topic.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)