Versions of AIXI can be arbitrarily stupid

Stuart_Armstrong

Many people (including me) had the impression that AIXI was ideally smart. Sure, it was uncomputable, and there might be "up to finite constant" issues (as with anything involving Kolmogorov complexity), but it was, informally at least, "the best intelligent agent out there". This was reinforced by Pareto-optimality results, namely that there was no computable policy that performed at least as well as AIXI in all environments, and strictly better in at least one.

However, Jan Leike and Marcus Hutter have proved that AIXI can be, in some sense, arbitrarily bad. The problem is that AIXI is not fully specified, because the universal prior is not fully specified. It depends on a choice of a initial computing language (or, equivalently, of an initial Turing machine).

For the universal prior, this will only affect it up to a constant (though this constant could be arbitrarily large). However, for the agent AIXI, it could force it into continually bad behaviour that never ends.

For illustration, imagine that there are two possible environments:

The first one is Hell, which will give ε reward if the AIXI outputs "0", but, the first time it outputs "1", the environment will give no reward for ever and ever after that.
The second is Heaven, which gives ε reward for outputting "0" and 1 reward for outputting "1", and is otherwise memoryless.

Now simply choose a language/Turing machine such that the ratio P(Hell)/P(Heaven) is higher than the ratio 1/ε. In that case, for any discount rate, the AIXI will always output "0", and thus will never learn whether its in Hell or not (because its too risky to do so). It will observe the environment giving reward ε after receiving "0", behaviour which is compatible with both Heaven and Hell. Thus keeping P(Hell)/P(Heaven) constant, and ensuring the AIXI never does anything else.

In fact, it's worse than this. If you use the prior to measure intelligence, then an AIXI that follows one prior can be arbitrarily stupid with respect to another.

For illustration, imagine that there are two possible environments:

The first one is Hell, which will give ε reward if the AIXI outputs "0", but, the first time it outputs "1", the environment will give no reward for ever and ever after that.
The second is Heaven, which gives ε reward for outputting "0" and 1 reward for outputting "1", and is otherwise memoryless.

In fact, it's worse than this. If you use the prior to measure intelligence, then an AIXI that follows one prior can be arbitrarily stupid with respect to another.

I think this shows how the whole "language independent up to a constant" thing is basically just a massive cop-out. It's very clever for demonstrating that complexity is a real, definable thing, with properties which at least transcend representation in the infinite limit. But as you show it's useless for doing anything practical.

My personal view is that there's a true universal measure of complexity which AIXI ought to be using, and which wouldn't have these problems. It may well be unknowable, but AIXI is intractable anyway so what's the difference? In my opinion, this complexity measure could give a real, numeric answer to seemingly stupid questions like "You see a number. How likely is it that the number is 1 (given no other information)?". Or it could tell us that 16 is actually less complex than, say, 13. I mean really, it's 2^2^2, spurning even a need for brackets. I'm almost certain it would show up in real life more often than 13, and yet who can even show me a non-contrived language or machine in which it's simpler?

Incidentally, they "hell" scenario you describe isn't as unlikely as it at first sounds. I remember an article here a while back lamenting the fact that left unmonitored AIXI could easily kill itself with exploration, the result of which would have a very similar reward profile to what you describe as "hell". It seems like it's both too cautious and not cautious enough in even just this one scenario.

Yes. The problem is not the Hell scenarios, the problem is that we can make them artificially probable via language choice.

I think this shows how the whole "language independent up to a constant" thing is basically just a massive cop-out.

Some results are still true. An exploring agent (if it survives) will converge on the right environment, independent of language. And episodic environments do allow AIXI to converge on optimal behaviour (as long as the discount rate is gradually raised).

30

Versions of AIXI can be arbitrarily stupid

30

30

30

Versions of AIXI can be arbitrarily stupid

30

30