So if you give an agent a bad prior, it can make bad decisions. This is not a new insight.
Low probability hypotheses predicting vast rewards/punishments, seems equivalent to Pascal's Mugging. Any agent that maximizes expected utility will spend increasing amounts of resources, worrying about more and more unlikely hypotheses. In the limit, it will spend all of it's time and energy caring about a single random hypotheses which predicts infinite reward (like your examples), even if it has zero probability.
I've argued in the past that maximizing expected utility should be abandoned. I may not have the perfect alternative, and alternatives may be somewhat ad hoc. But that's better than just ignoring the problem.
AIXI is still optimal at doing what you told it to do. It's maximizing it's expected reward, given the prior you tell it. It's just what you told it to do isn't what you really want. But we already knew that.
Oh, one interesting thing is that your example does appear similar to real life. If you die, you get stuck in a state where you don't receive any more rewards. I think this is actually a desirable thing and solves the anvil problem. I've suggested this solution in the past.
No, maximizing expected utility (still) should not be abandoned.
Many people (including me) had the impression that AIXI was ideally smart. Sure, it was uncomputable, and there might be "up to finite constant" issues (as with anything involving Kolmogorov complexity), but it was, informally at least, "the best intelligent agent out there". This was reinforced by Pareto-optimality results, namely that there was no computable policy that performed at least as well as AIXI in all environments, and strictly better in at least one.
However, Jan Leike and Marcus Hutter have proved that AIXI can be, in some sense, arbitrarily bad. The problem is that AIXI is not fully specified, because the universal prior is not fully specified. It depends on a choice of a initial computing language (or, equivalently, of an initial Turing machine).
For the universal prior, this will only affect it up to a constant (though this constant could be arbitrarily large). However, for the agent AIXI, it could force it into continually bad behaviour that never ends.
For illustration, imagine that there are two possible environments:
Now simply choose a language/Turing machine such that the ratio P(Hell)/P(Heaven) is higher than the ratio 1/ε. In that case, for any discount rate, the AIXI will always output "0", and thus will never learn whether its in Hell or not (because its too risky to do so). It will observe the environment giving reward ε after receiving "0", behaviour which is compatible with both Heaven and Hell. Thus keeping P(Hell)/P(Heaven) constant, and ensuring the AIXI never does anything else.
In fact, it's worse than this. If you use the prior to measure intelligence, then an AIXI that follows one prior can be arbitrarily stupid with respect to another.