As I discussed before, IMO the correct approach is not looking for the one "correct" prior since there is no such thing but specifying a "pure learning" phase in AI development.
I'm not sure about "no correct prior", and even if there is no "correct prior", maybe there is still "the right prior for me", or "my actual prior", which we can somehow determine or extract and build into an FAI?
In the case of your example, we can imagine the operator overriding the agent's controls and forcing it to produce various outputs in order to update away from Hell.
How do you know when you've forced the agent to explore enough? What if the agent has a prior which assigns a large weight to an environment that's indistinguishable from our universe, except that lots of good things happen if the sun gets blown up? It seems like the agent can't update away from this during the training phase.
(of course if we start from a ridiculous universal prior it will take ridiculously long, so I still grant that there is a fuzzy domain of "good" universal priors)
So you think "universal" isn't "good enough", but something more specific (but perhaps not unique as in "the correct prior" or "the right prior for me") is? Can you try to define it?
I'm not sure about "no correct prior", and even if there is no "correct prior", maybe there is still "the right prior for me", or "my actual prior", which we can somehow determine or extract and build into an FAI?
This sounds much closer home. Note, however, that there is certain ambiguity between the prior and the utility function. UDT agents maximize Sum Prior(x) U(x) so certain simultaneous redefinitions of Prior and U will lead to the same thing.
Many people (including me) had the impression that AIXI was ideally smart. Sure, it was uncomputable, and there might be "up to finite constant" issues (as with anything involving Kolmogorov complexity), but it was, informally at least, "the best intelligent agent out there". This was reinforced by Pareto-optimality results, namely that there was no computable policy that performed at least as well as AIXI in all environments, and strictly better in at least one.
However, Jan Leike and Marcus Hutter have proved that AIXI can be, in some sense, arbitrarily bad. The problem is that AIXI is not fully specified, because the universal prior is not fully specified. It depends on a choice of a initial computing language (or, equivalently, of an initial Turing machine).
For the universal prior, this will only affect it up to a constant (though this constant could be arbitrarily large). However, for the agent AIXI, it could force it into continually bad behaviour that never ends.
For illustration, imagine that there are two possible environments:
Now simply choose a language/Turing machine such that the ratio P(Hell)/P(Heaven) is higher than the ratio 1/ε. In that case, for any discount rate, the AIXI will always output "0", and thus will never learn whether its in Hell or not (because its too risky to do so). It will observe the environment giving reward ε after receiving "0", behaviour which is compatible with both Heaven and Hell. Thus keeping P(Hell)/P(Heaven) constant, and ensuring the AIXI never does anything else.
In fact, it's worse than this. If you use the prior to measure intelligence, then an AIXI that follows one prior can be arbitrarily stupid with respect to another.