I have collected some links on topic!
Correctly handling the uncertainty in values, knowledge and predictions is necessary for reaching any complex goal or executing any complex plan. So, capability of doing that is probably something that AI will have to obtain in order to be AGI.
I've recently started to think about how nascent "hot mess" superintelligence can reflect on its own values and converge to something consistent. The simplest route to think about this, it seems to me, is model it like a process of resolving uncertainity of superintelligence about its own preferences.
Suppose an agent knows that it is an expected utility maximizer and is uncertain between two utility functions, U1 and U2, with assigned probabilities p1 and p2. The agent must choose between two actions, a1 and a2. Let's say that the optimal decision for U1 is a1 and for U2 is a2. To maximize the expected value of p1U1+p2U2, the agent chooses a1. However, choosing a1 is also a decisive evidence in favor of U1, and therefore, the agent updates p1 to 1. This representation of uncertain preferences looks unsatisfactory because it quickly and predictably converges to only one utility function.
Does anyone know of a good model for uncertain preferences that can meet these criteria after some additions?
Nash bargaining (between different hypotheses about preferences) looks like something that is close to desirable properties but I am not sure, may be something better has already been developed.