So if I understand this correctly, Alice and the Sovereign are identically omniscient, and the Sovereign additionally has some power and influence upon the world that Alice does not. In the case where Alice herself is the sovereign the problem is solved, right? The sovereign just has to figure out what she prefers and do that. The solution then, is to simulate the scenario where Alice has the power to make the decision herself and then match Alice's decision. This solves both 1 and 2.
My short answer to the broader "How do we know what sacks of meat / circuits / whatever prefer" question is "you look at the behavioral output". Here, if Alice can make the decision herself, the decision represents her behavioral output.
(I'm about halfway through writing about how to make this idea more workable without resorting to omniscient things with consistent preferences, if I still like the idea after writing it out I'll cross post it on lw.)
It seems like a good portion of the whole "maximizing utility" strategy which might be used by a sovereign relies on actually being able to consolidate human preferences into utilities. I think there are a few stages here, each of which may present obstacles. I'm not sure what the current state of the art is with regard to overcoming these, and am curious regarding such.
First, here are a few assumptions that I'm using just to make the problem a bit more navigable (dealing with one or two hard problems instead of a bunch at once) - will need to go back and do away with each of these (and each combination thereof) and see what additional problems result.
So Alice can conclude anything and everything, pretty much (and so can our sovereign.) The sovereign is faced with the problem of figuring out what action to take to maximize across Alice's preferences. However, Alice is basically a sack of meat that has certain emotions in response to certain experiences or certain conclusions about the world, and it doesn't seem obvious how to get the preference ordering of the different worldlines out of these emotions. Some difficulties:
So, to rehash my actual request: what's the state of the art with regards to these difficulties, and how confident are we that we've reached a satisfactory answer?