I think there are some subtleties with the (non-infra) bayesian VNM version, which come down to the difference between "extreme point" and "exposed point" of . If a point is an extreme point that is not an exposed point, then it cannot be the unique expected utility maximizer under a utility function (but it can be a non-unique maximizer).
For extreme points it might still work with uniqueness, if, instead of a VNM-decision-maker, we require a slightly weaker decision maker whose preferences satisfy the VNM axioms except continuity.
For any , if then either or .
I think this condition might be too weak and the conjecture is not true under this definition.
If , then we have (because a minimum over a larger set is smaller). Thus, can only be the unique argmax if .
Consider the example . Then is closed. And satisfies . But per the above it cannot be a unique maximizer.
Maybe the issue can be fixed if we strengthen the condition so that has to be also minimal with respect to .
For a provably aligned (or probably aligned) system you need a formal specification of alignment. Do you have something in mind for that? This could be a major difficulty. But maybe you only want to "prove" inner alignment and assume that you already have an outer-alignment-goal-function, in which case defining alignment is probably easier.
insofar as the simplest & best internal logical-induction market traders have strong beliefs on the subject, they may very well be picking up on something metaphysically fundamental. Its simply the simplest explanation consistent with the facts.
Theorem 4.6.2 in logical induction says that the "probability" of independent statements does not converge to or , but to something in-between. So even if a mathematician says that some independent statement feels true (eg some objects are "really out there"), logical induction will tell him to feel uncertain about that.
A related comment from lukeprog (who works at OP) was posted on the EA Forum. It includes:
However, at present, it remains the case that most of the individuals in the current field of AI governance and policy (whether we fund them or not) are personally left-of-center and have more left-of-center policy networks. Therefore, we think AI policy work that engages conservative audiences is especially urgent and neglected, and we regularly recommend right-of-center funding opportunities in this category to several funders.
it's for the sake of maximizing long-term expected value.
Kelly betting does not maximize long-term expected value in all situations. For example, if some bets are offered only once (or even a finite amount), then you can get better long-term expected utility by sometimes accepting bets with a potential "0"-Utility outcome.
This is maybe not the central point, but I note that your definition of "alignment" doesn't precisely capture what I understand "alignment" or a good outcome from AI to be:
‘AGI’ continuing to exist
AGI could be very catastrophic even when it stops existing a year later.
eventually
If AGI makes earth uninhabitable in a trillion years, that could be a good outcome nonetheless.
ranges that existing humans could survive under
I don't know whether that covers "humans can survive on mars with a space-suit", but even then, if humans evolve/change to handle situations that they currently do not survive under, that could be part of an acceptable outcome.
it is the case that most algorithms (as a subset in the hyperspace of all possible algorithms) are already in their maximally most simplified form. Even tiny changes to an algorithm could convert it from 'simplifiable' to 'non-simplifiable'.
This seems wrong to me:
For any given algorithm you can find many equivalent but non-simplified algorithms with the same behavior, by adding a statement to the algorithm that does not affect the rest of the algorithm
(e.g. adding a line such as foobar1234 = 123
in the middle of a python program)).
In fact, I would claim that the majority python programs on github are not in their "maximally most simplified form".
Maybe you can cite the supposed theorem that claims that most (with a clearly defined "most") algorithms are maximally simplified?
This is not a formal definition.
Your English sentence has no apparent connection to mathematical objects, which would be necessary for a rigorous and formal definition.
Here is an example (to point out a missing assumption): Lets say you are offered to bet on the result of a coin flip for 1 dollar. You get 3 dollars if you win, and your utility function is linear in dollars. You have three actions: "Heads", "Tails", and "Pass". Then "Pass" performs Pareto-optimally across multiple worlds. But "Pass" does not maximize expected utility under any distribution.
I think what is needed for the result is an additional convexity-like assumption about the utilities. This could be
the set of achievable utility vectors is convex'', or even something weaker like
every convex combination of achievable utility vectors is dominated by an achievable utility vector" (here, by utility vector I mean (uw)w∈W if uw is the utility of world w). If you already accept the concept of expected utility maximization, then you could also use mixed strategies to get the convexity-like assumption (but that is not useful if the point is to motivate using probabilities and expected utility maximization).The underlying math statement of some of these kind of results about Pareto-optimality seems to be something like this:
If ¯x is Pareto-optimal wrt utilities ui, i=1,…n and a convexity assumption (e.g. the set {(ui(x))ni=1:x} is convex, or something with mixed strategies) holds, then there is a probability distribution μ so that ¯x is optimal for U(x)=Ei∼μui(x).
I think there is a (relatively simple) approximate version of this, where we start out with approximate Pareto-optimality.
We say that ¯x is Pareto-ε-optimal if there is no (strong) Pareto-improvement by more than ε (that is, there is no x with ui(x)>ui(¯x)+ε for all i).
Claim: If ¯x is Pareto-ε-optimal and the convexity assumption holds, then there is a probability distribution μ so that ¯x is ε-optimal for U(x)=Ei∼μui(x).
Rough proof: Define Y:={(ui(x))ni=1:x} and ¯¯¯¯Y as the closure of Y. Let ~y∈¯¯¯¯Y be of the form ~y=(ui(¯x)+δ)ni=1 for the largest δ such that ~y∈¯¯¯¯Y. We know that δ≤ε. Now ~y is Pareto-optimal for Y, and by the non-approximate version there exists a probability distribution μ so that ~y is optimal for y↦Ei∼μyi. Then, for any x, we have $\mathbb{E}{i\sim\mu} u_i(x) \leq \mathbb{E}{i\sim\mu} \tilde y_i = \mathbb{E}{i\sim\mu} (u_i(\bar x) + \delta)\le \varepsilon + \mathbb{E}{i\sim\mu} u_i(\bar x), $ that is, ¯x is ε-optimal for U.