Here, I start with a bounded and easy to reach u (that's a first step in the process), so "u = finite-u". This is still not safe for a maximiser (usual argument about "being sure" and squeezing ever more tiny amounts of expected utility from optimising the universe). Then the whole system is supposed to produce S(u) rather than M(u). This is achieved by having M(εu+v) allow it, when M(εu+v) expects (counterfactually) to optimise the universe, and would see any optimisation by S(u) as getting in the way (or, if it could co-opt these otimisations, then this is something that M(u-v) would not want it to do).
Technically, you might not need to bound u so sharply - it's possible that the antagonistic setup will produce a S(u) that is equivalent to S(finite-u) even it u is unbounded (via the reduced impact effect of the interactions between the two maximisers). But it seems sensible to add the extra precaution of starting with a bounded u.
Augh! "I" and "you" are not in the list of agents we're discussing. Who starts with a bounded u, and how does that impact the decision of what S will be offered by the M(u-v) agent?
A putative new idea for AI control; index here.
This is a potential design for a safe-ish satisficer, based on the ideas presented here. The two key ideas are that a satisficer S(u) with utility u:
So satisficer make poor allies and weak enemies.
The design, the designer and the verifier
Let M(u-v) be an agent that maximises u and minimises v. Let's assume that M(u-v) is currently unaware as to what u and v are (though it knows that the upper bound of u is easy to approach). It can interact with the outside environment in one way only: by creating a satisficer S(). S() will then be fed u to become S(u).
However, the design for S(u) will first be sent to M(εu+v), for some unknown small ε. Note that M(εu+v) is a v-maximiser, so it disagrees with M(u-v) except for u. M(εu+v) is a (hypothetical) agent that will be (counterfactually) released at some unspecified later date. Currently, it will be fed the whole reasoning history of M(u+v), and will make a single decision only: where S(u) is to be built or not.
So it seems that S(u) must;
For the moment, this does seems like it would produce a successful satisficer...