That's one of the reasons the agents don't know u and v at this point.
Thanks for your reply, I had missed the fact that M(εu+v) is also ignorant of what u and v are. In this case is this a general structure of how a satisficer should work, but then when applying it in practice we would need to assign some values to u and v on a case by case basis, or at least to ε, so that M(εu+v) could veto? Or is it the case that M(εu+v) uses an arbitrarily small ε, in which case it is the same as imposing Δv>0?
A putative new idea for AI control; index here.
This is a potential design for a safe-ish satisficer, based on the ideas presented here. The two key ideas are that a satisficer S(u) with utility u:
So satisficer make poor allies and weak enemies.
The design, the designer and the verifier
Let M(u-v) be an agent that maximises u and minimises v. Let's assume that M(u-v) is currently unaware as to what u and v are (though it knows that the upper bound of u is easy to approach). It can interact with the outside environment in one way only: by creating a satisficer S(). S() will then be fed u to become S(u).
However, the design for S(u) will first be sent to M(εu+v), for some unknown small ε. Note that M(εu+v) is a v-maximiser, so it disagrees with M(u-v) except for u. M(εu+v) is a (hypothetical) agent that will be (counterfactually) released at some unspecified later date. Currently, it will be fed the whole reasoning history of M(u+v), and will make a single decision only: where S(u) is to be built or not.
So it seems that S(u) must;
For the moment, this does seems like it would produce a successful satisficer...