Could you expand on what the "upper bound" of utility is for a maximizer, and why it's easy to approach? Perhaps a concrete (but simple) example would help. Say "Clippy" wants to maximize paperclips and minimize waste heat. "HotClippy" is the counterfactual agent that maximizes heat while thinking paperclips are fine if they're nearly free. What is the maximal value for paperclips?
It seems like the submission is always going to be S(infinity*u + 0v) for this constraint. Any other v will be rejected by the counterfactual or contradict the base agent's preferences. Any smaller/finite u is a lost opportunity.
Perhaps a concrete (but simple) example would help.
Clippy has utility that awards 1 if Clippy produces one or more paperclips (and 0 otherwise). Clippy can easily produce ten paperclips.
Basically what I'm trying to do is make the AI "do the easy evident thing" rather than "optimise the whole universe just to be absolutely sure they achieved their goal".
A putative new idea for AI control; index here.
This is a potential design for a safe-ish satisficer, based on the ideas presented here. The two key ideas are that a satisficer S(u) with utility u:
So satisficer make poor allies and weak enemies.
The design, the designer and the verifier
Let M(u-v) be an agent that maximises u and minimises v. Let's assume that M(u-v) is currently unaware as to what u and v are (though it knows that the upper bound of u is easy to approach). It can interact with the outside environment in one way only: by creating a satisficer S(). S() will then be fed u to become S(u).
However, the design for S(u) will first be sent to M(εu+v), for some unknown small ε. Note that M(εu+v) is a v-maximiser, so it disagrees with M(u-v) except for u. M(εu+v) is a (hypothetical) agent that will be (counterfactually) released at some unspecified later date. Currently, it will be fed the whole reasoning history of M(u+v), and will make a single decision only: where S(u) is to be built or not.
So it seems that S(u) must;
For the moment, this does seems like it would produce a successful satisficer...