The point here is that M(u-v) might not know what v is, but M(εu+v) certainly does, and this is not the same as maximising an unknown utility function.
Ah, okay. I think I see better what you're getting at. My intuition is that there's a mapping to minimization of a reasonable aggregation of the set of non-negative utilities, but I think I should actually work through some examples before I make any long comments.
Do you disagree with my description of the "resource gathering agent":
I don't think I had read that article until now, but no objections come to mind.
My intuition is that there's a mapping to minimization of a reasonable aggregation of the set of non-negative utilities
That would be useful to know, if you can find examples. Especially ones where all v and -v have the same probability (which is my current favourite requirement in this area).
A putative new idea for AI control; index here.
This is a potential design for a safe-ish satisficer, based on the ideas presented here. The two key ideas are that a satisficer S(u) with utility u:
So satisficer make poor allies and weak enemies.
The design, the designer and the verifier
Let M(u-v) be an agent that maximises u and minimises v. Let's assume that M(u-v) is currently unaware as to what u and v are (though it knows that the upper bound of u is easy to approach). It can interact with the outside environment in one way only: by creating a satisficer S(). S() will then be fed u to become S(u).
However, the design for S(u) will first be sent to M(εu+v), for some unknown small ε. Note that M(εu+v) is a v-maximiser, so it disagrees with M(u-v) except for u. M(εu+v) is a (hypothetical) agent that will be (counterfactually) released at some unspecified later date. Currently, it will be fed the whole reasoning history of M(u+v), and will make a single decision only: where S(u) is to be built or not.
So it seems that S(u) must;
For the moment, this does seems like it would produce a successful satisficer...