If that were not the case, then the maximising agent would transform itself into a satisficing agent, but, (unless there are other agents out there penalising you for your internal processes), there is no better way of maximising the expected U than by attempting to maximise the expected U.
Is that really true? This seems to be the main and non-trivial question here, presented without proof. It seems to me that there ought to be plenty of strategies that a satisficer would prefer over a maximizer, just like risk-averse strategies differ from optimal risk-neutral strategies. eg. buying +EV lottery tickets might be a maximizer's strategy but not a satisficer.
I don't think this follows. Consider the case where there's two choices:
1) 10% chance of no paperclips, 90% chance of 3^^^3 paperclips 2) 100% chance of 20 paperclips
The maximizer will likely pick 1, while the satisficer will definitely prefer 2.
As I understand it, your satisficing agent has essentially the utility function min(E[paperclips], 9). This means it would be fine with a 10^-100 chance of producing 10^101 paperclips. But isn't it more intuitive to think of a satisficer as optimizing the utility function E[min(paperclips, 9)]? In this case, the satisficer would reject the 10^-100 gamble described above, in favor of just producing 9 paperclips (whereas a maximizer would still take the gamble and hence would be a poor replacement for the satisficer).
A satisficer might not want to take over ...
Alternately, a satisficer could build a maximiser. For example, if you don't give it the ability to modify its own code. It also might build a paperclip-making Von Neumann machine that isn't anywhere near a maximizer, but is still insanely dangerous.
I notice a satisficing agent isn't well-defined. What happens when it has two ways of satisfying its goals? It may be possible to make a safe one if you come up with a good enough answer to that question.
I see that a satisficer would assign higher expected utility to being a maximizer than to being a satisficer. But if the expected utility of being a satisficer were high enough, wouldn't it be satisfied to remain a satisficer?
If the way to satisfice best is to act like a maximizer, then wouldn't an optimal satisficer simply act like a maximizer, no self-rewriting required?
Here is a (contrived) situation where a satisficer would need to rewrite.
Sally the Satisficer gets invited to participate on a game show. The game starts with a coin toss. If she loses the coin toss, she gets 8 paperclips. If she wins, she gets invited to the Showcase Showdown where she will first be offered a prize of 9 paperclips. If she turns down this first showcase, she is offered the second showcase of 10 paper clips (fans of The Price is Right know the second showcase is always better).
When she first steps on stage she considers whether she should switch to maximizer mode or stick with her satisficer strategy. As a satisficer, she knows that if she wins the coin toss she won't be able to refuse the 9 paperclip prize since it satisfies her target expected utility of 9. So her expected utility as a satisficer is (1/2) 8 + (1/2) 9 = 8.5. If she won the flip as a maximizer, she would clearly pass on the first showcase and receive the second showcase of 10 paperclips. Thus her expected utility as a maximizer is (1/2) 8 + (1/2) 10 = 9. Switching to maximizer mode meets her target while remaining a satisficer does not, so she rewrites herself to be a maximizer.
Doesn't follow if an agent wants to satisfice multiple things, since maximizing the amount of one thing could destroy your chances of bringing about a sufficient quantity of another.
E(U(there exists an agent A maximising U) ≥ E(U(there exists an agent A satisficing U)
It's a good idea to define your symbols and terminology in general before (or right after) using them. Presumably U is utility, but what it E? Expectation value? How do you calculate it? What is an agent? How do you calculate utility of an existential quantifier? If this is all common knowledge, at least give a relevant link. Oh, and it is also a good idea to prove or at least motivate any non-trivial formula you present.
Feel free to make your post (which apparently attempts to make an interesting point) more readable for the rest of us (i.e. newbies like me).
What if the satisficer is also an optimiser? That is, its utility function is not only flat in the number of paperclips after 9, but actually decreasing.
E(U(there exists an agent A maximising U) ≥ E(U(there exists an agent A satisficing U)
The reason this equation looks confusing is because (I presume) there ought to be a second closing bracket on both sides.
Anyhow, I agree that a satisficer is almost as dangerous a maximiser. However, I've never come across the idea that a satisficing agent "has a lot to recommend it" on Less Wrong.
I thought that the vast majority of possible optimisation processes - maximisers, satisficers or anything else - are very likely to destroy humanity. That is why CE...
As I understand what is meant by satisficing, this misses the mark. A satisficer will search for an action until it finds one that is good enough, then it will do that. A maximiser will search for the best action and then do that. A bounded maximser will search for the "best" (best according to its bounded utility function) and then do that.
So what the satisficer picks depends on what order the possible actions are presented to it in a way it doesn't for either maximiser. Now, if easier options are presented to it first then I guess your conclusion still follows, as long as we grant the premise that self-transforming will be easy.
But I don't think it's right to identify bounded maximisers and satisficers.
It seems to me that a satisficer that cares about expected utility rather than actual utility is not even much of a satisficer in the first place, in that it doesn't do what we expect of satisficers (mostly ignoring small probabilities of gains much greater than its cap in favor of better probabilities of ones that just meet the cap). Whereas the usual satisficer, maximizer with the bounded utility function (well, not just bounded - cut off) does.
it is very easy to show that any "satisficing problem" can be formulated as an equivalent "optimization problem"
Satisficing seems a great way to describe the behavior of maximizers with multiple-term utility functions and an ordinal ranking of preference satisfaction i.e. humans. This sounds like it should have some fairly serious implications.
So you're defining a satisficing agent as an agent with utility function f that it wants to maximize, but that acts like its trying to maximize minimum(f, a constant)? In that case, sure, turning itself into an agent that actually tries to maximize f will make it better at maximizing f. This is a fairly trivial case of the general fact that making yourself better at maximizing your utility tends to increase your utility. However, if the satisficing agent with utility function f acts exactly like a maximizing agent with utility function min(f, constant), th...
Can you really assume the agent to have a utility function that is both linear in paperclips (which implies risk neutrality) and bounded + monotonic?
Build the utility function such that excesses above the target level are penalized. If the agent is motivated to build 9 paperclips only and absolutely no more, then the idea of becoming a maximizer becomes distasteful.
This amuses me because I know actual human beings who behave as satisficers with extreme aversion to waste, far out of proportion to the objective costs of waste. For example: Friends who would buy a Toyota Corolla based on its excellent value-to-cost ratio, and who would not want a cheaper, less reliable car, but who would also turn down a much nicer car offered to them at a severe discount, on the grounds that the nicer car is "indulgent."
Um, the standard AI definition of a satisficer is:
"optimization where 'all' costs, including the cost of the optimization calculations themselves and the cost of getting information for use in those calculations, are considered."
That is, a satisficer explicitly will not become a maximizer, because it is consciously aware of the costs of being a maximizer rather than a satisficer.
A maximizer might have a utility function like "p", where p is the number of paperclips, while a satisficer would have a utility function like "p-c", ...
(with thanks to Daniel Dewey, Owain Evans, Nick Bostrom, Toby Ord and BruceyB)
In theory, a satisficing agent has a lot to recommend it. Unlike a maximiser, that will attempt to squeeze the universe to every drop of utility that it can, a satisficer will be content when it reaches a certain level expected utility (a satisficer that is content with a certain level of utility is simply a maximiser with a bounded utility function). For instance a satisficer with a utility linear in paperclips and a target level of 9, will be content once it's 90% sure that it's built ten paperclips, and not try to optimize the universe to either build more paperclips (unbounded utility), or obsessively count the ones it has already (bounded utility).
Unfortunately, a self-improving satisficer has an extremely easy way to reach its satisficing goal: to transform itself into a maximiser. This is because, in general, if E denotes expectation,
E(U(there exists an agent A maximising U)) ≥ E(U(there exists an agent A satisficing U))
How is this true (apart from the special case when other agents penalise you specifically for being a maximiser)? Well, agent A will have to make decisions, and if it is a maximiser, will always make the decision that maximises expected utility. If it is a satisficer, it will sometimes not make the same decision, leading to lower expected utility in that case.
So hence if there were a satisficing agent for U, and it had some strategy S to accomplish its goal, then another way to accomplish this would be to transform itself into a maximising agent and let that agent implement S. If S is complicated, and transforming itself is simple (which would be the case for a self-improving agent), then self-transforming into a maximiser is the easier way to go.
So unless we have exceedingly well programmed criteria banning the satisficer from using any variant of this technique, we should assume satisficers are as likely to be as dangerous as maximisers.
Edited to clarify the argument for why a maximiser maximises better than a satisficer.
Edit: See BruceyB's comment for an example where a (non-timeless) satisficer would find rewriting itself as a maximiser to be the only good strategy. Hence timeless satisficers would behave as maximisers anyway (in many situations). Furthermore, a timeless satisficer with bounded rationality may find that rewriting itself as a maximiser would be a useful precaution to take, if it's not sure to be able to precalculate all the correct strategies.