Manfred comments on Satisficers want to become maximisers - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (67)
Here is a (contrived) situation where a satisficer would need to rewrite.
Sally the Satisficer gets invited to participate on a game show. The game starts with a coin toss. If she loses the coin toss, she gets 8 paperclips. If she wins, she gets invited to the Showcase Showdown where she will first be offered a prize of 9 paperclips. If she turns down this first showcase, she is offered the second showcase of 10 paper clips (fans of The Price is Right know the second showcase is always better).
When she first steps on stage she considers whether she should switch to maximizer mode or stick with her satisficer strategy. As a satisficer, she knows that if she wins the coin toss she won't be able to refuse the 9 paperclip prize since it satisfies her target expected utility of 9. So her expected utility as a satisficer is (1/2) * 8 + (1/2) * 9 = 8.5. If she won the flip as a maximizer, she would clearly pass on the first showcase and receive the second showcase of 10 paperclips. Thus her expected utility as a maximizer is (1/2) * 8 + (1/2) * 10 = 9. Switching to maximizer mode meets her target while remaining a satisficer does not, so she rewrites herself to be a maximizer.
Ah, good point. So "picking the best strategy, not just the best individual moves" is similar to self-modifying to be a maximizer in this case.
On the other hand, if our satisficer runs on updateless decision theory, picking the best strategy is already what it does all the time. So I guess it depends on how your satisficer is programmed.
This seems to imply that an updatless satisficer would behave like a maximiser - or that an updatless satisficer with bounded rationality would make themselves into a maximiser as a precaution.
A UDT satisficer is closer to the original than a pure maximizer, because where different strategies fall above the threshold the original tie-breaking rule can still be applied.