Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
Nathaniel10

I think the initial (2-agent) model only has two time steps, ie one opportunity for the button to be pressed. The goal is just for the agent to be corrigible for this single button-press opportunity. 

That can be folded into the utility function, however. Just make the ratings of the deferential person mostly copy the ratings of their partner.

 

Presumably the deferential parter could just use a utility function which is a weighted combination of their partner's and their own (selfish) one. For instance, the deferential partner could use a utility function like , where  is the utility function of the partner and  is the utility function of the deferential person accounting only for their weak personal preferences and not their altruism. 

Obviously the weights could depend on the level of altruism, the strength of the partner's preferences, whether they are reporting their true preferences or the preferences such that the outcome will be what they want, etc. But this type of deferential preference can still be described by a utility function.

Thanks for the great post!

In the definition of Coalition-Perfect CoCo Equilibrium, it seems to me like part 1) is already implied by part 3).  

1:  is on the Pareto frontier.

1) means that the utility profile achieved by the joint strategy for the grand coalition is on the Pareto frontier. 

3: 

(ie. all the joint strategies are trying to maximize the money earned if up against the opposing coalition in a zero-sum game and as a special case, when S=N, it says that what the entire group actually ends up doing maximizes surplus value, which is another way of stating that the  are the appropriate virtual currencies to use at the  point)

If in the joint strategy for the special case of S=N the group maximizes surplus value according to some weight function, then utility profile resulting from this joint strategy should be on the Pareto frontier, so 1) should be automatically satisfied.

If it wasn't, then you could improve a player's utility without hurting anyone else. But that would improve the surplus value as well[1], which would mean that the S=N joint strategy didn't maximize surplus value (contradicting 3) ). 

I think this is because your utilitarian characterization is an if and only if.

Closely related to this is a result that says that any point on the Pareto Frontier of a game can be post-hoc interpreted as the result of maximizing a collective utility function.

Could be:  An outcome is on the Pareto frontier if and only if it can be post-hoc interpreted as the result of maximizing a collective utility function.

  1. ^

    I guess I'm assuming the weights are strictly positive, whereas you only assumed them to be non-negative. Does this matter/Is this the reason why we need 1)?