Probabilistic Negotiation
Follow up to Deterministic Strategies Can Be Sub-optimal The Ultimatum Game is a simple experiment. Two people have been allocated $10. One person decides how to divide the profits, and the other decides whether to Accept that allocation or to Deny it, in which case both participants get $0. Suppose you are the person whose job it is to choose whether to Accept or Deny an offer. What strategy could you use to maximize your returns? Yudkowsky offers the following solution (NB: the original text splits $12, because sci-fi; I have changed the numbers inline/without brackets, let me know if that offends) > It goes like this: > > When somebody offers you a 6:4 split, instead of the 5:5 split that would be fair, you should accept their offer with slightly less than 5/6 probability. Their expected value from offering you 6:4, in this case, is 6 * slightly less than 5/6, or slightly less than 5. This ensures they can't do any better by offering you an unfair split; but neither do you try to destroy all their expected value in retaliation. It could be an honest mistake, especially if the real situation is any more complicated than the original Ultimatum Game. > > If they offer you 7:3, accept with probability slightly-more-less than 5/7, so they do even worse in their own expectation by offering you 7:3 than 6:4. > > It's not about retaliating harder, the harder they hit you with an unfair price - that point gets hammered in pretty hard to the kids, a Watcher steps in to repeat it. The circumstances under which you should ever go around carrying out counterfactual threats in real life are much more fraught and complicated than this, and nobody's going to learn about them realistically for several years yet. This setup isn't about retaliation, it's about what both sides have to do, to turn the problem of dividing the gains, into a matter of fairness; to create the incentive setup whereby both sides don't expect to do any better by distorting their own estimate o
RE: GPT getting dumber, that paper is horrendous.
The code gen portion was completely thrown off because of Markdown syntax (the authors mistook back-ticks for single-quotes, afaict). I think the update to make there is that it is decent evidence that there was some RLHF on ChatGPT outputs. If you remember from that "a human being will die if you don't reply with pure JSON" tweet, even that final JSON code was escaped with markdown. My modal guess is that markdown was inserted via cludge to make the ChatGPT UX better, and then RLHF was done on that cludged output. Code sections are often mislabeled for what language they contain. My secondary guess... (read more)