You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

gjm comments on Rationality Reading Group: Part V: Value Theory - Less Wrong Discussion

6 Post author: Gram_Stone 10 March 2016 01:11AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (31)

You are viewing a single comment's thread. Show more comments above.

Comment author: SquirrelInHell 20 March 2016 02:10:57AM *  0 points [-]

Just because you might be wrong about utilities (I assume that's a possibility that you're implying) doesn't mean that you should make the process you use to choose outcomes random.

Yes. What I meant is rather something like in this example:

You have 4 options.

  • Option A: estimated utility = 10 ± 5

  • Option B: estimated utility = 5 ± 10

  • Option C: estimated utility = 3 ± 2

  • Option D: estimated utility = -10 ± 30

It seems reasonable to not always choose A, and sometimes choose B, and from time to time even D. At least until you gather enough data to improve the accuracy of your estimates.

I expect you can arrive at this solution by carefully calculating probabilities of changing your estimates by various amounts and how much more utility you can get if your estimates change.

Comment author: gjm 20 March 2016 03:03:18AM 1 point [-]

There's been quite a lot of work on this sort of question, under the title of "Multi-armed bandits". (As opposed to the "one-armed bandits" you find rows and rows of in casinos.)

Comment author: Gram_Stone 20 March 2016 03:44:17AM 0 points [-]

Your response is very different from mine, so I'm wondering if I'm wrong.

Comment author: gjm 20 March 2016 03:00:15PM 1 point [-]

The multi-armed bandit scenario applies when you are uncertain about the distributions produced by these options, and are going to have lots of interactions with them that you can use to discover more about them while extracting utility.

For a one-shot game, or if those estimated utilities are distributions you know each option will continue to produce every time, you just compute the expected utility and you're done.

But suppose you know that each produces some distribution of utilities, but you don't know what it is yet (but e.g. maybe you know they're all normally distributed and have some guess at the means and variances), and you get to interact with them over and over again. Then you will probably begin by trying them all a few times to get a sense of what they do, and as you learn more you will gradually prioritize maximizing expected-utility-this-turn over knowledge gain (and hence expected utility in the future).