You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Gram_Stone comments on Rationality Reading Group: Part V: Value Theory - Less Wrong Discussion

6 Post author: Gram_Stone 10 March 2016 01:11AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (31)

You are viewing a single comment's thread. Show more comments above.

Comment author: Gram_Stone 20 March 2016 03:44:17AM 0 points [-]

Your response is very different from mine, so I'm wondering if I'm wrong.

Comment author: gjm 20 March 2016 03:00:15PM 1 point [-]

The multi-armed bandit scenario applies when you are uncertain about the distributions produced by these options, and are going to have lots of interactions with them that you can use to discover more about them while extracting utility.

For a one-shot game, or if those estimated utilities are distributions you know each option will continue to produce every time, you just compute the expected utility and you're done.

But suppose you know that each produces some distribution of utilities, but you don't know what it is yet (but e.g. maybe you know they're all normally distributed and have some guess at the means and variances), and you get to interact with them over and over again. Then you will probably begin by trying them all a few times to get a sense of what they do, and as you learn more you will gradually prioritize maximizing expected-utility-this-turn over knowledge gain (and hence expected utility in the future).