You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Gram_Stone comments on Rationality Reading Group: Part V: Value Theory - Less Wrong Discussion

6 Post author: Gram_Stone 10 March 2016 01:11AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (31)

You are viewing a single comment's thread. Show more comments above.

Comment author: SquirrelInHell 17 March 2016 01:19:07PM 1 point [-]

Something like, "Simple moral theories are too neat to do any real work in moral philosophy; you need theories that account for human messiness if you want to discover anything important."

This is exactly the mistake from http://lesswrong.com/lw/ix/say_not_complexity/ , and (I hope) LW'ers are aware of it. So probably your examples are not from LW?

How can you regard it as anything else but a flaw that we sometimes just don't do what we really want to do?

This adds variety, and is good when your best options are close in utility.

And the people who object to the assumptions should also expect to have opportunities to discredit any deductions from the assumptions if they really believe that those arguments rest on a confused foundation.

This is not how it logically works. If you get a suspicious deduction, you discredit your assumptions, and not the other way round.

It's useful to lay out your arguments, to consider all of the different ways that things could be different if you assume that different things are true or false.

Sure. (Isn't this what we all like to do in any case?)

Comment author: Gram_Stone 19 March 2016 11:40:48PM *  0 points [-]

This adds variety, and is good when your best options are close in utility.

Just because you might be wrong about utilities (I assume that's a possibility that you're implying) doesn't mean that you should make the process you use to choose outcomes random.

This is not how it logically works. If you get a suspicious deduction, you discredit your assumptions, and not the other way round.

The idea is that different people consider different assumptions suspicious, and that sometimes people change their minds about which assumptions are suspicious. A suspicious deduction can also be the effect of a bad inference rule, not just a bad premise. It seems better to me to run as far as you can with many lines of reasoning unless they're completely obviously wrong at a glance.

Comment author: SquirrelInHell 20 March 2016 02:10:57AM *  0 points [-]

Just because you might be wrong about utilities (I assume that's a possibility that you're implying) doesn't mean that you should make the process you use to choose outcomes random.

Yes. What I meant is rather something like in this example:

You have 4 options.

  • Option A: estimated utility = 10 ± 5

  • Option B: estimated utility = 5 ± 10

  • Option C: estimated utility = 3 ± 2

  • Option D: estimated utility = -10 ± 30

It seems reasonable to not always choose A, and sometimes choose B, and from time to time even D. At least until you gather enough data to improve the accuracy of your estimates.

I expect you can arrive at this solution by carefully calculating probabilities of changing your estimates by various amounts and how much more utility you can get if your estimates change.

Comment author: gjm 20 March 2016 03:03:18AM 1 point [-]

There's been quite a lot of work on this sort of question, under the title of "Multi-armed bandits". (As opposed to the "one-armed bandits" you find rows and rows of in casinos.)

Comment author: Gram_Stone 20 March 2016 03:44:17AM 0 points [-]

Your response is very different from mine, so I'm wondering if I'm wrong.

Comment author: gjm 20 March 2016 03:00:15PM 1 point [-]

The multi-armed bandit scenario applies when you are uncertain about the distributions produced by these options, and are going to have lots of interactions with them that you can use to discover more about them while extracting utility.

For a one-shot game, or if those estimated utilities are distributions you know each option will continue to produce every time, you just compute the expected utility and you're done.

But suppose you know that each produces some distribution of utilities, but you don't know what it is yet (but e.g. maybe you know they're all normally distributed and have some guess at the means and variances), and you get to interact with them over and over again. Then you will probably begin by trying them all a few times to get a sense of what they do, and as you learn more you will gradually prioritize maximizing expected-utility-this-turn over knowledge gain (and hence expected utility in the future).

Comment author: Gram_Stone 20 March 2016 03:37:06AM *  0 points [-]

I assume that when you write '10 +/- 5', you mean that Option A could have a utility on the open interval with 0 and 10 as lower and upper bounds.

You can transform this into a decision problem under risk. Assuming that, say, in option A, you're not reasoning as though 6 is more probable than 10 because 6 is closer to 5 than 10 is (your problem statement did not indicate anything like this), then you can assign an expected utility to each option by making an equiprobable prior over the open interval including the set of possible utilities for each action. For example, since there are 10 members in the set defined as the open interval with 0 and 10 as lower and upper bounds, you would assign a probability of 0.1 to each member of the set. Furthermore, the expected utility for each Option is as follows:

  • A = (0*0.1) + (1*0.1) + (2*0.1) + (3*0.1) + (4*0.1) + (5*0.1) +(6*0.1) + (7*0.1) + (8*0.1) + (9*0.1) + (10*0.1) = 1.5

  • B = 0

  • C = 0.3

  • D = -61

The expected utility formalism prescribes A. Choosing any other option violates the Von Neumann-Morgenstern axioms.

However, my guess is that your utilities are secretly dollar values and you have an implicit utility function over outcomes. You can represent this by introducing a term u into the expected utility calculations that weights the outcomes by their real utility. This makes sense in the real world because of things like gambler's ruin. In a world of perfect emptiness, you have infinite money to lose, so it makes sense to maximize expected value. In the real world, you can run out of money, so evolution might make you loss averse to compensate. This was the original motivation for formulating the notion of expected utility (some quantitative measure of desirableness weighted by probability), as opposed to the earlier notion of expected value (dollar value weighted by probability).

Comment author: SquirrelInHell 20 March 2016 03:46:49AM *  1 point [-]

Your analysis misses the point that you may play the game many times and change your estimates as you go.

For the record, 10 ± 5 means an interval from 5 to 15, not 0 to 10, and in any case I intended it as a shorthand for a bell-like distribution with a mean of 10 and a standard deviation of 5.

Comment author: Gram_Stone 20 March 2016 03:54:07AM 0 points [-]

Yeah, I parsed it as 5 +/- 5 some how. Might have messed up the other ones too.

Wouldn't you just maximize expected utility in each iteration, regardless of what estimates are given in each iteration?

Comment author: SquirrelInHell 20 March 2016 03:56:13AM 1 point [-]

You would indeed maximize EV in each iteration, but this EV would also include a factor from value of information.

Comment author: Gram_Stone 20 March 2016 04:11:20AM *  0 points [-]

Ah, okay. I went downstairs for a minute and thought to myself, "Well, the only way I get what he's saying is if we go up a level and assume that the given utilities are not simply changing, but are changing according to some sort of particular rule."

Also, I spent a long time writing my reply to your original problem statement, without refreshing the page, so I only read the original comment, not the edit. That might explain why I didn't immediately notice that you were talking about value of information, if I seemed a little pedantic in my earlier comment with all of the math.

Back to the original point that brought this problem up, what's going on inside the brain is that the brain has assigned utilities to outcomes, but there's a tremble on its actions caused by the stochastic nature of neural networks. The brain isn't so much uncertain about utilities as it is believing that its utility estimates are accurate and randomly not doing what it considers most desirable.

That's why I wrote, in the original comment:

It just seems interesting to consider the consequences of the assumption that there is a decision-maker without a trembling hand.

Does that make sense?

Comment author: SquirrelInHell 20 March 2016 05:31:57AM 1 point [-]

Ah, okay. I went downstairs for a minute and thought to myself, "Well, the only way I get what he's saying is if we go up a level and assume that the given utilities are not simply changing, but are changing according to some sort of particular rule."

Congratulations on good thinking and attitude :)

Does that make sense?

Yes, I get that. What I meant to suggest to you in the broader picture, is that this "tremble" might be evolution's way to crudely approximate a fully rational agent, who makes decisions based on VOI.

So it's not necessarily detrimental to us. Sometimes it might well be.

The main takeaway from all that I have said is it that replacing your intuition with "let's always take option A because it's the rational thing to do" just doesn't do the trick when you play multiple games (as is often the case in real life).