Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

# How Not to be Stupid: Adorable Maybes

-2 29 April 2009 07:15PM

Previous: Know What You Want

Ah wahned yah, ah wahned yah about the titles. </some enchanter named Tim>

(Oh, a note: the idea here is to establish general rules for what sorts of decisions one in principle ought to make, and how one in principle ought to know stuff, given that one wants to avoid Being Stupid. (in the sense described in earlier posts) So I'm giving some general and contrived hypothetical situations to throw at the system to try to break it, to see what properties it would have to have to not automatically fail.)

Okay, so assuming you buy the argument in favor of ranked preferences, let's see what else we can learn by considering sources of, ahem, randomness:

Suppose that either via indexical uncertainty, or it turns out there really is some nondeterminism in the universe, or there's some source of bits such that the only thing you're able to determine about it is that the ratio of 1s it puts out to total bits is p. You're not able to determine anything else about the pattern of bits, they seem unconnected to each other. In other words, you've got some source of uncertainty that leaves you only knowing that some outcomes happen more often than others, and potentially you know something about the precise relative rates of those outcomes.

I'm trying here to avoid actually assuming epistemic probabilities. (If I've inserted an invisible assumption for such that I didn't notice, let me know.) Instead I'm trying to construct a situation in which that specific situation can be accepted as at least validly describable by something resembling probabilities (propensity or frequencies. (frequencies? aieeee! Burn the heretic, or at least flame them without mercy! :))) So, for whatever reason, suppose the universe or your opponent or whatever has access to such a source of bits. Let's consider some of the implications of this.

For instance, suppose you prefer A > B.

Now, suppose you are somehow presented with the following choice: Choose B, or choose a situation in which if, at a specific instance, the source outputs a 1, A will occur. Otherwise, B occurs. We'll call this sort of situation a p*A + (1-p)*B lottery, or simply p*A + (1-p)*B

So, which should you prefer? B or the above lottery? (assume there's no other cost other than declaring your choice. Or just wanting the choice. It's not a "pay for a lottery ticket" scenario yet. Just a "assuming you simply choose one or the other... which do you choose?")

Consider our holy law of "Don't Be Stupid", specifcally in the manifestation of "Don't automatically lose when you could potentially do better without risking doing worse. It would seem the correct answer would be "choose the lottery, dangit!" The only possible outcomes of it are A or B. So it can't possibly be worse than B, since you actually prefer A. Further, choosing B is accepting an automatic loss compared to chosing the above lottery which at least gives you a chance of to do better. (obviously we assume here that p is nonzero. In the degenerate case of p = 0, you'd presumably be indifferent between the lottery and B since, well... choosing that actually is the same thing as choosing B)

By an exactly analogous argument, you should prefer A more than the lottery. Specifically, A is an automatic WIN compared to the lottery, which doesn't give you any hope of doing better than A, but does give you a chance of doing worse.

Example: Imagine you're dying horribly of some really nasty disease that know isn't going to heal on its own and you're offered a possible medication for it. Assume there's no other medication available, and assume that somehow you know as a fact that none of the ways it could fail could possibly be worse. Further, assume that you know as a fact no one else on the planet has this disease, and the medication is availible for free to you and has already been prepared. (These last few assumptions are to remove any possible considerations like altruistically giving up your dose of the med to save another or similar.)

Do you choose to take the medication or no? Well, by assumption, the outcome can't possibly be worse than what the disease will do to you, and there's the possibility that it will cure you. Further, there're no other options availible that may potentially be better than taking this med. (oh, assume for whatever reason cryo, so taking an ambulance ride to the future in hope of a better treatment is also not an option. Basically, assume your choices are "die really really horribly" or "some chance of that, and some chance of making a full recovery. No chance of partially surviving in a state worse than death."

So the obviously obvious choice is "choose to take the medication."

Next time: We actually do a bit more math based on what we've got so far and begin to actually construct utilities.

Sort By: Best
Comment author: 29 April 2009 07:50:59PM 2 points [-]

This breaks down somewhat when A and B are not axiomatically preferred. If instead, they are preferred because they enable other states in conjunction with other actions and resources down the line, then it is entirely possible that a certainty of B is preferable to the inability to commit other actions between the longer term states A* and B* until the lottery decides.

This may be one reason humans evolved to be somewhat risk averse, especially because in real situations the resources in question include our mental and physical resources.

This all comes back to the self-reference of the preference function. If you add the lottery you change the circumstances under which you were able to compute A > B, and even the meta-preference computation that said that determining this was preferable to other things you could have done instead.

Often this won't make a difference, but often is not equivalent to always. It's important to know the limits to these sorts of ideas.

Comment author: 29 April 2009 08:12:20PM 1 point [-]

This might be a good argument for the general preferences shown by the Allais paradox. If you strictly prefer 2B to 2A, you might nonetheless have a reason to prefer 1A to 1B - you could leverage your certainty to perform actions contingent on actually having \$24,000. This might only work if the payoff is not immediate - you can take a loan based on the \$24,000 you'll get in a month, but probably not on the 34% chance of \$24,000.

Comment author: 29 April 2009 09:33:37PM 0 points [-]

Fine, there could be a good reason to strictly prefer 1A to 1B, but then if you do, how do you justify preferring 2B to 2A?

Comment author: 29 April 2009 11:11:28PM *  1 point [-]

Because there's a larger jump in expected utility between certainty (up to breach of contract, etc.) of future money and 99% than between (n < 100)% and (n-1)%. However, this means that the outcome of 1A and the winning outcome of 2A are no longer the same (both involve obtaining money at time t_1, but 1A also includes obtaining, at t_0, certainty of future money), and choosing 1A and 2B becomes unproblematic.

Comment author: 29 April 2009 11:54:57PM 0 points [-]

Unless I misunderstood, most of your comment was just another justification for preferring 1A to 1B.

It doesn't seem to support simultaneously preferring 2B to 2A. Further, as near as I can tell, none of what you're saying stops the vulnerability that's opened up by having those two preferences simultaneously. I.e. the preference reversal issue is still there and still exploitable.

Comment author: 30 April 2009 12:16:37AM *  2 points [-]

Haven't followed too closely, but I think Nick's saying that the preference reversal issue doesn't apply and that's OK, because as we've defined it now 2A is no longer the same thing as a 34% chance of 1A and a 66% chance of nothing, because in the context of what thomblake said we're assuming you get the information at different times. (We're assuming the 34% chance is not for your being certain now of getting 1A, but for your being certain only later of getting 1A, which breaks the symmetry.)

Comment author: 30 April 2009 12:24:19AM *  0 points [-]

Yes, that's what I meant.

Comment author: 30 April 2009 12:55:37PM 0 points [-]

Yes to what Nick Tarleton said. I didn't give a justification for preferring 2B to 2A because I was willing to assume that, and then gave reasons for nonetheless preferring 1A to 1B. There are things that certainty can buy you.

Also yes to what steven0461 said. While you can reverse the symmetry, you can't reverse it twice - once you've given me certainty, you can't take it away again (or at least, in this thought experiment, I won't be willing to give it up).

Eliezer's money-pump might still work once (thus making it not so much a money-pump) but inasmuch as you end up buying certainty for a penny, I don't find it all that problematic.

Comment author: 29 April 2009 09:35:59PM 0 points [-]

Sorry, maybe it's because I'm running on insufficient sleep, but I don't understand what you're saying here. Mind rephrasing your objection? Thanks.

Comment author: 30 April 2009 02:00:49AM 4 points [-]

I'll try a concrete example. Of note, fuzziness of goals isn't the problem, it's the fact that the consequences on your other priorities are different choosing between the lottery and B, than choosing between A and B.

Let's say A and B are lots of land on which you could build your new Human Instrumentality Lab. You've checking things out and you somewhat prefer lot A to lot B. You get the option (1) definitely get lot B, or (2) go in on a lottery-type auction and get a chance of either lot. In either case, you'll get the lot at the end of the month.

If you go with (1) you can get the zoning permits, and get your architect started right now. If you go with (2) you can try that, but you may need to back track or do twice the work. It may not be worth doing that if you don't prefer lot A enough.

Now obviously this isn't an issue if the knowledge of the outcome of the lottery is instantaneous. But you can't assume that you immediately know the outcomes of all your gambles.

Comment author: 29 April 2009 09:51:57PM *  0 points [-]

What he seems to be saying is that there are situations where although you prefer A>B, the uncertainty and time for the lottery to settle the probabilities changes things so your new preference would be A>B>(pA + (1-p)B)

EDIT: It occured to me that would be somewhat dependent on the value of p. And the relative value between A and B. But for low values of p and fairly long time to settle the probabilities, B would often be higher valued than the lottery.

Comment author: 29 April 2009 10:08:10PM 0 points [-]

Well, if B is defined sufficiently precisely, ie, have X money at time Y, then B shouldn't be greater than the lottery which, even if the lose happens, produces the exact same outcome.

ie, unless I misunderstand, the objection only arises out of being a bit fuzzy about what B actually precisely means, letting the B in the lottery be a different B then the, well, regular B.

Would you agree with that interpretation of things, or am I missing something critical here?

Comment author: 29 April 2009 10:25:09PM 0 points [-]

I think you're right - I meant mainly that a lot depends on the specifics of the situation, so even with A>B, it is not necessarily irrational to prefer B to the probability.

Comment author: 30 April 2009 12:50:42AM *  2 points [-]

I think Nick Tarleton refuted this in the other subthread -- a lottery here means a lottery over states of the world, which include your knowledge state, so if you get your knowledge of the outcome later it's not really the same thing.

It's still true that this is a reason to disprefer realistic lotteries where you learn the outcome later, but maybe this is better termed "unpredictability aversion" than "risk aversion"? After all, it can happen even when all lottery outcomes are equally desirable. (Example: you like soup and potatoes equally, but prefer either to a lottery over them because you want to know whether to get a spoon or a fork.)

Comment author: 30 April 2009 01:06:21AM 0 points [-]

(In that link, I'm actually just restating Thom Blake's argument.)

Comment author: 30 April 2009 05:30:27PM 0 points [-]

Thanks for the link!

Comment author: 29 April 2009 10:56:26PM 0 points [-]

Okay. I'd say then that case is comparing B with a lottery involving some different B'.

(ie, like saying sometimes x=x is false of the x on the left is 2 and the one on the right is 3. Of course 2 is not = 3, but that's a counterexample of x=x, rather that's a case of ignoring what we actually mean by using the same variable name on both sides)

Comment author: 30 April 2009 10:31:28AM 3 points [-]

Downvoted because you could have said in three paragraphs what took over 10. It's an interesting but very simple point; if you cut everything before "Example: Imagine..." you would lose very little.

Comment author: 29 April 2009 10:07:26PM 1 point [-]

Okay, so assuming you buy the argument in favor of ranked preferences...

I downvoted this post because it's not a question of buying; your ranking argument is logically invalid, as indicated by gjm, Vladimir_Nesov and me in the comments to your previous post in the series.

Comment author: 29 April 2009 10:32:45PM 0 points [-]

"buying" in the sense of "assuming you consider the argument valid" but actually, I've rethought about it several times and I think you're right about that. I think I'm going to edit that bit somewhat in light of that.

Do you accept that IF for some agent it can be said that for any two states, they prefer one to the other or are indifferent (ie, have just as much preference for one as for the other) THEN, that combined by the "don't be stupid" rule, prohibits cycles in the preference rankings?

Comment author: 29 April 2009 11:03:02PM 1 point [-]

Yes for idealized agents. Not yet convinced about humans.

See, if your theory eventually runs counter to common sense on Pascal's Mugging (Eliezer says he has no good solution, common sense says decline the offer) or Dust Specks (Eliezer chooses torture, common sense chooses dust specks), we will have to reexamine the assumptions again. It could easily be that the utility function assumption is faulty, or well-orderedness is faulty, or something else.

Comment author: 29 April 2009 11:25:57PM 1 point [-]

Actually, IIRC, Eliezer said that he thinks Robin Hanson's (I think it was his) solution to the mugging seems to be in the right direction. But that gets into computational power issues. Actually, my original intent was to name this sequence "How not to Be Stupid (given unbounded computational power)"

Obviously we can't do the full decision theory computations in full exact correctness. And I did give the warning against hastily giving an oversimplified human preference generator. What I'm going for here is more "why assume that Bayesian decision theory is the thing we should be building approximations to, rather than some other entirely different blob of math?"

(Oh, incidentally. I originally chose SPECKS, then later one of the comments in that sequence of posts (the comment that stepped through it, incrementally reducing, etc) ended up convincing me to switch to TORTURE.)

Also, finished editing the offending argument.

Comment author: 29 April 2009 11:57:05PM *  1 point [-]

What I'm going for here is more "why assume that Bayesian decision theory is the thing we should be building approximations to, rather than some other entirely different blob of math?"

Over the last couple years I went from believing that statement to deeply doubting it. If you want a chess player that will win games by holding the opponents' kids hostage, sure, build a Bayesian optimizer. My personal feeling is that even an ordinary human modified to be deeply and genuinely driven by an explicit utility function would pose a substantial danger to this world. No need for AIs.

Comment author: 30 April 2009 12:03:38AM 3 points [-]

That's where the whole "don't assume an overly simplistic preference ranking for yourself" warnings come in.

ie, nothing wrong with the utility function being composed of terms for all the things we value, and simply happening to include for that player a component that translates to "win at chess by actually playing chess", and other components giving stuff that lowers utility for "kids have been kidnapped" situations, etc etc etc.

The hard part is, of course, actually translating the algorithms we're running (including the bits that respond to arguments that lead us to become convinced to change our minds about a moral question, etc etc) into a more explicit algorithm. Any simple one is going to get it WRONG.

But that's not a hit against decision theory. That's a hit against bad utility functions.

Or did I utterly misunderstand your point?

Comment author: 30 April 2009 12:15:21AM *  3 points [-]

But that's not a hit against decision theory. That's a hit against bad utility functions.

We know from Eliezer's writings that almost any strong goal-directed chessplayer AI will destroy the world. Well guess what, if a non-world-destroying utility function appears almost impossibly hard to formulate, in my book it counts as a hit against the concept of utility functions. Especially seeing as machines based on e.g. control theory (RichardKennaway) behave much more sensibly - they almost never display any urge to screw up the whole world, instead being content to sit there and tweak their needle.

Comment author: 30 April 2009 12:26:54AM 4 points [-]

Well, a recursively self modifying chess playing AI is a very different beast than a human who, AMONG OTHER THINGS, cares about doing well at chess. The sum total of those other things and chess together is a very different goal system than "chess and nothing else".

As far as control theory, well... that's because control theory based systems are currently too stupid to pose such a threat to us, no?

Your judgment against decision theory seems to be "an agent based on it will act in accordance with its utility function... which may not resemble meaningfully my preferences. It may not be moral, etc etc etc. It will be good at what it's trying to do... but it isn't exactly trying to do the stuff I care about."

Do you consider this a fair summary of your position?

If so, then the response is, well... So, it's good at doing the stuff it's trying to do. It's not trying to do what we'd prefer it to be doing. This is a serious problem. But that problem isn't a flaw with decision theory itself. I mean, if decision theory is leading it to be good optimizing reality in accordance with its preference rankings, then decision theory is acting as promised. The problem is "it's trying to do stuff we don't want it to do!"

The things we care abut are complicated. To actually specifically accurately fully and explicitly specify that is REALLY HARD. That doesn't mean decision theory is inherently flawed. It means, well, fully specifying what we actually want is a highly nontrivial problem.

Comment author: 30 April 2009 07:51:58AM *  2 points [-]

I agree with you that the math is right. Given assumptions, it acts as promised. But the assumptions just aren't a good model of reality. Like naive game theory: you can go with the mathematically justified option of Always Defect, or you can go with common sense. Reality doesn't contain preference rankings over all possible situations; shoehorning reality into preference rankings might hurt you. Hasn't this point clicked yet? I'll try again.

The sum total of those other things and chess together is a very different goal system than "chess and nothing else".

Human beings aren't goal systems. We DON'T SUM, anymore than a car "sums" the value of its speedometer with the value of the fuel gauge. If we actually summed, you'd get the outcome Eliezer once advocated: every one of us "picking one charity and donating as much to it as he can". Your superintelligent chess player with the "correct" utility function won't ever play chess while there are other util-rich tasks anywhere in the world, like hunger in Africa.

That doesn't mean decision theory is inherently flawed. It means, well, fully specifying what we actually want is a highly nontrivial problem.

We shouldn't need to fully specify what we actually want, if we're building a specialized machine to e.g. cure world hunger or design better integrated circuits. It would be better to build such machines based on a theory that typically results in localized screw-ups... rather than a theory that destroys the world by default, unless you tell it everything about you.

Comment author: 01 May 2009 02:21:29AM 1 point [-]

We shouldn't need to fully specify what we actually want, if we're building a specialized machine to e.g. cure world hunger or design better integrated circuits.

What if we're building a specialized machine to prevent a superintelligence from annihilating us?

Comment author: 30 April 2009 01:11:50PM 1 point [-]

It would be better to build such machines based on a theory that typically results in localized screw-ups... rather than a theory that destroys the world by default, unless you tell it everything about you.

Where's the "I super-agree" button?

I agree with you that maximizing utility is dangerous and wrong even just in ordinary humans. That's not what we're for and that's not what the good life is about.

We don't need a clean-cut, provable decision theory that will drive the universe into a hole of 'utility'. We need more of a wibbly-wobbly, humany-ethicy ball of... stuff.

Comment author: 30 April 2009 06:00:30PM 0 points [-]

Human beings aren't goal systems. We DON'T SUM, anymore than a car "sums" the value of its speedometer with the value of the fuel gauge. If we actually summed, you'd get the outcome Eliezer once advocated: every one of us "picking one charity and donating as much to it as he can".

That seems an obviously fallacious argument to me. Many posts on OB have talked about other motivations behind charitable giving - whether it's 'buying fuzzies' or signalling. You seem to be arguing that because one possible (but naive and inaccurate) model of a person's utility function would predict different behaviour than what we actually observere, that the observed behaviour is evidence against any utility function being maximized. There are pretty clearly at least two possibilities here: either humans don't maximize a utility function or they maximize a different utility function from the one you have in mind.

Personally I think humans are imperfect maximizers of utility functions that are sufficiently complex that the 'function' terminology is as misleading as it is enlightening but your argument really doesn't support your conclusion.

Comment author: 30 April 2009 02:47:30AM 2 points [-]

Especially seeing as machines based on e.g. control theory (RichardKennaway) behave much more sensibly - they almost never display any urge to screw up the whole world, instead being content to sit there and tweak their needle.

This is a rather bad example -- machines based on control theory can easily display an "urge" to screw up as much of the world as they can touch. Short version: slapping a PID controller onto a system gives it second order dynamics, and those can have a resonant frequency. If the random disturbance has power at the resonant frequency, the system goes into a positive feedback loop and blows up.

Comment author: 30 April 2009 12:38:24AM 0 points [-]

I agree. But what do you do with this situation? To give up, you have to be certain that there is no way out, and we are much too confused to say anything like that yet. Someone is bound to build a doom machine someday if you don't do something about it.

Comment author: 30 April 2009 12:32:15AM 0 points [-]

Normative decision theory – the structure our final, stable preferences if we knew more, thought faster, were more the people we wished we were, had grown up further together – needn't be good engineering design; agreed that utility functions often aren't the latter, but that doesn't count against them as the former.

Maybe Psy-Kosh should say "becoming" instead of "building"?

Comment author: 30 April 2009 12:30:05AM *  1 point [-]

That is a right sentiment about strength: there are no simple rules, only goals, which makes a creative mind extremely dangerous. And we shouldn't build things like this without understanding what the outcome will be. This is one of the reasons it's important to understand human values in this light, to guard them from this destructive potential.

Whatever you want accomplished, whatever you want averted, instrumental rationality defines an optimal way of doing that (without necessarily giving the real-world means, that's a next step). If you really want life to continue as before, the correctly implemented explicit utility function for doing that won't lead a Bayesian optimizer to do something horrible. (Although inaction may be considered horrible in itself, where so much more could've been done.)

Comment author: 30 April 2009 01:12:38PM *  -1 points [-]

given unbounded computational power

You don't get to assume that till tomorrow.

Comment author: 29 April 2009 11:14:11PM *  0 points [-]

Your statements about application of decision-making to humans still fail to make any sense to me. I fail to form a coherent model of how you understand this issue. Could you try to write up a short step-by-step introduction to your position, maybe basic terms only, just to establish a better vocabulary to build on? Open thread seems like a right place for such post.

Comment author: 30 April 2009 08:40:33PM *  6 points [-]

Short version: beyond a certain (very coarse) precision you can't usefully model humans as logical, goal-directed, decision-making agents contaminated by pesky "biases". Goals, decisions and agency are very leaky abstractions, illusions that arise from the mechanical interplay of our many ad-hoc features. Rather than heading off for the sunset, the 99% typical behavior of humans is going around in circles day after day; if this is goal-directed, the goal must be weird indeed. If you want to make predictions about actual human beings, don't talk about their goals, talk about their tendencies.

Far from distressing me, this situation makes me happy. It's great we have so few optimizers around. Real-world strong optimizers, from natural selection to public corporations to paperclippers, look psychopathic and monstrous when viewed through the lens of our tendency-based morality.

For more details see thread above. Or should I compile this stuff into a toplevel post?

Comment author: 01 May 2009 09:43:07AM 1 point [-]

Okay, I've probably captured the gist of your position now. Correct me if I'm speaking something out of its character below.

Humans are descriptively not utility maximizers, they can only be modeled this way under coarse approximation and with a fair number of exceptions. There seems to be no reason to normatively model them with some ideal utility maximizer, to apply the concepts like should in more rigorous sense of decision theory.

Humans do what they do, not what they "should" according to some rigorous external model. This is an argument and intuition similar to not listening to philosopher-constructed rules of morality, non-intuitive conclusions reached from considering a thought experiment, or God-declared moral rules, since you first have to accept each moral rule yourself, according to your own criteria, which might even be circular.

Comment author: 01 May 2009 02:41:40AM 1 point [-]

It's great we have so few optimizers around. Real-world strong optimizers, from natural selection to public corporations to paperclippers, look psychopathic and monstrous when viewed through the lens of our tendency-based morality.

I thought this was the point of the Overcoming Bias project and the endeavor not to be named until tomorrow (cf. "Thou Art Godshatter" and "Value is Fragile"): that we want to put the fearsome power of optimization in the service of humane values, instead just of leaving things to nature, which is monstrous.

Or should I compile this stuff into a toplevel post?

I would love to see a top-level post on this issue.

Comment author: 29 April 2009 11:40:16PM 0 points [-]

Is that addressed to cousin_it or Psy-Kosh?

Comment author: 29 April 2009 11:42:53PM 0 points [-]

To cousin_it, obviously...

Comment author: 30 April 2009 12:14:01AM 0 points [-]

Thanks. (It wasn't obvious to me, because I've seen similar comments from you to Psy-Kosh recently, and don't remember seeing any such to cousin_it. And it's not entirely outside the bounds of possibility for someone to make a comment a sibling rather than a child of what it's responding to.)

Comment author: 29 April 2009 07:24:24PM 0 points [-]

typo above:

(propensity or frequencies. (frequencies? aieeee! Burn the heretic, or at least flame them without mercy! :))

Presumably, you want :) to be a smiley - in that case, you need another closing parenthesis.

Comment author: 29 April 2009 07:51:17PM 1 point [-]
Comment author: 29 April 2009 08:07:08PM 2 points [-]

Nice cite, but proper nesting beats stylistic concerns any day.

Comment author: 29 April 2009 09:37:31PM -1 points [-]

Hee hee, fixed. :)

((:)))

Comment deleted 01 May 2009 01:17:07AM [-]
Comment author: 01 May 2009 01:43:36AM 0 points [-]

What? Oh, I think you meant that comment for this article rather than for mine.

Comment author: 01 May 2009 02:01:33AM 0 points [-]

yep.