Wei_Dai comments on Branches of rationality - Less Wrong

75 Post author: AnnaSalamon 12 January 2011 03:24AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (64)

You are viewing a single comment's thread.

Comment author: Wei_Dai 13 January 2011 10:11:32AM 19 points [-]

"What simple tricks can help turn humans -- haphazard evolutionary amalgams that we are -- into coherent agents?"

One trick that we can always apply: disassemble the human and use his atoms to build a paperclip maximizer. The point is, we don't just want to turn humans into coherent agents, we want to turn them into coherent agents who can be said to have the same preferences as the original humans. But given that we don't have a theory of preferences for incoherent agents, how do we know that any given trick intended to improve coherence is preference-preserving? Right now we have little to guide us except intuition.

To borrow an example from Robin Hanson, we have both preferences that are consciously held, and preferences that are unconsciously held, and many "rationality techniques" seem to emphasize the consciously held preferences at the expense of unconsciously held preferences. It's not clear this is kosher.

I think there are many important unsolved problems in the theoretical/philosophical parts of rationality, and this post seems to under-emphasize them.

Comment author: AnnaSalamon 13 January 2011 11:46:20AM 4 points [-]

I think there are many important unsolved problems in the theoretical/philosophical parts of rationality, and this post seems to under-emphasize them.

What would a picture of rationality’s goal that correctly emphasized them look like?

Comment author: Wei_Dai 13 January 2011 07:15:55PM 3 points [-]

I think it should at least mention prominently that there is a field that might be called "theory of rationality" and perhaps subdivided into "theory of ideal agents" and "theory of flawed agents", and we still know very little about these subjects (the latter even less than the former), and as a result we have little theoretical guidance for the practical work.

I'm tempted to further say that work on theory should take precedence over work on practice at this point, but that's probably just my personal bias speaking. In any case people will mostly work on what they intuitively think is interesting or important, so I just want to make sure that people who are interested in "rationality" know that there are lots of theoretical problems that they might consider interesting or important.

Comment author: AnnaSalamon 13 January 2011 11:45:47AM *  3 points [-]

The point is, we don't just want to turn humans into coherent agents, we want to turn them into coherent agents who can be said to have the same preferences as the original humans. But given that we don't have a theory of preferences for incoherent agents, how do we know that any given trick intended to improve coherence is preference-preserving? Right now we have little to guide us except intuition.

I absolutely agree. The actual question I had written on my sheet, as I tried to figure out what a more powerful “rationality” might include, was “... into coherent agents, with something like the goals ‘we’ wish to have?” Branch #8 above is exactly the art of not having the goals-one-acts-on be at odds with the goals-one-actually-cares-about (and includes much mention of the usefulness of theory).

My impression, though, is that some of the other branches of rationality in the post are very helpful for self-modifying in a manner you’re less likely to regret. Philosophy always holds dangers, but a person approaching the question of “What goals shall I choose?”, and encountering confusing information that may affect what he wants (e.g., encountering arguments in meta-ethics, or realizing his religion is false, or realizing he might be able to positively or negatively affect a disorienting number of lives) will be much better off if he already has good self-knowledge and has accepted that his current state is his current state (vs. if he wants desperately to maintain that, say, he doesn’t care about status and that only utilitarian expected-global-happiness-impacts affect his behavior -- a surprisingly common nerd failure mode).

I don’t know how to extrapolate the preferences of myself or other people either, but my guess is, while further theoretical work is critical, it’ll be easier to do this work in a non-insane fashion in the context of a larger, or more whole-personed, rationality. What are your thoughts here?

Comment author: Dr_Manhattan 13 January 2011 06:56:06PM *  6 points [-]

very helpful for self-modifying in a manner you’re less likely to regret.

I don't think regret is the concern here... Your future self might be perfectly happy making paperclips. I almost think "not wanting your preferences changed" deserves a new term...Hmm, "pre-gret"?

Comment author: Vaniver 17 January 2011 02:55:04PM 0 points [-]

Upvoted for 'pregret.'

Comment author: Jonathan_Graehl 17 January 2011 02:39:05PM 0 points [-]

I like to imagine another copy of my mind watching what I'm becoming, and being pleased. If I can do that, then I feel good about my direction.

You will find people who are willing to bite the "I won't care when I'm dead" bullet, or at least claim to - it's probably just the abstract rule-based part of them talking.

Comment author: Clippy 17 January 2011 03:25:25PM 0 points [-]

Useful concept, bad example.

Comment author: steven0461 13 January 2011 11:08:44PM 2 points [-]

will be much better off if he already has good self-knowledge and has accepted that his current state is his current state

Everything here turns on the meaning of "accept". Does it mean "acknowledge as a possibly fixable truth" or does it mean "consciously endorse"? I think you're suggesting the latter but only defending the former, which is much more obviously true.

he wants desperately to maintain that, say, he doesn’t care about status and that only utilitarian expected-global-happiness-impacts affect his behavior -- a surprisingly common nerd failure mode

Is the disagreement here about what his brain does, or about what parts of his brain to label as himself? If the former, it's not obviously common, if the latter, it's not obviously a failure mode.

Comment author: Nick_Tarleton 26 February 2011 07:26:41AM *  1 point [-]

will be much better off if he already has good self-knowledge and has accepted that his current state is his current state

Everything here turns on the meaning of "accept". Does it mean "acknowledge as a possibly fixable truth" or does it mean "consciously endorse"?

Those both sound like basically verbal/deliberate activities, which is probably not what Anna meant. I would say "not be averse to the thought of".

Comment author: Wei_Dai 13 January 2011 07:16:51PM 1 point [-]

I don't have much data here, but I guess none of us do. Personally, I haven't found it terribly helpful to learn that I'm probably driven in large part by status seeking, and not just pure intellectual curiosity. I'm curious what data points you have.

Comment author: Zvi 13 January 2011 09:50:46PM 4 points [-]

That is interesting to me because finding out I am largely a status maximizer (and that others are as well) has been one of the most valuable bits of information I've learned from OB/LW. This was especially true at work, where I realized I needed to be maximizing my status explicitly as a goal and not feel bad about it, which allowed me to do so far more efficiently.

Comment author: Wei_Dai 17 January 2011 10:19:39PM *  6 points [-]

You, upon learning that you're largely a status maximizer, decided to emphasize status seeking even more, by doing it on a conscious level. But that means other competing goals (I assume you must have some) have been de-emphasized, since the cognitive resources of your conscious mind are limited.

I, on the other hand, do not want to want to seek status. Knowing that I'm driven largely by status seeking makes me want to self-modify in a way that de-emphasizes status seeking as a goal (*). But I'm not really sure either of these responses are rational.

(*) Unfortunately I don't know how to do so effectively. Before, I'd just spend all of my time thinking about a problem on the object level. Now I can't help but periodically wonder if I believe or argue for some position because it's epistemically justified, or because it helps to maximize status. For me, this self doubt seems to sap energy and motivation without reducing bias enough to be worth the cost.

Comment author: Zvi 18 January 2011 01:53:07PM 2 points [-]

This is the simple version of the explicit model I have in my head at work now: I have two currencies, Dollars and Status. Every decision I make likely has some impact both in terms of our company's results (Dollars) and also in terms of how I and others will be perceived (Status). The cost in Status to make any given decision is a reducing function of current Status. My long term goal is to maximize Dollars. However, often the correct way to maximize Dollars in the long term is to sacrifice Dollars for Status, bank the Status and use it to make better decisions later.

I think this type of thing should be common. Status is a resource that is used to acquire what you want, so in my mind there's no shame in going after it.

Comment author: katydee 18 January 2011 02:24:45PM 0 points [-]

How do time constraints play into this model?

Comment author: TheOtherDave 13 January 2011 08:03:24PM 1 point [-]

Do you ever find yourself in situations where you would predict different things if you thought you were a pure-intellectual-curiosity-satisfier than if you think you're in part a status-maximizer?

If so, is making more accurate predictions in such situations useful, or do accurate predictions not matter much?

I suspect that if I thought of myself as a pure-intellectual-curiosity-satisfier, I would be a lot more bewildered by my behavior and my choices than I am, and struggle with them a lot more than I do, and both of those would make me less happy.

Comment author: Jonathan_Graehl 17 January 2011 02:43:52PM 0 points [-]

If the way you seek status is ethical ("do good work" more than "market yourself as doing good work") then you may not want to change anything once you discover your "true motivation". And the alternative "don't care about anything" hardly entices.

Comment author: Will_Sawin 13 January 2011 09:10:14PM 2 points [-]

I, the entity that is typing these words, do not approve of unconscious preferences when they conflict with conscious ones.

Comment author: Will_Newsome 13 January 2011 11:26:15AM 1 point [-]

I think there are many important unsolved problems in the theoretical/philosophical parts of rationality, and this post seems to under-emphasize them.

Agreed to an extent, but most folk aren't out to become Friendliness philosophers. One branch that went unmentioned in the post and would be useful for both philosophers and pragmatists includes the ability to construct (largely cross-domain / overarching) ontologies out of experience and abstract knowledge, the ability to maintain such ontologies (propagating beliefs across domains, noticing implications of belief structures and patterns, noticing incoherence), and the disposition of staying non-attached to familiar ontologies (e.g. naturalism/reductionism) and non-averse to unfamiliar/enemy ontologies (e.g. spiritualism/phenomenology). This is largely what distinguishes exemplary rationalists from merely good rationalists, and it's barely talked about at all on Less Wrong.