DanArmak comments on What Are Probabilities, Anyway? - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (78)
That implies there's some objectively-definable standard for preferences which you'll be able to recognize once you see it. Also, it begs the question of what in your current preferences says "I have to go out and get some more/different preferences!" From a goal-driven intelligence's POV, asking others to modify your prefs in unspecified ways is pretty much the anti-rational act.
I think we need to distinguish between what a rational agent should do, and what a non-rational human should do to become more rational. Nesov's reply to you also concerns the former, I think, but I'm more interested in the latter here.
Unlike a rational agent, we don't have well-defined preferences, and the preferences that we think we have can be changed by arguments. What to do about this situation? Should we stop thinking up or listening to arguments, and just fill in the fuzzy parts of our preferences with randomness or indifference, in order to emulate a rational agent in the most direct manner possible? That doesn't make much sense to me.
I'm not sure what we should do exactly, but whatever it is, it seems like arguments must make up a large part of it.
Please see my reply to Nesov above, too.
I think we shouldn't try to emulate rational agents at all, in the sense that we shouldn't pretend to have rationality-style preferences and supergoals; as a matter of fact we don't have them.
Up to here we seem to agree, we just use different terminology. I just don't want to conflate rational preferences with human preferences because they the two systems behave very differently.
Just as an example, in signalling theories of behaviour, you may consciously believe that your preferences are very different from what your behaviour is actually optimizing for when noone is looking. A rational agent wouldn't normally have separate conscious/unconscious minds unless only the conscious part was sbuject to outside inspection. In this example, it makes sense to update signalling-preferences sometimes, because they're not your actual acting-preferences.
But if you consciously intend to act out your (conscious) preferences, and also intend to keep changing them in not-always-foreseeable ways, then that isn't rationality, and when there could be confusion due to context (such as on LW most of the time) I'd prefer not to use the term "preferences" about humans, or to make clear what is meant.
That arguments modify preference means that you are (denotationally) arriving at different preferences depending on arguments. This means that, from the perspective of a specific given preference (or "true" neutral preference not biased by specific arguments), you fail to obtain optimal rational decision algorithm, and thus to achieve high-preference strategy. But at the same time, "absence of action" is also an action, so not exploring the arguments may as well be a worse choice, since you won't be moving forward towards more clear understanding of your own preference, even if the preference that you are going to understand will be somewhat biased compared to the unknown original one.
Thus, there is a tradeoff:
FWIW, my preferences have not been changed by arguments in the last 20 years. So I don't think your "we" includes me.
As an example, consider the arguments in form of proofs/disproofs of the statements that you are interested in. Information doesn't necessarily "change" or "determine arbitrarily" the things you take from it, it may help you to compute an object in which you are already interested, without changing that object, and at the same time be essential in moving forward. If you have an algorithm, it doesn't mean that you know what this algorithm will give you in the end, what the algorithm "means". Resist the illusion of transparency.
I don't understand what you're saying as applied to this argument. That Wei Dai has an algorithm for modifying his preferences and he doesn't know what the end output of that algorithm will be?
There will always be something about preference that you don't know, and it's not the question of modifying preference, it's a question of figuring out what the fixed unmodifiable preference implies. Modifying preference is exactly the wrong way of going about this.
If we figure out the conceptual issues of FAI, we'd basically have the algorithm that is our preferences, but not in infinite and unknowable normal "execution trace" denotational "form".
As Wei says below, we should consider rational agents (who have explicit preferences separate from the rest of their cognitive architecture) separately from humans who want to approximate that in some ways.
I think that if we first define separate preferences, and then proceed to modify them over and over again, this is so different from rational agents that we shouldn't call it preferences at all. We can talk about e.g. morals instead, or about habits, or biases.
On the other hand if we define human preferences as 'whatever human behavior happens to optimize', then there's nothing interesting about changing our preferences, this is something that happens all the time whether we want it to or not. Under this definition Wei's statement that he deliberately makes it happen is unclear (the totality of a human's behaviour, knowledge, etc. is subtly changing over time in any case) so I assumed he was using the former definition.
There is no clear-cut dichotomy between defining something completely at the beginning and doing things arbitrarily as we go. Instead of defining preference for rational agents, in a complete, finished form, and then seeing what happens, consider a process of figuring out what preference is. This is neither a way to arrive at the final answer, at any point, nor a history of observing of "whatever happens". Rational agent is an impossible construct, but something irrational agents aspire to be, never obtaining. What they want to become isn't directly related to what they "appear" to strive towards.
I understand. So you're saying we should indeed use the term 'preference' for humans (and a lot of other agents) because no really rational agents can exist.
Actually, why is this true? I don't know about perfect rationality, but why shouldn't an agent exist whose preferences are completely specified and unchanging?
Right. Except that really rational agents might exist, but not if their preferences are powerful enough, as humans' have every chance to be. And whatever we irrational humans, or our godlike but still, strictly speaking, irrational FAI try to do, the concept of "preference" still needs to be there.
Again, it's not about changing preference. See these comments.
An agent can have a completely specified and unchanging preference, but still not know everything about it (and never able to know everything about it). In particular, this is a consequence of halting problem: if you have source code of a program, this code completely specifies whether this program halts, and you may run this code for arbitrarily long time without ever changing it, but still not know whether it halts, and not being able to ever figure that out, unless you are lucky to arrive at a solution in this particular case.
OK, I understand now what you're saying. I think the main difference, then, between preferences in humans and in perfect (theoretical) agents is that our preferences aren't separate from the rest of our mind.
I don't understand this point.