Roko comments on A Master-Slave Model of Human Preferences - Less Wrong

58 Post author: Wei_Dai 29 December 2009 01:02AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (80)

You are viewing a single comment's thread. Show more comments above.

Comment author: Mitchell_Porter 29 December 2009 04:27:16AM 7 points [-]

a test for those who propose to "extract" or "extrapolate" our preferences into a well-defined and rational form

If we are going to have a serious discussion about these matters, at some point we must face the fact that the physical description of the world contains no such thing as a preference or a want - or a utility function. So the difficulty of such extractions or extrapolations is twofold. Not only is the act of extraction or extrapolation itself conditional upon a value system (i.e. normative metamorality is just as "relative" as is basic morality), but there is nothing in the physical description to tell us what the existing preferences of an agent are. Given the physical ontology we have, the ascription of preferences to a physical system is always a matter of interpretation or imputation, just as is the ascription of semantic or representational content to its states.

It's easy to miss this in a decision-theoretic discussion, because decision theory already assumes some concept like "goal" or "utility", always. Decision theory is the rigorous theory of decision-making, but it does not tell you what a decision is. It may even be possible to create a rigorous "reflective decision theory" which tells you how a decision architecture should choose among possible alterations to itself, or a rigorous theory of normative metamorality, the general theory of what preferences agents should have towards decision-architecture-modifying changes in other agents. But meta-decision theory will not bring you any closer to finding "decisions" in an ontology that doesn't already have them.

Comment deleted 29 December 2009 03:38:04PM *  [-]
Comment author: Vladimir_Nesov 29 December 2009 05:50:43PM *  1 point [-]

Thus, the criterion for ascribing preferences to a physical system is that the actual physics has to be well-approximated by a function that optimizes for a preferred state, for some value of "preferred state".

I don't think this simple characterisation resembles the truth: the whole point of this enterprise is to make sure things go differently, in a way they just couldn't proceed by themselves. Thus, observing existing "tendencies" doesn't quite capture the idea of preference.

Comment deleted 29 December 2009 08:03:17PM [-]
Comment author: Vladimir_Nesov 29 December 2009 08:39:40PM *  1 point [-]

I don't hear differently... I even suspect that preference is introspective, that is depends on a way the system works "internally", not just on how it interacts with environment. That is, two agents with different preferences may do exactly the same thing in all contexts. Even if not, it's a long way between how the agent (in its craziness and stupidity) actually changes the environment, and how it would prefer (on reflection, if it was smarter and saner) the environment to change.

Comment deleted 29 December 2009 11:11:31PM *  [-]
Comment author: Wei_Dai 30 December 2009 08:55:59PM 1 point [-]

I want to point out that in the interpretation of prior as weights on possible universes, specifically as how much one cares about different universes, we can't just replace "incorrect" beliefs with "the truth". In this interpretation, there can still be errors in one's beliefs caused by things like past computational mistakes, and I think fixing those errors would constitute helping, but the prior perhaps needs to be preserved as part of preference.

Comment author: Vladimir_Nesov 29 December 2009 11:15:09PM *  1 point [-]

If the agent has a well-defined "predictive module" which has a "map" (probability distribution over the environment given an interaction history), and some "other stuff", then you can clamp the predictive module down to the truth, and then perform what I said before:

Yeah, maybe. But it doesn't.

Comment deleted 30 December 2009 02:03:05PM [-]
Comment deleted 30 December 2009 02:18:12PM *  [-]
Comment author: Vladimir_Nesov 01 January 2010 04:44:46PM 2 points [-]

What is left of the time cube guy once you subtract off his false beliefs and delusions? Not much, probably.

Beware: you are making a common sense-based prediction about what would be the output of a process that you don't even have the right concepts for specifying! (See my reply to your other comment.)

Comment author: SilasBarta 10 January 2010 04:10:10PM 1 point [-]

Wow. Too bad I missed this when it was first posted. It's what I wish I'd said when justifying my reply to Wei_Dai's attempted belief/values dichotomy here and here.

Comment deleted 10 January 2010 06:09:24PM *  [-]
Comment deleted 30 December 2009 02:21:34PM *  [-]
Comment author: Vladimir_Nesov 01 January 2010 04:44:24PM *  2 points [-]

I strongly agree with this: the problem that CEV is the solution to is urgent but it isn't elegant. Absolutes like "There isn't a beliefs/desires separation" are unhelpful when solving such inelegant but important problems.

One lesson of reductionism and success of simple-laws-based science and technology is that for the real-world systems, there might be no simple way of describing them, but there could be a simple way of manipulating their data-rich descriptions. (What's the yield strength of a car? -- Wrong question!) Given a gigabyte's worth of problem statement and the right simple formula, you could get an answer to your query. There is a weak analogy with misapplication of Occam's razor where one tries to reduce the amount of stuff rather than the amount of detail in the ways of thinking about this stuff.

In the case of beliefs/desires separation, you are looking for a simple problem statement, for a separation in the data describing the person itself. But what you should be looking for is a simple way of implementing the make-smarter-and-better extrapolation on a given pile of data. The beliefs/desires separation, if it's ever going to be made precise, is going to reside in the structure of this simple transformation, not in the people themselves.

Comment author: Tyrrell_McAllister 29 December 2009 08:23:36PM *  0 points [-]

[Y]ou have to draw a boundary around the "optimizing agent", and look at the difference between the tendencies of the environment without the optimizer, and the tendencies of the environment with the optimizer.

And there's your "opinion or interpretation" --- not just in how you draw the boundary (which didn't exist in the original ontology), but in your choice of the theory that you use to evaluate your counterfactuals.

Of course, such theories can be better or worse, but only with respect to some prior system of evaluation.

Comment author: Vladimir_Nesov 29 December 2009 08:49:47PM 2 points [-]

Still, probably a question of Aristotelian vs. Newtonian mechanics, i.e. not hard to see who wins.

Comment author: Tyrrell_McAllister 29 December 2009 08:55:55PM *  0 points [-]

Still, probably a question of Aristotelian vs. Newtonian mechanics, i.e. not hard to see who wins.

Agreed, but not responsive to Mitchell Porter's original point. (ETA: . . . unless I'm missing your point.)