Wei_Dai comments on A Master-Slave Model of Human Preferences - Less Wrong

58 Post author: Wei_Dai 29 December 2009 01:02AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (80)

You are viewing a single comment's thread. Show more comments above.

Comment author: Mitchell_Porter 29 December 2009 04:27:16AM 7 points [-]

a test for those who propose to "extract" or "extrapolate" our preferences into a well-defined and rational form

If we are going to have a serious discussion about these matters, at some point we must face the fact that the physical description of the world contains no such thing as a preference or a want - or a utility function. So the difficulty of such extractions or extrapolations is twofold. Not only is the act of extraction or extrapolation itself conditional upon a value system (i.e. normative metamorality is just as "relative" as is basic morality), but there is nothing in the physical description to tell us what the existing preferences of an agent are. Given the physical ontology we have, the ascription of preferences to a physical system is always a matter of interpretation or imputation, just as is the ascription of semantic or representational content to its states.

It's easy to miss this in a decision-theoretic discussion, because decision theory already assumes some concept like "goal" or "utility", always. Decision theory is the rigorous theory of decision-making, but it does not tell you what a decision is. It may even be possible to create a rigorous "reflective decision theory" which tells you how a decision architecture should choose among possible alterations to itself, or a rigorous theory of normative metamorality, the general theory of what preferences agents should have towards decision-architecture-modifying changes in other agents. But meta-decision theory will not bring you any closer to finding "decisions" in an ontology that doesn't already have them.

Comment author: Wei_Dai 29 December 2009 08:51:26PM 4 points [-]

I agree this is part of the problem, but like others here I think you might be making it out to be harder than it is. We know, in principle, how to translate a utility function into a physical description of an object: by coding it as an AI and then specifying the AI along with its substrate down to the quantum level. So, again in principle, we can go backwards: take a physical description of an object, consider all possible implementations of all possible utility functions, and see if any of them matches the object.

Comment author: Vladimir_Nesov 29 December 2009 09:04:10PM 2 points [-]

We know, in principle, how to translate a utility function into a physical description of an object: by coding it as an AI and then specifying the AI along with its substrate down to the quantum level. So, again in principle, we can go backwards: take a physical description of an object, consider all possible implementations of all possible utility functions, and see if any of them matches the object.

I think it's enough to consider computer programs and dispense with details of physics -- everything else can be discovered by the program. You are assuming the "bottom" level of physics, "quantum level", but there is no bottom, not really, there is only the beginning where our own minds are implemented, and the process of discovery that defines the way we see the rest of the world.

If you start with an AI design parameterized by preference, you are not going to enumerate all programs, only a small fraction of programs that have the specific form of your AI with some preference, and so for a given arbitrary program there will be no match. Furthermore, you are not interested in finding a match: if a human was equal to the AI, you are already done! It's necessary to explicitly go the other way, starting from arbitrary programs and understanding what a program is, deeply enough to see preference in it. This understanding may give an idea of a mapping for translating a crazy ape into an efficient FAI.

Comment author: Wei_Dai 29 December 2009 09:26:23PM 1 point [-]

If you start with an AI design parameterized by preference, you are not going to enumerate all programs, only a small fraction of programs that have the specific form of your AI with some preference, and so for a given arbitrary program there will be no match.

When I said "all possible implementations of all possible utility functions", I meant to include flawed implementations. But then two different utility functions might map onto the same physical object, so we'd also need a theory of implementation flaws that tells us, given two implementations of a utility function, which is more flawed.

Comment author: Vladimir_Nesov 29 December 2009 09:45:46PM *  2 points [-]

When I said "all possible implementations of all possible utility functions", I meant to include flawed implementations. But then two different utility functions might map onto the same physical object, so we'd also need a theory of implementation flaws that tells us, given two implementations of a utility function, which is more flawed.

This is WAY too hand-wavy an explanation for "in principle, we can go backwards" (from a system to its preference). I believe that in principle, we can, but not via injecting fuzziness of "implementation flaws".

Comment author: Mitchell_Porter 30 December 2009 12:18:56PM 1 point [-]

Here's another statement of the problem: One agent's bias is another agent's heuristic. And the "two agents" might be physically the same, but just interpreted differently.