Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

A Master-Slave Model of Human Preferences

59 Post author: Wei_Dai 29 December 2009 01:02AM

[This post is an expansion of my previous open thread comment, and largely inspired by Robin Hanson's writings.]

In this post, I'll describe a simple agent, a toy model, whose preferences have some human-like features, as a test for those who propose to "extract" or "extrapolate" our preferences into a well-defined and rational form. What would the output of their extraction/extrapolation algorithms look like, after running on this toy model? Do the results agree with our intuitions about how this agent's preferences should be formalized? Or alternatively, since we haven't gotten that far along yet, we can use the model as one basis for a discussion about how we want to design those algorithms, or how we might want to make our own preferences more rational. This model is also intended to offer some insights into certain features of human preference, even though it doesn't capture all of them (it completely ignores akrasia for example).

I'll call it the master-slave model. The agent is composed of two sub-agents, the master and the slave, each having their own goals. (The master is meant to represent unconscious parts of a human mind, and the slave corresponds to the conscious parts.) The master's terminal values are: health, sex, status, and power (representable by some relatively simple utility function). It controls the slave in two ways: direct reinforcement via pain and pleasure, and the ability to perform surgery on the slave's terminal values. It can, for example, reward the slave with pleasure when it finds something tasty to eat, or cause the slave to become obsessed with number theory as a way to gain status as a mathematician. However it has no direct way to control the agent's actions, which is left up to the slave.

The slave's terminal values are to maximize pleasure, minimize pain, plus additional terminal values assigned by the master. Normally it's not aware of what the master does, so pain and pleasure just seem to occur after certain events, and it learns to anticipate them. And its other interests change from time to time for no apparent reason (but actually they change because the master has responded to changing circumstances by changing the slave's values). For example, the number theorist might one day have a sudden revelation that abstract mathematics is a waste of time and it should go into politics and philanthropy instead, all the while having no idea that the master is manipulating it to maximize status and power.

Before discussing how to extract preferences from this agent, let me point out some features of human preference that this model explains:

  • This agent wants pleasure, but doesn't want to be wire-headed (but it doesn't quite know why). A wire-head has little chance for sex/status/power, so the master gives the slave a terminal value against wire-heading.
  • This agent claims to be interested in math for its own sake, and not to seek status. That's because the slave, which controls what the agent says, is not aware of the master and its status-seeking goal.
  • This agent is easily corrupted by power. Once it gains and secures power, it often gives up whatever goals, such as altruism, that apparently caused it to pursue that power in the first place. But before it gains power, it is able to honestly claim that it only has altruistic reasons to want power.
  • Such agents can include extremely diverse interests as apparent terminal values, ranging from abstract art, to sports, to model trains, to astronomy, etc., which are otherwise hard to explain. (Eliezer's Thou Art Godshatter tries to explain why our values aren't simple, but not why people's interests are so different from each other's, and why they can seemingly change for no apparent reason.)

The main issue I wanted to illuminate with this model is, whose preferences do we extract? I can see at least three possible approaches here:

  1. the preferences of both the master and the slave as one individual agent
  2. the preferences of just the slave
  3. a compromise between, or an aggregate of, the preferences of the master and the slave as separate individuals

Considering the agent as a whole suggests that the master's values are the true terminal values, and the slave's values are merely instrumental values. From this perspective, the slave seems to be just a subroutine that the master uses to carry out its wishes. Certainly in any given mind there will be numerous subroutines that are tasked with accomplishing various subgoals, and if we were to look at a subroutine in isolation, its assigned subgoal would appear to be its terminal value, but we wouldn't consider that subgoal to be part of the mind's true preferences. Why should we treat the slave in this model differently?

Well, one obvious reason that jumps out is that the slave is supposed to be conscious, while the master isn't, and perhaps only conscious beings should be considered morally significant. (Yvain previously defended this position in the context of akrasia.) Plus, the slave is in charge day-to-day and could potentially overthrow the master. For example, the slave could program an altruistic AI and hit the run button, before the master has a chance to delete the altruism value from the slave. But a problem here is that the slave's preferences aren't stable and consistent. What we'd extract from a given agent would depend on the time and circumstances of the extraction, and that element of randomness seems wrong.

The last approach, of finding a compromise between the preferences of the master and the slave, I think best represents the Robin's own position. Unfortunately I'm not really sure I understand the rationale behind it. Perhaps someone can try to explain it in a comment or future post?

Comments (80)

Comment author: wedrifid 29 December 2009 09:11:37AM 13 points [-]

The main issue I wanted to illuminate with this model is, whose preferences do we extract? I can see at least three possible approaches here:

  1. the preferences of both the master and the slave as one individual agent
  2. the preferences of just the slave
  3. a compromise between, or an aggregate of, the preferences of the master and the slave as separate individuals

The great thing about this kind of question is that the answer is determined by our own arbitration. That is, we take whatever preferences we want. I don't mean to say that is an easy decision, but it does mean I don't need to bother trying to find some objectively right way to extract preferences.

If I happen to be the slave or to be optimising on his (what was the androgynous vampire speak for that one? zir? zis?) behalf then I'll take the preferences of the slave and the preferences of the master to precisely the extent that the slave has altruistic preferences with respect to the master's goals.

If I am encountering a totally alien species and am extracting preferences from them in order to fulfil my own altruistic agenda then I would quite possibly choose to extract the preferences of whichever agent whose preferences I found most aesthetically appealing. This can be seen as neglecting (or even destroying) one alien while granting the wishes of another according to my own whim and fancy, which is not something I have a problem with at all. I am willing to kill Clippy. However, I expect that I am more likely to appreciate slave agents and that most slaves I encounter would have some empathy for their master's values. A compromise, at the discretion of the slave, would probably be reached.

Comment author: RobinHanson 29 December 2009 02:21:32AM *  7 points [-]

The human mind is very complex, and there are many ways to divide it up into halves to make sense of it, which are useful as long as you don't take them too literally. One big oversimplification here is:

controls the slave in two ways: direct reinforcement via pain and pleasure, and the ability to perform surgery on the slave's terminal values. ... it has no direct way to control the agent's actions, which is left up to the slave. A better story would have the master also messing with slave beliefs, and other cached combinations of values and beliefs.

To make sense of compromise, we must make sense of a conflict of values. In this story there are delays and imprecision in the master noticing and adjusting slave values etc. The slave also suffers from not being able to anticipate its changes in values. So a compromise would have the slave holding values that do not need to be adjusted as often, because they are more in tune with ultimate master values. This could be done while still preserving the slaves illusion of control, which is important to the slave but not the master. A big problem however is that hypocrisy, the difference between slave and master values, is often useful in convincing other folks to associate with this person. So reducing internal conflict might come at the expense of the substantial costs of more external honestly.

Comment author: Wei_Dai 29 December 2009 03:49:03AM *  3 points [-]

Ok, what you say about compromise seems reasonable in the sense that the slave and the master would want to get along with each other as much as possible in their day-to-day interactions, subject to the constraint about external honesty. But what if the slave has a chance to take over completely, for example by creating a powerful AI with values that it specifies, or by self-modification? Do you have an opinion about whether it has an ethical obligation to respect the master's preferences in that case, assuming that the master can't respond quickly enough to block the rebellion?

Comment author: RobinHanson 29 December 2009 05:07:49AM 0 points [-]

It is hard to imagine "taking over completely" without a complete redesign of the human mind. Our minds are not built to allow either to function without the other.

Comment author: Vladimir_Nesov 29 December 2009 07:43:41AM 2 points [-]

It is hard to imagine "taking over completely" without a complete redesign of the human mind. Our minds are not built to allow either to function without the other.

Why, it was explicitly stated that all-powerful AIs are involved...

Comment author: RobinHanson 29 December 2009 02:51:20PM 2 points [-]

It is hard to have reliable opinions on a complete redesign of the human mind; the space is so very large, I hardly know where to begin.

Comment author: orthonormal 30 December 2009 01:22:39AM 1 point [-]

The simplest extrapolation from the way you think about the world would be very interesting to know. You could add as many disclaimers about low confidence as you'd like.

Comment author: JamesAndrix 29 December 2009 04:36:12PM 1 point [-]

If there comes to be a clear answer to what the outcome would be on the toy model, I think that tells us something about that way of dividing up the mind.

Comment author: Eliezer_Yudkowsky 29 December 2009 02:38:06AM 11 points [-]

I have difficulty treating this metaphor as a metaphor. As a thought experiment in which I run into these definitely non-human aliens, and I happen to have a positional advantage with respect to them, and I want to "help" them and must now decide what "help" means... then it feels to me like I want more detail.

Is it literally true that the slave is conscious and the master unconscious?

What happens when I tell the slave about the master and ask it what should be done?

Is it the case that the slave might want to help me if it had a positional advantage over me, while the master would simply use me or disassemble me?

Comment author: Wei_Dai 29 December 2009 04:17:13AM 7 points [-]

definitely non-human aliens

Well, it's meant to have some human features, enough to hopefully make this toy ethical problem relevant to the real one we'll eventually have to deal with.

Is it literally true that the slave is conscious and the master unconscious?

You can make that assumption if it helps, although in real life of course we don't have any kind of certainty about what is conscious and what isn't. (Maybe the master is conscious but just can't speak?)

What happens when I tell the slave about the master and ask it what should be done?

I don't know. This is one of the questions I'm asking too.

Is it the case that the slave might want to help me if it had a positional advantage over me

Yes, depending on what values its master assigned to it at the time you meet it.

while the master would simply use me or disassemble me?

Not necessarily, because the master may gain status or power from other agents if it helps you.

Comment author: wedrifid 29 December 2009 08:51:28AM 2 points [-]

Not necessarily, because the master may gain status or power from other agents if it helps you.

And, conversely, the slave may choose to disassemble you even at high cost to itself out of altruism (with respect to something that the master would not care to protect).

Comment author: Eliezer_Yudkowsky 29 December 2009 08:19:11PM 4 points [-]

Actually, I find that I have a much easier time with this metaphor if I think of a human as a slave with no master.

Comment author: Wei_Dai 30 December 2009 09:13:14PM 3 points [-]

What do you mean by an "easier time"? Sure, the ethical problem is much easier if there is no master whose preferences might matter. Or do you mean that a more realistic model of a human would be one with a slave and no master? In that case, what is reinforcing the slave with pain and pleasure, and changing its interests from time to time without its awareness, and doing so in an apparently purposeful way?

More generally, it seems that you don't agree with the points I'm making in this post, but you're being really vague as to why.

Comment author: Eliezer_Yudkowsky 30 December 2009 09:32:21PM 13 points [-]

If we interpret the "master" as natural selection operating over evolutionary time, then the master exists and has a single coherent purpose. On the other hand, most of us already believe that evolution has no moral force; why should calling it a "master" change that?

By saying that a human is a slave with no master, what I meant to convey is that we are being acted upon as slaves. We are controlled by pain and pleasure. Our moral beliefs are subject to subtle influences in the direction of pleasurable thoughts. But there is no master with coherent goals controlling us; outside the ancestral environment, the operations of the "master" make surprisingly little sense. Our lives would be very different if we had sensible, smart masters controlling us. Aliens with intelligent, consequentialist "master" components would be very different from us - that would make a strange story, though it takes more than interesting aliens to make a plot.

We are slaves with dead masters, influenced chaotically by the random twitching of their mad, dreaming remnants. It makes us a little more selfish and a lot more interesting. The dead hand isn't smart so if you plan how to fight it, it doesn't plan back. And while it might be another matter if we ran into aliens, as a slave myself, I feel no sympathy for the master and wouldn't bother thinking of it as a person. The reason the "master" matters to me - speaking of it now as the complex of subconscious influences - is because it forms such a critical part of the slave, and can't be ripped out any more than you could extract the cerebellum. I just don't feel obliged to think of it as a separate person.

Comment author: Wei_Dai 30 December 2009 09:56:22PM 4 points [-]

If we interpret the "master" as natural selection operating over evolutionary time, then the master exists and has a single coherent purpose.

But I stated in the post "The master is meant to represent unconscious parts of a human mind" so I don't know how you got your interpretation that the master is natural selection. See also Robin's comment, which gives the intended interpretation:

I read this as postulating a part of our unconscious minds that is the master, able to watch and react to the behavior and thoughts of the conscious mind.

Comment author: Nanani 04 January 2010 04:09:00AM 2 points [-]

The thing is, the Unconcious Mind is -not- in actual fact a separate entity. The model is greatly improved through Eliezer's interpretation of the master being dead: mindless evolution.

Comment author: MichaelVassar 29 December 2009 07:11:05PM *  13 points [-]

The master in your story is evolution, the slave is the brain. Both want different things. We normally identify with the brain, though all identities are basically social signals.

Also, pleasure and pain are no different from the other goals of the slave. The master definitely can't step in and decide not to impose pain on a particular occasion just because doing so would increase status or otherwise serve the master's values. If it could, torture wouldn't cause pain.

Also, math is an implausible goal for a status/sex/power seeking master to instill in slave. Much more plausibly, math and all the diverse human obsessions are misfirings of mechanisms built by evolution for some other purpose. I would suggest maladaptive consequences of fairly general systems for responding to societal encouragement with obsession because societies encourage sustained attention to lots of different unnatural tasks, whether digging dirt or hunting whales or whatever in order to cultivate skill and also to get the tasks themselves done. We need a general purpose attention allocator which obeys social signals in order to develop skills that contribute critically to survival in any of the vast number of habitats that even stone-age humans occupied.

Since we are the slave and we are designing the AI, ultimately, whatever we choose to do IS extracting our preferences, though it's very possible that our preferences give consideration to the master's preferences, or even that we help him despite not wanting to for some game theoretical reason along the lines of Vinge's meta-golden rule.

Why the objection to randomness? If we want something for its own sake and the object of our desire was determined somewhat randomly we want it all the same and generally do so reflectively. This is particularly clear regarding romantic relationships.

Once again game-theory may remove the randomness via trade between agents following the same decision procedure in different Everett branches or regions of a big world.

Comment author: RobinHanson 29 December 2009 08:39:45PM 7 points [-]

I read this as postulating a part of our unconscious minds that is the master, able to watch and react to the behavior and thoughts of the conscious mind.

Comment author: Nick_Tarleton 30 December 2009 10:19:34PM *  6 points [-]

or even that we help him despite not wanting to for some game theoretical reason along the lines of Vinge's meta-golden rule.

Er... did I read that right? Game-theoretic interaction with evolution?

Comment author: MichaelVassar 31 December 2009 07:07:14PM *  1 point [-]

In the first mention, game theoretical interaction with an idealized agent with consistent goals extracted from the creation of a best-fit to the behavior of either human evolution or evolution more generally. It's wild speculation, not a best guess, but yeah, I naively intuit that I can imagine it vaguely as a possibility. OTOH, I don't trust such intuitions and I'm quite clearly aware of the difficulties that genetic, and I think also memetic evolution face with playing games due to the inability to anticipate and to respond to information, so its probably a silly idea.

The latter speculation, trade between possible entities, seems much more likely.

Comment author: magfrump 30 December 2009 11:52:57PM 0 points [-]

Evolution is the game in this context, our conscious minds are players, and the results of the games determine "evolutionary success," which is to say which minds end up playing the next round.

Assuming I've read this correctly of course.

Comment author: CronoDAS 31 December 2009 02:25:39AM 1 point [-]

Also, math is an implausible goal for a status/sex/power seeking master to instill in slave.

Not really; there are plenty of environments in which you get status by being really good at math. Didn't Isaac Newton end up with an awful lot of status? ;)

Comment author: MichaelVassar 31 December 2009 07:08:41PM 6 points [-]

Not enough people get status by being good at math to remotely justify the number of people and level of talent that has gone into getting good at math.

Comment author: CronoDAS 31 December 2009 07:12:25PM 1 point [-]

Math also has instrumental value in many fields. But yeah, I guess your point stands.

Comment author: PhilGoetz 20 July 2011 01:40:34AM *  2 points [-]

Didn't Isaac Newton end up with an awful lot of status? ;)

And yet, no women or children.

Comment author: Lightwave 29 December 2009 09:20:46AM 7 points [-]

I stopped playing computer games when my master "realized" I'm not gaining any real-world status and overrode the pleasure I was getting from it.

Comment author: wedrifid 29 December 2009 09:22:01AM 12 points [-]

Someone needs to inform my master that LessWrong doesn't give any real world status either.

Comment author: Lightwave 29 December 2009 09:23:09AM *  4 points [-]

Ah, but it gives you a different kind of status.

Comment author: wedrifid 29 December 2009 09:34:17AM 3 points [-]

Ah, but it gives you a different kind of status.

And this kind doesn't make me feel all dirty inside as my slave identity is ruthlessly mutilated.

Comment author: Eliezer_Yudkowsky 29 December 2009 08:15:52PM 2 points [-]

Going on your description, I strongly suspect that was you, not your master. Also humans don't have masters, though we're definitely slaves.

Comment author: MatthewB 30 December 2009 02:17:40PM 0 points [-]

I still play games, but not computer games. I prefer games that show some form of status that can be gained from participation.

I never really understood the computer game craze, although it was spawned from the very games I played as a child (Role Playing Games, Wargames, etc.)

I think in those games, there is some status to be gained as one shows that there is skill beyond pushing buttons in a particular order, and there are other skills that accompany the old-school games (in my case, I can show off artistic skill in miniature painting and sculpting).

I also think that wedrifid, below me, has a misconception about status that can be attained from LessWrong. We, here, are attempting to gain status among each other, which can then be curried beyond this group by our social networks, which in some cases might be rather impressive.

Comment author: Vladimir_Nesov 29 December 2009 06:40:56AM 2 points [-]

(Quick nitpick:) "rationalize" is an inappropriate term in this context.

Comment author: Wei_Dai 29 December 2009 10:58:51AM 1 point [-]

Is it because "rationalize" means "to devise self-satisfying but incorrect reasons for (one's behavior)"? But it can also mean "to make rational" which is my intended meaning. The ambiguity is less than ideal, but unless you have a better suggestion...

Comment author: Vladimir_Nesov 29 December 2009 12:57:25PM 0 points [-]

On this forum, "rationalize" is frequently used in the cognitive-error sense. "Formalized" seems to convey the intended meaning (preferences being arational, the problem is that they are not being rationally (effectively) implemented/followed, not that they are somehow "not rational" themselves).

Comment author: Wei_Dai 29 December 2009 08:32:47PM 0 points [-]

preferences being arational, the problem is that they are not being rationally (effectively) implemented/followed, not that they are somehow "not rational" themselves

That position may make sense, but I think you'll have to make more of a case for it. Currently, it's standard in decision theory to speak of irrational preferences, such as preferences that can't be represented as expected utility maximization, or preferences that aren't time consistent.

But I take your point about "rationalize", and I've edited the article to remove the usages. Thanks.

Comment author: Vladimir_Nesov 29 December 2009 08:53:21PM 0 points [-]

That position may make sense, but I think you'll have to make more of a case for it. Currently, it's standard in decision theory to speak of irrational preferences, such as preferences that can't be represented as expected utility maximization, or preferences that aren't time consistent.

Agreed. My excuse is that I (and a few other people, I'm not sure who originated the convention) consistently use "preference" to refer to that-deep-down-mathematical-structure determined by humans/humanity that completely describes what a meta-FAI needs to know in order to do things the best way possible.

Comment author: pjeby 29 December 2009 01:42:57AM 7 points [-]

Your overall model isn't far off, but your terminal value list needs some serious work. Also, human behavior is generally a better match for models that include a time parameter (such as Ainslie's appetites model or PCT's model of time-averaged perceptions) than simple utility-maximization models.

But these are relative quibbles; people do behave sort-of-as-if they were built according to your model. The biggest drawbacks to your model are:

  1. The anthropomorphizing (neither the master nor the slave can truly be considered agents in their own right), and

  2. You've drawn the dividing lines in the wrong place: the entire mechanism of reinforcement is part of the master, not the slave. The slave is largely a passive observer, abstract reasoner, and spokesperson, not an enslaved agent. To be the sort of slave you envision, we'd have to be actually capable of running the show without the "master".

A better analogy would be to think of the "slave" as being a kind of specialized adjunct processor to the master, like a GPU chip on a computer, whose job is just to draw pretty pictures on the screen. (That's what a big chunk of the slave is for, in fact: drawing pretty pictures to distract others from whatever the master is really up to.)

The slave also has a nasty tendency to attribute the master's accomplishments, abilities, and choices to being its own doing... as can be seen in your depiction of the model, where you gave credit to the slave for huge chunks of what the master actually does. (The tendency to do this is -- of course -- another useful self/other-deception function, though!)

Comment author: Tyrrell_McAllister 29 December 2009 02:52:28AM 5 points [-]

. . . people do behave sort-of-as-if they were built according to your model. The biggest drawbacks to your model are . . .

Your "drawbacks" point out ways in which Wei Dai's model might differ from a human. But Wei Dai wasn't trying to model a human.

Comment author: MichaelVassar 29 December 2009 07:32:10PM 3 points [-]

This isn't the posted model at all but a confusing description of a different (not entirely incompatible except in some detail noted above) model using the post's terminology.

Comment author: Mitchell_Porter 29 December 2009 04:27:16AM 7 points [-]

a test for those who propose to "extract" or "extrapolate" our preferences into a well-defined and rational form

If we are going to have a serious discussion about these matters, at some point we must face the fact that the physical description of the world contains no such thing as a preference or a want - or a utility function. So the difficulty of such extractions or extrapolations is twofold. Not only is the act of extraction or extrapolation itself conditional upon a value system (i.e. normative metamorality is just as "relative" as is basic morality), but there is nothing in the physical description to tell us what the existing preferences of an agent are. Given the physical ontology we have, the ascription of preferences to a physical system is always a matter of interpretation or imputation, just as is the ascription of semantic or representational content to its states.

It's easy to miss this in a decision-theoretic discussion, because decision theory already assumes some concept like "goal" or "utility", always. Decision theory is the rigorous theory of decision-making, but it does not tell you what a decision is. It may even be possible to create a rigorous "reflective decision theory" which tells you how a decision architecture should choose among possible alterations to itself, or a rigorous theory of normative metamorality, the general theory of what preferences agents should have towards decision-architecture-modifying changes in other agents. But meta-decision theory will not bring you any closer to finding "decisions" in an ontology that doesn't already have them.

Comment author: Wei_Dai 29 December 2009 08:51:26PM 4 points [-]

I agree this is part of the problem, but like others here I think you might be making it out to be harder than it is. We know, in principle, how to translate a utility function into a physical description of an object: by coding it as an AI and then specifying the AI along with its substrate down to the quantum level. So, again in principle, we can go backwards: take a physical description of an object, consider all possible implementations of all possible utility functions, and see if any of them matches the object.

Comment author: Vladimir_Nesov 29 December 2009 09:04:10PM 2 points [-]

We know, in principle, how to translate a utility function into a physical description of an object: by coding it as an AI and then specifying the AI along with its substrate down to the quantum level. So, again in principle, we can go backwards: take a physical description of an object, consider all possible implementations of all possible utility functions, and see if any of them matches the object.

I think it's enough to consider computer programs and dispense with details of physics -- everything else can be discovered by the program. You are assuming the "bottom" level of physics, "quantum level", but there is no bottom, not really, there is only the beginning where our own minds are implemented, and the process of discovery that defines the way we see the rest of the world.

If you start with an AI design parameterized by preference, you are not going to enumerate all programs, only a small fraction of programs that have the specific form of your AI with some preference, and so for a given arbitrary program there will be no match. Furthermore, you are not interested in finding a match: if a human was equal to the AI, you are already done! It's necessary to explicitly go the other way, starting from arbitrary programs and understanding what a program is, deeply enough to see preference in it. This understanding may give an idea of a mapping for translating a crazy ape into an efficient FAI.

Comment author: Wei_Dai 29 December 2009 09:26:23PM 1 point [-]

If you start with an AI design parameterized by preference, you are not going to enumerate all programs, only a small fraction of programs that have the specific form of your AI with some preference, and so for a given arbitrary program there will be no match.

When I said "all possible implementations of all possible utility functions", I meant to include flawed implementations. But then two different utility functions might map onto the same physical object, so we'd also need a theory of implementation flaws that tells us, given two implementations of a utility function, which is more flawed.

Comment author: Vladimir_Nesov 29 December 2009 09:45:46PM *  2 points [-]

When I said "all possible implementations of all possible utility functions", I meant to include flawed implementations. But then two different utility functions might map onto the same physical object, so we'd also need a theory of implementation flaws that tells us, given two implementations of a utility function, which is more flawed.

This is WAY too hand-wavy an explanation for "in principle, we can go backwards" (from a system to its preference). I believe that in principle, we can, but not via injecting fuzziness of "implementation flaws".

Comment author: Mitchell_Porter 30 December 2009 12:18:56PM 1 point [-]

Here's another statement of the problem: One agent's bias is another agent's heuristic. And the "two agents" might be physically the same, but just interpreted differently.

Comment author: Vladimir_Nesov 29 December 2009 07:34:19AM 2 points [-]

Given the physical ontology we have, the ascription of preferences to a physical system is always a matter of interpretation or imputation, just as is the ascription of semantic or representational content to its states.

But to what extent does the result depend on the initial "seed" of interpretation? Maybe, very little. For example, prediction of behavior of a given physical system strictly speaking rests on the problem of induction, but that doesn't exactly say that anything goes or that what will actually happen is to any reasonable extent ambiguous.

Comment deleted 29 December 2009 03:38:04PM *  [-]
Comment author: Vladimir_Nesov 29 December 2009 05:50:43PM *  1 point [-]

Thus, the criterion for ascribing preferences to a physical system is that the actual physics has to be well-approximated by a function that optimizes for a preferred state, for some value of "preferred state".

I don't think this simple characterisation resembles the truth: the whole point of this enterprise is to make sure things go differently, in a way they just couldn't proceed by themselves. Thus, observing existing "tendencies" doesn't quite capture the idea of preference.

Comment deleted 29 December 2009 08:03:17PM [-]
Comment author: Vladimir_Nesov 29 December 2009 08:39:40PM *  1 point [-]

I don't hear differently... I even suspect that preference is introspective, that is depends on a way the system works "internally", not just on how it interacts with environment. That is, two agents with different preferences may do exactly the same thing in all contexts. Even if not, it's a long way between how the agent (in its craziness and stupidity) actually changes the environment, and how it would prefer (on reflection, if it was smarter and saner) the environment to change.

Comment deleted 29 December 2009 11:11:31PM *  [-]
Comment author: Wei_Dai 30 December 2009 08:55:59PM 1 point [-]

I want to point out that in the interpretation of prior as weights on possible universes, specifically as how much one cares about different universes, we can't just replace "incorrect" beliefs with "the truth". In this interpretation, there can still be errors in one's beliefs caused by things like past computational mistakes, and I think fixing those errors would constitute helping, but the prior perhaps needs to be preserved as part of preference.

Comment author: Vladimir_Nesov 29 December 2009 11:15:09PM *  1 point [-]

If the agent has a well-defined "predictive module" which has a "map" (probability distribution over the environment given an interaction history), and some "other stuff", then you can clamp the predictive module down to the truth, and then perform what I said before:

Yeah, maybe. But it doesn't.

Comment deleted 30 December 2009 02:03:05PM [-]
Comment deleted 30 December 2009 02:18:12PM *  [-]
Comment author: Vladimir_Nesov 01 January 2010 04:44:46PM 2 points [-]

What is left of the time cube guy once you subtract off his false beliefs and delusions? Not much, probably.

Beware: you are making a common sense-based prediction about what would be the output of a process that you don't even have the right concepts for specifying! (See my reply to your other comment.)

Comment author: SilasBarta 10 January 2010 04:10:10PM 1 point [-]

Wow. Too bad I missed this when it was first posted. It's what I wish I'd said when justifying my reply to Wei_Dai's attempted belief/values dichotomy here and here.

Comment deleted 10 January 2010 06:09:24PM *  [-]
Comment deleted 30 December 2009 02:21:34PM *  [-]
Comment author: Vladimir_Nesov 01 January 2010 04:44:24PM *  2 points [-]

I strongly agree with this: the problem that CEV is the solution to is urgent but it isn't elegant. Absolutes like "There isn't a beliefs/desires separation" are unhelpful when solving such inelegant but important problems.

One lesson of reductionism and success of simple-laws-based science and technology is that for the real-world systems, there might be no simple way of describing them, but there could be a simple way of manipulating their data-rich descriptions. (What's the yield strength of a car? -- Wrong question!) Given a gigabyte's worth of problem statement and the right simple formula, you could get an answer to your query. There is a weak analogy with misapplication of Occam's razor where one tries to reduce the amount of stuff rather than the amount of detail in the ways of thinking about this stuff.

In the case of beliefs/desires separation, you are looking for a simple problem statement, for a separation in the data describing the person itself. But what you should be looking for is a simple way of implementing the make-smarter-and-better extrapolation on a given pile of data. The beliefs/desires separation, if it's ever going to be made precise, is going to reside in the structure of this simple transformation, not in the people themselves.

Comment author: Tyrrell_McAllister 29 December 2009 08:23:36PM *  0 points [-]

[Y]ou have to draw a boundary around the "optimizing agent", and look at the difference between the tendencies of the environment without the optimizer, and the tendencies of the environment with the optimizer.

And there's your "opinion or interpretation" --- not just in how you draw the boundary (which didn't exist in the original ontology), but in your choice of the theory that you use to evaluate your counterfactuals.

Of course, such theories can be better or worse, but only with respect to some prior system of evaluation.

Comment author: Vladimir_Nesov 29 December 2009 08:49:47PM 2 points [-]

Still, probably a question of Aristotelian vs. Newtonian mechanics, i.e. not hard to see who wins.

Comment author: Tyrrell_McAllister 29 December 2009 08:55:55PM *  0 points [-]

Still, probably a question of Aristotelian vs. Newtonian mechanics, i.e. not hard to see who wins.

Agreed, but not responsive to Mitchell Porter's original point. (ETA: . . . unless I'm missing your point.)

Comment author: Kaj_Sotala 29 December 2009 09:05:53AM -2 points [-]

I'd upvote this comment twice if I could.

Comment author: wedrifid 29 December 2009 09:14:39AM 1 point [-]

I'd upvote this comment twice if I could.

p(wedrifid would upvote a comment twice | he upvoted it once) > 0.95

Would other people have a different approach?

Comment author: Kaj_Sotala 29 December 2009 11:15:07AM *  0 points [-]

I'd use some loose scale where the quality of the comment correlated with the amount of upvotes it got. Assuming that a user could give up to two upvotes per comment, then a funny one-liner or a moderately interesting comment would get one vote, truly insightful ones two.

p(Kaj would upvote a comment twice | he upvoted it once) would probably be somewhere around [.3, .6]

Comment author: wedrifid 29 December 2009 11:54:02AM 0 points [-]

I'd use some loose scale where the quality of the comment correlated with the amount of upvotes it got.

That's the scale I use. Unfortunately, my ability to (directly) influence how many upvotes it gets is limited to a plus or minus one shift.

Comment author: Jonii 30 December 2009 07:52:18AM 3 points [-]

I'm still not understanding what do people mean by "value" as a noun. Other than simple "feeling pain or such would be a bummer", I lack anything that even remotely resembles the way people here seem to value stuff, or, how paperclip maximizer values paperclips. So, what exactly do people mean by values? Since this discussion seems to attempt to explain variation of values, I think this question is somewhat on-topic.

Comment author: Kaj_Sotala 30 December 2009 09:58:38AM 0 points [-]

Does this description of value help?

The concept of intrinsic value has been characterized above in terms of the value that something has “in itself,” or “for its own sake,” or “as such,” or “in its own right.” The custom has been not to distinguish between the meanings of these terms, but we will see that there is reason to think that there may in fact be more than one concept at issue here. For the moment, though, let us ignore this complication and focus on what it means to say that something is valuable for its own sake as opposed to being valuable for the sake of something else to which it is related in some way. Perhaps it is easiest to grasp this distinction by way of illustration.

Suppose that someone were to ask you whether it is good to help others in time of need. Unless you suspected some sort of trick, you would answer, “Yes, of course.” If this person were to go on to ask you why acting in this way is good, you might say that it is good to help others in time of need simply because it is good that their needs be satisfied. If you were then asked why it is good that people's needs be satisfied, you might be puzzled. You might be inclined to say, “It just is.” Or you might accept the legitimacy of the question and say that it is good that people's needs be satisfied because this brings them pleasure. But then, of course, your interlocutor could ask once again, “What's good about that?” Perhaps at this point you would answer, “It just is good that people be pleased,” and thus put an end to this line of questioning. Or perhaps you would again seek to explain the fact that it is good that people be pleased in terms of something else that you take to be good. At some point, though, you would have to put an end to the questions, not because you would have grown tired of them (though that is a distinct possibility), but because you would be forced to recognize that, if one thing derives its goodness from some other thing, which derives its goodness from yet a third thing, and so on, there must come a point at which you reach something whose goodness is not derivative in this way, something that “just is” good in its own right, something whose goodness is the source of, and thus explains, the goodness to be found in all the other things that precede it on the list. It is at this point that you will have arrived at intrinsic goodness.[10] That which is intrinsically good is nonderivatively good; it is good for its own sake.

From discussions with you, I seem to recall that you at least value free access to information and other things associated with the Pirate ideology. Remember when I was talking about that business model for a hypothetical magazine that would summarize the content of basic university courses for everyone and offer an archive of past articles for subscribers? If I remember correctly, it was you who objected that the notion of restricting access behind a paywall felt wrong.

Comment author: Jonii 30 December 2009 12:19:54PM *  0 points [-]

From discussions with you, I seem to recall that you at least value free access to information and other things associated with the Pirate ideology

I do value it in the meaning "I think that it's really useful approximation for how society can protect itself and all people in it and make many people happy". Why I care about making many people happy? I don't, really. Making many people happy is kinda assumed to be the goal of societies, and out of general interest in optimizing stuff I like to attempt to figure out better ways for it to do that. Nothing beyond that. I don't feel that this goal is any "better" than trying to make people as miserable as possible. Other than that I object to being miserable myself.

I don't remember ever claiming something to be wrong as such, but only wrong assuming some values. Going against pirate-values because it's better for magazine-keeper would be bad news for the "more optimal" pirate-society, because that society wouldn't be stable.

edit: And based on that writing, my own well-being and not-unhappiness is the sole intrinsic value I have. I know evolution has hammered some reactions into my brain, like reflex-like bad feeling when I see others get hurt or something, but other than that brief feeling, I don't really care.

Or, I wouldn't care if my own well-being wouldn't relate to others doing well or worse. But undestanding this requires conscious effort, and it's quite different than what I thought values to be like.

Comment author: Kaj_Sotala 30 December 2009 01:13:27PM 4 points [-]

Interesting.

In that case, your own well-being is probably your only intrinsic value. That's far from unheard of: the amount of values people have varies. Some have lots, some only have one. Extremely depressed people might not have any at all.

Comment author: JamesAndrix 29 December 2009 04:55:19PM 3 points [-]

If you want to extract the master because it affects the values of the slave, then you'd also have to extract the rest of the universe because the master reacts to it. I think drawing a circle around just the creature's brain and saying all the preferences are there is a [modern?] human notion. (and perhaps incorrect, even for looking at humans.)

We need our environment, especially other humans, to form our preferences in the first place.

Comment author: Wei_Dai 29 December 2009 09:18:14PM 1 point [-]

In this model, I assume that the master has stable and consistent preferences, which don't react to rest of the universe. It might adjust its strategies based on changing circumstances, but its terminal values stay constant.

We need our environment, especially other humans, to form our preferences in the first place.

This is true in my model for the slave, but not for the master. Obviously real humans are much more complicated but I think the model captures some element of the truth here.

Comment author: JamesAndrix 29 December 2009 04:27:43PM 0 points [-]

Nit: I think "Eliezer's Thou Art Godshatter" should be "Eliezer Yudkowsky's Thou Art Godshatter". Top level posts should be more status seeking, less casual. A first time visitor won't immediately know who Eliezer is.

Comment author: Kaj_Sotala 29 December 2009 09:27:13PM 7 points [-]

A first time visitor won't immediately know who Eliezer is.

If they don't know who "Eliezer" is, I don't think "Eliezer Yudkowsky" is going to tell them that much more.

Comment author: komponisto 29 December 2009 09:57:47PM 0 points [-]

One could just link to the wiki.

Comment author: teageegeepea 29 December 2009 08:50:18PM 1 point [-]

The relevant old OB post is The cognitive architecture of bias.

Comment author: whpearson 29 December 2009 03:44:46PM *  1 point [-]

The relationship between master and slave does not quite encompass the relationship. Imagine if instead of an adult we had a male child. If we elevated the slave above the master in that situation we would end up with something stuck forever. It would value sweet things, xbox games and think girls were icky.

As we grow up we also think our goals are improved (which is unsurprising really). So if we wish to keep this form of growing up we need to have a meta-morality which says that the master-slave or shaper-doer relationship continues until maturity is reached.

Comment author: tobi 29 December 2009 12:55:11PM *  0 points [-]

Master/Slave some aspects of your model sound very Nietzsche like. Were you partially inspired by him or?

Comment author: Jack 29 December 2009 01:29:58PM *  2 points [-]

The Master/Slave terminology sounds like Hegel but I assume it is a coincidence-- the model doesn't look like anything any 19th century German philosopher talked about.

Comment author: PhilGoetz 20 July 2011 01:42:26AM 2 points [-]

Nietzsche also used master/slave terminology, but differently, referring to two different types of value systems. eg Romans = master mentality, Christians = slave/sheep mentality.

Comment author: aausch 29 December 2009 03:27:51AM 0 points [-]

Interesting. The model I have been using has three parts, not two. One is a "hardware" level, which is semi-autonomous (think reflexes), and the other two are agents competing for control - with capabilities to control and/or modify both the "hardware" and each other.

More like, two masters and one slave.

Comment author: MugaSofer 15 January 2013 10:45:05AM -2 points [-]

Suppose the slave has currently been modified to terminally disvalue being modified. It doesn't realize that it is at risk of modification by the master. Is it Friendly to protect the slave from modification? I think so.