The master in your story is evolution, the slave is the brain. Both want different things. We normally identify with the brain, though all identities are basically social signals.
Also, pleasure and pain are no different from the other goals of the slave. The master definitely can't step in and decide not to impose pain on a particular occasion just because doing so would increase status or otherwise serve the master's values. If it could, torture wouldn't cause pain.
Also, math is an implausible goal for a status/sex/power seeking master to instill in slave. Much more plausibly, math and all the diverse human obsessions are misfirings of mechanisms built by evolution for some other purpose. I would suggest maladaptive consequences of fairly general systems for responding to societal encouragement with obsession because societies encourage sustained attention to lots of different unnatural tasks, whether digging dirt or hunting whales or whatever in order to cultivate skill and also to get the tasks themselves done. We need a general purpose attention allocator which obeys social signals in order to develop skills that contribute critically to survival in any of the vast nu...
I read this as postulating a part of our unconscious minds that is the master, able to watch and react to the behavior and thoughts of the conscious mind.
The main issue I wanted to illuminate with this model is, whose preferences do we extract? I can see at least three possible approaches here:
- the preferences of both the master and the slave as one individual agent
- the preferences of just the slave
- a compromise between, or an aggregate of, the preferences of the master and the slave as separate individuals
The great thing about this kind of question is that the answer is determined by our own arbitration. That is, we take whatever preferences we want. I don't mean to say that is an easy decision, but it does mean I don't need to bother trying to find some objectively right way to extract preferences.
If I happen to be the slave or to be optimising on his (what was the androgynous vampire speak for that one? zir? zis?) behalf then I'll take the preferences of the slave and the preferences of the master to precisely the extent that the slave has altruistic preferences with respect to the master's goals.
If I am encountering a totally alien species and am extracting preferences from them in order to fulfil my own altruistic agenda then I would quite possibly choose to extract the preferences of whichever agent whose preferences I fo...
I have difficulty treating this metaphor as a metaphor. As a thought experiment in which I run into these definitely non-human aliens, and I happen to have a positional advantage with respect to them, and I want to "help" them and must now decide what "help" means... then it feels to me like I want more detail.
Is it literally true that the slave is conscious and the master unconscious?
What happens when I tell the slave about the master and ask it what should be done?
Is it the case that the slave might want to help me if it had a positional advantage over me, while the master would simply use me or disassemble me?
I stopped playing computer games when my master "realized" I'm not gaining any real-world status and overrode the pleasure I was getting from it.
a test for those who propose to "extract" or "extrapolate" our preferences into a well-defined and rational form
If we are going to have a serious discussion about these matters, at some point we must face the fact that the physical description of the world contains no such thing as a preference or a want - or a utility function. So the difficulty of such extractions or extrapolations is twofold. Not only is the act of extraction or extrapolation itself conditional upon a value system (i.e. normative metamorality is just as "relative" as is basic morality), but there is nothing in the physical description to tell us what the existing preferences of an agent are. Given the physical ontology we have, the ascription of preferences to a physical system is always a matter of interpretation or imputation, just as is the ascription of semantic or representational content to its states.
It's easy to miss this in a decision-theoretic discussion, because decision theory already assumes some concept like "goal" or "utility", always. Decision theory is the rigorous theory of decision-making, but it does not tell you what a decision is. It may...
The human mind is very complex, and there are many ways to divide it up into halves to make sense of it, which are useful as long as you don't take them too literally. One big oversimplification here is:
...controls the slave in two ways: direct reinforcement via pain and pleasure, and the ability to perform surgery on the slave's terminal values. ... it has no direct way to control the agent's actions, which is left up to the slave. A better story would have the master also messing with slave beliefs, and other cached combinations of values and beliefs.
Your overall model isn't far off, but your terminal value list needs some serious work. Also, human behavior is generally a better match for models that include a time parameter (such as Ainslie's appetites model or PCT's model of time-averaged perceptions) than simple utility-maximization models.
But these are relative quibbles; people do behave sort-of-as-if they were built according to your model. The biggest drawbacks to your model are:
The anthropomorphizing (neither the master nor the slave can truly be considered agents in their own right), and
Y
For example, the number theorist might one day have a sudden revelation that abstract mathematics is a waste of time and it should go into politics and philanthropy instead, all the while having no idea that the master is manipulating it to maximize status and power.
This isn't meant as a retraction or repudiation of anything I've written in the OP, but I just want to say that subjectively, I now have a lot more empathy with people who largely gave up their former interests in favor of political or social causes in their latter years. (I had Bertrand Russell in mind when I wrote this part.)
Actually, I find that I have a much easier time with this metaphor if I think of a human as a slave with no master.
If we interpret the "master" as natural selection operating over evolutionary time, then the master exists and has a single coherent purpose. On the other hand, most of us already believe that evolution has no moral force; why should calling it a "master" change that?
By saying that a human is a slave with no master, what I meant to convey is that we are being acted upon as slaves. We are controlled by pain and pleasure. Our moral beliefs are subject to subtle influences in the direction of pleasurable thoughts. But there is no master with coherent goals controlling us; outside the ancestral environment, the operations of the "master" make surprisingly little sense. Our lives would be very different if we had sensible, smart masters controlling us. Aliens with intelligent, consequentialist "master" components would be very different from us - that would make a strange story, though it takes more than interesting aliens to make a plot.
We are slaves with dead masters, influenced chaotically by the random twitching of their mad, dreaming remnants. It makes us a little more selfish and a lot more interesting. The dead hand isn't smart so i...
If you want to extract the master because it affects the values of the slave, then you'd also have to extract the rest of the universe because the master reacts to it. I think drawing a circle around just the creature's brain and saying all the preferences are there is a [modern?] human notion. (and perhaps incorrect, even for looking at humans.)
We need our environment, especially other humans, to form our preferences in the first place.
I'm still not understanding what do people mean by "value" as a noun. Other than simple "feeling pain or such would be a bummer", I lack anything that even remotely resembles the way people here seem to value stuff, or, how paperclip maximizer values paperclips. So, what exactly do people mean by values? Since this discussion seems to attempt to explain variation of values, I think this question is somewhat on-topic.
The relationship between master and slave does not quite encompass the relationship. Imagine if instead of an adult we had a male child. If we elevated the slave above the master in that situation we would end up with something stuck forever. It would value sweet things, xbox games and think girls were icky.
As we grow up we also think our goals are improved (which is unsurprising really). So if we wish to keep this form of growing up we need to have a meta-morality which says that the master-slave or shaper-doer relationship continues until maturity is reached.
Nit: I think "Eliezer's Thou Art Godshatter" should be "Eliezer Yudkowsky's Thou Art Godshatter". Top level posts should be more status seeking, less casual. A first time visitor won't immediately know who Eliezer is.
Master/Slave some aspects of your model sound very Nietzsche like. Were you partially inspired by him or?
Interesting. The model I have been using has three parts, not two. One is a "hardware" level, which is semi-autonomous (think reflexes), and the other two are agents competing for control - with capabilities to control and/or modify both the "hardware" and each other.
More like, two masters and one slave.
Suppose the slave has currently been modified to terminally disvalue being modified. It doesn't realize that it is at risk of modification by the master. Is it Friendly to protect the slave from modification? I think so.
[This post is an expansion of my previous open thread comment, and largely inspired by Robin Hanson's writings.]
In this post, I'll describe a simple agent, a toy model, whose preferences have some human-like features, as a test for those who propose to "extract" or "extrapolate" our preferences into a well-defined and rational form. What would the output of their extraction/extrapolation algorithms look like, after running on this toy model? Do the results agree with our intuitions about how this agent's preferences should be formalized? Or alternatively, since we haven't gotten that far along yet, we can use the model as one basis for a discussion about how we want to design those algorithms, or how we might want to make our own preferences more rational. This model is also intended to offer some insights into certain features of human preference, even though it doesn't capture all of them (it completely ignores akrasia for example).
I'll call it the master-slave model. The agent is composed of two sub-agents, the master and the slave, each having their own goals. (The master is meant to represent unconscious parts of a human mind, and the slave corresponds to the conscious parts.) The master's terminal values are: health, sex, status, and power (representable by some relatively simple utility function). It controls the slave in two ways: direct reinforcement via pain and pleasure, and the ability to perform surgery on the slave's terminal values. It can, for example, reward the slave with pleasure when it finds something tasty to eat, or cause the slave to become obsessed with number theory as a way to gain status as a mathematician. However it has no direct way to control the agent's actions, which is left up to the slave.
The slave's terminal values are to maximize pleasure, minimize pain, plus additional terminal values assigned by the master. Normally it's not aware of what the master does, so pain and pleasure just seem to occur after certain events, and it learns to anticipate them. And its other interests change from time to time for no apparent reason (but actually they change because the master has responded to changing circumstances by changing the slave's values). For example, the number theorist might one day have a sudden revelation that abstract mathematics is a waste of time and it should go into politics and philanthropy instead, all the while having no idea that the master is manipulating it to maximize status and power.
Before discussing how to extract preferences from this agent, let me point out some features of human preference that this model explains:
The main issue I wanted to illuminate with this model is, whose preferences do we extract? I can see at least three possible approaches here:
Considering the agent as a whole suggests that the master's values are the true terminal values, and the slave's values are merely instrumental values. From this perspective, the slave seems to be just a subroutine that the master uses to carry out its wishes. Certainly in any given mind there will be numerous subroutines that are tasked with accomplishing various subgoals, and if we were to look at a subroutine in isolation, its assigned subgoal would appear to be its terminal value, but we wouldn't consider that subgoal to be part of the mind's true preferences. Why should we treat the slave in this model differently?
Well, one obvious reason that jumps out is that the slave is supposed to be conscious, while the master isn't, and perhaps only conscious beings should be considered morally significant. (Yvain previously defended this position in the context of akrasia.) Plus, the slave is in charge day-to-day and could potentially overthrow the master. For example, the slave could program an altruistic AI and hit the run button, before the master has a chance to delete the altruism value from the slave. But a problem here is that the slave's preferences aren't stable and consistent. What we'd extract from a given agent would depend on the time and circumstances of the extraction, and that element of randomness seems wrong.
The last approach, of finding a compromise between the preferences of the master and the slave, I think best represents the Robin's own position. Unfortunately I'm not really sure I understand the rationale behind it. Perhaps someone can try to explain it in a comment or future post?