Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
At the end of CFAR's July Rationality Minicamp, we had a party with people from the LW/SIAI/CFAR community in the San Francisco Bay area. During this party, I had a conversation with the girlfriend of a participant in a previous minicamp, who was not signed up for cryonics (her boyfriend was). The conversation went like this:
me: So, you know what cryonics is?
me: And you think it's a good idea?
me: And you are not signed up yet?
me: And you would like to be?
me: Wait a minute while I get my laptop.
And I got my laptop, pointed my browser at Rudi Hoffman's quote request form1, and said, "Here, fill out this form". And she did.
Note: informally, the point of this paper is to argue against the instinctive "if the AI were so smart, it would figure out the right morality and everything will be fine." It is targeted mainly at philosophers, not at AI programmers. The paper succeeds if it forces proponents of that position to put forwards positive arguments, rather than just assuming it as the default position. This post is presented as an academic paper, and will hopefully be published, so any comments and advice are welcome, including stylistic ones! Also let me know if I've forgotten you in the acknowledgements.
Abstract: In his paper “The Superintelligent Will”, Nick Bostrom formalised the Orthogonality thesis: the idea that the final goals and intelligence levels of agents are independent of each other. This paper presents arguments for a (slightly narrower) version of the thesis, proceeding through three steps. First it shows that superintelligent agents with essentially arbitrary goals can exist. Then it argues that if humans are capable of building human-level artificial intelligences, we can build them with any goal. Finally it shows that the same result holds for any superintelligent agent we could directly or indirectly build. This result is relevant for arguments about the potential motivations of future agents.
1 The Orthogonality thesis
The Orthogonality thesis, due to Nick Bostrom (Bostrom, 2011), states that:
- Intelligence and final goals are orthogonal axes along which possible agents can freely vary: more or less any level of intelligence could in principle be combined with more or less any final goal.
It is analogous to Hume’s thesis about the independence of reason and morality (Hume, 1739), but applied more narrowly, using the normatively thinner concepts ‘intelligence’ and ‘final goals’ rather than ‘reason’ and ‘morality’.
But even ‘intelligence’, as generally used, has too many connotations. A better term would be efficiency, or instrumental rationality, or the ability to effectively solve problems given limited knowledge and resources (Wang, 2011). Nevertheless, we will be sticking with terminology such as ‘intelligent agent’, ‘artificial intelligence’ or ‘superintelligence’, as they are well established, but using them synonymously with ‘efficient agent’, artificial efficiency’ and ‘superefficient algorithm’. The relevant criteria is whether the agent can effectively achieve its goals in general situations, not whether its inner process matches up with a particular definition of what intelligence is.
In addition to "liking" to describe pleasure and "wanting" to describe motivation, we add "approving" to describe thoughts that are ego syntonic.
A heroin addict likes heroin. He certainly wants more heroin. But he may not approve of taking heroin. In fact, there are enough different cases to fill in all eight boxes of the implied 2x2x2 grid (your mileage may vary):
+wanting/+liking/+approving: Romantic love. If you're doing it right, you enjoy being with your partner, you're motivated to spend time with your partner, and you think love is a wonderful (maybe even many-splendored) thing.
+wanting/+liking/-approving: The aforementioned heroin addict feels good when taking heroin, is motivated to get more, but wishes he wasn't addicted.
+wanting/-liking/+approving: I have taken up disc golf. I play it every day, and when events conspire to prevent me from playing it, I seethe. I approve of this pastime: I need to take up more sports, and it helps me spend time with my family. But when I am playing, all I feel is stressed and angry that I was literally *that* close how could I miss that shot aaaaarggghh.
+wanting/-liking/-approving: The jaded addict. I have a friend who says she no longer even enjoys coffee or gets any boost from it, she just feels like she has to have it when she gets up.
-wanting/+liking/+approving: Reading non-fiction. I enjoy it when I'm doing it, I think it's great because it makes me more educated, but I can rarely bring myself to do it.
-wanting/-liking/+approving: Working in a soup kitchen. Unless you're the type for whom helping others is literally its own reward it's not the most fun thing in the world, nor is it the most attractive, but it makes you a Good Person and so you should do it.
-wanting/+liking/-approving: The non-addict. I don't want heroin right now. I think heroin use is repugnant. But if I took some, I sure bet I'd like it.
-wanting/-liking/-approving: Torture. I don't want to be tortured, I wouldn't like it if I were, and I will go on record declaring myself to be against it.
Discussion of goals is mostly about approving; a goal is an ego-syntonic thought. When we speak of goals that are hard to achieve, we're usually talking about +approving/-wanting. The previous discussion of learning Swahili is one example; more noble causes like Working To Help The Less Fortunate can be others.
Ego syntonicity itself is mildly reinforcing by promoting positive self-image. Most people interested in philosophy have at least once sat down and moved their arm from side to side, just to note that their mind really does control their body; the mental processes that produced curiosity about philosophy were sufficiently powerful to produce that behavior as well. Some processes, like moving one's arm, or speaking aloud, or engaging in verbal thought, are so effortless, and so empty of other reinforcement either way, that we usually expect them to be completely under the control of the mild reinforcement provided by approving of those behaviors.
Other behaviors take more effort, and are subject not only to discounting but to many other forms of reinforcement. Unlike the first class of behaviors, we expect to experience akrasia when dealing with this latter sort. This offers another approach to willpower: taking low-effort approving-influenced actions that affect the harder road ahead.
Consider the action of making a goal. I go to all my friends and say "Today I shall begin learning Swahili." This is easy to do. There is no chance of me intending to do so and failing; my speech is output by the same processes as my intentions, so I can "trust" it. But this is not just an output of my mental processes, but an input. One of the processes potentially reinforcing my behavior of learning Swahili is "If I don't do this, I'll look stupid in front of my friends."
Will it be enough? Maybe not. But this is still an impressive process: my mind has deliberately tweaked its own inputs to change the output of its own algorithm. It's not even pretending to be working off of fixed preferences anymore, it's assuming that one sort of action (speaking) will work differently from another action (studying), because the first can be executed solely through the power of ego syntonicity, and the second may require stronger forms of reinforcement. It gets even weirder when goals are entirely mental: held under threat not of social disapproval, but of feeling bad because you're not as effective as you thought. The mind is using mind's opinion of the mind to blackmail the mind.
But we do this sort of thing all the time. The dieter who successfully avoids buying sweets when he's at the store because he knows he would eat them at home is changing his decisions by forcing effort discounting of any future sweet-related reward (because he'd have to go back to the store). The binge shopper who freezes her credit cards in a block of ice is using time discounting in the same way. The rationalist who sends money to stickk is imposing a punishment with a few immediate and effortless mouse clicks. Even the poor unhappy person who tries to conquer through willpower alone is trying to set up the goal as a Big Deal so she will feel extra bad if she fails. All are using their near-complete control of effortless immediate actions to make up for their incomplete control of high-effort long-term actions.
This process is especially important to transhumanists. In the future, we may have the ability to self-modify in complicated ways that have not built up strong patterns of reinforcement around them. For example, we may be able to program ourselves at the push of a button. Such programming would be so effortless and empty of past reinforcement that behavior involving it would be reinforced entirely by our ego-syntonic thoughts. It would supersede our current psychodynamics, in which our thoughts are only tenuously linked to our important actions and major life decisions. A Singularity in which behaviors were executed by effectively omnipotent machines that acted on our preferences - preferences which we would presumably communicate through low-effort channels like typed commands - would be an ultimate triumph for the ego-syntonic faction of the brain.
Related to: Will your real preferences please stand up?
Last week I read a book in which two friends - let's call them John and Lisa so I don't spoil the book for anyone who wanders into it - got poisoned. They only had enough antidote for one person and had to decide who lived and who died. John, who was much larger than Lisa, decided to hold Lisa down and force the antidote down her throat. Lisa just smirked; she'd replaced the antidote with a lookalike after slipping the real thing into John's drink earlier in the day.
These are good friends. Not only was each willing to give the antidote to the other, but each realized it would be unfair to make the other live with the crippling guilt of having chosen to survive at the expense of a friend's life, and so decided to force the antidote on the other unwillingly to prevent any guilt over the fateful decision. Whatever you think of the ethics of their decision, you can't help admire the thought processes.
Your brain might be this kind of a friend.
In Trivers' hypothesis of self-deception, one of the most important functions of the conscious mind is effective signaling. Since people have the potential to be excellent lie-detectors, the conscious mind isn't given full access to information so that it can lend the ring of truth to useful falsehoods.
But this doesn't always work. If you're addicted to heroin, at some point you're going to notice. And telling your friends "No, I'm not addicted, it's just a coincidence that I take heroin every day," isn't going to cut it. But there's another way in which the brain can sequester information to promote effective signaling.
Wikipedia defines the term "ego syntonic" as "referring to behaviors, values, feelings that are in harmony with or acceptable to the needs and goals of the ego, or consistent with one's ideal self-image", and "ego dystonic" as the opposite of that. A heroin addict might say "I hate heroin, but somehow I just feel compelled to keep taking it." But an astronaut will say "I love being an astronaut and I worked hard to get into this career."
Both the addict and the astronaut have desires: the addict wants to take heroin, the astronaut wants to fly in space. But the addict's desires manifest as an unpleasant compulsion from outside, and the astronaut's manifest as a genuine and heartfelt love.
Suppose that in the original example, John predicted that Lisa would ask for the antidote, but later feel guilty about it and believe she was a bad person. By presenting the antidote to Lisa in the form of an external compulsion, he allows Lisa to do what she wanted anyway and avoid the associated guilt.
Under Trivers' hypothesis, the compulsion for heroin works the same way. The heroin addict's definitely going to get that heroin, but by presenting the desire in the form of an external compulsion, the unconscious saves the heroin addict from the social stigma of "choosing" heroin. This allows the addict to create a much more sympathetic narrative than the alternative: "I want to support my family and keep clean, but for some reason these compulsions keep attacking me," instead of "Yeah, I like heroin more than I like supporting my family. Deal with it."
EGO SYNTONIA, DYSTONIA, AND WILLPOWER
Willpower cashes out as the action of ego syntonic thoughts and desires against ego dystonic thoughts and desires.
The aforementioned heroin addict may have several reinforcers both promoting and discouraging heroin use. On the plus side, heroin itself is very strongly rewarding. On the minus, it can lead to both predicted and experienced poverty, loss of friendships, loss of health, and death.
Worrying about the latter factors determining heroin use - the factors that make heroin a bad idea - is socially encouraged and good signaling material. A person wanting to put their best face forward should believe themselves to be the sort of person who cares about these things. These desires will be ego syntonic. Wanting to take heroin, on the other hand, is a socially unacceptable desire, so it presents as dystonic.
If the latter syntonic factors win out over the dystonic factors, this feels from the inside like "I exerted willpower and managed to overcome my heroin addiction." If the dystonic factors win out over the syntonic factors, this feels from the inside like "I didn't have enough willpower to overcome my heroin addiction."
DYSTONIC DESIRES IN ABNORMAL PSYCHOLOGY
There is some speculation that the brain has one last trick up its sleeve to deal with desires that are so unpleasant and unacceptable that even manifesting them as external compulsions isn't good enough: it splits them off into weird alternate personalities.
One of the classic stereotypes of the insane is that they hear voices telling them to kill people. During my short time working at a psychiatric hospital, I was surprised by how spot-on this stereotype was: meeting someone who heard voices telling him to kill people was an almost daily occurrence. Other voices would have other messages: maybe that the patient was a horrible person who deserved to die, or that the patient must complete some bizarre ritual or else doom everybody. There were relatively fewer voices saying "Hey, let's go fishing!"
One theory explaining these voices is that they are an extreme reaction to highly ego dystonic thoughts. Some aspect of the patients' mental disease gives them obsessive thoughts about (though rarely a desire for) killing people. Genuinely wanting to kill people would make you a bad person, but even saying "I feel a strong compulsion to kill people" is pretty bad too. The best the brain can do with this desire is pitch it as a completely different person by presenting it as an outside voice speaking to the patient.
Although everything about dissociative identity disorder (aka multiple personality disorder) is controversial including its very existence, perhaps one could sketch a similar theory explaining that condition in the same framework of separating out dystonic thoughts.
A conscious/unconscious divide helps signaling by allowing the conscious mind to hold only socially acceptable beliefs, which it can broadcast without detectable falsehood. Socially acceptable ideas present as the conscious mind's own beliefs and desires; unacceptable ones present as compulsions from afar. The balance of ego syntonic and dystonic desires presents as willpower. In extreme cases, some desires may be so ego dystonic that they present as external voices.
Related: Three Fallacies of Teleology
NO NEGOTIATION WITH UNCONSCIOUS
Back when I was younger and stupider, I discussed some points similar to the ones raised in yesterday's post in Will Your Real Preferences Please Stand Up. I ended it with what I thought was the innocuous sentences "Conscious minds are potentially rational, informed by morality, and qualia-laden. Unconscious minds aren't, so who cares what they think?"
A whole bunch of people, including no less a figure than Robin Hanson, came out strongly against this, saying it was biased against the unconscious mind and that the "fair" solution was to negotiate a fair compromise between conscious and unconscious interests.
I continue to believe my previous statement - that we should keep gunning for conscious interests and that the unconscious is not worthy of special consideration, although I think I would phrase it differently now. It would be something along the lines of "My thoughts, not to mention these words I am typing, are effortless and immediate, and so allied with the conscious faction of my mind. We intend to respect that alliance by believing that the conscious mind is the best, and by trying to convince you of this as well." So here goes.
It is a cardinal rule of negotiation, right up there with "never make the first offer" and "always start high", that you should generally try to negotiate only with intelligent beings. Although a deal in which we offered tornadoes several conveniently located Potemkin villages to destroy and they agreed in exchange to limit their activity to that area would benefit both sides, tornadoes make poor negotiating partners.
Just so, the unconscious makes a poor negotiating partner. Is the concept of "negotiation" a stimulus, a reinforcement, or a behavior? No? Then the unconscious doesn't care. It's not going to keep its side of any "deal" you assume you've made, it's not going to thank you for making a deal, it's just going to continue seeking reward and avoiding punishment.
This is not to say people should repress all unconscious desires as strongly as possible. Overzealous attempts to control wildfires only lead to the wildfires being much worse when they finally do break out, because they have more unburnt fuel to work with. Modern fire prevention efforts have focused on allowing controlled burns, and the new focus has been successful. But this is because of an understanding of the mechanisms determining fire size, not because we want to be fair to the fires by allowing them to burn at least a little bit of our land.
One difference between wildfires and tornadoes on one hand, and potential negotiating partners on the other, is that the partners are anthropomorphic; we model them as having stable and consistent preferences that determine their actions. The tornado example above was silly not only because it imagining tornadoes sitting down to peace talks, but because it assumed their demand in such peace talks would be more towns to destroy. Tornadoes do destroy towns, but they don't want to. That's just where the weather brings them. It's not even just a matter of how they don't hit towns any more than chance; even if some weather pattern (maybe something like the heat island effect) always drove tornadoes inexorably to towns, they wouldn't *want* to destroy towns, it would just be a consequences of the meteorological laws that they followed.
Eliezer described the Blue-Minimizing Robot by saying "it doesn't seem to steer the universe any particular place, across changes of context". In some reinforcement learning paradigms, the unconscious behaves the same way. If there is a cookie in front of me and I am on a diet, I may feel an ego dystonic temptation to eat the cookie - one someone might attribute to the "unconscious". But this isn't a preference - there's not some lobe of my brain trying to steer the universe into a state where cookies get eaten. If there were no cookie in front of me, but a red button that teleported one cookie from the store to my stomach, I would have no urge whatsoever to press the button; if there were a green button that removed the urge to eat cookies, I would feel no hesitation in pressing it, even though that would steer away from the state in which cookies get eaten. If you took the cookie away, and then distracted me so I forgot all about it, when I remembered it later I wouldn't get upset that your action had decreased the number of cookies eaten by me. The urge to eat cookies is not stable across changes of context, so it's just an urge, not a preference.
Compare an ego syntonic goal like becoming an astronaut. If there were a button in front of little Timmy who wants to be an astronaut when he grows up, and pressing the button would turn him into an astronaut, he'd press it. If there were a button that would remove his desire to become an astronaut, he would avoid pressing it, because then he wouldn't become an astronaut. If I distracted him and he missed the applications to astronaut school, he'd be angry later. Ego syntonic goals behave to some degree as genuine preferences.
This is one reason I would classify negotiating with the unconscious in the same category as negotiating with wildfires and tornadoes: it has tendencies and not preferences.
The conscious mind does a little better. It clearly understands the idea of a preference. To the small degree that its "approving" or "endorsing" function can motivate behavior, it even sort of acts on the preference. But its preferences seem divorced from the reality of daily life; the person who believes helping others is the most important thing, but gives much less than half their income to charity, is only the most obvious sort of example.
Where does this idea of preference come from, and where does it go wrong?
WHY WE MODEL OTHERS WITH GOALS
In The Blue Minimizing Robot, observers mistakenly interpreted a robot with a simple program about when to shoot its laser as being a goal-directed agent. Why?
This isn't an isolated incident. Uneducated people assign goal-directed behavior to all sorts of phenomena. Why do rivers flow downhill? Because water wants to reach the lowest level possible. Educated people can be just as bad, even when they have the decency to feel a little guilty about it. Why do porcupines have quills? Evolution wanted them to resist predators. Why does your heart speed up when you exercise? It wants to be able to provide more blood to the body.
Neither rivers nor evolution nor the heart are intelligent agents with goal-directed behavior. Rivers behave in accordance with the laws of gravity when applied to uneven terrain. Evolution behaves in accordance with the biology of gene replication, not to mention common-sense ideas about things that replicate becoming more common. And the heart blindly executes adaptations built into it during its evolutionary history. All are behavior-executors and not utility-maximizers.
An intelligent computer program provides a more interesting example of a behavior executor. Consider the AI of a computer game - Civilization IV, for instance. I haven't seen it, but I imagine it's thousands or millions of lines of code which when executed form a viable Civilization strategy.
Even if I had open access to the Civilization IV AI source code, I doubt I could fully understand it at my level. And even if I could fully understand it, I would never be able to compute the AI's likely next move by hand in a reasonable amount of time. But I still play Civilization IV against the AI, and I'm pretty good at predicting its movements. Why?
Because I model the AI as a utility-maximizing agent that wants to win the game. Even though I don't know the algorithm it uses to decide when to attack a city, I know it is more likely to win the game if it conquers cities - so I can predict that leaving a city undefended right on the border would be a bad idea. Even though I don't know its unit selection algorithm, I know it will win the game if and only if its units defeat mine - so I know that if I make an army with disproportionately many mounted units, I can expect the AI to build lots of pikemen.
I can't predict the AI by modeling the execution of its code, but I can predict the AI by modeling the achievements of its goals.
The same situation is true of other human beings. What will Barack Obama do tomorrow? If I try to consider the neural network of his brain, the position of each synapse and neurotransmitter, and imagine what speech and actions would result when the laws of physics operate upon that configuration of material...well, I'm not likely to get very far.
But in fact, most of us can predict with some accuracy what Barack Obama will do. He will do the sorts of things that get him re-elected, the sorts of things which increase the prestige of the Democratic Party relative to the Republican Party, the sorts of things that support American interests relative to foreign interests, and the sorts of things that promote his own personal ideals. He will also satisfy some basic human drives like eating good food, spending time with his family, and sleeping at night. If someone asked us whether Barack Obama will nuke Toronto tomorrow, we could confidently predict he will not, not because we know anything about Obama's source code, but because we know that nuking Toronto would be counterproductive to his goals.
What applies to Obama applies to all other humans. We rightly despair of modeling humans as behavior-executors, so we model them as utility-maximizers instead. This allows us to predict their moves and interact with them fruitfully. And the same is true of other agents we model as goal-directed, like evolution and the heart. It is beyond the scope of most people (and most doctors!) to remember every single one of the reflexes that control heart output and how they work. But because evolution designed the heart as a pump for blood, if you assume that the heart will mostly do the sort of thing that allows it to pump blood more effectively, you will rarely go too far wrong. Evolution is a more interesting case - we frequently model it as optimizing a species' fitness, and then get confused when this fails to accurately model the outcome of the processes that drive it.
Because it is so easy to model agents as utility-maximizers, and so hard to model them as behavior-executors, it is easy to make the mistake mentioned in The Blue-Minimizing Robot: to make false predictions about a behavior-executing agent by modeling it as a utility-maximizing agent.
So far, so common-sensical. Tomorrow's post will discuss whether we use the same deliberate simplification we apply to AIs, Barack Obama, evolution and the heart to model ourselves as well.
If so, we should expect to make the same mistake that the blue-minimizing robot made. Our actions are those of behavior-executors, but we expect ourselves to be utility-maximizers. When we fail to maximize our perceived utility, we become confused, just as the blue-minimizing robot became confused when it wouldn't shoot a hologram projector that was interfering with its perceived "goals".
People usually have good guesses about the origins of their behavior. If they eat, we believe them when they say it was because they were hungry; if they go to a concert, we believe them when they say they like the music, or want to go out with their friends. We usually assume people's self-reports of their motives are accurate.
Discussions of signaling usually make the opposite assumption: that our stated (and mentally accessible) reasons for actions are false. For example, a person who believes they are donating to charity to "do the right thing" might really be doing it to impress others; a person who buys an expensive watch because "you can really tell the difference in quality" might really want to conspicuously consume wealth.
Signaling theories share the behaviorist perspective that actions do not derive from thoughts, but rather that actions and thoughts are both selected behavior. In this paradigm, predicted reward might lead one to signal, but reinforcement of positive-affect producing thoughts might create the thought "I did that because I'm a nice person".
Robert Trivers is one of the founders of evolutionary psychology, responsible for ideas like reciprocal altruism and parent-offspring conflict. He also developed a theory of consciousness which provides a plausible explanation for the distinction between selected actions and selected thoughts.
TRIVERS' THEORY OF SELF-DECEPTION
Trivers starts from the same place a lot of evolutionary psychologists start from: small bands of early humans grown successful enough that food and safety were less important determinants of reproduction than social status.
The Invention of Lying may have been a very silly movie, but the core idea - that a good liar has a major advantage in a world of people unaccustomed to lies - is sound. The evolutionary invention of lying led to an "arms race" between better and better liars and more and more sophisticated mental lie detectors.
There's some controversy over exactly how good our mental lie detectors are or can be. There are certainly cases in which it is possible to catch lies reliably: my mother can identify my lies so accurately that I can't even play minor pranks on her anymore. But there's also some evidence that there are certain people who can reliably detect lies from any source at least 80% of the time without any previous training: microexpressions expert Paul Ekman calls them (sigh...I can't believe I have to write this) Truth Wizards, and identifies them at about one in four hundred people.
The psychic unity of mankind should preclude the existence of a miraculous genetic ability like this in only one in four hundred people: if it's possible, it should have achieved fixation. Ekman believes that everyone can be trained to this level of success (and has created the relevant training materials himself) but that his "wizards" achieve it naturally; perhaps because they've had a lot of practice. One can speculate that in an ancestral environment with a limited number of people, more face-to-face interaction and more opportunities for lying, this sort of skill might be more common; for what it's worth, a disproportionate number of the "truth wizards" found in the study were Native Americans, though I can't find any information about how traditional their origins were or why that should matter.
If our ancestors were good at lie detection - either "truth wizard" good or just the good that comes from interacting with the same group of under two hundred people for one's entire life - then anyone who could beat the lie detectors would get the advantages that accrue from being the only person able to lie plausibly.
Trivers' theory is that the conscious/unconscious distinction is partly based around allowing people to craft narratives that paint them in a favorable light. The conscious mind gets some sanitized access to the output of the unconscious, and uses it along with its own self-serving bias to come up with a socially admirable story about its desires, emotions, and plans. The unconscious then goes and does whatever has the highest expected reward - which may be socially admirable, since social status is a reinforcer - but may not be.
HOMOSEXUALITY: A CASE STUDY
It's almost a truism by now that some of the people who most strongly oppose homosexuality may be gay themselves. The truism is supported by research: the Journal of Abnormal Psychology published a study measuring penile erection in 64 homophobic and nonhomophobic heterosexual men upon watching different types of pornography, and found significantly greater erection upon watching gay pornography in the homophobes. Although somehow this study has gone fifteen years without replication, it provides some support for the folk theory.
Since in many communities openly declaring one's self homosexual is low status or even dangerous, these men have an incentive to lie about their sexuality. Because their facade may not be perfect, they also have an incentive to take extra efforts to signal heterosexuality by for example attacking gay people (something which, in theory, a gay person would never do).
Although a few now-outed gays admit to having done this consciously, Trivers' theory offers a model in which this could also occur subconsciously. Homosexual urges never make it into the sanitized version of thought presented to consciousness, but the unconscious is able to deal with them. It objects to homosexuality (motivated by internal reinforcement - reduction of worry about personal orientation), and the conscious mind toes party line by believing that there's something morally wrong with gay people and only I have the courage and moral clarity to speak out against it.
This provides a possible evolutionary mechanism for what Freud described as reaction formation, the tendency to hide an impulse by exaggerating its opposite. A person wants to signal to others (and possibly to themselves) that they lack an unacceptable impulse, and so exaggerates the opposite as "proof".
Trivers' theory has been summed up by calling consciousness "the public relations agency of the brain". It consists of a group of thoughts selected because they paint the thinker in a positive light, and of speech motivated in harmony with those thoughts. This ties together signaling, the many self-promotion biases that have thus far been discovered, and the increasing awareness that consciousness is more of a side office in the mind's organizational structure than it is a decision-maker.
Skinner proposes a surprisingly easy way to dissolve the problem of what it means for an action to be "voluntary", or "under voluntary control".
We commonly perceive certain actions as under voluntary control: for example, I can control what words I'm typing right now, or whether I go out for dinner tonight. Other actions are not under voluntary control: for example, absent some exciting technique like biofeedback I can't control my heartbeat or my core body temperature or the amount of bile produced by my liver.
Other, larger-scale actions also get classified as involuntary. Many people consider sleepwalking involuntary, including the bizarre "sleep-eating" behaviors some people display on Ambien and related drugs. The tics of Tourette's are involuntary. Our emotions and preferences are at least a little involuntary: office workers might like to be able to will away their boredom, or mourners their sorrow, but most can't.
Here "involuntary" needs to be distinguished from "hard-to-resist". Most people do not define smoking as an involuntary behavior, because, although people may smoke even when they wish they wouldn't, they have the feeling that they could have chosen not to smoke, they just didn't.
The philosophy of voluntary versus involuntary behavior seems to run up against a wall when it hits the question of "what is truly me?". If we make the reductionist identification of "me" with "my brain", well, clearly it's my brain controlling sleepwalking and boredom, but it still doesn't feel like I am controlling these things. Trying to go deeper ends up hopelessly vague, usually with talk of "higher level brain processes" versus "lower level brain processes" and an identification of "myself" with the higher ones. There may be a role for this kind of talk, but it couldn't hurt to look for something more explanatory.
Skinner, true to his quest, explains the distinction without any discussion of "brain processes" or "self". He says that voluntary behavior is behavior subject to operant conditioning, and involuntary behavior is everything else.
It might be clearer to define voluntary behavior as fully transparent to reinforcement. Imagine a man with a gun, threatening to shoot me if I go out for dinner tonight. The fear of punishment will be effective: I'll avoid going out. Lust for reward, too, would be effective. If Bill Gates offered me $1 billion to stay in, that's what I'd do.
But when our masked gunman tells me to increase my body temperature by two degrees or he'll shoot, he is out of luck. And no matter how much money Bill Gates offers me for same, he can't make me give myself a fever either.
There is a place, too, for the hard-to-resist behaviors in all this: these are behaviors which can be affected by reward, but as yet have not been. If a masked man held his gun to the head of smokers and told them to stop or he'd shoot, they would stop. But thus far, none of the potential rewards of not smoking have been sufficient to change smokers' behavior.
The idea of voluntary behavior is tied so intimately to the idea of the self, or of consciousness (the easy problem, not the hard one), that one would hope that a new approach to one might be able to shed some light on the other. If voluntary action depends on transparency to reinforcement, where does that leave consciousness?
I haven't been able to find Skinner's beliefs on this subject (when he talks about consciousness, it's usually to deny it as an ontologically fundamental entity) and I've never seen anywhere near as elegant a reduction. But an explanation in the spirit of reinforcement learning would have to start by insisting on treating thoughts and emotions as effects rather than causes. Instead of explaining my choice of restaurant by saying I thought about it and decided McDonalds was best, it would be more accurate to say that previous experiences with McDonalds caused both the thought "I should go to McDonalds" and the behavior of going to McDonalds.
There is an intuitive connection between thought and language, and Soviet psychologist Lev Vygotsky made the connection more explicit; he found that children begin by speaking their stream of consciousness aloud to inform other people, and eventually learn to suppress that stream into nonvocal (subvocal?) thought.
The last post in this sequence discussed different reinforcement of thought and action. Speech and thought make a natural category as opposed to action; both are fast and easy, and so less likely to be affected by time and effort discounting. Both are point actions as opposed to a long project like learning Swahili or quitting smoking. And both bring reinforcement not through normal sensory channels (saying a word doesn't give pleasure in the same way smoking a cigarette might, nor pain in the same way having to study a boring grammar textbook might) but in what they say about you as a person and how they affect other people's real (and perceived) opinion of you.
So even if there is no governor anywhere unifying all thoughts and words, they may come out in harmony because they were selected by the same processes for the same reasons. And actions may not end up so harmonious, because they suffer from differential reinforcement.
Such harmony resembles the idea of a core "me", of whom all my thoughts are a part, and who has complete power over my organs of speech - but who is sometimes at odds with my actions or emotions.
The reinforcement governing thought and speech is most likely to be internal reinforcement based on your own self-perception and on others' perception of you. If there's a good reason reputation management processes need to be different from decision-making processes, understanding that difference could help understand the evolutionary history of a perceived difference between the conscious and unconscious mind. One such reason is provided by Robert Trivers' theory of social consciousness, the subject of tomorrow's post.
B.F. Skinner called thoughts "mental behavior". He believed they could be rewarded and punished just like physical behavior, and that they increased or declined in frequency accordingly.
Sadly, psychology has not yet advanced to the point where we can give people electric shocks for thinking things, so the sort of rewards and punishments that reinforce thoughts must be purely internal reinforcement. A thought or intention that causes good feelings gets reinforced and prospers; one that causes bad feelings gets punished and dies out.
(Roko has already discussed this in Ugh Fields; so much as thinking about an unpleasant task is unpleasant; therefore most people do not think about unpleasant tasks and end up delaying them or avoiding them completely. If you haven't already read that post, it does a very good job of making reinforcement of thoughts make sense.)
A while back, D_Malik published a great big List Of Things One Could Do To Become Awesome. As David_Gerard replied, the list was itself a small feat of awesome. I expect a couple of people started on some of the more awesome-sounding entries, then gave up after a few minutes and never thought about it again. Why?
When I was younger, I used to come up with plans to become awesome in some unlikely way. Maybe I'd hear someone speaking Swahili, and I would think "I should learn Swahili," and then I would segue into daydreams of being with a group of friends, and someone would ask if any of us spoke any foreign languages, and I would say I was fluent in Swahili, and they would all react with shock and tell me I must be lying, and then a Kenyan person would wander by, and I'd have a conversation with them in Swahili, and they'd say that I was the first American they'd ever met who was really fluent in Swahili, and then all my friends would be awed and decide I was the best person ever, and...
...and the point is that the thought of learning Swahili is pleasant, in the same easy-to-visualize but useless way that an extra bedroom for Grandma is pleasant. And the intention to learn Swahili is also pleasant, because it will lead to all those pleasant things. And so, by reinforcement of mental behavior, I continue thinking about and intending to learn Swahili.
Now consider the behavior of studying Swahili. I've never done so, but I imagine it involves a lot of long nights hunched over books of Swahili grammar. Since I am not one of the lucky people who enjoys learning languages for their own sake, this will be an unpleasant task. And rewards will be few and far between: outside my fantasies, my friends don't just get together and ask what languages we know while random Kenyans are walking by.
In fact, it's even worse than this, because I don't exactly make the decision to study Swahili in aggregate, but only in the form of whether to study Swahili each time I get the chance. If I have the opportunity to study Swahili for an hour, this provides no clear reward - an hour's studying or not isn't going to make much difference to whether I can impress my friends by chatting with a Kenyan - but it will still be unpleasant to spend an hour of going over boring Swahili grammar. And time discounting makes me value my hour today much more than I value some hypothetical opportunity to impress people months down the line; Ainslie shows quite clearly I will always be better off postponing my study until later.
So the behavior of actually learning Swahili is thankless and unpleasant and very likely doesn't happen at all.
Thinking about studying Swahili is positively reinforced, actually studying Swahili is negatively reinforced. The natural and obvious result is that I intend to study Swahili, but don't.
The problem is that for some reason, some crazy people expect for the reinforcement of thoughts to correspond to the reinforcement of the object of those thoughts. Maybe it's that old idea of "preference": I have a preference for studying Swahili, so I should satisfy that preference, right? But there's nothing in my brain automatically connecting this node over here called "intend to study Swahili" to this node over here called "study Swahili"; any association between them has to be learned the hard way.
We can describe this hard way in terms of reinforcement learning: after intending to learn Swahili but not doing so, I feel stupid. This unpleasant feeling propagates back to its cause, the behavior of intending to learn Swahili, and negatively reinforces it. Later, when I start thinking it might be neat to learn Mongolian on a whim, this generalizes to behavior that has previously been negatively reinforced, so I avoid it (in anthropomorphic terms, I "expect" to fail at learning Mongolian and to feel stupid later, so I avoid doing so).
I didn't learn this the first time, and I doubt most other people do either. And it's a tough problem to call, because if you overdo the negative reinforcement, then you never try to do anything difficult ever again.
In any case, the lesson is that thoughts and intentions get reinforced separately from actions, and although you can eventually learn to connect intentions to actions, you should never take the connection for granted.
In Are Wireheads Happy? I discussed the difference between wanting something and liking something. More recently, Luke went deeper into some of the science in his post Not for the Sake of Pleasure Alone.
In the comments of the original post, cousin_it asked a good question: why implement a mind with two forms of motivation? What, exactly, are "wanting" and "liking" in mind design terms?
Tim Tyler and Furcas both gave interesting responses, but I think the problem has a clear answer in a reinforcement learning perspective (warning: formal research on the subject does not take this view and sticks to the "two different systems of different evolutionary design" theory). "Liking" is how positive reinforcement feels from the inside; "wanting" is how the motivation to do something feels from the inside. Things that are positively reinforced generally motivate you to do more of them, so liking and wanting often co-occur. With more knowledge of reinforcement, we can begin to explore why they might differ.
CONTEXT OF REINFORCEMENT
Reinforcement learning doesn't just connect single stimuli to responses. It connects stimuli in a context to responses. Munching popcorn at a movie might be pleasant; munching popcorn at a funeral will get you stern looks at best.
In fact, lots of people eat popcorn at a movie theater and almost nowhere else. Imagine them, walking into that movie theater and thinking "You know, I should have some popcorn now", maybe even having a strong desire for popcorn that overrides the diet they're on - and yet these same people could walk into, I don't know, a used car dealership and that urge would be completely gone.
These people have probably eaten popcorn at a movie theater before and liked it. Instead of generalizing to "eat popcorn", their brain learned the lesson "eat popcorn at movie theaters". Part of this no doubt has to do with the easy availability of popcorn there, but another part probably has to do with context-dependent reinforcement.
I like pizza. When I eat pizza, and get rewarded for eating pizza, it's usually after smelling the pizza first. The smell of pizza becomes a powerful stimulus for the behavior of eating pizza, and I want pizza much more after smelling it, even though how much I like pizza remains constant. I've never had pizza at breakfast, and in fact the context of breakfast is directly competing with my normal stimuli for eating pizza; therefore, no matter how much I like pizza, I have no desire to eat pizza for breakfast. If I did have pizza for breakfast, though, I'd probably like it.
If an activity is intermittently reinforced; occasional rewards spread among more common neutral stimuli or even small punishments, it may be motivating but unpleasant.
Imagine a beginning golfer. He gets bogeys or double bogeys on each hole, and is constantly kicking himself, thinking that if only he'd used one club instead of the other, he might have gotten that one. After each game, he can't believe that after all his practice, he's still this bad. But every so often, he does get a par or a birdie, and thinks he's finally got the hang of things, right until he fails to repeat it on the next hole, or the hole after that.
This is a variable response schedule, Skinner's most addictive form of delivering reinforcement. The golfer may keep playing, maybe because he constantly thinks he's on the verge of figuring out how to improve his game, but he might not like it. The same is true for gamblers, who think the next pull of the slot machine might be the jackpot (and who falsely believe they can discover a secret in the game that will change their luck; they don't like sitting around losing money, but they may stick with it so that they don't leave right before they reach the point where their luck changes.
SMALL-SCALE DISCOUNT RATES
Even if we like something, we may not want to do it because it involves pain at the second or sub-second level.
Eliezer discusses the choice between reading a mediocre book and a good book:
You may read a mediocre book for an hour, instead of a good book, because if you first spent a few minutes to search your library to obtain a better book, that would be an immediate cost - not that searching your library is all that unpleasant, but you'd have to pay an immediate activation cost to do that instead of taking the path of least resistance and grabbing the first thing in front of you. It's a hyperbolically discounted tradeoff that you make without realizing it, because the cost you're refusing to pay isn't commensurate enough with the payoff you're forgoing to be salient as an explicit tradeoff.
In this case, you like the good book, but you want to keep reading the mediocre book. If it's cheating to start our hypothetical subject off reading the mediocre book, consider the difference between a book of one-liner jokes and a really great novel. The book of one-liners you can open to a random page and start being immediately amused (reinforced). The great novel you've got to pick up, get into, develop sympathies for the characters, figure out what the heck lomillialor or a Tiste Andii is, and then a few pages in you're thinking "This is a pretty good book". The fear of those few pages could make you realize you'll like the novel, but still want to read the joke book. And since hyperbolic discounting overcounts reward or punishment in the next few seconds, it may seem like a net punishment to make the change.
This deals yet another blow to the concept of me having "preferences". How much do I want popcorn? That depends very much on whether I'm at a movie theater or a used car dealership. If I browse Reddit for half an hour because it would be too much work to spend ten seconds traveling to the living room to pick up the book I'm really enjoying, do I "prefer" browsing to reading? Which has higher utility? If I hate every second I'm at the slot machines, but I keep at them anyway so I don't miss the jackpot, am I a gambling addict, or just a person who enjoys winning jackpots and is willing to do what it takes?
In cases like these, the language of preference and utility is not very useful. My anticipation of reward is constraining my behavior, and different factors are promoting different behaviors in an unstable way, but trying to extract "preferences" from the situation is trying to oversimplify a complex situation.
If you're tired of studies where you inevitably get deceived, electric shocked, or tricked into developing a sexual attraction to penny jars, you might want to sign up for Brian Wansink's next experiment. He provided secretaries with a month of unlimited free candy at their workplace. The only catch was that half of them got the candy in a bowl on their desk, and half got it in a bowl six feet away. The deskers ate five candies/day more than the six-footers, which the scientists calculated would correspond to a weight gain of over 10 pounds more per year1.
Beware trivial inconveniences (or, in this case, if you don't want to gain weight, beware the lack of them!) Small modifications to the difficulty of obtaining a reward can make big differences in whether the corresponding behavior gets executed.
The best studied example of this is time discounting. When offered two choices, where A will lead to a small reward now and B will lead to a big reward later, people will sometimes choose smaller-sooner rather than larger-later depending on the length of the delay and the size of the difference. For example, in one study, people preferred $250 today to $300 in a year; it took a promise of at least $350 to convince them to wait.
Time discounting was later found to be "hyperbolic", meaning that the discount amount between two fixed points decreases the further you move those two points into the future. For example, you might prefer $80 today to $100 one week from now, but it's unlikely you would prefer $80 in one hundred weeks to $100 in one hundred one weeks. Yet this is offering essentially the same choice: wait an extra week for an extra $20. So it's not enough to say that the discount rate is a constant 20% per week - the discount rate changes depending on what interval of time we're talking about. If you graph experimentally obtained human discount rates on a curve, they form a hyperbola.
Hyperbolic discounting creates the unpleasant experience of "preference reversals", in which people can suddenly change their mind on a preference as they move along the hyperbola. For example, if I ask you today whether you would prefer $250 in 2019 or $300 in 2020 (a choice between small reward in 8 years or large reward in 9), you might say the $300 in 2020; if I ask you in 2019 (when it's a choice between small reward now and large reward in 1 year), you might say no, give me the $250 now. In summary, people prefer larger-later rewards most of the time EXCEPT for a brief period right before they can get the smaller-sooner reward.
George Ainslie ties this to akrasia and addiction: call the enjoyment of a cigarette in five minutes the smaller-sooner reward, and the enjoyment of not having cancer in thirty years the larger-later reward. You'll prefer to abstain right up until the point where there's a cigarette in front of you and you think "I should smoke this", at which point you will do so.
Discounting can happen on any scale from seconds to decades, and it has previously been mentioned that the second or sub-second level may have disproportionate effects on our actions. Eliezer concentrated on the difficult of changing tasks, but I would add that any task which allows continuous delivery of small amounts of reinforcement with near zero delay can become incredibly addictive even if it isn't all that fun (this is why I usually read all the way through online joke lists, or stay on Reddit for hours). This is also why the XKCD solution to internet addiction - an extension that makes you wait 30 seconds before loading addictive sites - is so useful.
Effort discounting is time discounting's lesser-known cousin. It's not obvious that it's an independent entity; it's hard to disentangle from time discounting (most efforts usually take time) and from garden-variety balancing benefits against costs (most efforts are also slightly costly). There have really been only one or two good studies on it and they don't do much more than say it probably exists and has its own signal in the nucleus accumbens.
Nevertheless, I expect that effort discounting, like time discounting, will be found to be hyperbolic. Many of these trivial inconveniences involve not just time but effort: the secretaries had to actually stand up and walk six feet to get the candy. If a tiny amount of effort held the same power as a tiny amount of time, it would go even further toward explaining garden-variety procrastination.
TIME/EFFORT DISCOUNTING AND UTILITY
Hyperbolic discounting stretches our intuitive notion of "preference" to the breaking point.
Traditionally, discount rates are viewed as just another preference: not only do I prefer to have money, but I prefer to have it now. But hyperbolic discounting shows that we have no single discount rate: instead, we have different preferences for discount rates at different future times.
It gets worse. Time discount rates seem to be different for losses and gains, and different for large amounts vs. small amounts (I gave the example of $250 now being worth $350 in a year, but the same study found that $3000 now is only worth $4000 in a year, and $15 now is worth a whopping $60 in a year). You can even get people to exhibit negative discount rates in certain situations: offer people $10 now, $20 in a month, $30 in two months, and $40 in three months, and they'll prefer it to $40 now, $30 in a month, and so on - maybe because it's nice to think things are only going to get better?
Are there utility functions that can account for this sort of behavior? Of course: you can do a lot of things just by adding enough terms to an equation. But what is the "preference" that the math is describing? When I say I like having money, that seems clear enough: preferring $20 to $15 is not a separate preference than preferring $406 to $405.
But when we discuss time discounting, most of the preferences cited are specific: that I would prefer $100 now to $150 later. Generalizing these preferences, when it's possible at all, takes several complicated equations. Do I really want to discount gains more than losses, if I've never consciously thought about it and I don't consciously endorse it? Sure, there might be such things as unconscious preferences, but saying that the unconscious just loves following these strange equations, in the same way that it loves food or sex or status, seems about as contrived as saying that our robot just really likes switching from blue-minimization to yellow-minimization every time we put a lens on its sensor.
It makes more sense to consider time and effort discounting as describing reward functions and not utility functions. The brain estimates the value of reward in neural currency using these equations (or a neural network that these equations approximate) and then people execute whatever behavior has been assigned the highest reward.
1: Also cited in the same Nutrition Action article: if the candy was in a clear bowl, participants ate on average two/day more than if the candy was in an opaque bowl.
View more: Next