Morality is not about willpower
Most people believe the way to lose weight is through willpower. My successful experience losing weight is that this is not the case. You will lose weight if you want to, meaning you effectively believe0 that the utility you will gain from losing weight, even time-discounted, will outweigh the utility from yummy food now. In LW terms, you will lose weight if your utility function tells you to. This is the basis of cognitive behavioral therapy (the effective kind of therapy), which tries to change peoples' behavior by examining their beliefs and changing their thinking habits.
Similarly, most people believe behaving ethically is a matter of willpower; and I believe this even less. Your ethics is part of your utility function. Acting morally is, technically, a choice; but not the difficult kind that holds up a stop sign and says "Choose wisely!" We notice difficult moral choices more than easy moral choices; but most moral choices are easy, like choosing a ten dollar bill over a five. Immorality is not a continual temptation we must resist; it's just a kind of stupidity.
This post can be summarized as:
- Each normal human has an instinctive personal morality.
- This morality consists of inputs into that human's decision-making system. There is no need to propose separate moral and selfish decision-making systems.
- Acknowledging that all decisions are made by a single decision-making system, and that the moral elements enter it in the same manner as other preferences, results in many changes to how we encourage social behavior.
Basics of Human Reinforcement
Today: some more concepts from reinforcement learning and some discussion on their applicability to human behavior.
For example: most humans do things even when they seem unlikely to result in delicious sugar water. Is this a violation of behaviorist principles?
No. For one thing, yesterday's post included a description of secondary reinforcers, those reinforcers which are not hard-coded evolutionary goods like food and sex, but which nevertheless have a conditioned association with good things. Money is the classic case of a secondary reinforcer among humans. Little colored rectangles are not naturally reinforcing, but from a very young age most humans learn that they can be used to buy pleasant things, like candy or toys or friends. Behaviorist-inspired experiments on humans often use money as a reward, and have yet to run into many experimental subjects whom it fails to motivate1.
Speaking of friends, status may be a primary reinforcer specific to social animals. I don't know if being able to literally feel reinforcement going on is a real thing, but I maintain I can feel the rush of reward when someone gives me a compliment. If that's too unscientific for you, consider studies in which monkeys will "exchange" sugary juice for the opportunity to look at pictures of high status monkeys, but demand extra juice in exchange for looking at pictures of low status monkeys.
Although certain cynics might consider money and status an exhaustive list, we may also add moral, aesthetic, and value-based considerations. Evolutionary psychology explains why these might exist and Bandura called some of them "internal reinforcement".
But more complicated reinforcers alone are not sufficient to bridge the gap between lever-pushing pigeons and human behavior. Humans have an ability to select for or against behaviors without trying them. For example: most of us would avoid going up to Mr. T and giving him the finger. But most of us have not personally tried this behavior and observed the consequences.
Is this the result of pure reason? No; the rational part of our mind is the part telling us that Mr. T is probably sixty years old by now and far too deep in the media spotlight to want to risk a scandal and jail time by beating up a random stranger. So where exactly is the reluctance coming from?
Basics of Animal Reinforcement
Behaviorism historically began with Pavlov's studies into classical conditioning. When dogs see food they naturally salivate. When Pavlov rang a bell before giving the dogs food, the dogs learned to associate the bell with the food and salivate even after they merely heard the bell . When Pavlov rang the bell a few times without providing food, the dogs stopped salivating, but when he added the food again it only took a single trial before the dogs "remembered" their previously conditioned salivation response1.
So much for classical conditioning. The real excitement starts at operant conditioning. Classical conditioning can only activate reflexive actions like salivation or sexual arousal; operant conditioning can produce entirely new behaviors and is most associated with the idea of "reinforcement learning".
Serious research into operant conditioning began with B.F. Skinner's work on pigeons. Stick a pigeon in a box with a lever and some associated machinery (a "Skinner box"2). The pigeon wanders around, does various things, and eventually hits the lever. Delicious sugar water squirts out. The pigeon continues wandering about and eventually hits the lever again. Another squirt of delicious sugar water. Eventually it percolates into its tiny pigeon brain that maybe pushing this lever makes sugar water squirt out. It starts pushing the lever more and more, each push continuing to convince it that yes, this is a good idea.
Consider a second, less lucky pigeon. It, too, wanders about in a box and eventually finds a lever. It pushes the lever and gets an electric shock. Eh, maybe it was a fluke. It pushes the lever again and gets another electric shock. It starts thinking "Maybe I should stop pressing that lever." The pigeon continues wandering about the box doing anything and everything other than pushing the shock lever.
The basic concept of operant conditioning is that an animal will repeat behaviors that give it reward, but avoid behaviors that give it punishment3.
Skinner distinguished between primary reinforcers and secondary reinforcers. A primary reinforcer is hard-coded: for example, food and sex are hard-coded rewards, pain and loud noises are hard-coded punishments. A primary reinforcer can be linked to a secondary reinforcer by classical conditioning. For example, if a clicker is clicked just before giving a dog a treat, the clicker itself will eventually become a way to reward the dog (as long as you don't use the unpaired clicker long enough for the conditioning to suffer extinction!)
Probably Skinner's most famous work on operant conditioning was his study of reinforcement schedules: that is, if pushing the lever only gives you reward some of the time, how obsessed will you become with pushing the lever?
Consider two basic types of reward: interval, in which pushing the lever gives a reward only once every t seconds - and ratio, in which pushing the lever gives a reward only once every x pushes.
Put a pigeon in a box with a lever programmed to only give rewards once an hour, and the pigeon will wise up pretty quickly. It may not have a perfect biological clock, but after somewhere around an hour, it will start pressing until it gets the reward and then give up for another hour or so. If it doesn't get its reward after an hour, the behavior will go extinct pretty quickly; it realizes the deal is off.
Put a pigeon in a box with a lever programmed to give one reward every one hundred presses, and again it will wise up. It will start pressing more on the lever when the reward is close (pigeons are better counters than you'd think!) and ease off after it obtains the reward. Again, if it doesn't get its reward after about a hundred presses, the behavior will become extinct pretty quickly.
To these two basic schedules of fixed reinforcement, Skinner added variable reinforcement: essentially the same but with a random factor built in. Instead of giving a reward once an hour, the pigeon may get a reward in a randomly chosen time between 30 and 90 minutes. Or instead of giving a reward every hundred presses, it might take somewhere between 50 and 150.
Put a pigeon in a box on variable interval schedule, and you'll get constant lever presses and good resistance to extinction.
Put a pigeon in a box with a variable ratio schedule and you get a situation one of my professors unscientifically but accurately described as "pure evil". The pigeon will become obsessed with pecking as much as possible, and really you can stop giving rewards at all after a while and the pigeon will never wise up.
Skinner was not the first person to place an animal in front of a lever that delivered reinforcement based on a variable ratio schedule. That honor goes to Charles Fey, inventor of the slot machine.
So it looks like some of this stuff has relevance for humans as well4. Tomorrow: more freshman psychology lecture material. Hooray!
FOOTNOTES
1. Of course, it's not really psychology unless you can think of an unethical yet hilarious application, so I refer you to Plaud and Martini's study in which slides of erotic stimuli (naked women) were paired with slides of non-erotic stimuli (penny jars) to give male experimental subjects a penny jar fetish; this supports a theory that uses chance pairing of sexual and non-sexual stimuli to explain normal fetish formation.
2. The bizarre rumor that B.F. Skinner raised his daughter in a Skinner box is completely false. The rumor that he marketed a child-rearing device called an "Heir Conditioner" is, remarkably, true.
3: In technical literature, behaviorists actually use four terms: positive reinforcement, positive punishment, negative reinforcement, and negative punishment. This is really confusing: "negative reinforcement" is actually a type of reward, behavior like going near wasps is "punished" even though we usually use "punishment" to mean deliberate human action, and all four terms can be summed up under the category "reinforcement" even though reinforcement is also sometimes used to mean "reward as opposed to punishment". I'm going to try to simplify things here by using "positive reinforcement" as a synonym for "reward" and "negative reinforcement" as a synonym for "punishment", same way the rest of the non-academic world does it.
4: Also relevant: checking HP:MoR for updates is variable interval reinforcement. You never know when an update's coming, but it doesn't come faster the more times you reload fanfiction.net. As predicted, even when Eliezer goes weeks without updating, the behavior continues to persist.
Behaviorism: Beware Anthropomorphizing Humans
Related to: The Comedy of Behaviorism
Behaviorism's gotten a bad rap.
It's gone down in history as the school founded upon the idea that there's no such thing as mental phenomena or cognitive processing, and if there are we can't ever know anything about them, and if we can I don't want to know about it, and if you tell me I will put my fingers in my ears and whistle, and SHUT UP SHUT UP I CAN'T HEAR YOU.
Actually it was more subtle.
The movement did begin with a variation on that principle for historical reasons. John Watson began his work thirty years before the first computer. Information processing still looked like magic; most scientists didn't realize that reductionist accounts of information processing were even possible. Neurons were still "the thing that Spanish guy keeps talking about". Today we discuss the brain by analogy to computers; in Watson's day, they discussed the brain by analogy to their own most advanced technology, mechanical devices. Today we talk about looking for mental programs and subroutines; they sought its gears and levers instead. And just as today many philosophers dismiss consciousness as an epiphenomenon of information processing because computers don't seem to be conscious, so Watson dismissed all mental states as an epiphenomenon of mechanical processing because mechanical devices didn't have mental states.
As science advanced, and as it picked up glimpses of cognition from the Stroop effect and early priming experiments, behaviorism became more sophisticated. Maybe its pinnacle of subtlety came with B.F. Skinner's "radical behaviorism" movement, which accepted inner mental life (which Skinner called "mental behavior") and sought to explain it.
If Skinner was willing to acknowledge inner life, why do we still call his theory behaviorist? It's hard and not especially profitable to define "behaviorism", but if I had to try I'd say it is a methodology that doesn't consider mental phenomena useful as a fundamental level of explanation. So if we want to know why Wanda runs away from a wasp, saying "because her previous encounters wasps have been negatively reinforced" is more useful than "because she felt scared".
And if Wanda herself says "No, I ran away because I felt scared," we shouldn't be especially interested in her opinion: she has privileged access to a certain type of output of the process generating her behavior, but not to the process itself.
Imagine the better behaviorists, if you like, as playing a worldwide half-century long game of Rationalist Taboo, in which you're no longer allowed to use words like "want", "feel", "hope", or "decide". It's overwhelmingly tempting to fake-explain psychology using non-technical non-explanations like "Oh, she just acts that way because she has an overly emotional personality" and so the whole school just promised themselves to root out that way of thinking.
Although the witticism that behaviorism scrupulously avoids anthropomorphizing humans was intended as a jab at the theory, I think it touches on something pretty important. Just as normal anthropomorphism - "it only snows in winter because the snow prefers cold weather", acts as a curiosity-stopper and discourages technical explanation of the behavior, so using mental language to explain the human mind equally halts the discussion without further investigation.
This idea of Rationalist Taboo also explains B.F. Skinner's "mental behavior" loophole. When he discusses thoughts as mental behavior, he's not using them as explanations for other things - not taking the easy way out and saying "The reason I stayed in tonight is because, after thinking about it, I decided I didn't want to go to dinner". He's taking an extra burden upon himself, trying to come up with explanations for thoughts as well as actions.
Behaviorism became less popular in the 1950s after clever experimental protocols allowed more direct measurement of what happens inside the mind, making its taboo on mental occurrences unnecessary and restrictive. Although the philosophical commitments involved became obsolete, the scientific findings remain as valuable as ever. They have entered into the new paradigm as "reinforcement learning", a process widely believed to underlie many diverse mental subsystems all the way from motor coordination to social behavior.
Although reinforcement learning is almost universally known, Skinner's philosophical context for the process is not. He believed that the Darwinian evolution of organisms was just one instance of a wider principle called "selection by consequences", the most successful optimization process in the history of the universe. Evolution can successfully design permanent features of an organism like its skin, claws, and eyes. But it is too slow to fully optimize an organism's behavior, and too large-grained to produce complex behavior on its own. It is is especially too slow and large-grained to produce human-level behavior: citing my sources in MLA format is an important skill, and I don't want to have to wait until ten generations of my ancestors have perished for citing their sources incorrectly before I can do it right.
So evolution conjured up a mini-evolution to serve it. Reinforcement learning is evolution writ small; behaviors propagate or die out based on their consequences to reinforcement in a mind, just as mutations propagate or die out based on their consequences to reproduction in an organism. In the behaviorist model, our mind is not an agent, but a flourishing ecosystem of behaviors both physical and mental, all scrabbling for supremacy and mutating into more effective versions of themselves.
Just as evolving organisms are adaptation-executors and not fitness-maximizers, so minds are behavior-executors and not utility-maximizers. This returns us to the case of the blue-minimizing robot, which executed its program without any representation of a "goal". Behaviorism holds out the prospect of an explanation of human behavior based on similar lines.
Despite its subsumption by the cognitive paradigm, behaviorism continues to hold a special place because of its association with reinforcement learning, as well as its uses in industrial psychology, applied psychology, and various successful therapies including the famous CBT. It's also one of the major inspirations for connectionism, a more modern and exciting eliminativist model which we'll return to later.
This sequence will continue by exploring some of the basics of reinforcement learning in the behaviorist paradigm, and then get into more controversial applications of the theory to explain previously mysterious human behaviors.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)