Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
Part of the sequence: The Science of Winning at Life
Also see: Basics of Animal Reinforcement, Basics of Human Reinforcement, Physical and Mental Behavior, Wanting vs. Liking Revisited, Approving reinforces low-effort behaviors, Applying Behavioral Psychology on Myself.
On Skype with Eliezer, I said: "Eliezer, you've been unusually pleasant these past three weeks. I'm really happy to see that, and moreover, it increases my probability than an Eliezer-led FAI research team will work. What caused this change, do you think?"
Eliezer replied: "Well, three weeks ago I was working with Anna and Alicorn, and every time I said something nice they fed me an M&M."
I once witnessed a worker who hated keeping a work log because it was only used "against" him. His supervisor would call to say "Why did you spend so much time on that?" or "Why isn't this done yet?" but never "I saw you handled X, great job!" Not surprisingly, he often "forgot" to fill out his worklog.
Ever since I got everyone at the Singularity Institute to keep work logs, I've tried to avoid connections between "concerned" feedback and staff work logs, and instead take time to comment positively on things I see in those work logs.
Chatting with Eliezer, I said, "Eliezer, I get the sense that I've inadvertently caused you to be slightly averse to talking to me. Maybe because we disagree on so many things, or something?"
Eliezer's reply was: "No, it's much simpler. Our conversations usually run longer than our previously set deadline, so whenever I finish talking with you I feel drained and slightly cranky."
Now I finish our conversations on time.
A major Singularity Institute donor recently said to me: "By the way, I decided that every time I donate to the Singularity Institute, I'll set aside an additional 5% for myself to do fun things with, as a motivation to donate."
The power of reinforcement
It's amazing to me how consistently we fail to take advantage of the power of reinforcement.
Maybe it's because behaviorist techniques like reinforcement feel like they don't respect human agency enough. But if you aren't treating humans more like animals than most people are, then you're modeling humans poorly.
You are not an agenty homunculus "corrupted" by heuristics and biases. You just are heuristics and biases. And you respond to reinforcement, because most of your motivation systems still work like the motivation systems of other animals.
A quick reminder of what you learned in high school
- A reinforcer is anything that, when it occurs in conjunction with an act, increases the probability that the act will occur again.
- A positive reinforcer is something the subject wants, such as food, petting, or praise. Positive reinforcement occurs when a target behavior is followed by something the subject wants, and this increases the probability that the behavior will occur again.
- A negative reinforcer is something the subject wants to avoid, such as a blow, a frown, or an unpleasant sound. Negative reinforcement occurs when a target behavior is followed by some relief from something the subject doesn't want, and this increases the probability that the behavior will happen again.
- Small reinforcers are fine, as long as there is a strong correlation between the behavior and the reinforcer (Schneider 1973; Todorov et al. 1984). All else equal, a large reinforcer is more effective than a small one (Christopher 1988; Ludvig et al. 2007; Wolfe 1936), but the more you increase the reinforcer magnitude, the less benefit you get from the increase (Frisch & Dickinson 1990).
- The reinforcer should immediately follow the target behavior (Escobar & Bruner 2007; Schlinger & Blakely 1994; Schneider 1990). Pryor (2007) notes that when the reward is food, small bits (like M&Ms) are best because they can be consumed instantly instead of being consumed over an extended period of time.
- Any feature of a behavior can be strengthened (e.g., its intensity, frequency, rate, duration, persistence, its shape or form), so long as a reinforcer can be made contingent on that particular feature (Neuringer 2002).
- If you want someone to call you, then when they do call, don't nag them about how they never call you. Instead, be engaging and positive.
- When trying to maintain order in a class, ignore unruly behavior and praise good behavior (Madsen et al. 1968; McNamara 1987).
- Reward originality to encourage creativity (Pryor et al. 1969; Chambers et al. 1977; Eisenberger & Armeli 1997; Eisenberger & Rhoades 2001).
- If you want students to understand the material, don't get excited when they guess the teacher's password but instead when they demonstrate a technical understanding.
- To help someone improve at dance or sport, ignore poor performance but reward good performance immediately, for example by shouting "Good!" (Buzas & Allyon 1981) The reason you should ignore poor performance if you say "No, you're doing it wrong!" you are inadvertently punishing the effort. A better response to a mistake would be to reinforce the effort: "Good effort! You're almost there! Try once more."
- Reward honesty to help people be more honest with you (Lanza et al 1982).
- Reward opinion-expressing to get people to express their opinions more often (Verplanck 1955).
- You may even be able to reinforce-away annoying involuntary behaviors, such as twitches (Laurenti-Lions et al. 1985) or vomiting (Wolf et al. 1965).
- Want a young infant to learn to speak more quickly? Reinforce their attempts at vocalization (Ramely & Finkelstein 1978).
- More training should occur via video games like DragonBox, because computer programs can easily provide instant reinforcement many times a minute for very specific behaviors (Fletcher-Flinn & Gravatt 1995).
I close with Story 5, from Amy Sutherland:
For a book I was writing about a school for exotic animal trainers, I started commuting from Maine to California, where I spent my days watching students do the seemingly impossible: teaching hyenas to pirouette on command, cougars to offer their paws for a nail clipping, and baboons to skateboard.
I listened, rapt, as professional trainers explained how they taught dolphins to flip and elephants to paint. Eventually it hit me that the same techniques might work on that stubborn but lovable species, the American husband.
The central lesson I learned from exotic animal trainers is that I should reward behavior I like and ignore behavior I don't. After all, you don't get a sea lion to balance a ball on the end of its nose by nagging. The same goes for the American husband.
Back in Maine, I began thanking Scott if he threw one dirty shirt into the hamper. If he threw in two, I'd kiss him. Meanwhile, I would step over any soiled clothes on the floor without one sharp word, though I did sometimes kick them under the bed. But as he basked in my appreciation, the piles became smaller.
I was using what trainers call "approximations," rewarding the small steps toward learning a whole new behavior...
Once I started thinking this way, I couldn't stop. At the school in California, I'd be scribbling notes on how to walk an emu or have a wolf accept you as a pack member, but I'd be thinking, "I can't wait to try this on Scott."
...After two years of exotic animal training, my marriage is far smoother, my husband much easier to love.
Next post: Rational Romantic Relationships Part 1
Previous post: The Good News of Situationist Psychology
My thanks to Erica Edelman for doing much of the research for this post.