Notes from "Don't Shoot the Dog"

juliawise

Cross-posted from The Whole Sky.

I just finished Karen Pryor’s “Don’t Shoot the Dog: the New Art of Teaching and Training.” Partly because a friend points out that it’s not on Audible and therefore she can’t possibly read it, here are the notes I took and some thoughts. It’s a quick, easy read.

The book applies behavioral psychology to training animals and people. The author started off as a dolphin trainer at an aquarium park in the 1960s and moved on to horses, dogs, and her own children. There are a lot of anecdotes about how to train animals (apparently polar bears like raisins). At the time, training animals without violence was considered novel and maybe impossible. I read it as a parenting book since I don’t plan to train dogs, horses, or polar bears.

It’s probably not the best guide to training dogs since a lot of it is about people, and not the best guide to training people since a lot is about animals. She’s written a bunch of other books about training dogs and cats. But this book is an entertaining overview of all of it.

The specter of behaviorism

I can understand not wanting to use behavioral methods on children; the idea can sound overly harsh or reductive. The thing is, we already reinforce behavior all the time, including bad behavior, often without meaning to. So you might as well notice what you’re doing.

To people schooled in the humanistic tradition, the manipulation of human behavior by some sort of conscious technique seems incorrigibly wicked, in spite of the obvious fact that we all go around trying to manipulate one another’s behavior all the time, by whatever means come to hand.
[...]

There are still people who shudder at the very name of Skinner, which conjures in their minds some amalgam of Brave New World, mind control, and electric shock.

(B. F. Skinner in fact believed that punishment was not an effective learning tool, and that positive reinforcement was much better for teaching.)

Pryor argues that behavioral training allows you to get good results more pleasantly than with other methods. She describes her daughter’s experience directing a play in high school:

At the closing performance the drama coach told me that she’d been amazed to see that throughout rehearsals Gale never yelled at her cast. Student directors always yell, but Gale never yelled. ‘Of course not,’ I said without thinking, ‘she’s an animal trainer.’ From the look on the teacher’s face, I realized I’d said the wrong thing—her students were not animals! But of course all I meant was that Gale would know how to establish stimulus control without unnecessary escalation.

Of course there are bad applications of behavioral training: “The psychological literature abounds with shaping programs that are so unimaginative, not to say ham-handed, that they constitute in my opinion cruel and unusual punishment.”

I don’t know a lot about ABA (applied behavior analysis), which is one application of behaviorism. My understanding is that its bad applications are certainly cruel and ham-handed, although there also seem to be good applications. I think that even people opposed to ABA should be able to find a lot of useful material in this book.

You’re already doing reinforcement training

One point I think is underappreciated is that we all reinforce each other, and children train parents as well as the other way around.

A child is tantruming in the store for candy. The parent gives in and lets the child have a candy bar. The tantruming is positively reinforced by the candy, but the more powerful event is that the parent is negatively reinforced for giving in, since the public tantrum, so aversive and embarrassing for the parent, actually stopped.

It’s also easy to accidentally reinforce bad behavior.

I recently read Beverly Cleary’s Beezus and Ramona with the kids, in which a preschooler scribbles in a library book she wants to keep. Her older sister pays for the book, and the librarian gives them back the discarded book to keep.

That’s not fair, thought Beezus. Ramona shouldn’t get her own way when she had been naughty.
‘But, Miss Evans,’ protested Beezus, ‘if she spoils a book she shouldn’t get to keep it. Now every time she finds a book she likes she will…’ Beezus did not go on. She knew very well what Ramona would do, but she wasn’t going to say it out loud in front of her.

Jeff and I try to not let bad behavior lead to a reward. For example, our four-year-old was eager to go home from the park, and left without us towards the house. I caught up with her and told her not to leave without us. We were halfway to the house, but If I’d continued home with her from there, she would still have achieved what she wanted: getting home sooner. So I took her back to the park and we redid the whole situation: she said “I want to go home” and I walked home with her. Running off on her own didn’t pay, and she hasn’t repeated it.

Responding to good behavior, not bad

Instead of punishing bad behavior, the emphasis is on noticing and reinforcing good behavior.

“Shutting up about what you don’t like, in order to wait for and reinforce behavior you do like, is counterintuitive and takes some practice.”

My mother, who taught preschool for decades, sums it up as “You have to catch ‘em being good.”

Some animals can’t be trained by force, or at least can’t be trained to do anything very complicated. Such training was necessary with dolphins because they’ll simply swim away if you try to make them do anything they don’t like. You can only train them by offering something they like (fish).

“As a dolphin researcher whom I worked with sourly put it, ‘Nobody should be allowed to have a baby until they have first been required to train a chicken,’ meaning that the experience of getting results with a chicken, an organism that cannot be trained by force, should make it clear that you don’t need to use punishers to get results with a baby.”

At its best, reinforcement learning is enjoyable for the learner:

Clicker trainers have learned to recognize play behavior in animals as a sign that the learner has become consciously aware of what behavior was being reinforced. When ‘the light bulb goes on,’ as clicker trainers put it, dogs gambol and bark, horses prance and toss their heads, and elephants, I am told, run around in circles chirping. They are happy. They are excited.

Clickers and other sounds

Pryor became known for “clicker training” because she started using the method of using a sound to immediately convey “yes, that’s good.” The particular sound isn’t important as long as the learner can hear and recognize it. With aquatic animals you use whistles because they can be heard underwater; with dogs she uses mechanical clicker noisemakers; with a person I’d probably use a specific phrase but some people also use clickers.

The sound initially has no meaning, but by giving it at the same time as a reward (food, smiles, pats) you create an association between the sound and the reward. Later the sound itself is rewarding.

It often happens, especially when training with food reinforcers, that there is absolutely no way you can get the reinforcer to the subject during the instant it is performing the behavior you wish to encourage. If I am training a dolphin to jump, I cannot possibly get a fish to it while it is in midair. If each jump is followed by a thrown fish with an unavoidable delay, eventually the animal will make the connection between jumping and eating and will jump more often. However, it has no way of knowing which aspect of the jump I liked. Was it the height? The arch? Perhaps the splashing reentry? Thus it would take many repetitions to identify to the animal the exact sort of jump I had in mind. To get around this problem, we use conditioned reinforcers.
[...]
Breland called the whistle a ‘bridging stimulus,’ because, in addition to informing the dolphin that it had just earned a fish, the whistle bridged the period of time between the leap in midtank—the behavior that was being reinforced—and swimming over to the side to collect one’s pay.

Pryor describes the program her son (an airplane pilot) designed for pilot training:

A flight instructor can also click a student for initiative and for good thinking: for example, for glancing over the instrument panel before being reminded to do so. So the clicker can reward nonverbal behavior nonverbally in the instant it’s occurring.
[...]
Once you have established a conditioned reinforcer, you must be careful not to throw it around meaninglessly or you will dilute its force. The children who rode my Welsh ponies for me quickly learned to use ‘Good pony!’ only when they wanted to reinforce behavior. . . One day a child who had just joined the group was seen petting a pony’s face while saying ‘You’re a good pony.’ Three of the others rounded on her instantly: ‘What are you telling him that for? He hasn’t done anything!'

Attention

This doesn’t mean you give positive attention only during training.

One can and should lavish children (and spouses, parents, lovers, and friends) with love and attention, unrelated to any particular behavior; but one should reserve praise, specifically, as a conditioned reinforcer related to something real.

I think when children point out minor accomplishments — “Look at all the sticks I collected” — it’s more often a request for attention than a situation that requires praise. I’m likely to comment in a way that shows interest — “Yes, you’ve got a lot of sticks there!” — but I don’t see a need to evaluate the quality of their stick pile or whatever. I try to save actual praise for something I especially want them to do more of, or something that was new and challenging for them.

Interested attention during training is necessary, and ignoring someone is a kind of punishment:

If the trainer starts chatting to some bystander or leaves to answer the telephone or is merely daydreaming, the contract is broken; reinforcement is unavailable through no fault of the trainee. This does more harm than just putting the trainer at risk of missing a good opportunity to reinforce. It may punish some perfectly good behavior that was going on at the time. Of course if you want to rebuke a subject, removing your attention is a good way to do it.

Wrong timing

Pryor emphasizes that if you give punishment or reward at the wrong time, you reinforce the wrong behavior. If you call a dog to you and it finally comes, then you strike it, you’ve punished it for returning to you.

My mother always complained of the same tendency in her choral director: when the singers finally got a difficult passage right, instead of praising them he’d shout “Why couldn’t you do it like that the first time?!”

I’ve noticed the importance of timing when a child finally does what you want, because it’s tempting to scold them even after they’ve shaped up. Anna has a wide variety of delay tactics for brushing her teeth, and I find it easy to be stony-faced when she’s capering around instead of coming to the sink. By the time she finally comes to have her teeth brushed I’m feeling annoyed and would like to give her a lecture. But if I give her an unpleasant response just as she’s finally doing what I want, I disincentivize her from doing it. Instead, as soon as she comes to the sink I become pleasant Mama, smiling and joking.

Maintaining behavior

Once a behavior is established, you use intermittent reinforcement to maintain it:

“In order to maintain an already-learned behavior with some degree of reliability, it is not only not necessary to reinforce it every time; it is vital that you do not reinforce it on a regular basis but instead switch to using reinforcement only occasionally, and on a random or unpredictable basis.”
“Many people initially object to the idea of using positive reinforcers in training because they imagine that they will forever have to hand out treats to get good behavior. But the opposite is true. Training with reinforcement actually frees you from the need for constant vigilance over the behavior, because of the power of variable schedules.”

In people, the behavior itself eventually brings its own reward; we praise toddlers for learning to use the potty, but after the behavior is established we no longer need to reinforce it. And having dry clothes is its own reward.

“The power of the variable schedule is at the root of all gambling. If every time you put a nickel into a slot machine a dime were to come out, you would soon lose interest. Yes, you would be making money, but what a boring way to do it. People like to play slot machines precisely because there’s no predicting whether nothing will come out, or a little money, or a lot of money, or which time the reinforcer will come (it might be the very first time).”

We encountered this in my house when Lily was two. Our housemate would sporadically show her a Sesame Street video on his phone, and she loved this so she’d pester him constantly for it. The reward came unpredictably, so she asked very often. Once he moved to a predictable schedule (one video every day after dinner) she learned the pattern and stopped asking at times of day when she knew it wouldn’t work.

Also affects adult relationships:

If you get into a relationship with someone who is fascinating, charming, sexy, fun, and attentive, and then gradually the person becomes more disagreeable, even abusive, though still showing you the good side now and then, you will live for those increasingly rare moments when you are getting all those wonderful reinforcers: the fascinating, charming, sexy, and fun attentiveness. And paradoxically from a commonsense viewpoint, though obviously from the training viewpoint, the rarer and more unpredictable those moments become, the more powerful will be their effect as reinforcers, and the longer your basic behavior will be maintained. Furthermore, it is easy to see why someone once in this kind of relationship might seek it out again. A relationship with a normal person who is decent and friendly most of the time might seem to lack the kick of that rare, longed-for, and thus doubly intense reinforcer.

Pryor training herself to go to class even when she didn’t feel like it, and then maintaining the behavior without the reward:

I found that if I broke down the journey, the first part of the task, into five steps—walking to the subway, catching the train, changing to the next train, getting the bus to the university, and finally, climbing the stairs to the classroom—and reinforced each of these initial behaviors by consuming a small square of chocolate, which I like but normally never eat, at the completion of each step, I was at least able to get myself out of the house, and in a few weeks was able to get all the way to class without either the chocolate or the internal struggle.

Sports players and fans become “trained” to do certain actions (wearing their lucky clothes, etc) because they associate it with the team winning.

I have seen one baseball pitcher who goes through a nine-step chain of behavior every time he gets ready to pitch the ball: touch cap, touch ball to glove, push cap forward, wipe ear, push cap back, scuff foot, and so on. In a tight moment he may go through all nine steps twice, never varying the order. The sequence goes by quite fast—announcers never comment on it—and yet it is a very elaborate piece of superstitious behavior.

Raise expectations gradually, with rewards for incremental progress:

I once saw a father make a serious error in this regard. Because his teenage son was doing very badly in school, he confiscated the youth’s beloved motorcycle until his grades improved. The boy did work harder, and his grades did improve, from Fs and Ds to Ds and Cs. Instead of reinforcing this progress, however, the father said that the grades had not improved enough and continued to withhold bike privileges. This escalation of the criteria was too big a jump; the boy stopped working altogether.

Pryor claims that you have to be much more consistent with aversives (punishments) than with rewards. Seems like that might be right with animals and young children, but adults are usually willing to avoid committing crimes even if they don’t expect to be caught every time.

Often when we are teaching the behavior, we use a fixed schedule of reinforcement; that is, we reinforce every adequate behavior. But when we are just maintaining a behavior, we reinforce very occasionally, using a sporadic or intermittent schedule. For example, once a pattern of chore sharing has been established, your roommate or spouse may stop at the dry cleaners on the way home without being reinforced each time; but you might express thanks for an extra trip made when you are ill or the weather is bad. When we train with aversives, however—and that’s the way most of us began—we are usually taught that it is vital to correct every mistake or misbehavior. When errors are not corrected, the behavior breaks down. Many dogs are well behaved on the leash, when they might get jerked, but they are highly unreliable as soon as they are off leash and out of reach. When out with their friends, many teenagers do things that they wouldn’t dream of doing in their parents’ presence. This can happen because the subject is fully aware that punishment is unavailable—when the cat’s away, the mice will play—but it can also happen as a side effect of training with aversives. Since the message in a punisher is 'Don’t do that,' the absence of the aversive sends the message, 'That is okay now.'

Learners can go long periods of time without a reward:

One psychologist jokes that the longest schedule of unreinforced behavior in human existence is graduate school.

When to stop a training session

End a training session while the learner is having success:

“When you stop is not nearly as important as what you stop on. You should always quit while you’re ahead.”
“The last behavior that was accomplished is the one that will be remembered best; you want to be sure it was a good, reinforceable performance. What happens all too often is that we get three or four good responses—the dog retrieves beautifully, the diver does a one-and-a-half for the first time, the singer gets a difficult passage right—and we are so excited that we want to see it again or to do it again. So we repeat it, or try to, and pretty soon the subject is tired, the behavior gets worse, mistakes crop up, corrections and yelling take place, and we just blew a training session.”

Sports training

Pryor notes that in the second part of the 20th century, sports training seems a lot better than when she was young, and has moved toward more effective reinforcement learning:

I think what had changed in the last decade or two is that the principles that produce rapid results are becoming implicit in the standard teaching strategies: “This is the way to teach skiing: Don’t yell at them, follow steps one through ten, praise and reinforce accomplishment at each step, and you’ll get most of them out on the slopes in three days.

On patience

Good trainers are disciplined and intentional:

People who have a disciplined understanding of stimulus control avoid giving needless instructions, unreasonable or incomprehensible commands, or orders that can’t be obeyed. They try not to make requests they’re not prepared to follow through on; you always know exactly what they expect. They don’t fly off the handle at a poor response. They don’t nag, scold, whine, coerce, beg, or threaten to get their way, because they don’t need to. And when you ask them to do something, if they say yes, they do it. When you get a whole family, or household, or corporation working on the basis of real stimulus control— when all the people keep their agreements, say what they need, and do what they say— it is perfectly amazing how much gets done, how few orders ever need to be given, and how fast the trust builds up. Good stimulus control is nothing more than true communication— honest, fair communication. It is the most complex, difficult, and elegant aspect of training with positive reinforcement.

One thing I notice in all this is that it’s self-reinforcing. The method requires a certain amount of patience and self-discipline from the parent. It’s easier to do that when things are already going well, and in turn you’re rewarded with children who are easier to live with. When parents are exhausted and time-pressed, it’s easier to slip into inconsistency, and both parents and children are more prone to outbursts and unpleasantness.

Limits of reinforcement

She ends with some warnings about trying to apply reinforcement to absolutely everything, or assuming it’s the only thing in play:

Idealistic societies, in imagination or in practice, sometimes fail to take into account or seek to eliminate such biological facts as status conflict. We are social animals, after all, and as such we must establish dominance hierarchies. Competition within groups for increased status—in all channels, not just approved or ordained channels—is absolutely inevitable and in fact performs an important social function: Whether in Utopias or herds of horses, the existence of a fully worked-out hierarchy operates to reduce conflict. You know where you stand, so you don’t have to keep growling to prove it. I feel that individual and group status, and many other human needs and tendencies, are too complex to be either met or overridden by planned arrangements of reinforcement, at least on a long-term basis.

This isn’t the only tool I’d want in my parenting repertoire. But I do think it’s well worth having.

[-]Viliam4y580

An interesting idea in this book was that most people who try to do "behaviorism" are actually doing it wrong (such as delivering the rewards and punishments too late, so they are associated with something else instead). And yet the people who do it wrong will defend their approach as scientifically proved.

As a rule of thumb, if your approach uses a lot of punishment, chances are that you are actually only rewarding yourself (by giving yourself a feeling of high status when you deliver the punishment). Which is the true reason why approaches using punishments are so popular (among the people who use them).

tl;dr -- all you need is love (and clicker)

[-]Vanilla_cabs4y60

Concise, to the point, love it.

[-]Thelo4y30

What about the claims in "Maintaining behavior" that you do need consistent aversives (punishment), but only inconsistent rewards? That seems to say the exact opposite of the earlier stance: it says that you should use lots of punishments (every time the subject gets something wrong), and few rewards.

I'm confused as to what the book actually wants you to do.

[-]juliawise4y90

One of the chapters deals with getting rid of behaviors you don't want, with eight methods (some of which she doesn't recommend). For example, training an incompatible behavior: if don't want your dog to beg at the table during dinner, train your dog to lie down someplace else during dinner. Or "shape the absence" - reinforce everything that's not the unwanted behavior.

[-]Viliam4y50

I recommend reading the book. It is one of the best things I have ever read. A short review or a comment cannot explain everything. (Also, I don't fully remember everything; it was a few years ago.)

First, there is a difference between teaching a behavior, and unteaching a behavior. (Is "unteach" a proper English word?) Second, there is a difference between creating a new habit, and maintaining the existing habit.

On the topic of unteaching, the important thing is that instead of "don't do X" it is often easier to teach an alternative "in situation Z (instead of X) do Y". But if you want to use punishments, the important thing is that they come immediately and consistently. A small punishment that comes always and immediately after the act, works much better than a large punishment that comes only sometimes and several days after the act.

When you start teaching a new habit using rewards, again the important thing is to deliver the reward immediately and consistently... at the beginning. But after the habit is established, you gradually reduce the size and frequency of the rewards. If you stop rewarding suddenly, the animal will give it a few more attempts, and then give up... and maybe occassionally try again, just to see if the rewards have returned. But if you gradually make the rewards rare, the animal will keep doing it, and the occassional reward will be enough to keep the habit. If you deliver the rare rewards regularly (e.g. only once a day, or always once per 100 attempts), the animal will notice the regularity, and after receiving a reward will slow down (because it means that the next reward is far away). But if you deliver the rare rewards unpredictably, the animal will keep trying all the time.

[-]Raemon4y160

Curated. I found a lot of this to be a good reminder of stuff I had gathered from previous similar articles, but packaged up in a way that feels more useful to me.

Something I'd like to see is a post that considers all of this in the context of adult relationships / employees / teammates, where there's more of a requirement of "treat each other as peers with similar amounts of agency." I think there are some very obvious failure modes to fall into (i.e. yelling at people at dumb times that punish the wrong things), and some obvious good things to try (i.e. noting when people did a good thing).

But then there's a blurry window where I think it's plausible end up treating people as objects to optimize, or giving them the impression that you're treating them that way, and I'd be interested in a post exploring how to navigate that productively.

[-]Ajeya Cotra4y110

Thanks for writing this up! Appreciate the personal anecdotes too. Curious if you or Jeff have any tips and tricks for maintaining the patience/discipline required to pull off this kind of parenting (for other readers, I enjoyed some of Jeff's thoughts on predictable parenting here). Intuitively to me, it seems like this is a reason that the value-add from paying for childcare might be higher than you'd think naively — not only do you directly save time, you might also have more emotional reserves to be consistent and disciplined if you get more breaks.

[-]juliawise4y60

I don't think I have anything much to add in the way of specific tips.

I do think I'm a worse parent when I have less support (when I was home on maternity leave with a newborn and toddler, or when Jeff has been traveling and I've been alone with both kids for longer stretches than usual.) I agree that having childcare available, either paid or any kind, can help you be more patient and in-control.

[-]Rafael Harth2y*50Review for 2021 Review

So, "Don't Shoot the Dog" is a collection of parenting advice based solely on the principle of reinforcement learning, i.e., the idea that kids do things more if they are rewarded and less if they're punished. It gets a lot out of this, including things that many parents do wrong. And the nicest thing is that, because everything is based on such a simple idea, most of the advice is self-evident. Pretty good, considering that learning tips are often controversial.

For example, say you ask your kid to clean her room, but she procrastinates on the task. When she finally does it, it's probably a bad idea to scold her for taking so long since that would primarily discourage the act of cleaning (which you want) rather than the delay (which you don't want) -- because the cleaning is what directly preceded your reaction. Pretty self-evident, right? But it's also something parents do wrong all the time.

The post is good enough to make me feel like I don't have to read the book myself. Since none of the concepts are difficult, this requires nothing fancy; the post just goes through all the advice sequentially and gives a straightforward explanation, usually with a quote from the book. It's super simple, but it works well.

My only complaint is the introduction; I think the post should (a) define behaviorism and (b) tell me why I should care about the book.

[-]kareempforbes9mo-10

This behavioral training method using positive and negative reinforcement, is something that I would recommend you use with animals and children before they can speak and reason. Our brains do this type of direct training, through our built in pleasure and pain feedback. Once the child is able to talk and reason, then it isn't very effective at all especially with complex choices like not doing drugs with friends when they are away from the parents. You are effectively trying to enforce your ideas and beliefs on to child's and taking away their own autonomy and desire to make their own decisions.

The most effective way to persuade a child about your point of view is (at least my children) to give them the information that is available, and if possible to show then the possible outcomes of their choices, good and bad. Once I arm my children with the appropriate information, then generally make the correct choices, of their own choosing. This is a much more powerful method and will be maintained when they are on their own. If they do not, they are already aware of what the consequences will be, and that is a powerful lesson that they will not forget. The aim is to develop a thinking and considerate person, that can make the correct decisions for their own lives, on their own.

LESSWRONG
LW

255

Notes from "Don't Shoot the Dog"

255

New to LessWrong?

255