Basics of Animal Reinforcement

Scott Alexander

Basics of Animal Reinforcement

4 min read5th Jul 201121 comments

73

Behaviorism historically began with Pavlov's studies into classical conditioning. When dogs see food they naturally salivate. When Pavlov rang a bell before giving the dogs food, the dogs learned to associate the bell with the food and salivate even after they merely heard the bell . When Pavlov rang the bell a few times without providing food, the dogs stopped salivating, but when he added the food again it only took a single trial before the dogs "remembered" their previously conditioned salivation response¹.

So much for classical conditioning. The real excitement starts at operant conditioning. Classical conditioning can only activate reflexive actions like salivation or sexual arousal; operant conditioning can produce entirely new behaviors and is most associated with the idea of "reinforcement learning".

Serious research into operant conditioning began with B.F. Skinner's work on pigeons. Stick a pigeon in a box with a lever and some associated machinery (a "Skinner box"²). The pigeon wanders around, does various things, and eventually hits the lever. Delicious sugar water squirts out. The pigeon continues wandering about and eventually hits the lever again. Another squirt of delicious sugar water. Eventually it percolates into its tiny pigeon brain that maybe pushing this lever makes sugar water squirt out. It starts pushing the lever more and more, each push continuing to convince it that yes, this is a good idea.

Consider a second, less lucky pigeon. It, too, wanders about in a box and eventually finds a lever. It pushes the lever and gets an electric shock. Eh, maybe it was a fluke. It pushes the lever again and gets another electric shock. It starts thinking "Maybe I should stop pressing that lever." The pigeon continues wandering about the box doing anything and everything other than pushing the shock lever.

The basic concept of operant conditioning is that an animal will repeat behaviors that give it reward, but avoid behaviors that give it punishment³.

Skinner distinguished between primary reinforcers and secondary reinforcers. A primary reinforcer is hard-coded: for example, food and sex are hard-coded rewards, pain and loud noises are hard-coded punishments. A primary reinforcer can be linked to a secondary reinforcer by classical conditioning. For example, if a clicker is clicked just before giving a dog a treat, the clicker itself will eventually become a way to reward the dog (as long as you don't use the unpaired clicker long enough for the conditioning to suffer extinction!)

Probably Skinner's most famous work on operant conditioning was his study of reinforcement schedules: that is, if pushing the lever only gives you reward some of the time, how obsessed will you become with pushing the lever?

Consider two basic types of reward: interval, in which pushing the lever gives a reward only once every t seconds - and ratio, in which pushing the lever gives a reward only once every x pushes.

Put a pigeon in a box with a lever programmed to only give rewards once an hour, and the pigeon will wise up pretty quickly. It may not have a perfect biological clock, but after somewhere around an hour, it will start pressing until it gets the reward and then give up for another hour or so. If it doesn't get its reward after an hour, the behavior will go extinct pretty quickly; it realizes the deal is off.

Put a pigeon in a box with a lever programmed to give one reward every one hundred presses, and again it will wise up. It will start pressing more on the lever when the reward is close (pigeons are better counters than you'd think!) and ease off after it obtains the reward. Again, if it doesn't get its reward after about a hundred presses, the behavior will become extinct pretty quickly.

To these two basic schedules of fixed reinforcement, Skinner added variable reinforcement: essentially the same but with a random factor built in. Instead of giving a reward once an hour, the pigeon may get a reward in a randomly chosen time between 30 and 90 minutes. Or instead of giving a reward every hundred presses, it might take somewhere between 50 and 150.

Put a pigeon in a box on variable interval schedule, and you'll get constant lever presses and good resistance to extinction.

Put a pigeon in a box with a variable ratio schedule and you get a situation one of my professors unscientifically but accurately described as "pure evil". The pigeon will become obsessed with pecking as much as possible, and really you can stop giving rewards at all after a while and the pigeon will never wise up.

Skinner was not the first person to place an animal in front of a lever that delivered reinforcement based on a variable ratio schedule. That honor goes to Charles Fey, inventor of the slot machine.

So it looks like some of this stuff has relevance for humans as well⁴. Tomorrow: more freshman psychology lecture material. Hooray!

FOOTNOTES

1. Of course, it's not really psychology unless you can think of an unethical yet hilarious application, so I refer you to Plaud and Martini's study in which slides of erotic stimuli (naked women) were paired with slides of non-erotic stimuli (penny jars) to give male experimental subjects a penny jar fetish; this supports a theory that uses chance pairing of sexual and non-sexual stimuli to explain normal fetish formation.

2. The bizarre rumor that B.F. Skinner raised his daughter in a Skinner box is completely false. The rumor that he marketed a child-rearing device called an "Heir Conditioner" is, remarkably, true.

3: In technical literature, behaviorists actually use four terms: positive reinforcement, positive punishment, negative reinforcement, and negative punishment. This is really confusing: "negative reinforcement" is actually a type of reward, behavior like going near wasps is "punished" even though we usually use "punishment" to mean deliberate human action, and all four terms can be summed up under the category "reinforcement" even though reinforcement is also sometimes used to mean "reward as opposed to punishment". I'm going to try to simplify things here by using "positive reinforcement" as a synonym for "reward" and "negative reinforcement" as a synonym for "punishment", same way the rest of the non-academic world does it.

4: Also relevant: checking HP:MoR for updates is variable interval reinforcement. You never know when an update's coming, but it doesn't come faster the more times you reload fanfiction.net. As predicted, even when Eliezer goes weeks without updating, the behavior continues to persist.

Psychology

Frontpage

73

Mentioned in

201A Crash Course in the Neuroscience of Human Motivation

169The Power of Reinforcement

122The Library of Scott Alexandria

35Reinforcement and Short-Term Rewards as Anti-Akratic

35Optimizing the Twelve Virtues of Rationality

New Comment

21 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:42 AM

[-]Will_Sawin13y300

4: Also relevant: checking HP:MoR for updates is variable interval reinforcement. You never know when an update's coming, but it doesn't come faster the more times you reload fanfiction.net. As predicted, even when Eliezer goes weeks without updating, the behavior continues to persist.

This type of situation was one of the main reasons I started using an rss feed reader to do almost all of my internet browsing.

[-]Pavitra13y70

I've arranged my workspace so that updates come to me without checking: MoR updates appear in my email, email in general has an icon on my taskbar, my RSS aggregator is on my taskbar, etc. The reward is thus entirely causally decoupled from behavior; I can never explicitly check at all, and still get the updates.

There are about four websites that I haven't been able to get RSS-based updates for (including my LW inbox), and I consider the need to manually check these a moderate inconvenience.

[-]Oscar_Cunningham13y20

Many free blogging services allow you to post to them by email, thus it's possible to bounce all of the alerts coming into your email to your rss reader, which I've found to be a useful trick.

[-]Alicorn13y00

There are about four websites that I haven't been able to get RSS-based updates for

http://page2rss.com/ is useful. I don't know if it'll work for things like your inbox that you have to login to see.

[-]Kaj_Sotala13y60

Other examples include checking for Facebook notifications, new e-mails, replies to your Less Wrong comments, or highlights/interesting discussions on IRC.

Those four combined have caused serious damage to my productivity, to the point where I'm forced to use things like LeechBlock and limiting myself to a slow Internet connection to stop myself from checking them all the time. And even those measures frequently fail.

[-]FiftyTwo13y120

Re reference 1. Another example of this working in humans is that apparently school children become conditioned to the class interval bell to the extent that they receive a boost of adrenaline.

Is it possible to deliberately establish such responses in oneself? For example if i wish to become more alert on demand could I pair stimulant use with another experience? I could see useful applications to this.

[-]Scott Alexander13y320

You might get the opposite result you expected.

We were talking about this at the last LA meetup in the context of place conditioning of drug reactions. Place conditioning occurs when you always take heroin (a drug that lowers your breathing rate) in the same place with the same people. You gradually develop a tolerance and take more and more heroin to get the same effects. Then one day you're on vacation and you take heroin in a different place with different people, and die.

The theory is that every time you're in your heroin spot with your heroin buddies (stimulus), your breath rate decreases and the nervous system has to raise the breath rate (response). Your nervous system gets pretty good at this, so even when you take large doses of heroin, you don't stop breathing. But if you take a large dose of heroin somewhere else, then your nervous system doesn't realize it has to raise the breath rate until too late, and the heroin decreases your breathing without any counter-efforts from the body and you die.

If you were to take a stimulant after playing a tone, your body would probably learn that your heart rate and various other metabolic parameters it likes to keep constant always go up after playing the tone, and as a result it would learn to lower heart rate etc. every time it hears the tone. When you play the tone without the stimulant, your body would lower metabolism and you wouldn't receive anything to counter that effect, so you'd end up less energetic and not more.

[-]jimmy13y150

Man, whoever told you about that must be well read ;-)

In all seriousness though, I would expect it to work for him, since the placebo effect seems to work for caffeine.

The "anti placebo effect" is the result of a simple classical conditioning. It looks like the strength is just some simple function of the relative timing of the CS and US. It 's an attempt to maintain homeostasis so you don't die.

The "regular placebo effect" is a different beast, and seems to have more or less full access to full cognitive capacity. The information about the contents of the pill gets fed 'down' to the "prior" input to your bayesian estimators. To the extent that the priors are strong and the data is weak, the final estimate sent back up can look a lot like the priors. This actually explains a ton of cool stuff, but that's for another comment/post.

The vast majority of examples of this "anti placebo effect" that I have read about involve injections (there was one story of oral intake). I'm really speculating here, but I think that short time frame things like injections would call up the "anti placebo" more strongly than "regular placebo". The first is that the other drug tolerance mechanisms can usually handle things when the onset is slow, but if the onset occurs in seconds, there has to be a quicker cognitive method to handle it. The second is that the signal to noise ratio is much higher with fast rise times which reduces the effect of the regular placebo effect, and increases the ease of learning the anti placebo effect.

Here are a couple sources on the classical conditioning effect on heroin tolerance.

[-]Scott Alexander13y30

Sorry not to credit you; I remembered it was someone at the LA meetup but not exactly who.

[-]FiftyTwo13y00

Thats fascinating, I'd never thought about a reverse effect from conditioning, but retrospectively it seems an obvious adaptation.

Does this mean that the 'regular placebo' effect is purely psychological? In which case could it be triggered by things we associate with certain affects but don't in fact have them? [Say if I sincerely believed lettuce had a strong stimulant effect]

[-]jimmy13y40

Does this mean that the 'regular placebo' effect is purely psychological?

Umm, sure? What would a non purely psychological placebo look like? I'm not quite sure which distinction you're making

In which case could it be triggered by things we associate with certain affects but don't in fact have them? [Say if I sincerely believed lettuce had a strong stimulant effect]

Of course. Sugar pills aren't really stimulants, depressants, anti-emetics, and everything else. Keep in mind that the effects come from the alief level, not the explicit belief level.

[-]TCB13y70

Slightly off-topic thought regarding penny jars and fetish formation:

I've heard that fetishes are more prevalent in cultures where sex is repressed. I always wondered why this would be the case (assuming that it is in fact true). One explanation is associations: if people are raised to think sex is dirty, or that sex is a necessary but base bodily function akin to using the bathroom, then they might fetishize urine or excrement. And if people are raised to think that sex is beastly and animalistic, they might fetishize things that are related to animals and violence.

However, the penny jar experiment suggests another, more "innocent" explanation. Perhaps it's simply that, in sexually repressed cultures, people don't have sex very often, or they do it in a special ritualized setting. If this is the case, then accidents of that ritual setting might become associated with the sexual act itself. So, perhaps the neurons corresponding to the distinctive pink pillows on a lover's bed get wired up with the neurons that correspond to actually having sex. Then later, the pink pillows are enough to cause arousal, and perhaps in extreme cases pink pillows later become /required/ for arousal. This presumably wouldn't happen as much if the sex-location changed frequently, or if the setting was not seen as an important component of the ritual.

The first hypothesis seems to be a better explanation of things like poop fetishes, while the second hypothesis might better explain things like lacy pink lingerie. What do you think?

Also, it goes without saying that I enjoyed your article. =)

[-]JoshuaZ13y60

I'm concerned that this sort of explanation could just as easily explain the exact opposite and so doesn't really give us much information. For example, one could imagine that in societies with less sexual repression, people are more likely to have sex or engage in sexual activity in a variety of circumstances and so have more opportunities to imprint on non-standard objects or situations. Moreover, the more open a society is about sex the more likely people are to hear about some fetish and decide to try it out just to see what it is like, and then get imprinted to it.

[-]TCB13y60

Those are valid concerns. Regarding the first, that's why I emphasized the ritual component of sex in a repressed society. I suspect that such a society would have very strict rituals for sex: it must occur only at specific times in specific locations, and in the presence of certain stimuli. Some examples of stimuli are candles or lacy lingerie or dim lighting. An example of a time is night. I've heard lots of comments to the effect that having sex in the middle of the day would be strange, and that sex is strictly a nighttime activity. This could be classified as a "nighttime fetish", perhaps. The ritual component of sex would serve to highlight the ritual times/locations/stimuli, causing them to imprint more strongly than other, non-ritualized components of the sexual act.

Regarding your second objection, while that definitely seems like a possibility, the variations and experimentation would probably mean that no one thing would imprint strongly enough to become a fetish, because its presence wouldn't correlate strongly enough with the sexual act.

[-]JoshuaZ13y100

The nightime/daytime issue seems to be more an issue of sex being taboo than being a fetish. And one thing that seems clear is that a lot of fetishes specifically revolve around breaking taboos.

Regarding the second issue, keep in mind that the degree of imprinting that occurs when someone is actively having sex is likely to be higher than simply the level of imprinting one would get from simply associating the fetish with sexually attractive images. It might not take more than a few times having sex with a specific fetish for it to imprint. This is further complicated by the fact that some people take sexual pleasure specifically in their partner's sexual pleasure, so one can have additional imprinting simply if the partner has a pre-existing fetish.

In general, I think we drastically overestimate the level of generalizations that we can make about societies sexual attitudes. Most of the notion of repression -> fetishes seems to come from Victorian Britain where things were highly repressed and had a lot of fetishtic behavior, especially BDSM. One reason some terms for BDSM equipment are so ornate is that they date from Victorian times and were sometimes euphemisms or the like. (The most obvious such example is the St. Andrews Cross.)

But other societies that were extremely open about sexuality also has a lot of fetishtic behavior. The ancient Greeks, ancient Romans and ancient Indians are all prominent examples.

Moreover, some societies just have wildly different notions about sex. For example, consider the Jews of the Talmudic time period. They are generally seen as sexually repressed and with good reason. The Talmud discusses how it must be dark when one has sex, and lists probably about 20 or 30 other taboos, some of which seem misogynist. But, a married woman has a right to at least a certain amount of sex a year while a man has no such right. Moreover, the Talmud when discussing non-standard forms of sex outlaws cunnilingus under the logic that if they do it the men will become addicted. To modern thought processes this seems very strange. Moreover, when discussing whether anal sex is permitted with a female on the receiving end, the Talmudic response is that obviously this is ok.

Cultural attitudes about sex and sexuality vary a lot.

[-][anonymous]9y40

Of course, it's not really psychology unless you can think of an unethical yet hilarious application, so I refer you to Plaud and Martini's study in which slides of erotic stimuli (naked women) were paired with slides of non-erotic stimuli (penny jars) to give male experimental subjects a penny jar fetish; this supports a theory that uses chance pairing of sexual and non-sexual stimuli to explain normal fetish formation.

3 words:

Porn induced bisexuality.

[-]John_Maxwell13y30

If anyone's interested, I wrote a post a year ago on applying findings from behavioral psychology for self-improvement:

http://lesswrong.com/lw/2dg/applying_behavioral_psychology_on_myself/

[-]pontifex13y20

Todd Becker has a good summary of tips & tricks gleaned from behaviorists:

http://gettingstronger.org/psychology/

[-][anonymous]13y10

That study where the experimental subjects developed a penny jar fetish. If that works, wouldn't you expect watching pornography to create an attraction to men? (Assuming it's a man watching heterosexual porn.)

I would expect that that effect would be even more pronounced, because the people watching it are putting themselves into a state of arousal on purpose. This could be an easy way of becoming bisexual, for those who are currently either gay or straight.

[-]XiXiDu13y10

Put a pigeon in a box on variable interval schedule, and you'll get constant lever presses and good resistance to extinction.

Put a pigeon in a box with a variable ratio schedule and you get a situation one of my professors unscientifically but accurately described as "pure evil". The pigeon will become obsessed with pecking as much as possible, and really you can stop giving rewards at all after a while and the pigeon will never wise up.

Absolutely fascinating! I have never heard of this before. Thanks.

[-]JackEmpty13y70

You've proably been in a skinner box though.

"Instant win" prizes on fast food and soft drinks.

WoW drop tables. Or basically any game where a prize or reward for victory is not guaranteed.

Moderation Log