The Power of Reinforcement

96 Post author: lukeprog 21 June 2012 01:42PM

Part of the sequence: The Science of Winning at Life

Also see: Basics of Animal Reinforcement, Basics of Human Reinforcement, Physical and Mental Behavior, Wanting vs. Liking Revisited, Approving reinforces low-effort behaviors, Applying Behavioral Psychology on Myself.

 

Story 1:

On Skype with Eliezer, I said: "Eliezer, you've been unusually pleasant these past three weeks. I'm really happy to see that, and moreover, it increases my probability than an Eliezer-led FAI research team will work. What caused this change, do you think?"

Eliezer replied: "Well, three weeks ago I was working with Anna and Alicorn, and every time I said something nice they fed me an M&M."

 

Story 2:

I once witnessed a worker who hated keeping a work log because it was only used "against" him. His supervisor would call to say "Why did you spend so much time on that?" or "Why isn't this done yet?" but never "I saw you handled X, great job!" Not surprisingly, he often "forgot" to fill out his worklog.

Ever since I got everyone at the Singularity Institute to keep work logs, I've tried to avoid connections between "concerned" feedback and staff work logs, and instead take time to comment positively on things I see in those work logs.

 

Story 3:

Chatting with Eliezer, I said, "Eliezer, I get the sense that I've inadvertently caused you to be slightly averse to talking to me. Maybe because we disagree on so many things, or something?"

Eliezer's reply was: "No, it's much simpler. Our conversations usually run longer than our previously set deadline, so whenever I finish talking with you I feel drained and slightly cranky."

Now I finish our conversations on time.

 

Story 4:

A major Singularity Institute donor recently said to me: "By the way, I decided that every time I donate to the Singularity Institute, I'll set aside an additional 5% for myself to do fun things with, as a motivation to donate."


The power of reinforcement

It's amazing to me how consistently we fail to take advantage of the power of reinforcement.

Maybe it's because behaviorist techniques like reinforcement feel like they don't respect human agency enough. But if you aren't treating humans more like animals than most people are, then you're modeling humans poorly.

You are not an agenty homunculus "corrupted" by heuristics and biases. You just are heuristics and biases. And you respond to reinforcement, because most of your motivation systems still work like the motivation systems of other animals.

 

A quick reminder of what you learned in high school

  • A reinforcer is anything that, when it occurs in conjunction with an act, increases the probability that the act will occur again.
  • A positive reinforcer is something the subject wants, such as food, petting, or praise. Positive reinforcement occurs when a target behavior is followed by something the subject wants, and this increases the probability that the behavior will occur again.
  • A negative reinforcer is something the subject wants to avoid, such as a blow, a frown, or an unpleasant sound. Negative reinforcement occurs when a target behavior is followed by some relief from something the subject doesn't want, and this increases the probability that the behavior will happen again.

 

What works

  1. Small reinforcers are fine, as long as there is a strong correlation between the behavior and the reinforcer (Schneider 1973; Todorov et al. 1984). All else equal, a large reinforcer is more effective than a small one (Christopher 1988; Ludvig et al. 2007; Wolfe 1936), but the more you increase the reinforcer magnitude, the less benefit you get from the increase (Frisch & Dickinson 1990).
  2. The reinforcer should immediately follow the target behavior (Escobar & Bruner 2007; Schlinger & Blakely 1994; Schneider 1990). Pryor (2007) notes that when the reward is food, small bits (like M&Ms) are best because they can be consumed instantly instead of being consumed over an extended period of time.
  3. Any feature of a behavior can be strengthened (e.g., its intensity, frequency, rate, duration, persistence, its shape or form), so long as a reinforcer can be made contingent on that particular feature (Neuringer 2002).

 

Example applications

  • If you want someone to call you, then when they do call, don't nag them about how they never call you. Instead, be engaging and positive.
  • When trying to maintain order in a class, ignore unruly behavior and praise good behavior (Madsen et al. 1968; McNamara 1987).
  • Reward originality to encourage creativity (Pryor et al. 1969; Chambers et al. 1977Eisenberger & Armeli 1997; Eisenberger & Rhoades 2001).
  • If you want students to understand the material, don't get excited when they guess the teacher's password but instead when they demonstrate a technical understanding.
  • To help someone improve at dance or sport, ignore poor performance but reward good performance immediately, for example by shouting "Good!" (Buzas & Allyon 1981) The reason you should ignore poor performance if you say "No, you're doing it wrong!" you are inadvertently punishing the effort. A better response to a mistake would be to reinforce the effort: "Good effort! You're almost there! Try once more." 
  • Reward honesty to help people be more honest with you (Lanza et al 1982).
  • Reward opinion-expressing to get people to express their opinions more often (Verplanck 1955).
  • You may even be able to reinforce-away annoying involuntary behaviors, such as twitches (Laurenti-Lions et al. 1985) or vomiting (Wolf et al. 1965).
  • Want a young infant to learn to speak more quickly? Reinforce their attempts at vocalization (Ramely & Finkelstein 1978).
  • More training should occur via video games like DragonBox, because computer programs can easily provide instant reinforcement many times a minute for very specific behaviors (Fletcher-Flinn & Gravatt 1995).

For additional examples and studies, see The Power of Reinforcement (2004), Don't Shoot the Dog (2006), and Learning and Behavior (2008).

 

I close with Story 5, from Amy Sutherland:

For a book I was writing about a school for exotic animal trainers, I started commuting from Maine to California, where I spent my days watching students do the seemingly impossible: teaching hyenas to pirouette on command, cougars to offer their paws for a nail clipping, and baboons to skateboard.

I listened, rapt, as professional trainers explained how they taught dolphins to flip and elephants to paint. Eventually it hit me that the same techniques might work on that stubborn but lovable species, the American husband.

The central lesson I learned from exotic animal trainers is that I should reward behavior I like and ignore behavior I don't. After all, you don't get a sea lion to balance a ball on the end of its nose by nagging. The same goes for the American husband.

Back in Maine, I began thanking Scott if he threw one dirty shirt into the hamper. If he threw in two, I'd kiss him. Meanwhile, I would step over any soiled clothes on the floor without one sharp word, though I did sometimes kick them under the bed. But as he basked in my appreciation, the piles became smaller.

I was using what trainers call "approximations," rewarding the small steps toward learning a whole new behavior...

Once I started thinking this way, I couldn't stop. At the school in California, I'd be scribbling notes on how to walk an emu or have a wolf accept you as a pack member, but I'd be thinking, "I can't wait to try this on Scott."

...After two years of exotic animal training, my marriage is far smoother, my husband much easier to love.

 

Next post: Rational Romantic Relationships Part 1

Previous post: The Good News of Situationist Psychology

 

 

My thanks to Erica Edelman for doing much of the research for this post.

Comments (467)

Comment author: [deleted] 21 June 2012 01:13:19AM 2 points [-]

Excellent article. I wonder if reinforcement could be used to speed up rationality training? I would love to see a study done on that.

Comment author: lukeprog 21 June 2012 01:42:13AM 4 points [-]

I wonder if reinforcement could be used to speed up rationality training?

Almost certainly. CFAR is doing this to some extent at their minicamps, but much more can be done. There is tons of room for rationality training via videogames, for example. Raytheon is developing some debiasing video games for military officers.

Comment author: [deleted] 21 June 2012 01:46:24AM 2 points [-]

Now that is exciting as hell. I've always felt that rationality was a potential competitive advantage for organizations that wasn't being utilized. Way to go.

Comment author: Swimmer963 21 June 2012 01:19:27AM *  11 points [-]

To help someone improve at dance or sport, ignore poor performance but reward good performance immediately, for example by shouting "Good!" (Buzas & Allyon 1981) The reason you should ignore poor performance if you say "No, you're doing it wrong!" you are inadvertently punishing the effort. A better response to a mistake would be to reinforce the effort: "Good effort! You're almost there! Try once more."

I got a demonstration of how true this is yesterday when, during my taekwondo class, I was paired up with one of the senior black belt students, who has some but not a lot of experience teaching. He was supposed to be fixing up my poomsae (same thing as a kata in karate) and each time he watched me do it, I would finish and he would immediately launch into a description of what I was doing wrong. His feedback was pretty useful–specific, with demonstrations of exactly what to change in order to do it right–but without any prelude of "yay, good job!" or even "okay, the punches were way better that time...now let's work on the stances", I found myself getting really discouraged. Reminding myself that I wasn't actually doing worse than usual, that he just had a different teaching style, helped a little... But my subconscious brain still decided to feel resentful and unenthusiastic, no matter how counterproductive that might be towards my actual goal of improving my poomsae.

As a swimming instructor, I do make sure to dole out a LOT of praise, but I'm wondering if I should push it even further...

Comment author: [deleted] 21 June 2012 01:32:15AM *  8 points [-]

I'm not sure a lot of praise is a good idea since that would lower its effectiveness as a reinforcer.

Comment author: Swimmer963 21 June 2012 02:01:00AM 12 points [-]

Well, a lot of non-specific praise would water down the value of non-specific praise as a reinforcer, but taking the time to pick out more specific elements that are good/improving would probably reduce discouragement.

I think one of the things I forget most as an instructor is how easy it is to get discouraged, especially when you're being taught by someone who seems to be able to do all of it effortlessly. There's also the element of "I already know I'm doing it wrong! I just can't get my body to listen to my brain!" Instructors who don't acknowledge this and give praise for trying or noticing that I'm doing it wrong are a major source of discouragement for any new physical skill I try to learn.

Comment author: [deleted] 21 June 2012 02:14:01AM 1 point [-]

Excellent point. I stand corrected.

Comment author: Swimmer963 21 June 2012 02:23:58AM 4 points [-]

I think you do have a valid point... However, in my experience, most instructors err way on the side of "too little praise" and don't have to worry about using it too much and lowering its effectiveness. And most humans I know have a brain setup where after hearing "good job on X" ten times, hearing it an eleventh time is still really reinforcing. So you'd have to really go to extremes to praise them too much...

Comment author: NancyLebovitz 22 June 2012 03:12:39AM 2 points [-]

"I already know I'm doing it wrong! I just can't get my body to listen to my brain!

Any advice on getting one's body or one's student's body to be more cooperative?

Comment author: phonypapercut 21 June 2012 02:06:30AM 0 points [-]

Would it? There would be greater contrast between the reinforcement and the ignoring of poor performance.

Comment author: [deleted] 21 June 2012 02:15:39AM 2 points [-]

Well the idea I was going for was that it would be better to praise improvements in skill rather than just good performance.

Comment author: Armok_GoB 21 June 2012 02:25:59PM 1 point [-]

Hmm, I wonder if providing a lot of negative reinforcement on some attribute of them you don't care about would make the positive reinforcements more effective on the things you do care about.

Example: trying to teach someone math, and praising them at everything they do right with the math, including trying, but complain abut their physique, fashion choices, hygiene, etc. Especially timing those unrelated complaints to when they seem less focused on the math but subtly enough they don't consciously notice the correlation.

Not that this isn't a bad idea for other unrelated reasons...

Comment author: wedrifid 21 June 2012 02:38:04PM *  1 point [-]

Hmm, I wonder if providing a lot of negative reinforcement on some attribute of them you don't care about would make the positive reinforcements more effective on the things you do care about.

The example you give is either punishment of the other attributes or negative reinforcement of the desired behavior (if you look at it from the perspective of taking away the aversive stimulus only when the math is done.)

Comment author: TheOtherDave 21 June 2012 03:24:48PM 2 points [-]

There's a couple of factors here worth keeping in mind.

One is that classical conditioning continues to work, even when I'm concentrating on operant conditioning. So one result of this strategy is that my target will come to associate me with aversive stimuli, which will in turn reduce the effectiveness of my attempts at reinforcement. They will similarly associate the teaching sessions and math with those stimuli, which may be counterproductive.

Another is that a target consciously noticing my attempts at conditioning changes the whole ball game, in ways I don't entirely understand and I'm not sure are entirely understood. Sometimes it's a huge win. Sometimes it's a huge lose. Staying subtle is more predictable, if I can do it, but of course it's not always possible to avoid detection, and sometimes it's better to admit to my attempts at conditioning than to be caught out at them. The safest move is to first establish a social context where my attempts at conditioning can be labelled "manners," such that any attempt to call me out on them is inherently low-status, but that's not always possible either.

When using praise signals as reinforcers for systems, like some humans, who are capable of skepticism about my motives, it helps to be seen to use expensive signals. (Attention often works well, which is one reason Internet trolls are so persistent.) Of course, that typically means I have to invest resources into my conditioning efforts.

In general, the approach I endorse is to maintain (and adjust as needed) a consistent threshold of evaluation, ignore behavior that falls below that threshold, reward behavior that clears it, and resist the temptation to go meta about the process.

Comment author: [deleted] 21 June 2012 05:47:50PM 1 point [-]

Sounds like an interesting idea for an experiment, although it would probably violate ethical guidelines. :P

Comment author: TheOtherDave 21 June 2012 01:36:18AM 6 points [-]

"Don't Shoot the Dog" remains my favorite book for these sorts of anecdotes, as well as some of the theory and a lot of the practice. I recommend it.

Comment author: JGWeissman 21 June 2012 01:58:56AM 11 points [-]

On Skype with Eliezer, I said: "Eliezer, you've been unusually pleasant these past three weeks. I'm really happy to see that, and moreover, it increases my probability than an Eliezer-led FAI research team will work. What caused this change, do you think?"

Eliezer replied: "Well, three weeks ago I was working with Anna and Alicorn, and every time I said something nice they fed me an M&M."

If I recall my high school psychology class correctly, you can get a stronger and more persistent effect by secretly rolling a dice and note the number, and when Eliezer says that many nice things, give him an M&M, roll the dice again for a new target number of nice things.

Comment author: TheOtherDave 21 June 2012 02:12:48AM 19 points [-]

That's true and false. Intermittent reinforcement gets a more robust effect than continual reinforcement, yes, but randomly intermittent reinforcement isn't as effective as setting the reward threshold higher as the behavior becomes more common... e.g., rewarding only the 10% nicest things.

Comment author: matt 21 June 2012 07:09:46PM 5 points [-]

I want to design a reinforcement schedule in one of our apps. Can anyone link me to some specific guidelines on how to optimise this?

(Reinforce exactly what % of successes (30%? 26%? 8%?)? Reinforce performances in the top 10% of past performances (or the top 12%, or the top 8%?)? How does time factor (if the user hasn't used the app for a week, should I push a reinforcer forward?)?)

Comment author: TheOtherDave 21 June 2012 07:17:44PM 1 point [-]

I can't, but if you find anything concise and useful, I'd love to hear about it myself.

My rule of thumb is to set the threshold so as to reinforce the top 20% or so of performances, and arrange performance frequencies so I'm reinforcing 2-3 times/minute during active training periods. But that's not based on anything.

I'll also note that reinforcing higher-tier performances more strongly works really well (though is hard to do consistently by hand), as do very intermittent "jackpots" (disproportional and unpredictable mega-rewards).

Comment author: dbaupp 21 June 2012 05:41:52AM 4 points [-]

Some previous discussion about this form of conditioning.

Comment author: ciphergoth 21 June 2012 06:18:23AM 5 points [-]

When the threshold is "something nice", there's going to be randomness in the reinforcement anyway.

Comment author: tgb 21 June 2012 02:42:27AM 21 points [-]

I like this article because it is reasonably short, but very clear and highly actionable.

Comment author: sketerpot 21 June 2012 11:21:58PM 13 points [-]

This compliment is particularly effective because it's specific, verifiable, and true. I've never been very good at accepting vague compliments -- I tend to get embarrassed and self-conscious -- but more specific compliments are really nice.

Comment author: MBlume 21 June 2012 03:00:14AM 35 points [-]

Good post! Thank you for writing it Luke =)

Comment author: arundelo 21 June 2012 03:11:04AM 6 points [-]

I see what you did there!

Comment author: [deleted] 21 June 2012 08:33:42AM 0 points [-]

(I didn't until EY pointed that out.)

Comment author: Eliezer_Yudkowsky 21 June 2012 03:57:36AM 29 points [-]

Thanks for reinforcing Luke! And it's great that you applied the theory so quickly!

Comment author: JGWeissman 21 June 2012 04:06:19AM 20 points [-]

Yay recursive reinforcement!

Comment author: Eliezer_Yudkowsky 21 June 2012 04:19:53AM 8 points [-]

Why, thanks! It's helpful to hear you say that!

Comment author: Dorikka 21 June 2012 04:51:16AM -1 points [-]

Moar recursion! Keep it up! :D

Comment author: Will_Newsome 21 June 2012 04:56:58AM 28 points [-]

No. Unreflective happy death spirals get people killed. Shame on all of you for being bad people.

Comment author: RomeoStevens 21 June 2012 05:49:25AM 11 points [-]

I'm glad you mentioned this.

Comment author: Will_Newsome 21 June 2012 05:59:39AM *  9 points [-]

Don't be glad. If you need reinforcement, be relieved. Gladness tends to cause unreflective happy death spirals. Shame on you for being glad.

Presumably the emotion you actually felt was relief, and "glad" was merely used as an inaccurate/misleading synonym? In which case, shame on you for using inaccurate/misleading synonyms.

(I'm totally at least a quarter serious, maybe half.)

Comment author: JGWeissman 21 June 2012 06:06:54AM 12 points [-]

Thank you for wanting us to not have unreflective happy death spirals. I will have to repeat the behavior that caused you to express such caring.

Comment author: Will_Newsome 21 June 2012 06:17:24AM *  3 points [-]

I don't want you to not have unreflective happy death spirals, I'm just horrified at the potential consequences of not going out of my way to prevent you from having unreflective happy death spirals. Shame on you for imprecision and/or implicitly accusing me of hypocrisy.

Comment author: Viliam_Bur 21 June 2012 09:31:47AM 21 points [-]

I guess now it's the right time to say big thanks to everyone who didn't contribute to this thread!

Comment author: CharlieSheen 21 June 2012 02:35:41PM 24 points [-]

I think I'm going to be ill if this continues.

Comment author: CommanderShepard 21 June 2012 03:25:13PM *  5 points [-]

"god this is even more phygish than just that quote about eliezer getting fed mnms"

Comment author: Will_Sawin 21 June 2012 05:33:51PM 4 points [-]

what's "phygish"?

Comment author: shminux 21 June 2012 05:37:50PM 3 points [-]
Comment author: radical_negative_one 21 June 2012 07:14:21PM -1 points [-]
Comment author: John_Maxwell_IV 22 June 2012 01:34:58AM 2 points [-]

That strikes me as goofy, not phygish.

Comment author: Dorikka 22 June 2012 03:21:57AM 0 points [-]

I agree, so much that I think I might be missing something.

Comment author: Rain 21 June 2012 03:13:15AM *  12 points [-]

That's why I tried to stay positive when talking about the new SI website. Especially with technical changes like that, the (vocal) negative response can be overwhelming.

Comment author: lukeprog 21 June 2012 03:37:15AM 22 points [-]

Yup. When reading through the comments about the new website, I could feel my effort being punished.

Comment author: wedrifid 21 June 2012 05:16:20AM 2 points [-]

Yup. When reading through the comments about the new website, I could feel my effort being punished.

I am slightly surprised to hear this. I perhaps expected slightly less emotional involvement with the effort and more of a "<Website minion/>, Go! Fix!" feeling.

Comment author: lukeprog 21 June 2012 05:20:05AM 2 points [-]

What happened is that (1) I felt my effort being punished, and then (2) I sent an email to Nickolai or Kamil asking them to fix X.

Comment author: wedrifid 21 June 2012 05:26:36AM 10 points [-]

I sent an email to Nickolai or Kamil asking them to fix X.

Great work Nickolai or Kamil, if either of you read lesswrong at all. The website is a much needed improvement! ;)

Comment author: wedrifid 21 June 2012 05:48:36AM 4 points [-]

(2) I sent an email to Nickolai or Kamil asking them to fix X.

I've noticed (while being such a minion) that when making such change requests yourself manage to do so with a frame that minimises a criticism vibe or 'effort punishment' feelings. I would pay many, many M&Ms for that effort in careful phrasing.

Comment author: NancyLebovitz 22 June 2012 03:16:09AM 0 points [-]

Thank you for toughing it out.

I'm sorry if my comments were too harsh.

Comment author: John_Maxwell_IV 21 June 2012 05:45:32AM 5 points [-]

Sorry about that.

It seems to me that if humans were emotionless utility maximizers, we would prefer hearing criticism over praise, the same way programmers purchase more utility by fixing bugs in their programs than polishing features that already work. I suspect criticism is generally more valuable from a pure decision theoretic perspective.

I wonder if there is an effective way to buy encouragement and criticism separately. Also, it's hard to know exactly how best to encourage folks. In theory it's possible that making a new website is not the best use of SI's resources, which suggests reinforcement would not be optimal. But we still may want to reinforce you towards the more general behavior of taking steps to achieve your organizational goals. So what's the best response?

Maybe someone can develop some general guidelines for reinforcing/criticizing people, similar to what the nonviolent communication people came up with. (When {observable event} happened, I felt {feeling} because I need/value {underlying need that felt unmet or value that felt jeopardized}. Would you be willing to {specific request that person could do} in the future?) E.g. check to see if the person was acting with good intentions and reinforce them for those if they existed, check for super goals you endorse and reinforce them for working to accomplish those, check to see if the person could just have easily have sat around doing nothing and reinforce them for expending effort if this was the case, etc.

I think optimally criticism would have lots more reinforcers associated with it: people should be reinforced for requesting, giving, and receiving criticism because these are all activities that are naturally aversive but actually have high expected value.

So, I wholeheartedly endorse the following actions of yours: attempting to maximize humanity's collective utility function, working on the super goal of AGI safety, actually doing stuff, and deliberately gathering critical feedback. Go Luke!

Comment author: pjeby 21 June 2012 04:52:37PM 15 points [-]

Yup. When reading through the comments about the new website, I could feel my effort being punished.

Perhaps you could have somebody read them for you and summarize them in a non-critical way, thus creating a reinforcement shield.

Alternately, you could adapt what internet marketing "personalities" do, and promote doing: practice celebrating criticism. One marketer (I forget which one) described making a practice of throwing his hands in the air and shouting "Woo!" when he received a criticism via email.

(Background: "personality" marketers promote by writing emotionally charged material that's intended to divide their audience into people who either love or hate them. Thus, the presence of hate mail is evidence that their strategy is working. They will then often publicize the hate mail, in order to stir up the emotions of the people on the opposite side of the debate. Talk radio hosts, bloggers, political commentators, etc. also use these strategies, even if they're not always considered "marketers" in a traditional sense. Whether you consider this "dark arts" is largely a political question, since the LW sequences use these tactics also. Whether he knows it or not, Eliezer is a personality marketer in this sense, it's just that he's not as efficiently monetizing the results. ;-) )

Comment author: dbaupp 21 June 2012 05:49:48AM *  2 points [-]

I tried to do the same. Although, I was probably significantly less successful than I'd liked to have been (sorry Luke, Nickolai, Kamil and anyone else who'd made an effort!).

Also, given lukeprog's comment, this unfortunately appears to be a case of history repeating itself: matt had a similarly negative experience when LW was redesigned a little while ago.

Comment author: wedrifid 21 June 2012 06:28:39AM 7 points [-]

matt had a similarly negative experience when LW was redesigned a little while ago.

That circumstance is somewhat different in nature. While as far as I know nobody wanted matt to experience negative affect the discouragement of 'effort' was actually a perceived instrumental good, given an expectation that more effort would produce undesired outcomes.

I note that this relies on beliefs at the time. In that context users had to make the prediction "If a website administrator implements detrimental changes when previous discussion had already explained why such a thing was not desired and a prediction had been made that a change to the website would probably be bad, what is the probability that future 'effort' will be beneficial?" The answer is very, very low. The emotional distress matt experience was his social instincts warning him that interfering with the tribe when they do not want you to is a dangerous act - especially when that interference is to (in effect) institute a prohibition against something they could previously do.

It turns out, however, that matt is a superior human being to the typical person in his role. While his ego did cause him to act more defensive than optimal and seemed to cause him to experience emotional distress it did not cripple his ability to respond to user feedback or cause him to lash out with actions against the users as many would. The undesired change was eventually fixed, as were the few bugs that were introduced.

I expect users to drastically update how they would respond to matt if he made future website upgrades due to having more information about matt. He definitely deserves a lot of rewarding for going ahead and doing the bugfixes and implementing 'retraction then deletion' despite having received discouragement. He lives (here) in Melbourne. Perhaps I should give him a packet of M&Ms if I run in to him at one of our meetups!

Comment author: lukeprog 21 June 2012 03:35:23AM 23 points [-]

Reason #228 I'm crazy and irrational: Without conscious attention to the reinforcement process, my behaviors are selected for reinforcement almost at random. The process selecting behaviors for reinforcement has tons of steps in it like "Did I happen to glance in the direction of the bag of M&Ms right now?" instead of "Is the thing I'm doing now something I want to reinforce?"

Comment author: TheOtherDave 21 June 2012 03:39:47AM 9 points [-]

(nods) For my own part, it's frequently worse than random... when I don't attend to what I'm doing, I frequently berate or otherwise punish myself for attempts to achieve a target that fall short of that target, and I'm more likely to do that the more I value achieving the target. Which is a great way to extinguish the behaviors I value.

Comment author: Viliam_Bur 21 June 2012 09:45:25AM *  5 points [-]

I suspect it's very difficult to design the right reinforcement strategy. It's easy to reward something that seems related to the goal, but can gradually become a replacement for the goal.

For example rewarding success and punishing failure reinforces choosing only trivial tasks, which prevents learning new things. Rewarding starting new things reinforces starting new tasks without finishing them, also choosing tasks for being new, not being useful. Etc.

Rational thinking about consequences, and changing the strategy when necessary, cannot be avoided. So perhaps this should be reinforced. But how do we distinguish between genuine rationality and signalling? Yeah, rationalists should win, but by rewarding success and punishing failure... see the previous paragraph.

Anyway, many people do worse than random, so some reinforcement can be used to improve the situation.

EDIT: Another problem: I suspect that any reinforcement inevitably goes meta. When I get a reward for doing X, I will do X more, but I will also like the reinforcement mechanism more. When I get punished for doing Y, I will do Y less, but I will also hate the reinforcement mechanism and rationalize why I must get rid of it.

I suspect that people prefer wireheading, except in cases when it becomes too obvious that it is wireheading. If I am allowed to choose my reinforcement mechanisms, I will probably unknowingly slowly optimize them towards wireheading. If someone else chooses my reinforcement mechanisms, I suspect they will choose it to optimize their utility function instead of mine.

Comment author: hvass 21 June 2012 04:22:17AM 1 point [-]

Thanks, Luke! I've always enjoyed this sequence. (It's funny that I was tempted to include a note that I would've been happier if you contributed to the sequence more often, but let's stick with the praise for now. :-)

Comment deleted 21 June 2012 04:44:22AM *  [-]
Comment author: Will_Newsome 21 June 2012 04:51:33AM 16 points [-]

Eagerly awaiting "The Power of Punishment".

Comment author: Dorikka 21 June 2012 04:56:38AM 0 points [-]

I'm curious -- where did your other post (a few paragraphs) go? I didn't think that people could permanently delete posts, only retract them, and I thought that a star appeared if you edited your post.

Comment author: Will_Newsome 21 June 2012 04:58:17AM *  3 points [-]

You get the option to delete if you retract and no one's commented. Which is perhaps not good, because I made a rather embarrassing terminological error in that comment that I probably deserve to be punished for.

Comment author: Dorikka 21 June 2012 04:59:16AM 0 points [-]

Gotcha -- thanks.

Comment author: Will_Newsome 21 June 2012 05:00:50AM *  0 points [-]

But for some reason I can still see where the comment used to be, now with a "comment deleted" indicator, and it says it has one child, but it doesn't. Perhaps a synchronization error.

Comment author: JGWeissman 21 June 2012 05:09:33AM 2 points [-]

I had replied to point out the terminological error. You must have deleted after I started the comment but before I submitted. I then notice your comment was deleted, so I deleted my response. (It might be a good idea to not allow a response to be submitted after the parent is deleted.)

Comment author: Will_Newsome 21 June 2012 06:09:36AM 3 points [-]

Here's a fixed, less passive-agressive version of the deleted comment:

This article implicitly reinforces reinforcement and punishes punishment. But there are situations in which punishment should be reinforced, e.g. if this article is in fact correct to punish punishment. I hope someday someone writes out a list of ways to efficiently torture oneself into having at least some hope of ultimately not being seen as obviously stupid in retrospect, to complement this article and perhaps adjust for any optimistic-biased selection that might have generated it.

Comment author: wedrifid 21 June 2012 05:10:45AM 8 points [-]

Eagerly awaiting "The Power of Punishment".

Particularly good for demonstrating to observers that you have more status and power than the person you are punishing.

Comment author: Will_Newsome 21 June 2012 05:14:54AM 1 point [-]

(demonstrating to observers / demonstrating to self / demonstrating to punished; status / power / resources / justification / need / etc; person / cognitive subsystem / institution / problem representation / etc)

Comment author: Viliam_Bur 21 June 2012 10:04:17AM *  6 points [-]

meh. downvoted.

(just joking)

Comment author: JulianMorrison 21 June 2012 09:25:43AM 12 points [-]

Anecdotally, punishment seems to be a good guilt-releaser, while guilt is dysthymic. Punishment may be effective at snapping someone out of a blue funk and getting them to be responsive to rewards. Guilty people reject rewards. (The above may work better if you are kinked that way.)

Comment author: AdeleneDawner 21 June 2012 05:03:47AM 36 points [-]

Bit of a tangent, but if you ever run across someone for whom this doesn't seem to work, check the hypothesis that they don't parse praise as a positive reinforcer. I don't know how common this is, but I actually have to make a conscious effort to keep it from acting as a mild punishment in most cases when it's applied to me. (Ditto M&Ms in the given context, I expect. Attention Bad.)

Comment author: [deleted] 21 June 2012 05:20:21AM 2 points [-]

I'd have to say that it shouldn't be that common. Most people want to be praised.

Comment author: JGWeissman 21 June 2012 05:35:09AM 3 points [-]

I suspect it is common enough that when you observe that praising someone doesn't reinforce their behavior or makes them uncomfortable, you should consider that they might have an unusual aversion to praise.

Comment author: AdeleneDawner 21 June 2012 06:27:00AM 1 point [-]

Yep. It's not a situation you're likely to come across often, but when you do, it's worth having the alternate theory available to check.

Comment author: pjeby 21 June 2012 04:41:34PM 9 points [-]

I suspect it is common enough that when you observe that praising someone doesn't reinforce their behavior or makes them uncomfortable, you should consider that they might have an unusual aversion to praise.

And also, that you might just be really bad at it. ;-)

This was my problem for quite a while: believing that I ought to praise people, while alieving that there wasn't anything to praise and that they didn't deserve it, due to all their obvious imperfections.

This, as you can imagine, produced sub-optimal results. ;-)

Comment author: erratio 21 June 2012 01:21:35PM 19 points [-]

Most people want to be sincerely praised. Someone who reads this post and applies it poorly is going to be saying praise while their body language says something else entirely. Or acting out of character for themselves, leading the reinforcee to suspect that the praise is insincere. Or they may go around praising seemingly everything, causing the reinforcee to interpret the praise as meaningless noise.

There are lots of ways for using praise as reinforcement to go wrong, and if someone is in one of those environments for long enough they will end up being conditioned to interpret praise as neutral or negative.

Comment author: [deleted] 21 June 2012 06:37:00AM *  10 points [-]

You are correct that there are many kinds of reinforcers, and it's important to make sure that the one you choose to use is something the receiver will desire.

"In other studies, animals and people given a choice between performing a task for either of two reinforcers often show strong preferences (Parsons & Reid, 1990; Simmons, 1924). Identifying preferred reinforcers can improve the effectiveness of a reinforcement procedure in applied settings (Mace et al., 1997).”

-Learning and Behavior, p149

Comment author: Viliam_Bur 21 June 2012 09:56:42AM 3 points [-]

Yes, the situation is usually not so easy that behavior is just a result of inputs, like this:

output := f(input)

People have minds, and a mind is an environment, different for different people. The real equation would be more like this:

[mind1, output] := f([mind0, input])

For example many people like attention of others, but some people may be trained (for example by a previous abuse) that attention of others is usually followed by pain. For them, a positive reinforcement by giving them attention wouldn't work, because the important things is not the attention per se, but what it means for them.

On a meta level, for someone even the idea of "learning" or "improving" or "changing" may be already associated with pain, so they will resist any such process if they notice it. A human mind can be messed up rather easily.

Comment author: Will_Newsome 21 June 2012 10:17:15PM 8 points [-]

Furthermore at least one person I know (er, myself) picks up on any sort of test-like or game-like or we're-judging-you-so-you-better-not-screw-up-like context and starts acting in extremely confusing/uninformative/atypical/misleading ways so as not to be seen as the kind of person who is easily manipulable (there are probably other motivations involved too). Any incentive structure I'm put under thus has to somehow take this into account, even e.g. the LessWrong karma system. Explicitly manipulative socially mediated praise/M&Ms would strike my brain as outright evil and would stand some chance of being inverted entirely. That said I don't get the impression this sort of defense mechanism is very common.

Comment author: TheOtherDave 21 June 2012 10:21:21PM 2 points [-]

Explicitly manipulative socially mediated praise/M&Ms would strike my brain as outright evil and would stand some chance of being inverted entirely. That said I don't get the impression this sort of defense mechanism is very common.

I experience this as common, but I suspect it's because of a small number of exceptionally vocal "manipulation is evil!" types in my life, rather than a larger number of typically vocal ones.

Comment author: Vladimir_Golovin 21 June 2012 09:44:46AM 11 points [-]
  1. Nice post SIAI! Have an $5 donation!

  2. I tried a similar reinforcement technique on myself but it didn't stick because I couldn't find a reliable trigger condition for dispensing the reward.

  3. Does this mean that we should stop punishing ourselves for procrastination?

Comment author: Kaj_Sotala 21 June 2012 11:14:47AM *  14 points [-]

Does this mean that we should stop punishing ourselves for procrastination?

My personal experience strongly suggests that "stop punishing yourself for X" helps avoid X, for most if not all X. For instance, becoming a vegetarian was much easier when I didn't try to go cold turkey, but rather was fine with the fact that I would succumb to the lure of eating meat every now and then. When I did, I felt a little guilty, but then shrugged and thought that I'd try better the next time. I still fall victim to that temptation occasionally, but it's much more rare now than it used to be.

This might have something to do with the fact that if you punish yourself for trying and failing, you stop wanting to try in the first place, as it becomes associated with the negative emotions. Also, accepting and being okay with the occasional failure makes you treat it as a genuine choice where you have agency, not something that you're forced to do against your will.

See also It's okay to be (at least a little) irrational.

Comment author: Vladimir_Golovin 21 June 2012 12:05:24PM 5 points [-]

Perhaps this is why I like Autofocus better than GTD. "It is fine to have incomplete tasks in your task list".

Also, non-punishment for failures may be one of the distinctions between play-like work and work-like work.

Comment author: philh 21 June 2012 10:12:05AM 0 points [-]

I think next time I go shopping, I'll buy a pack of M&Ms, and take one whenever I make a git commit.

Comment author: thomblake 21 June 2012 02:17:20PM 2 points [-]

Careful not to over-reinforce! Think of the commit logs!

Comment author: Gastogh 21 June 2012 10:26:57AM 1 point [-]

On Skype with Eliezer, I said: "Eliezer, you've been unusually pleasant these past three weeks. I'm really happy to see that, and moreover, it increases my probability than an Eliezer-led FAI research team will work. What caused this change, do you think?"

Eliezer replied: "Well, three weeks ago I was working with Anna and Alicorn, and every time I said something nice they fed me an M&M."

Made me smile. Thanks for sharing.

Comment author: Viliam_Bur 21 June 2012 10:59:22AM 7 points [-]

Hopefully now that the experiment is over, they will return to the original schedule of giving M&Ms for new HPMoR chapters. Seriously, people are suffering here. :D

Comment author: Benquo 21 June 2012 01:54:18PM 12 points [-]

Too infrequent. They need to start by giving him an M&M every time he thinks about writing more HPMoR.

Comment author: gwern 21 June 2012 04:10:48PM 9 points [-]

But then when he starts actually writing, Eliezer will become diabetic!

Comment author: wedrifid 21 June 2012 04:31:54PM 5 points [-]

But then when he starts actually writing, Eliezer will become diabetic!

If he gets into flow quickly he could be safe. That would mean he is writing more HPMoR but not thinking about writing HPMoR.

Comment author: Benquo 21 June 2012 05:59:41PM *  6 points [-]

Shush, don't give away the plan!

But seriously, one can always increase the reward threshold once the first behavior has been firmly established.

Comment author: CharlieSheen 21 June 2012 02:36:32PM 2 points [-]

We have enough happy death spirals here.

Comment author: Eliezer_Yudkowsky 21 June 2012 03:02:03PM -1 points [-]

Whatever it is that rationalists are supposed to use instead of death spirals, we don't have enough of it until everything is funded. GO TEAM HAPPINESS!

Comment author: CharlieSheen 21 June 2012 03:14:02PM 5 points [-]

No.

Comment author: Strange7 21 June 2012 04:06:16PM 2 points [-]

How long has it been since you had a post that stabilized at net negative votes?

Comment author: wedrifid 21 June 2012 04:11:22PM 7 points [-]

29 March 2012

Comment author: gwern 21 June 2012 04:11:36PM 15 points [-]

'My Little SIAI: Positive Reinforcement is Magic'?

Comment author: wedrifid 21 June 2012 03:22:19PM 2 points [-]

We have enough happy death spirals here.

Who is happy about what?

Comment author: CharlieSheen 21 June 2012 03:28:22PM -2 points [-]

Leave sleeping mind killers lie.

Comment author: wedrifid 21 June 2012 03:35:46PM *  7 points [-]

Your unsubstantiated assertion is rejected. There is nothing that fits that label here. There are things that people like to say that everyone else is in a happy death spiral about but they are too powerfully skeptical to be one of the gullible crowd. This is useless cheap signalling that is a net detriment.

-3 M&Ms for all instances of vague self-reinforcing negativity.

Comment author: CharlieSheen 21 June 2012 03:40:38PM *  4 points [-]

Very well I'll be explicit, I simply wanted to avoid a flame war. Most obvious example:

  • Relationship advice.

Now give me my M&Ms back.

Comment author: wedrifid 21 June 2012 03:54:04PM 2 points [-]

Very well I'll be explicit, I simply wanted to avoid a flame war. Most obvious example:

Relationship advice.

That isn't a Happy Death Spiral. It is a disgraceful mindkiller, sure. But it isn't remotely happy, isn't encouraged by universal reward and absence of criticism. It certainly isn't treated with or caused by the kind of positive feedback Luke's post advocates.

Now give me my M&Ms back.

You can have one back - but being fundamentally confused about what it is you are trying to criticize is only a weak mitigating factor.

Comment author: CharlieSheen 21 June 2012 03:59:08PM *  1 point [-]

That isn't a Happy Death Spiral. It is a disgraceful mindkiller, sure. But it isn't remotely happy, isn't encouraged by universal reward and absence of criticism. It certainly isn't treated with or caused by the kind of positive feedback Luke's post advocates.

Do you remember the online dating profile optimization thread? LessWrong went in Vladimir_M's words "healing crystal equivalent". That thread was a happy death spiral.

Also if you recall the critics in the relationship threads are getting tired and frustrated and just aren't showing up any more, someone even wrote out a full comment to that effect! Evaporative cooling dude. Sure we haven't had a relationship thread since Luke's part I., but its only a matter of time before someone brings it up and the critics won't be there any more.

I only bother because I'm a Charlie Sheen.

Comment author: [deleted] 21 June 2012 03:53:32PM 1 point [-]

Does he have to vomit the M&M's back up?

I really hope that's not the procedure.

Comment author: Vaniver 21 June 2012 04:02:41PM 5 points [-]

Incidentally, chewing M&Ms and then spitting them out is a moderately effective way to wean yourself off of chocolate cravings.

Comment author: [deleted] 21 June 2012 06:23:43PM 2 points [-]

Any suggestions for sugar specifically? I like chocolate and can get it in low-sweet, high-theobromine form, but shaking off sugar cravings would do me a world of good.

Comment author: Vaniver 21 June 2012 06:29:48PM 4 points [-]

From my incomplete understanding of taste psychology, sugar is one of the instinctual taste preferences, whereas things like chocolate are learned taste preferences that are possible to unlearn. I've found that sugar/salt/fat cravings have been useful signals about the quality of my diet, and so would recommend taking a hard look at your diet before trying to alter those signals. (They could be mistuned, but I don't have any advice on how to correctly tune them.)

Comment author: [deleted] 21 June 2012 03:11:37PM 37 points [-]

"Eventually it hit me that the same techniques might work on that stubborn but lovable species, the American wife." "Back in Maine, I began thanking Amy if she threw one dirty shirt into the hamper. If she threw in two, I'd kiss her." "...After two years of exotic animal training, my marriage is far smoother, my wife much easier to love."

Comment author: lukeprog 22 June 2012 01:31:22AM *  8 points [-]

Have some tact, man. My post was fine, but you... you are a god damned sexist.

Comment author: shminux 21 June 2012 03:21:11PM *  10 points [-]

But if you aren't treating humans more like animals than most people are, then you're modeling humans poorly.

Thanks for pointing out this particular low-hanging fruit.

Eliezer replied: "Well, three weeks ago I was working with Anna and Alicorn, and every time I said something nice they fed me an M&M."

I wonder if they had just (re-)watched this Big Bang Theory episode.

you don't get a sea lion to balance a ball on the end of its nose by nagging

Hmm, I better keep this in mind at all times when dealing with my family.

Comment author: drethelin 21 June 2012 03:41:33PM 0 points [-]
Comment author: sketerpot 22 June 2012 12:47:01AM *  4 points [-]

You realize that almost all people express appreciation or displeasure routinely, right? It's a normal and reasonable part of human interaction, and it's a skill that someone can try to improve without needing to feel too conflicted. Love bombing is far more extreme than anything that this post even touched on. So, while we're linking to things, here's one:

http://lesswrong.com/lw/md/cultish_countercultishness/

Comment author: johnlawrenceaspden 21 June 2012 03:45:25PM 20 points [-]

Thank you Luke for this beautifully written post.

A while ago I saw a kindly waitress give my friend's two year old daughter a small cookie in a restaurant. Various emotions flickered across her tiny face, and then she made a decision, accompanied by a small smile.

She broke the cookie into three pieces and gave them to her brothers. Completely unprompted.

I couldn't believe my eyes. I asked my friend, who is a lecturer in experimental psychology, whether altruism was normal amongst very young siblings.

He looked a bit smug and said "Well we put a lot of reinforcement into that."

I hadn't really thought about what that meant until now. Your clear writing has made it obvious.

As a result of your post, I think I'm going to try deliberately modifying some of my own behaviours this way, and maybe try the techniques on some friends. (The first time, by the way, that I've changed my behaviour as a result of reading less wrong, rather than just treating it as philosophical crack.)

For friends it seems that sincere praise / avoiding criticism would be good, but what would you recommend as rewards to self? I'm pretty sure that nicotine and pizza slices would work for me, but I'm also sure that those aren't things I want to do more of.

Comment author: mstevens 21 June 2012 03:47:01PM 19 points [-]

To help someone improve at dance or sport, ignore poor performance but reward good performance immediately, for example by shouting "Good!" (Buzas & Allyon 1981) The reason you should ignore poor performance if you say "No, you're > doing it wrong!" you are inadvertently punishing the effort. A better response to a mistake would be to reinforce the effort: "Good effort! You're almost there! Try once more."

I've noticed in pilates classes with one specific teacher you get positive feedback in one specific situation - when you're having trouble, and have just barely managed something basic. This leads to the association that whenever you get positive comments you know you're doing badly.

Comment author: [deleted] 21 June 2012 06:20:33PM 6 points [-]

Yeah, there's kind of a perceptual/patternmatching arms race going on there -- if you're too blatant about it, or the intended recipient of the reinforcement is just that perceptive, then they're reading the script too and it won't have the intended result. It could backfire (as in your example; semantically-positive reinforcement becomes pragmatically-negative), or send undesirable information ("you wouldn't have put it that way unless something were up, and that gives me a clue"), or open you to counter social-engineering scripts if the part knows what they're doing.

Comment author: mstevens 21 June 2012 08:18:31PM 2 points [-]

In my case I'm not terribly perceptive, but there's a lot of repetition of the same situation to give you a clue.

Comment author: [deleted] 21 June 2012 04:12:59PM 0 points [-]

Maybe it's because behaviorist techniques like reinforcement feel like they don't respect human agency enough. But if you aren't treating humans more like animals than most people are, then you're modeling humans poorly.

But treating human beings, especially adults, like animals is characteristically unethical. Applying some system of reinforcement where someone has asked you to effectively treat their behavior is innocuous enough, as is of course treating yourself.

But generally manipulating the behavior of other people by means other than convincing them that they should behave in a certain way seems to me to be almost definitional of a dark art. If that's not controversial, then I think this article should be qualified appropriately: never do this to other people without their explicit consent.

Comment author: TheOtherDave 21 June 2012 05:00:27PM 8 points [-]

But treating human beings, especially adults, like animals is characteristically unethical.

This statement without context is clearly incorrect; there are all sorts of behaviors we can ethically execute with respect to both humans and other animals. I understand that what you and the OP both mean to connote is particular behaviors which we restrict in typical contexts only to non-human animals, but if you're going to label them as unethical when applied to humans it helps to specify what behaviors and context those are.

manipulating the behavior of other people by means other than convincing them that they should behave in a certain way seems to me to be almost definitional of a dark art.

That's a little more specific, but not too much, as I'm not really sure what you mean by "convincing" here.

That is, if at time T1 I don't exhibit behavior B and don't assert that I should exhibit B, and you perform some act A at T2 after which I exhibit B and assert that I should exhibit B, is A an act of convincing me (and therefore OK on your account) or not (and therefore unethical on your account)? How might I test that?

never do this to other people without their explicit consent

This, on the other hand, is clear. Thank you.
I disagree with it strongly.

Comment author: [deleted] 21 June 2012 05:06:53PM 1 point [-]

This statement without context is clearly incorrect...

You seem to know what I mean, so I won't go into a buch of unnecessary qualifications.

is A an act of convincing me?

Not necessarily. Is the meaning of 'convince' really unclear? Threatening someone with a gun seems to satisfy your description, but it's obviously not a case of convincing. I'm not sure what you're unclear about.

I disagree with it strongly.

If you care to explain why, please do so.

Comment author: TheOtherDave 21 June 2012 06:45:02PM 5 points [-]

If you care to explain why, please do so.

Sure.

The easiest way to get at it is with an example.

Suppose I decide I want my coworkers to visit my desk more often at work, and therefore begin a practice of smiling at everyone who visits, keeping treats on my desk and inviting visitors to partake, being nicer to people when they visit me at my desk than I am at other times, and otherwise setting up a schedule of differential reinforcement designed to increase the incidence of desk-visiting behavior, and I do all of that without ever announcing to anyone that I'm doing it or why I'm doing it, let alone securing anyone's consent. (Let alone securing everyone's consent.)

Do you consider that an example of unethical behavior? I don't.

Now, maybe you don't either. Maybe it's "obviously" not an example of manipulating the behavior of other people by means other than convincing them that they should behave in a certain way. I don't really know, since you've declined to clarify your constraints. But it sure does seem to match what you described.

Comment author: [deleted] 21 June 2012 06:57:33PM 1 point [-]

Do you consider that an example of unethical behavior? I don't.

You're right that this doesn't seem quite unethical, but it is awfully creepy and I'm not sure how to pull my intuitions apart there. Sitting across from someone who is faking affection and smiles and pleasantries so as to manipulate my behavior would cause me to avoid them like the plague.

In professional environments I find this happens all the time, and when the fake friendliness is discovered as such, the effect reverses considerably. If it's terribly important to something's being effective that the person you're doing it to doesn't know what's going on, it's probably bad.

Comment author: TheOtherDave 21 June 2012 07:11:37PM 5 points [-]

(nods) Absolutely. I could have also framed it to make it seem far creepier, or to make it seem significantly less creepy.

In particular, the use of loaded words like "faking" and "manipulate" ups the creepy factor of the description a lot. The difference between faking affection and choosing to be affectionate is difficult to state precisely, but boy do we respond to the difference between the words!

I agree that most activities which depend on my ignorance for their effectiveness are bad. I even agree that a higher percentage of activities which depend on my ignorance for their effectiveness are bad than the equivalent percentage of activities that don't so depend.

That said, you seem to be going from that claim to the implicit claim that they are bad by virtue of depending on my ignorance. That's less clear to me.

Comment author: [deleted] 21 June 2012 07:49:47PM 3 points [-]

I could have also framed it to make it seem far creepier

I'll put it simply: if someone asks me about my kids, neither to be polite nor because they care, but because they want to change the way I behave, then they're (in most cases) being manipulative and insincere. While perhaps they're not wronging me, per se, it's certainly not something that speaks well of them, ethically speaking. If you find this controversial, then you surprise me.

It would be bad advice, I think, to encourage people to use positive reinforcement on others when their ignorance is necessary for it to be effective. Not just practically bad advice, as people are pretty good at picking up on fake friendliness. But full stop ethically damaging advice, if taken seriously. I'm not saying that every such case is going to be unethical, but I'm not in the business of lawlike ethical principles anyway.

That said, you seem to be going from that claim to the implicit claim that they are bad by virtue of depending on my ignorance. That's less clear to me.

No, what I said was that behaviors which depend on someone's ignorance for their effectiveness are often also bad behaviors. I didn't say anything one way or the other about a stricter relation between the two properties, but I'll say now that I don't think they're unrelated.

Comment author: TheOtherDave 21 June 2012 08:37:24PM *  1 point [-]

I agree that asking you about your kids solely to change your behavior is manipulative.
I also agree that it's insincere. (Which is an entirely distinct thing.)
I would also say that asking you about your kids solely to be polite is insincere.
I would not agree that any of these are necessarily unethical.

I am not quite sure what you mean by "ethically damaging advice."
I agree with you that it's not always unethical to positively reinforce others without their knowledge.
I would agree that "Positively reinforcing others without their knowledge is a good thing to do, do it constantly" is advice that, if taken seriously, would often lead me to perform unethical acts. I can accept calling it unethical advice for that reason, I suppose.
But I also think that "Positively reinforcing others without their knowledge is a bad thing to do, never do it." is unethical advice in the same (somewhat unclear) sense.

I agree that behaviors that depend on others' ignorance are often also bad behaviors.
Behaviors that depend on others' knowledge are also often bad behaviors.

Comment author: [deleted] 21 June 2012 08:50:11PM 1 point [-]

Agreed on all counts. In fact, it doesn't look like we disagree at all, judging from your comment.

Comment author: TheOtherDave 21 June 2012 08:51:36PM 0 points [-]

Oh good!
When you started out by saying "never do this," I concluded otherwise.
I'm pleased to discover I was wrong.

Comment author: AdeleneDawner 21 June 2012 11:07:34PM 0 points [-]

You and Esar both: Taboo 'creepy'? Particularly with an eye to 'why is it important that this situation evokes this emotion'?

Comment author: TheOtherDave 21 June 2012 11:59:39PM 0 points [-]

Well, I think it's important because IMHO that negative emotional response is what underlies the (incorrect) description of the corresponding behavior as unethical. But I expect Esar would find that implausible.

Comment author: AdeleneDawner 22 June 2012 12:05:19AM 1 point [-]

'Taboo with an eye to this question', not 'answer this question'. I'd already noticed the pattern that people consider finding something creepy to be sufficient reason to label it unethical, but that observation isn't useful for very much beyond predicting other peoples' labeling habits.

Comment author: TheOtherDave 22 June 2012 12:23:50AM 0 points [-]

Oh, I see.
Sorry, misunderstood.
I could replace "creepy" everywhere it appears with "emotionally disquieting", but I'm not sure what that would help. I figured using the same language Esar was using would be helpful, but I may well have been wrong.

Comment author: TimS 21 June 2012 05:49:33PM *  2 points [-]

Eliezer replied: "Well, three weeks ago I was working with Anna and Alicorn, and every time I said something nice they fed me an M&M."

That story doesn't trouble you at all?

For most people, there's lots of low hanging fruit from trying to recognize when they are reinforcing and punishing behaviors of others. Also, positive reinforcement is more effective at changing behavior than positive punishment.

But that doesn't mean that we should embrace conditioning-type behavior-modification wholesale. I'm highly doubtful that conditioning responses are entirely justifiable by decision-theoretic reasons. And "not justifiable by decision theoretic reasons" is a reasonable definition of non-rational. Which implies that relying on those types of processes to change others behaviors might be unethical.

Comment author: TheOtherDave 21 June 2012 06:51:07PM 4 points [-]

Does it trouble me at all? I suppose. Not a huge amount, but some. Had Esar said "Doing this to people without their consent is troubling" rather than "never do this to other people without their explicit consent" I likely wouldn't have objected.

My response to the rest of this would mostly be repeating myself, so I'll point to here instead.

More generally, "conditioning-type behavior-modification" isn't some kind of special category of activity that is clearly separable from ordinary behavior. We modify one another's behavior through conditioning all the time. You did it just now when you replied to my comment. Declaring it unethical across the board seems about as useful as saying "never kill a living thing."

Comment author: Vaniver 21 June 2012 06:20:18PM 9 points [-]

But treating human beings, especially adults, like animals is characteristically unethical.

It seems to me like the flow is in the reverse direction: many unethical manipulations involve treating adults like animals. But people who skillfully use positive reinforcement are both more pleasant to be around and more effective- which seems like something ethical systems should point you towards, not away from.

Comment author: [deleted] 21 June 2012 06:28:39PM 2 points [-]

That's a fair point: I may have been treating a conditional like a bi-conditional. I think my sense of the matter is this: if a friend told me that he spent a lot of our time together thinking through ways to positively reinforce some of my behaviors, even to my benefit, I would become very suspicious of him. I would feel that I'd been treated as a child or a dog. His behavior would seem to me to be manipulative and dishonest, and I think I would feel this way even if I agreed that the results of his actions were on the whole good and good for me.

Do you think this sort of reaction on my part would be misguided? Or am I on to something?

Comment author: Vaniver 21 June 2012 06:51:20PM *  11 points [-]

I agree with you that your autonomy is threatened by the manipulations of others. But threats only sometimes turn into harm- distinguishing between manipulations you agree with and disagree with is a valuable skill.

Indeed, there's a general point that needs to be made about human interaction, and another about status, but first a recommendation: try to view as many of your actions as manipulations as possible. This will help separate out the things that, on reflection, you want to do and the things that, on reflection, you don't want to do. For example:

if a friend told me that he spent a lot of our time together thinking through ways to positively reinforce some of my behaviors, even to my benefit, I would become very suspicious of him. I would feel that I'd been treated as a child or a dog. His behavior would seem to me to be manipulative and dishonest,

Emphasis mine. The reaction- of calling his behavior manipulative and dishonest- feels like it punishes manipulation, which you might want to do to protect your autonomy. But it actually punishes honesty, because the trigger was your friend telling you! Now, if your friend wants to change you, they'll need to try to do it subtly. Your reaction has manipulated your friend without his explicit consent- and probably not in the direction you wanted it to.

So, the general point: human social interaction is an incredibly thorny field, in part because there are rarely ways to learn or teach it without externalities. Parents, for example, tell their children to share- not because sharing is an objective moral principle, but because it minimizes conflict. As well, some aspects of human social interaction are zero sum games- in which people who are skilled at interaction will lose if others get better at interaction, and thus discourage discussions that raise general social interaction skills.

The status interpretation: generally, manipulation increases the status of the manipulator and decreases the status of the manipulated. Resistance to manipulation could then be a status-preserving move, and interest in manipulation could be a status-increases move. What articles like this try to do is lower the status effects of manipulation (in both directions)- Luke proudly recounts the time Eliezer manipulated him so that he could better manipulate Eliezer. If being molded like this is seen more positively, then resistance to being molded (by others in the community) will decrease, and the community will work better and be happier. As well, I suspect that people are much more comfortable with manipulations if they know how to do them themselves- if positive reinforcement is a tool used by creepy Others, it's much easier to dislike than if it's the way you got your roommate to finally stop annoying you.

Comment author: wedrifid 21 June 2012 06:59:35PM 7 points [-]

distinguishing between manipulations you agree with and disagree with is a valuable skill.

This, with extra emphasis!

Comment author: TheOtherDave 21 June 2012 07:00:19PM 1 point [-]

Oh, you're definitely on to something, and it's something important.

That said, I don't think what you're on to has to do with whether and when it's ethical to manipulate people's behavior.

Comment author: [deleted] 21 June 2012 07:32:53PM 0 points [-]

So what am I on to then?

Comment author: TheOtherDave 21 June 2012 08:18:09PM 2 points [-]

Roughly, that we often respond to others' ability to cause us harm (whether by modifying our behavior or our bank accounts or our internal organs or whatever other mechanism) as a threat, independent of their likelihood of causing us harm.

So if you demonstrate, or even just tell me about, your ability to do these things, then while depending on the specific context, my specific reaction will be somewhat different... my reaction to you knowing my bank PIN number will be different from my reaction to you knowing how to modify my behavior or how to modify the beating of my heart or how to break into my home... they will all have a common emotional component: I will feel threatened, frightened, suspicious, attacked, violated.

That all is perfectly natural and reasonable. And a common and entirely understandable response to that might be for me to declare that, OK, maybe you are able do those things, but a decent or ethical person never will do those things. (That sort of declaration is one relatively common way that I can attempt to modify your likelihood of performing those actions. I realize that you would only consider that a form of manipulation if I realize that such declarations will modify your likelihood of performing those actions. Regardless, the declaration modifies your behavior just the same whether I realize it or not, and whether it's manipulation or not.)

But it doesn't follow from any of that that it's actually unethical for you to log into my bank account, modify my heartbeat, break into my home, or modify my behavior. To my mind, as I said before, the determiner of whether such behavior is ethical or not is whether the result leaves me better or worse off.

Breaking into my home to turn off the main watervalve to keep my house from flooding while I'm at work is perfectly ethical, indeed praiseworthy, and I absolutely endorse you doing so. Nevertheless, I suspect that if you told me that you spent a lot of time thinking about how to break into my home, I would become very suspicious of you.

Again, my emotional reaction to your demonstrated or claimed threat capacity is independent of my beliefs about your likely behaviors, let alone my beliefs about your likely intentions.

Comment author: [deleted] 21 June 2012 08:30:42PM 1 point [-]

Roughly, that we often respond to others' ability to cause us harm (whether by modifying our behavior or our bank accounts or our internal organs or whatever other mechanism) as a threat, independent of their likelihood of causing us harm.

This seems very implausible to me. I often encounter people with the ability to do me great harm (a police officer with a gun, say), and this rarely if ever causes me to be angry, or feel as if my dignity has been infringed upon, or anything like that. Yet these are the reactions typically associated with finding out you've been intentionally manipulated. Do you have some independent reason to believe this is true?

Comment author: TheOtherDave 21 June 2012 08:41:39PM 0 points [-]

Yes, but no reasons I can readily share. And, sure, I might be wrong.

Comment author: [deleted] 21 June 2012 07:36:11PM 3 points [-]

I think it's misguided personally. You're already being manipulated this way by your environment whether or not you realize it.

Comment author: [deleted] 21 June 2012 07:51:48PM 2 points [-]

You're already being manipulated this way by your environment whether or not you realize it.

Well, I'm claiming that this kind of manipulation is often, even characteristically, unethical. Since my environment is not capable of being ethical or unethical (that would be a category mistake, I think) then that's not relevant to my claim.

Comment author: [deleted] 21 June 2012 07:59:07PM 1 point [-]

I was referring though to the case of your friend using reinforcement to alter your behavior in a way that would benefit you. I just have a hard time seeing someone trying to help you as an unethical behavior.

Comment author: AdeleneDawner 21 June 2012 11:42:58PM 1 point [-]

I just have a hard time seeing someone trying to help you as an unethical behavior.

It does depend on whose definition of 'help' they're using.

Comment author: [deleted] 22 June 2012 12:08:15AM 0 points [-]

Good point. Do you think it would be ethical if they were helping to fulfill your preferences?

Comment author: AdeleneDawner 22 June 2012 01:07:58AM 0 points [-]

Usually, yes, though there are several qualifications and corner cases.

Comment author: [deleted] 22 June 2012 01:37:57AM 0 points [-]

That's fair. I should tone down my point and say that doing this sort of thing is disrespectful, not evil or anything. Its the sort of thing parents and teachers do with kids. With your peers, unsolicited reinforcement training is seen as disrespectful because it stands in leau of just explaing to the person what you think they should be doing.

Comment author: TheOtherDave 22 June 2012 01:55:32AM 0 points [-]

In my experience, telling other people how I think they should behave is also often seen as disrespectful.

Comment author: [deleted] 22 June 2012 02:03:44AM 0 points [-]

Often it is, we agree. But it's the 'telling' there that's the problem. A respectful way to modify someone's behavior is to convince them to do something different (which may mean convincing them to subject themselves to positive reinforcement training). The difference is often whether we appeal to someone's rationality, or take a run at their emotions.

Comment author: TheOtherDave 22 June 2012 02:28:36AM 1 point [-]

A respectful way to modify someone's behavior is to convince them to do something different

I agree that there are respectful ways to convince me to do something different, thereby respectfully modifying my behavior.
Many of those ways involve appealing to my rationality.
Many of those ways involve appealing to my emotions.

There are also disrespectful ways to convince me to do something different.
Many of those ways involve appealing to my rationality.
Many of those ways involve appealing to my emotions.

Comment author: Swimmer963 21 June 2012 07:50:17PM 0 points [-]

I don't think I would be suspicious of him, as long as I agreed with the behaviours he was trying to reinforce. (I don't know for sure–my reactions are based only on a thought experiment.) I think I would be grateful, both that he cared enough about me to put that much time and effort in, and that he considered me emotionally mature enough to tell me honestly what he was doing.

However, I do think that being aware of his deliberate reinforcement might make it less effective. Being reinforced for Behaviour A would feel less like "wow, the world likes it when I do A, I should do it more!" and more like "Person X wants me to do A", which is a bit less motivating.

Comment author: [deleted] 21 June 2012 08:03:43PM 2 points [-]

I don't think I would be suspicious of him, as long as I agreed with the behaviours he was trying to reinforce.

Really? So say I tell you that all those times that I smiled at you and asked how you were doing were part of a long term plan to change the way you behave. The next day I smile and ask you how you're doing. Has my confession done nothing to change the way you think about my question?

I'm saying that things like smiles and friendly, concerned questions have a certain importance for us that is directly undermined by their being used for for the purposes of changing our behavior. I don't think using them this way is always bad, but it seems to me that people who generally treat people this way are people we tend not to like once we discover the nature of their kindness.

Comment author: Swimmer963 21 June 2012 08:28:26PM 0 points [-]

Like I said, thoughts experiments about "how would I feel if X happened" are not always accurate. However, when I try to simulate that situation in my head, I find that although I would probably think about his smile and question differently (and be more likely to respond with a joke along the lines of "trying to reinforce me again, huh?") I don't think I would like him less.

Anyway, I think I regularly use smiles and "how are you doing?" to change the way people behave...namely, to get strangers, i.e. coworkers at a new job, to start liking me more.

Comment author: [deleted] 21 June 2012 08:43:31PM 0 points [-]

Well, I guess I'll tap out then. I'm not sure how to voice my position at this point.

Comment author: Swimmer963 21 June 2012 09:53:09PM 1 point [-]

Your position is that you have a certain emotional response to knowing someone is trying to modify your behaviour. My position is that I have a different emotional response. I can imagine myself having an emotional response like yours...I just don't. (Conversely, I can imagine someone experiencing jealousy in the context of a relationship, but romantic jealousy isn't something I really experience personally.) I don't think that makes either of us wrong.

Comment author: [deleted] 21 June 2012 10:42:12PM 0 points [-]

Well, my position is that doing things like asking how someone is doing so as to reinforce behavior rather than because you want to know the answer is ethically bad. I used the example of the friend to try to motivate and explain that position, but at some point if you are totally fine with that sort of behavior, I don't have very much to argue with. I think you're wrong to be fine with that, but I also don't think I can mount a convimcing argument to that effect. So you've pretty much reached the bottom of my thoughts on the matter, such as they are.

Comment author: shminux 21 June 2012 11:28:01PM *  1 point [-]

Well, my position is that doing things like asking how someone is doing so as to reinforce behavior rather than because you want to know the answer is ethically bad.

Can you express your personal ethics explicitly and clarify where it comes from?

Comment author: Swimmer963 22 June 2012 02:12:45AM 1 point [-]

I'm curious about whether your reasons for considering this kind of behaviour "unethical" are consequentialist (i.e. a world where people do X is going to be worse overall than a world where no one does X) or deontological (there are certain behaviours, like lying or stealing, that are just bad no matter what world they take place in, and using social cues to manipulate other people is a behaviour that falls into that class.)

Comment author: wedrifid 21 June 2012 04:50:22PM *  7 points [-]

The central lesson I learned from exotic animal trainers is that I should reward behavior I like and ignore behavior I don't. After all, you don't get a sea lion to balance a ball on the end of its nose by nagging. The same goes for the American husband.

Back in Maine, I began thanking Scott if he threw one dirty shirt into the hamper. If he threw in two, I'd kiss him. Meanwhile, I would step over any soiled clothes on the floor without one sharp word, though I did sometimes kick them under the bed. But as he basked in my appreciation, the piles became smaller.

My wife, if pulling that kind of stunt, would quickly find that her affections were shunned and her thanks were met with clear contempt (after she was asked politely not to do that the first time). It is almost certainly not in her interests to produce a pavlovian association between her affections and attempts to control me against my wishes. My aversion to hostile takeover of internal motivations is much stronger than my desire for the affections of any particular individual.

This would be entirely different if I had made a prior agreement regarding shirts and hampers. Making it motivationally easier and more enjoyable to do things I am willing to do is to be encouraged.

Comment author: pjeby 21 June 2012 05:27:30PM 6 points [-]

My wife, if pulling that kind of stunt, would quickly find that her affections were shunned and her thanks were met with clear contempt

Seriously? You'd shun your wife because she said thank you? i.e.

I began thanking Scott if he threw one dirty shirt into the hamper

Comment author: [deleted] 21 June 2012 06:15:21PM 12 points [-]

Some people react quite viscerally to the awareness that another party is trying intentionally to steer their behavior in any way. It seems to just be a massive squick button for some (indeed, I notice that most randomly-selected people who are made aware of explicit attempts to condition behavior react with discomfort at minimum); for others, there seems to be a correlation with triggers gained from abusive interactions earlier in life; a few I knew who reacted strongly showed strong indications of sociopathy and seemed to instinctively feel violated if someone else successfully, or even just obviously, tried to affect their behavior in a deliberate manner toward some end (a normal part of cognition and social interaction for them directed at others).

Comment author: wedrifid 21 June 2012 06:21:11PM 7 points [-]

Seriously? You'd shun your wife because she said thank you?

(No, I said I would shun kisses delivered under those circumstances. No cutting and pasting of my keywords for the sake of hyperbole thanks.)

If people use their affection in a way that is obviously intended to systematically manipulate me to do things that I do not, in fact, wish to do then yes, of course those instances of affection I will shun. While I know some people are more tolerant to that kind of blatant disrespect I would expect you to at least be able to comprehend the subset of people that will not.

I'm afraid that all women who want kisses to serve the role of doggy treats within our relationship are out of luck. I have yet to experience a problem with having that policy. My model of myself predicts that rewarding hostile-to-my-interests-reward-training with increased compliance or acceptance would leave me with relationships that were far less satisfying and in particular far less enjoyment of displays of affection.

Comment author: Vaniver 21 June 2012 06:26:39PM 10 points [-]

So, I have to ask: do you in fact have a wife?

Comment author: TimS 21 June 2012 06:34:20PM 3 points [-]

The question is not whether positive reinforcement is effective in changing your behavior. The question is whether kisses are positive reinforcement in particular contexts.

Suppose your spouse says, "Please pick up my prescription from the store" and you don't want to, but you do it anyway. When you get back, spouse says "Thanks for dealing with that." Do you really think continued experiences like that won't increase the frequency of the behavior "Run an errand even when I don't want to"?

Comment author: [deleted] 21 June 2012 06:40:03PM 1 point [-]

Do you really think continued experiences like that won't increase the frequency of the behavior "Run an errand even when I don't want to"?

I think it depends a lot on her intention. If she says 'thank you' for the purposes of positive reinforcement, I mean if she thinks about her 'thank you's' that way, then I think she's being manipulative.

If she says 'thank you' to say what those words mean, namely, that she's grateful, then even if this does have the effective positive reinforcement there's nothing wrong about her behavior.

Comment author: TheOtherDave 21 June 2012 06:57:05PM 14 points [-]

I find the idea of endorsing manipulative behavior if and only if I remain unaware of the fact that it's manipulative behavior deeply troubling.

It strikes me as similar to saying that hurting people is OK as long as I don't know I'm hurting them. No, it isn't. If hurting people is not OK, then it follows that I ought not hurt people, and learning to recognize when I'm hurting people is part of that, and I ought to learn to recognize it. The behavior doesn't suddenly become "not OK" the moment I learn to recognize it... it never was OK, and now I know it and can improve.

Conversely, if hurting people is OK, then it's OK whether I know I'm doing it or not.

The same goes for manipulating people. Whether I know I'm doing it or not isn't the determiner of whether I'm doing good or ill.

To my mind, the determiner of whether I'm doing good or ill is whether, when I'm done doing it, we're all better off or worse off.

Comment author: [deleted] 21 June 2012 06:58:51PM *  2 points [-]

find the idea of endorsing manipulative behavior if and only if I remain unaware of the fact that it's manipulative behavior deeply troubling.

If you don't know you're manipulating someone, you're not manipulating someone. Manipulation is an intentional behavior, like lying, or congratulating, or taking a vow. Knowing what you're doing is part of doing it.

Comment author: TheOtherDave 21 June 2012 07:06:19PM 9 points [-]

Yeah, I pretty much disagree with this statement completely.

Comment author: [deleted] 21 June 2012 07:32:23PM 1 point [-]

That's... incredible to me. Do you disagree that there is such a category (i.e. actions you have to know you're doing in order to be doing them at all), or that manipulation falls under it?

Comment author: TimS 21 June 2012 07:40:31PM 1 point [-]

This exchange may be helpful to understand TheOtherDave's point.

Comment author: TheOtherDave 21 June 2012 07:45:05PM 2 points [-]

I disagree that manipulation falls under it.

Comment author: TimS 21 June 2012 07:12:06PM 3 points [-]

I agree with your point, but I think that "manipulate" needs to be tabooed. If we define manipulate as "acts that tend to change the behavior of others" then I agree with your implicit point that it is impossible to interact with others without changing their behaviors, and there is nothing wrong with thinking about how I would like someone else to behave when considering how I interact with them.

That said, there are connotations of manipulate as the word is ordinarily used that are not captured by the way you (and I) are using the word.

Comment author: TheOtherDave 21 June 2012 07:19:32PM 2 points [-]

Sure. I'm perfectly happy to drop the word altogether and instead talk about changing the behavior of others.

Comment author: wedrifid 21 June 2012 06:49:27PM 0 points [-]

The question is not whether positive reinforcement is effective in changing your behavior. The question is whether kisses are positive reinforcement in particular contexts.

Neither of those seem to be the question - at least neither of those are the question I'm asking when I evaluate whether a given trend of behaviors constitutes a Defection::Manipulation.

Suppose your spouse says, "Please pick up my prescription from the store" and you don't want to, but you do it anyway. When you get back, spouse says "Thanks for dealing with that."

That is kind of me and it would all else being equal be somewhat rude if she didn't thank me for doing a favour like that. (This assumes a weak instantiation of 'want' such that I reflectively endorse doing the errand but experience emotional reluctance. If I reflectively endorse not doing the errand but still do then that is not kind but weak.)

Do you really think continued experiences like that won't increase the frequency of the behavior "Run an errand even when I don't want to"?

Being influenced isn't something to be universally avoided. Having negotiated boundaries subverted by the strategic use of kisses as doggy treats is. That way leads to madness - often for both parties.

Comment author: pjeby 21 June 2012 09:04:15PM 5 points [-]

If people use their affection in a way that is obviously intended to systematically manipulate me to do things that I do not, in fact, wish to do then yes, of course those instances of affection I will shun.

Since positive reinforcement can only be applied after you already do a thing, then presumably, you at least wished to do it once. So, how is providing you with a bonus to something you've already done, manipulating you to do something you don't "wish to do"?

Comment author: wedrifid 22 June 2012 03:04:27AM 2 points [-]

Caveat: I don't know why the husband in question doesn't just put his damn clothes in the hamper. Doesn't the idea of having soiled clothes lying around repulse him anyway? Especially when sharing the space with another. I mean... ewww. But now back to assuming the target behavioral territory is not already granted by the obvious shelling point or prior arrangement.

So, how is providing you with a bonus to something you've already done, manipulating you to do something you don't "wish to do"?

It seems you wish to unilaterally accept rewarding behavior as positive. I don't. I have no trouble detecting when rewards are being used as "approximations" towards a behavioral landscape that I clearly don't want or, especially, have previously declared that I would not accept. I am also able to predict - by reference to past experience and knowledge of my own preferences - that encouraging that reward pattern gives undesired outcomes. As Vaniver mentioned, an important skill to develop is the ability to detect the difference between desired and undesired manipulations.

As a somewhat separate issue, excessive use of physical affection (kisses, hugs, sex) as a "reward" for good behavior changes the experience of those activities - and not in a good way.

Comment author: cicatriz 21 June 2012 05:19:54PM 2 points [-]

This seems to contradict the very powerful effect of learning from failure and corrective feedback. See http://www.wired.com/wiredscience/2011/10/why-do-some-people-learn-faster-2/ for an accessible overview.

I'd conjecture this works better when someone can already perform the desired behavior and wants to form a habit, whereas learning from failure comes in when new information needs to be stored and reorganized.

Comment author: Will_Sawin 21 June 2012 05:51:44PM 3 points [-]

That article especially seems to demonstrate the critical importance of choosing what you reinforce, and how your a teacher's model of what they are reinforcing may differ from the students.

Comment author: Swimmer963 21 June 2012 08:16:36PM *  1 point [-]

I was about to reply "hmm, I wonder how you could reward someone for making an effort rather than just for succeeding, or reward them for noticing when they make a mistake." Then I read the article, and realized that that's basically what it talks about.

Yeah, failures are important. But the natural tendency, whether teaching others or trying to change our own behaviour, is to correct and criticize failures–which is basically negative reinforcement and trains people to stop trying because failing is so painful. The interesting new point in the article is that positively reinforcing for success, if done in a certain way (the "wow you're smart!" group of kids) can actually have the same effect as negatively reinforcing for failure.

Comment author: potato 21 June 2012 05:55:31PM 1 point [-]

Does this still work if I reinforce myself? Every time I read 5 lesswrong articles in a day, I give myself a reward. Or every time i have a cigarette, I kick a brick wall with no shoes on. If i was consistent with this for a long time, would it work?

Comment author: wedrifid 21 June 2012 06:25:28PM 7 points [-]

Or every time i have a cigarette, I kick a brick wall with no shoes on. If i was consistent with this for a long time, would it work?

Totally. The wall will fall over in 20 years, tops!

The actual answer is maybe - it works for some but not others. A common point of failure is that people just train themselves to cheat and take the reward anyway. I'm not sure what the response rate is when full compliance to the reward schedule is assumed.

Comment author: TheOtherDave 21 June 2012 07:04:33PM 1 point [-]

It can. Basically the failure modes are the same as when reinforcing others. In particular, it's common to fail to maintain consistent thresholds of self-reward.

Comment author: Oscar_Cunningham 21 June 2012 06:17:40PM 1 point [-]

Note that there are many circumstances when it is right to criticise. For instance group brainstorming exercises are more productive if the participants criticise each others ideas.

Comment author: mapnoterritory 21 June 2012 07:30:51PM *  9 points [-]

Daniel Kahneman in Thinking, Fast and Slow:

I had stumbled onto a significant fact of the human condition: the feedback to which life exposes us is perverse. Because we tend to be nice to other people when they please us and nasty when they do not, we are statistically punished for being nice and rewarded for being nasty.

There reason for that lies in regression to the mean when training (example of flight instructors in the israel airforce):

I pointed out to the instructors that what they saw on the board coincided with what we had heard about the performance of aerobatic maneuvers on successive attempts: poor performance was typically followed by improvement and good performance by deterioration, without any help from either praise or punishment.

Since positive reinforcement is so counterintuitive: don't forget to reward yourself for rewarding somebody for good behaviour! :)

Comment author: faul_sname 22 June 2012 01:47:15AM 9 points [-]

Speaking of regression to the mean, that seems to be one topic that wasn't really covered in the sequences that really should have been.

Comment author: EphemeralNight 21 June 2012 09:43:29PM 4 points [-]

The reason you should ignore poor performance if you say "No, you're doing it wrong!" you are inadvertently punishing the effort. A better response to a mistake would be to reinforce the effort: "Good effort! You're almost there! Try once more.

I am probably unusual in this regard, but I think I would find both approaches equally aggravating. If someone points out that I've made a mistake, anything other than a concise detailing of exactly how what I did differs from what I was supposed to do, is just going to irritate me. Also, my brain tends to interpret being ignored as a signal that I'm doing correctly.

Comment author: Swimmer963 21 June 2012 09:56:04PM 2 points [-]

If someone points out that I've made a mistake, anything other than a concise detailing of exactly how what I did differs from what I was supposed to do, is just going to irritate me.

Is this because of the "damn it, I know I made a mistake, you telling me I did doesn't help!" effect? I get that too... A good thought experiment is that if I was making a type of mistake that I couldn't automatically tell I was making on my own, I would prefer it to be pointed out, even if not in a concise detailed fashion–the idea of not knowing that I'm making a mistake is kind of scary. What would your reaction be in that situation?

Comment author: EphemeralNight 21 June 2012 10:23:58PM *  2 points [-]

Is this because of the "damn it, I know I made a mistake, you telling me I did doesn't help!" effect?

No, I react the same way whether I was previously aware of my mistake or not. I only experience that effect when I'm told to do something I am already doing.

A good thought experiment is that if I was making a type of mistake that I couldn't automatically tell I was making on my own, I would prefer it to be pointed out, even if not in a concise detailed fashion–the idea of not knowing that I'm making a mistake is kind of scary. What would your reaction be in that situation?

Pragmatically, we as humans, just barely over the threshold into sapient intelligence, make mistakes we're not aware of constantly. If we didn't, we wouldn't need a superintelligence to fix the world; we'd have already done it ourselves. So finding the concept scary seems kind of pointless.(Sort of like being hydrophobic about the water in one's own body.) However, I would, of course, rather be aware of my mistakes than not.

But none of this is really on the topic, which was that the listed reinforcements don't seem even remotely applicable to humans in a universal way.

Comment author: Swimmer963 22 June 2012 02:16:26AM 2 points [-]

So finding the concept scary seems kind of pointless. However, I would, of course, rather be aware of my mistakes than not.

My actions have impacts on others. In general, I prefer to help other people or at least not harm them–however, I may harm someone by mistake, and I really don't want this to happen. If I make a mistake once and I realize it–fine, hopefully no harm done, I won't do it again. If I make a mistake and I don't know about it, well, maybe no harm done that time in particular, but I'm likely to keep making this mistake over and over, and possibly the first time I'll find out is when there is harm done. I think that justifies finding it scary.

Comment author: shminux 21 June 2012 10:06:59PM *  3 points [-]

Lessons learned:

  • continue to mentally /ignore people and posts I don't care for on IRC and online forums

  • never comment on bad posts or explain my downvote on LW

  • be more generous with upvoting good contributions and give a short praise when warranted.