TheOtherDave comments on The Power of Reinforcement - Less Wrong

96 Post author: lukeprog 21 June 2012 01:42PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (467)

You are viewing a single comment's thread. Show more comments above.

Comment author: TheOtherDave 21 June 2012 03:39:47AM 9 points [-]

(nods) For my own part, it's frequently worse than random... when I don't attend to what I'm doing, I frequently berate or otherwise punish myself for attempts to achieve a target that fall short of that target, and I'm more likely to do that the more I value achieving the target. Which is a great way to extinguish the behaviors I value.

Comment author: Viliam_Bur 21 June 2012 09:45:25AM *  5 points [-]

I suspect it's very difficult to design the right reinforcement strategy. It's easy to reward something that seems related to the goal, but can gradually become a replacement for the goal.

For example rewarding success and punishing failure reinforces choosing only trivial tasks, which prevents learning new things. Rewarding starting new things reinforces starting new tasks without finishing them, also choosing tasks for being new, not being useful. Etc.

Rational thinking about consequences, and changing the strategy when necessary, cannot be avoided. So perhaps this should be reinforced. But how do we distinguish between genuine rationality and signalling? Yeah, rationalists should win, but by rewarding success and punishing failure... see the previous paragraph.

Anyway, many people do worse than random, so some reinforcement can be used to improve the situation.

EDIT: Another problem: I suspect that any reinforcement inevitably goes meta. When I get a reward for doing X, I will do X more, but I will also like the reinforcement mechanism more. When I get punished for doing Y, I will do Y less, but I will also hate the reinforcement mechanism and rationalize why I must get rid of it.

I suspect that people prefer wireheading, except in cases when it becomes too obvious that it is wireheading. If I am allowed to choose my reinforcement mechanisms, I will probably unknowingly slowly optimize them towards wireheading. If someone else chooses my reinforcement mechanisms, I suspect they will choose it to optimize their utility function instead of mine.

Comment author: mwengler 03 July 2012 09:54:07PM 2 points [-]

Yoiks! This may well be why my procrastination at work has increased and increased over the decades. I almost always (habitually?) feel like my efforts are not good enough, will be criticized negatively.

Comment author: TheOtherDave 03 July 2012 10:02:44PM 1 point [-]

(nods) That's a pretty common result of relying on punishment to shape behavior.