Related: Time and Effort Discounting, Akrasia, Hyperbolic Discounting, and Picoeconomics, The Power of Reinforcement, Basics of Animal Reinforcement, Basics of Human Reinforcement
I built a robot that feeds me candy when I get work done, to try to solve my akrasia problem. And, so far, it seems like it might actually work.
Naturally, the story starts with procrastination. I finish things the night before they're due. Or, sometimes, I don't. I'd like to fix that. One theory explains procrastination as a result of discounting, the idea that human brains discount long-term rewards in favor of short-term ones. For instance, my brain prefers watching Neon Genesis Evangelion now over nearly missing my project deadline in a few days. The same principle applies to consequences, and there are already tools like BeeMinder that are built to combat it. Its tagline, "bring long-term consequences near," is a very concise description of a clever way to short-circuit discounting. It's very interesting, but I'm not really comfortable with paying money as a consequence. Instead, I'm going to try a similar technique: bringing long-term rewards near.
There are already a lot of techniques about bringing long-term rewards near. Generally, they're called reinforcement learning. The classic reward in reinforcement is candy, which seems like a good idea: I like it, and I'm more than willing to abuse my youthful metabolism for productivity. And, in fact, there are a wide variety of folk solutions of that sort - advice to reward yourself with some candy once your work is done. I've tried those already, but they never seem to work out for me - I always seem to wind up cheating. I need to do something trickier.
CFAR describes reinforcement in a very striking way in some of their course materials: they call it "training your inner pigeon." Not only is that a nice, snappy turn of phrase, it illustrates the problem with attempting to self-administer rewards very nicely. Did Skinner's pigeons self-administer their rewards? No, of course they didn't. I shouldn't expect my inner pigeon to, either. So, my next step is to build a robot that gives me candy when I get stuff done.
Why do I think I can keep from cheating on the machine, when I couldn't restrain myself from cheating on regular old bags of candy? Well, I'm far from certain; it's my biggest worry with the project, in fact. But I am reasonably confident, because the machine will give me an easy way to establish a Schelling fence. Where taking a handful of candy out of the bag is sometimes right and sometimes wrong, taking a handful of candy out of the hopper is always wrong, since the machine will dispense the candy when I deserve it. Precommitting to never take candy out of the machine seems like it'll be a lot easier than precommitting to only sometimes take candy out of the bag.
Now, the description "robot" for my machine is a bit fanciful. It's actually an automatic dog feeder, modified and connected to the Internet. It has a small screen mounted on the front, which tells me how many rewards I've earned. If I've got any, I can press a button on the screen to dispense them. Not counting parts I already owned, the device cost me around $50 to build. To provide the data, I linked the device to an earlier productivity hack that I already had around, a custom webapp integrating a task list with a Pomodoro timer.
Rewards are given based on a few simple rules. When I finish a task early, it gives me the number of days early in rewards; if I finish tasks out of order, it gives me the nearer task's number of rewards, so I've got an incentive to finish tasks in order. I also get an extra reward for my first Pomodoro in a week for each of my projects, so that I have an incentive not to forget old projects. The system can also take away rewards. If I get distracted during a Pomodoro, I lose a reward. I'm blocked from redeeming rewards if I have a task within a day of its deadline. If I finish a task more than a day late, I lose any rewards in the system.
Results have been mixed so far. My greatest concern seems to have been unjustified: I haven't cheated on the machine once. However, it seems like the rules need some more work. The system has definitely helped some, but there are a lot of problems that could be improved.
The system doesn't account for the difficulty of tasks, meaning that I get more reward for less effort if I do easier work. As a result, I've done all of the reading up to next Tuesday for my literature class, but my Computer Science assignment due on Friday is unfinished, and my "research" for an exceptionally abhorrent humanities course is languishing on the vine.
The point of the system was to bring long-term rewards near, but there are a lot of circumstances in which it doesn't seem to bring them quite near enough. For deadlined tasks, I get no rewards until I've actually completed the task; if I think a task will take me more than a day to finish, that's more than a day of work which earns me no short-term rewards. This gets even worse if I happen to have a long task (or, many short tasks) that have reached the day before their deadline. Then, I don't get any rewards until I finish all of those tasks. While this is quite motivating, it's still a long-term motivation, i.e. it doesn't work very well.
I deliberately built the system to encourage doing tasks in order, but this seems to have backfired a little bit. Since I would be giving up rewards, I don't want to work on a task that's due later if there's another that's due sooner. However, if I really don't want to do the nearer task, I'll end up wasting time, since I get no rewards for that either way. Nyan_sandwich describes a similar failure mode in his Akrasia Case Study: if I know I have something more urgent to do, but I don't want to do it, I wind up procrastinating instead of doing less urgent things.
I get sick of candy more quickly than I expected. The portion my machine emits (about a small handful) tends to stop motivating me after about 4 in a day. Additionally, I seem to be entirely incapable of pacing myself; if the reward is in the system, I tend not to wait very long before using it. This has crippled all of the rules about involving taking away rewards - unless the rewards are blocked, they don't stick around in the system long enough to be taken away.
Not all of the things I want to change are a result of problems, though. There are a wide variety of interesting improvements I could make. Many of these are expansions: aside from my task list, what else can I connect to? Can I track note-taking in class? Can I set it up to reward continuing effort towards a task, like writing a few hundred words a day? Can I use it to create new, more rational habits? There are all kinds of possibilities to consider. If you've got anything you'd like to suggest, let me know - I'm open to anything interesting.
There are also a lot of techniques to research; I'm sure the program isn't nearly as effective as it could be. Operant conditioning techniques like variable-ratio schedules might help improve performance per candy. Or, I could look into gamification, basically a form of applied human operant conditioning; it's not a standard tool on the site, but if you've ever watched an experience bar rise, you know what I'm talking about. Again, if you happen to have some relevant ideas, let me know.
Obviously, I'm going to be making some rule changes in the near future. Expect another post in a few weeks about what's changed and how the changes have worked out for me.
Also, does anyone want to help me think of a good name for the system? Right now it's called the "extrinsic motivator." While descriptive, this name isn't snappy at all.
This is awesome! I'm really excited because I've been playing around with related things for a while, and by sharing techniques we can all become stronger!
So: A couple months ago I had a massive problem with my Anki reviews. I use Anki for a lot of things that aren't standard question-answer training, and this other work tends to be really mentally effortful. So, naturally, the reviews piled up until I had a massive backlog, around 10,000 reviews.
I tried a bunch of tricks; from this post ("Applying Behavioral Psychology on Myself") and elsewhere I got ideas for two Anki plugins intended to reinforce reviewing, by manipulating music volume or by showing reinforcing pictures. This was somewhat effective, but not very. Then I moved on to using candy, which worked better but still wasn't very effective.
But the latest thing I've been trying works a lot better. Look at my anki review stats:
The technique that did that was this:
As you procrastinate, you become hungrier and hungrier until your desire for rewards exceeds your desire for non-work. By keeping rewards small, you remain perpetually hungry and work remains reinforcing. The brain was built to extrapolate from "I'm less hungry" to "I should do whatever I just did more often".
This is especially good for something that needs to be done regularly, like anki reviews. If rewards and reward-probabilities are small enough, it also functions as caloric restriction. This system is also good for very granular tasks, like question-answer Anki reviews.
My non-question-answer reviews have irregular lengths and so aren't as granular, so for them I use another reinforcement system in addition to the old one: every 10 seconds, with probability ~20%, show a message in the background. When that appears, I examine my thoughts just prior to it appearing and reward if they were about work or other productive things. This also seems to work well, and works better for tasks where you don't want to incentivize rushing through things. (For question-answer pairs, success is measurable in correct responses, so generally you should rush through them, as long as you still get the correct responses.)
I also have a thing that periodically asks whether I'm in a correct posture, and standing instead of sitting, and not procrastinating sleep. If I'm in the right state, those give additional medium-sized rewards. I implemented this only 2 days ago, so I don't know if it works yet.
I strongly encourage you to try variable reinforcement, because my impression from reading things is that it's a lot better; I haven't tried non-variable reinforcement.
A similar thing I've been experimenting with is punishing unwanted behaviors, mostly by using the rubber band technique; mixed results so far.
I'm very interested in your automatic dog-feeder setup. I've been thinking about further automating my reinforcement things, and about automating punishment by buying an electroshock collar (normally used to train dogs).
How to implement some of the stuff above:
See the two anki plugins I posted. I just put up a more basic version that shows popups instead of pictu res.
On Ubuntu Linux, to show a background popup every 10 seconds (sorta), with some probability, do crontab -e and add this:
* * * * * bash -c "export DISPLAY=:0 && for i in 1 2 3 4 5; do if [ \$(( \$RANDOM / 327 )) -lt 20 ] ; then notify-send 'REINFORCE?'; (sleep 3; killall notify-osd ) & fi; sleep 10; done"
That's what I currently use; I used to use this, which shows foreground popups and should thus probably be kept commented-out most of the time:
* * * * * bash -c "export DISPLAY=:0 && for i in 1 2 3 4 5; do if [ \$(( \$RANDOM / 327 )) -lt 10 ] ; then zenity --info --text='\n\nREINFORCE?' --timeout=5 --width=1000 --height=800 & fi; sleep 10; done; if [ \$(( \$RANDOM / 327 )) -lt 10 ] ; then zenity --info --text='\n\nREINFORCE?' --timeout=5 --width=1000 --height=800 & fi"
If anyone is planning to use any of this, please tell me. Also please share any ideas you have, even if they don't seem useful.
Have you had any problems with the context switching? It seems like being interrupted every ~50 seconds would make me less productive.