Getting rational now or later: navigating procrastination and time-inconsistent preferences for new rationalists

milo_thoughts

This is a distillation of and reflection on O’Donoghue and Rabin’s “Doing it now or later” (see citation below)^[1].

Many people struggle with procrastination or self-control. Critically, we struggle with the mismatch between current preference and future preference. Procrastination arises in situations that are unpleasant to perform but create future benefits: they have “immediate costs.” Similarly, indulgent behaviors (e.g. eating unhealthy foods) have “immediate rewards” (they taste good), while stashing away future costs. Both have a mismatch between present-self and the future-self.

The best way to “kick” this issue is to develop time-consistent preferences, placing no special value on what’s happening now–thus our choices around procrastination or indulgence will be more rational and no longer skewed by a current self-control limitation. But…that’s hard to achieve.

O’Donoghue and Rabin introduce a neat distinction (1999). A naive procrastinator, they write, expects that in the future, they won’t have the same self-control issues that they’re having now. A sophisticated procrastinator expects their imperfection: they expect their future self to also have procrastination or indulgence tendencies. With their model of time-inconsistent preferences, they demonstrate that sophisticates procrastinate less than their naive peers! This model suggests that I shouldn’t plan on having good self-control; actually, expecting my own irrationality may improve my performance. But…not so fast, because the model also indicates that sophistication may make me more likely to indulge now at a time when I’m better off waiting. The lesson for new rationalists, who haven’t yet cracked time consistency: when facing immediate costs, be realistic (like the “sophisticate”), but when facing immediate rewards, stay idealistic (like the “naif”).

Let’s get into it with an example!

You choose…you must either endure:

Three hours of an unpleasant activity today
Four hours of it one week from today

If you still prefer today, congrats on beating your procrastination tendency. As for the rest of us, people overwhelmingly prefer incurring the higher cost in a week to the lower cost today. But what if I posed it this way–the same unpleasant activity…. Would you rather endure:

Three hours on March 24th
Four hours on March 31st

Barring specific circumstances around these dates, most of us have an easy time choosing the 24th…after all, that’s one fewer hour!

The model

It can be rational to discount your valuation of the future, prioritizing present utility over future utility. Sure, you care about your utility today , and you may also care today about your utility tomorrow $u_{t + 1}$ or the next day $u_{t + 2}$ , but you may reasonably discount how much you care about those future utilities. We consider $U^{t} (u_{t}, u_{t + 1}, \dots, u_{T})$ , your instantaneous utility today based on your future utilities. If today you care less about utilities that are far in the future, you may find yourself discounting according to a rule like this

U^{t} (u_{t}, u_{t + 1}, \dots, u_{T}) = T \sum τ = t δ^{τ} u_{τ}

For some $δ \in (0, 1]$ which acts as a discount factor. Basically, our utility now is just to total of all the daily utilities, except the farther away something is, the more “discounted” it is. Such a model (especially for low $δ$ values) can explain a lot of our present-biased behavior, but it does not explain the time inconsistency of our preferences. It fails to explain why I might punt the unpleasant activity today in favor of a week from today, but still choose March 24th over the 31st.

O’Donoghue and Rabin use a different model, which they call $(β - δ)$ -preferences. Consider, for all timestamps $t$ ,

U^{t} (u_{t}, u_{t + 1}, \dots, u_{T}) = δ^{t} u_{t} + β T \sum τ = t + 1 δ^{τ} u_{τ}

For a positive $β$ and a positive discount factor $δ \leq 1$ . We use $β$ as a parameter to tune our relationship to future events. When $β = 1$ , it’s just like standard exponential discounting. But for a smaller $β$ , such as $\frac{1}{2}$ , we capture time-inconsistent preferences that are present-biased: we value the future half as much as the present.

Applying the model

Facing immediate costs

Here’s the example they run with:

Suppose you usually go to the movies on Saturdays, and the schedule at the local cinema consists of a mediocre movie this week, a good movie next week, a great movie in two weeks, and (best of all) a Johnny Depp movie in three weeks. Now suppose you must complete a report for work within four weeks, and to do so you must skip the movie on one of the next four Saturdays. When do you complete the report? (O’Donoghue and Rabin, 1999, p. 109)

And as they did, we'll let valuations of the mediocre, good, great, and Depp movies be 3, 5, 8, and 13. We’ll consider a fully rational person with time-consistent preferences as well as a naif and a sophisticate with otherwise identical $(β - δ)$ -preferences:

From the perspective of someone with time-consistent preferences ( $β = 1$ ):

Week 1: the cost is lowest today at 3–the costs of the other days look like 5, 8, and 13 to me, which are all worse, so I’ll do the report today.

With time-consistent preferences, one has an easy time choosing the optimal outcome. For the others, for simplicity, we’ll let $β = \frac{1}{2}$ and $δ = 1$ , meaning, from their perspective, future utilities have half as much value.

From the naif’s perspective ( $β = \frac{1}{2}$ ):

Week 1: The cost of doing the report this week and skipping the mediocre movie would be 3, not bad, but with $β = \frac{1}{2}$ , my future costs look like (2.5, 4, 6.5). I’d rather see the mediocre movie this week (a cost of 3 > 2.5), so I’ll plan to do the report next week and miss the good movie. I can still see the great movie and the Johnny Depp movie.
Week 2: The cost this week would be 5. Ah, I wish I had gone last week, but alas here we are. What do the next weeks look like to me? With my $β$ discounting, they cost 4 and 6.5 respectively…so I should go this week because skipping this week would be a cost of 5 > 4, and I’ll plan to do the report next week, missing the great movie, so that I can still see the Johnny Depp movie.
Week 3: The cost this week is 8–I don’t wanna skip the great movie! Sure, that means I’ll miss the Johnny Depp movie, but from today’s perspective that’s looking like a cost of 6.5, worth less to me.
Week 4: I miss the Johnny Depp movie to work on my report, and incur the utility cost of 13.

In this example, we see that the naif’s behavior is perfectly bad: they consistently act against their own best interest, and they end up missing the best movie! How does the sophisticate perform? Well, their key insight comes from treating each “self” on each different day as its own player in the game, and thus they can make predictions about what their future self will do.

From the sophisticate’s perspective $(β = \frac{1}{2})$ :

Week 1: The cost this week is 3, while the future costs look like 2.5, 4, and 6.5 to me, so from my perspective today, I’d rather wait until next week to do the report–but only if I actually do the report next week, instead of procrastinating further. So…let’s work backwards.
- Thinking about week 4: if I haven’t done my report by then, I’ll have to miss Johnny Depp–a bummer that costs me utility 13!
- Thinking about week 3: On week 3, I’ll be comparing the cost of 8 to what I see as a cost of 6.5…so if I let it get to week 3, I’ll most certainly procrastinate and end up missing Johnny Depp.
- Thinking about week 2: On week 2, the cost is 5. Yes, that’s worse than 4, which is how I naively view next week, but I know (from the above) that if I plan to do it next week as 4 < 5, I’ll end up actually procrastinating that week and incurring the 6.5 cost. This week’s cost of 5 is better than 6.5, so I’ll do it today.
- Thinking about this week (week 1): If I don’t do the report this week, I know (from the above) I’ll actually do it next week. From today’s perspective, a cost of 2.5 sounds better than a cost of 3, so I’ll wait!
Week 2: Naively, today’s cost of 5 seems worse than next week, which I see as a cost of 4. But, sophisticatedly, I know next week I’ll end up procrastinating till the following week and incurring the 6.5 cost, which is worse than 5! So I’ll do the report today.

Just by recognizing and expecting their own time-inconsistent preferences, the sophisticate did the report much sooner, skipping the good movie, compared to the naif who ended up missing the Johnny Depp movie. In an “immediate costs” scenario like this one, sophistication helped get closer to the optimal outcome!

Facing immediate rewards

They use a slightly different example:

Suppose you have a coupon to see one movie over the next four Saturdays, and your allowance is such that you cannot afford to pay for a movie. The schedule at the local cinema is the same as for the above example—a mediocre movie this week, a good movie next week, a great movie in two weeks, and (best of all) a Johnny Depp movie in three weeks. Which movie do you see? (O’Donoghue and Rabin, 1999, p. 110).

We use the same values as before, (3, 5, 8, 13), but this time they represent instantaneous rewards rather than costs.

From the perspective of someone with time-consistent preferences ( $β = 1$ ):

Week 1: 5, 8, and 13 are all better rewards than 3, so I’ll wait for the Johnny Depp film.
Week 2: 8 and 13 are still better than 5, so I’ll keep waiting.
Week 3: 13 is better than 8, so I’ll pass on the great movie.
Week 4: I get to see Johnny Depp.

With time-consistent preferences, this player chooses the optimal outcome. Now consider the naif.

From the naif’s perspective $(β = \frac{1}{2})$ :

Week 1: The reward if I went today is 3, but I see the future rewards as (2.5, 4, and 6.5). Both 4 and 6.5 are bigger rewards than 3, so I’m inclined to wait.
Week 2: The reward today would be 5, a good movie, but future rewards look like 4 and 6.5. Since 6.5 is greater than 5, I’ll choose to wait for the Johnny Depp movie and forgo the good movie.
Week 3: The reward today would be 8, a great movie, whereas the reward for next week looks like 6.5 to me. I’m happiest seeing a great movie now, even if it means missing Johnny Depp tomorrow, as 8 > 6.5.

Thus, the naif would “cave” on the third week and finish having seen the great movie but not the best option. Now consider the sophisticate:

From the sophisticate’s perspective $(β = \frac{1}{2})$ :

Week 1: My payoff by going to the movie this week is 3, which is better than next week’s, which I see as 2.5, but not as good as the following weeks which I see as 4 and 6.5 respectively. Is it worth waiting for those later weeks? Only if I’m not just gonna end up “caving” next week and seeing the good movie, which I only value at 2.5 to today’s 3. To determine this, I work backwards:
- Thinking about week 4: If I still have my coupon then, I’ve made it. I get to see the best movie.
- Thinking about week 3: I will be choosing between seeing a great movie (8) or waiting and then seeing the Johnny Depp movie, which I will at that point value at a 6.5. Since 8 > 6.5, I know I’ll end up caving on week 3 and seeing the great movie and I actually won’t get to see the Johnny Depp movie anyway.
- Thinking about week 2: I will be choosing between seeing a good movie (5) or waiting and then seeing the great movie or Johnny Depp movie, which I will at that point value at a 4 or a 6.5 respectively–but, I know from the above that if I pass this week, I’ll end up caving on week 3, so Johnny Depp is off the table. So, on Week 2, I’ll end up caving and seeing the good movie for 5, which is better than waiting and ending up seeing the great movie which, right now, I only value at 4.
- Thinking about this week: I know now that if I choose to pass on the mediocre movie (3), I’ll end up caving next week and seeing the good movie that, right now, I value at only 2.5. So I should choose to see the mediocre movie.

Thus, the sophisticate ends up choosing the worst option. It happens as a cascade: the expectation of a future lack of self-control cascades backwards, leading to the decision that it’s not worth waiting at each step. For a scenario with immediate rewards, being idealistic seems better than being realistic–by planning to “make it to the finish line,” you get a whole lot closer.

The lesson

The lesson for new rationalists? Well, if you can tune your $β$ and pick up time-consistent preferences, do that! But if you can’t, then you should be selective about when to be idealistic and when to be realistic about your own tendencies. In procrastination situations with immediate costs and delayed rewards, it’s a good time to be realistic. Know yourself, and recognize that your future tendencies may not be as perfect as you plan for them to be. Maybe, by not doing it now, you’ll incur greater costs than you initially expect, due to future self control issues–channel that self-knowledge into motivation.

As for self-control scenarios, with immediate rewards and delayed costs? That’s the time to be idealistic. Channel a rah-rah, brute force mindset as close to the “finish line” as possible.

Just because you aren't time-consistent doesn't mean you are naive. You get to choose how to approach navigating your preferences, and by strategizing about your approach, you can improve your self-control.

Citations

^{^}
O'Donoghue, T., & Rabin, M. (1999). Doing it now or later. American economic review, 89(1), 103-124.
Read it here.

LESSWRONG
LW