A vote against spaced repetition
LessWrong seems to be a big fan of spaced-repetition flashcard programs like Anki, Supermemo, or Mnemosyne. I used to be. After using them religiously for 3 years in medical school, I now categorically advise against using them for large volumes of memorization.
[A caveat before people get upset: I think they appropriate in certain situations, and I have not tried to use them to learn a language, which seems its most popular use. More at the bottom.]
A bit more history: I and 30 other students tried using Mnemosyne (and some used Anki) for multiple tests. At my school, we have a test approximately every 3 weeks, and each test covers about 75 pages of high-density outline-format notes. Many stopped after 5 or so such tests, citing that they simply did not get enough returns from their time. I stuck with it longer and used them more than anyone else, using them for 3 years.
Incidentally, I failed my first year and had to repeat.
By the end of that third year (and studying for my Step 1 boards, a several-month process), I lost faith in spaced-repetition cards as an effective tool for my memorization demands. I later met with a learning-skills specialist, who felt the same way, and had better reasons than my intuition/trial-and-error:
- Flashcards are less useful to learning the “big picture”
- Specifically, if you are memorizing a large amount of information, there is often a hierarchy, organization, etc that can make leaning the whole thing easier, and you loose the constant visual reminder of the larger context when using flashcards.
- Flashcards do not take advantage of spatial, mapping, or visual memory, all of which the human mind is much better optimized for. It is not so well built to memorize pairs between seemingly arbitrary concepts with few to no intuitive links. My preferred methods are, in essence, hacks that use your visual and spatial memory rather than rote.
Here are examples of the typical kind of things I memorize every day and have found flashcards to be surprisingly worthless for:
- The definition of Sjögren's syndrome
- The contraindications of Metronidazole
- The significance of a rise in serum αFP
Here is what I now use in place of flashcards:
- Ven diagrams/etc, to compare and contrast similar lists. (This is more specific to medical school, when you learn subtly different diseases.)
- Mnemonic pictures. I have used this myself for years to great effect, and later learned it was taught by my study-skills expert, though I'm surprised I haven't found them formally named and taught anywhere else. The basic concept is to make a large picture, where each detail on the picture corresponds to a detail you want to memorize.
- Memory palaces. I recently learned how to properly use these, and I'm a true believer. When I only had the general idea to “pair things you want to memorize with places in your room” I found it worthless, but after I was taught a lot of do's and don'ts, they're now my favorite way to memorize any list of 5+ items. If there's enough demand on LW I can write up a summary.
Spaced repetition is still good for knowledge you need to retrieve immediately, when a 2-second delay would make it useless. I would still consider spaced-repetition to memorize some of the more rarely-used notes on the treble and bass clef, if I ever decide to learn to sight-read music properly. I make no comment on it's usefulness to learn a foreign language, as I haven't tried it, but if I were to pick one up I personally would start with a rosetta-stone-esque program.
Your mileage may vary, but after seeing so many people try and reject them, I figured it was enough data to share. Mnemonic pictures and memory palaces are slightly time consuming when you're learning them. However, if someone has the motivation and discipline to make a stack of flashcards and study them every day indefinitely, then I believe learning and using those skills is a far better use of time.
Less Wrong Study Hall - Year 1 Retrospective
Some time back, a small group of Less Wrongers collected in a video chatroom to work on…things. We’ve been at it for exactly one year as of today, and it seems like a good time to see what’s come of it.[1] So here is what we’ve done, what we’re doing, and a few thoughts on where we’re going. At the end is a survey taken of the LWSH, partly to be compared to Less Wrong proper, but mostly for fun. If you like what you see here, come join us. The password is “lw”.
A Brief History of the Hall
I think the first inspiration was Eliezer looking for someone to sit with him while he worked, to help with productivity and akrasia. Shannon Friedman answered the call and it seemed to be effective. She suggested a similar coworking scheme to one of her clients, Mqrius, to help him with akratic issues surrounding his thesis. She posted on Less Wrong about it, with the intent of connecting him and possibly others who wanted to co-work in a similar fashion. Tsakinis, in the comments, took the idea a step further, and created a Tinychat video chatroom for group working. It was titled the Less Wrong Study Hall. The theory is that it will help us actually do the work, instead of, say, reading tvtropes when we should be studying. It turned out to be a decent Schelling point, enough to form a regular group and occasionally attract new people. It’s grown slowly but steadily.
Tinychat’s software sucks, and there have been a couple of efforts to replace it. Mqrius looked into OpenMeetings, but it didn’t work out. Yours truly took a crack at programming a LWSH Google Hangout, but it ran aground on technical difficulties. Meanwhile the tinychat room continued to work, and despite nobody actually liking it, it’s done the job well enough.
Tinychat is publicly available, and there have been occasional issues with the public along the way. A few people took up modding, but it was still a nuisance. Eventually a password was placed on the room, which mostly shut down the problem. We did have one guy straight out guess the password, which was a…peculiar experience. He was notably not all there, but somehow still scrupulously polite, and left when asked. I don’t think I’ve ever seen that happen on the Internet before.
A year after the Hall opened, we have about twenty to twenty-five regulars, with an unknown number of occasional users. We’re still well within Dunbar’s number, so everybody knows everybody else and new users integrate quickly. We’ve developed a reasonably firm set of social norms to guide our work, in spite of not having direct technical control nor clear leaders.
Proportional Giving
Executive summary: The practice of giving a fixed fraction of one's income to charity is near-universal but possibly indefensible. I describe one approach that certainly doesn't defend it, speculate vaguely about a possible way of fixing it up, and invite better ideas from others.
Many of us give a certain fraction of our income to charitable causes. This sort of practice has a long history:
Deuteronomy 14:22 Thou shalt truly tithe all the increase of thy seed, that the field bringeth forth year by year.
(note that "tithe" here means "give one-tenth of") and is widely practised today:
GWWC Pledge: I recognise that I can use part of my income to do a significant amount of good in the developing world. Since I can live well enough on a smaller income, I pledge that from today until the day I retire, I shall give at least ten percent of what I earn to whichever organizations can most effectively use it to help people in developing countries. I make this pledge freely, openly, and without regret.
And of course it's roughly how typical taxation systems (which are kinda-sorta like charitable donation, if you squint) operate. But does it make sense? Is there some underlying principle from which a policy of giving away a certain fraction of one's income (not necessarily the traditional 10%, of course) follows?
The most obvious candidate for such a principle would be what we might call
Weighted Utilitarianism: Act so as to maximize a weighted sum of utility, where (e.g.) one's own utility may be weighted much higher than that of random far-away people.
But this can't produce anything remotely like a policy of proportional giving. Assuming you aren't giving away many millions per year (which is a fair assumption if you're thinking in terms of a fraction of your salary) then the level of utility-per-unit-money achievable by your giving is basically independent of what you give, and so is the weight you attach to the utility of the beneficiaries.
So suppose that when your income, after taking out donations, is $X, your utility (all else equal) is u(X), so that your utility per marginal dollar is u'(X); and suppose you attach weight 1 to your own utility and weight w to that of the people who'd benefit from your donations; and suppose their gain in utility per marginal dollar given is t. Then when your income is S you will set your giving g so that u'(S-g) = wt.
What this says is that a weighted-utilitarian should keep a fixed absolute amount S-g of his or her income, and give all the rest away. The fixed absolute amount will depend on the weight w (hence, on exactly which people are benefited by the donations) and on the utility per dollar given t (hence, on exactly what charities are serving them and how severe their need is), but not on the person's pre-donation income S.
(Here's a quick oversimplified example. Suppose that utility is proportional to log(income), that the people your donations will help have an income equivalent to $1k/year, that you care 100x more about your utility than about theirs, and that your donations are the equivalent of direct cash transfers to those people. Then u' = 1/income, so you should keep everything up to $100k/year and give the rest away. The generalization to other weighting factors and beneficiary incomes should be obvious.)
This argument seems reasonably watertight given its premises, but proportional giving is so well-established a phenomenon that we might reasonably trust our predisposition in its favour more than our arguments against. Can we salvage it somehow?
Here's one possibility. One effect of income is (supposedly) to incentivize work, and maybe (mumble near mode mumble) this effect is governed entirely by anticipated personal utility and not by any benefit conferred on others. Then the policy derived above, which above the threshold makes personal utility independent of effort, would lead to minimum effort and hence maybe less net weighted utility than could be attained with a different policy. Does this lead to anything like proportional giving, at least for some semi-plausible assumptions about the relationship between effort and income?
At the moment, I don't know. I have a page full of scribbled attempts to derive something of the kind, but they didn't work out. And of course there might be some better way to get proportional giving out of plausible ethical principles. Anyone want to do better?
Intelligence Metrics with Naturalized Induction using UDT
Followup to: Intelligence Metrics and Decision Theory
Related to: Bridge Collapse: Reductionism as Engineering Problem
A central problem in AGI is giving a formal definition of intelligence. Marcus Hutter has proposed AIXI as a model of perfectly intelligent agent. Legg and Hutter have defined a quantitative measure of intelligence applicable to any suitable formalized agent such that AIXI is the agent with maximal intelligence according to this measure.
Legg-Hutter intelligence suffers from a number of problems I have previously discussed, the most important being:
- The formalism is inherently Cartesian. Solving this problem is known as naturalized induction and it is discussed in detail here.
- The utility function Legg & Hutter use is a formalization of reinforcement learning, while we would like to consider agents with arbitrary preferences. Moreover, a real AGI designed with reinforcement learning would tend to wrestle control of the reinforcement signal from the operators (there must be a classic reference on this but I can't find it. Help?). It is straightword to tweak to formalism to allow for any utility function which depends on the agent's sensations and actions, however we would like to be able to use any ontology for defining it.
- Define a formalism for logical uncertainty. When I started writing this I thought this formalism might be novel but now I see it is essentially the same as that of Benja.
- Use this formalism to define a non-constructive formalization of UDT. By "non-constructive" I mean something that assigns values to actions rather than a specific algorithm like here.
- Apply the formalization of UDT to my quasi-Solomonoff framework to yield an intelligence metric.
- Slightly modify my original definition of the quasi-Solomonoff measure so that the confidence of the innate model becomes a continuous rather than discrete parameter. This leads to an interesting conjecture.
- Propose a "preference agnostic" variant as an alternative to Legg & Hutter's reinforcement learning.
- Discuss certain anthropic and decision-theoretic aspects.
Logical Uncertainty
The formalism introduced here was originally proposed by Benja.
Fix a formal system F. We want to be able to assign probabilities to statements s in F, taking into account limited computing resources. Fix D a natural number related to the amount of computing resources that I call "depth of analysis".
Define P0(s) := 1/2 for all s to be our initial prior, i.e. each statement's truth value is decided by a fair coin toss. Now define
PD(s) := P0(s | there are no contradictions of length <= D).
Consider X to be a number in [0, 1] given by a definition in F. Then dk(X) := "The k-th digit of the binary expansion of X is 1" is a statement in F. We define ED(X) := Σk 2-k PD(dk(X)).
Remarks
- Clearly if s is provable in F then for D >> 0, PD(s) = 1. Similarly if "not s" is provable in F then for D >> 0,
PD(s) = 0. - If each digit of X is decidable in F then limD -> inf ED(X) exists and equals the value of X according to F.
- For s of length > D, PD(s) = 1/2 since no contradiction of length <= D can involve s.
- It is an interesting question whether limD -> inf PD(s) exists for any s. It seems false that this limit always exists and equals 0 or 1, i.e. this formalism is not a loophole in Goedel incompleteness. To see this consider statements that require a high (arithmetical hierarchy) order halting oracle to decide.
- In computational terms, D corresponds to non-deterministic spatial complexity. It is spatial since we assign truth values simultaneously to all statements so in any given contradiction it is enough to retain the "thickest" step. It is non-deterministic since it's enough for a contradiction to exists, we don't have an actual computation which produces it. I suspect this can be made more formal using the Curry-Howard isomorphism, unfortunately I don't understand the latter yet.
Non-Constructive UDT
Consider A a decision algorithm for optimizing utility U, producing an output ("decision") which is an element of C. Here U is just a constant defined in F. We define the U-value of c in C for A at depth of analysis D to be
VD(c, A; U) := ED(U | "A produces c" is true). It is only well defined as long as "A doesn't produce c" cannot be proved at depth of analysis D i.e. PD("A produces c") > 0. We define the absolute U-value of c for A to be
V(c, A; U) := ED(c, A)(U | "A produces c" is true) where D(c, A) := max {D | PD("A produces c") > 0}. Of course D(c, A) can be infinite in which case Einf(...) is understood to mean limD -> inf ED(...).
For example V(c, A; U) yields the natural values for A an ambient control algorithm applied to e.g. a simple model of Newcomb's problem. To see this note that given A's output the value of U can be determined at low depths of analysis whereas the output of A requires a very high depth of analysis to determine.
Naturalized Induction
Our starting point is the "innate model" N: a certain a priori model of the universe including the agent G. This model encodes the universe as a sequence of natural numbers Y = (yk) which obeys either specific deterministic or non-deterministic dynamics or at least some constraints on the possible histories. It may or may not include information on the initial conditions. For example, N can describe the universe as a universal Turing machine M (representing G) with special "sensory" registers e. N constraints the dynamics to be compatible with the rules of the Turing machine but leaves unspecified the behavior of e. Alternatively, N can contain in addition to M a non-trivial model of the environment. Or N can be a cellular automaton with the agent corresponding to a certain collection of cells.
However, G's confidence in N is limited: otherwise it wouldn't need induction. We cannot start with 0 confidence: it's impossible to program a machine if you don't have even a guess of how it works. Instead we introduce a positive real number t which represents the timescale over which N is expected to hold. We then assign to each hypothesis H about Y (you can think about them as programs which compute yk given yj for j < k; more on that later) the weight QS(H) := 2-L(H) (1 - e-t(H)/t). Here L(H) is the length of H's encoding in bits and t(H) is the time during which H remains compatible with N. This is defined for N of deterministic / constraint type but can be generalized to stochastic N.
The weights QS(H) define a probability measure on the space of hypotheses which induces a probability measure on the space of histories Y. Thus we get an alternative to Solomonoff induction which allows for G to be a mechanistic part of the universe, at the price of introducing N and t.
Remarks
- Note that time is discrete in this formalism but t is continuous.
- Since we're later going to use logical uncertainties wrt the formal system F, it is tempting to construct the hypothesis space out of predicates in F rather than programs.
Intelligence Metric
To assign intelligence to agents we need to add two ingredients:
- The decoding Q: {Y} -> {bit-string} of the agent G from the universe Y. For example Q can read off the program loaded into M at time k=0.
- A utility function U: {Y} -> [0, 1] representing G's preferences. U has to be given by a definition in F. Note that N provides the ontology wrt which U is defined.
Instead, we define I(Q0) := EQS(Emax(U(Y(H)) | "Q(Y(H)) = Q0" is true)). Here the subscript max stands for maximal depth of analysis, as in the construction of absolute UDT value above.
Remarks
- IMO the correct way to look at this is intelligence metric = value of decision for the decision problem "what should I program into my robot?". If N is a highly detailed model including "me" (the programmer of the AI), this literally becomes the case. However for theoretical analysis it is likely to be more convenient to work with simple N (also conceptually it leaves room for a "purist" notion of agent's intelligence, decoupled from the fine details of its creator).
- As opposed to usual UDT, the algorithm (H) making the decision (Q) is not known with certainty. I think this represents a real uncertainty that has to be taken into account in decision problems in general: the decision-maker doesn't know her own algorithm. Since this "introspective uncertainty" is highly correlated with "indexical" uncertainty (uncertainty about the universe), it prevents us from absorbing the later into the utility function as proposed by Coscott.
- For high values of t, G can improve its understanding of the universe by bootstrapping the knowledge it already has. This is not possible for low values of t. In other words, if I cannot trust my mind at all, I cannot deduce anything. This leads me to an interesting conjecture: There is a a critical value t* of t from which this bootstrapping becomes possible (the positive feedback look of knowledge becomes critical). I(Q) is non-smooth at t* (phase transition).
- If we wish to understand intelligence, it might be beneficial to decouple it from the choice of preferences. To achieve this we can introduce the preference formula as an unknown parameter in N. For example, if G is realized by a machine M, we can connect M to a data storage E whose content is left undetermined by N. We can then define U to be defined by the formula encoded in E at time k=0. This leads to I(Q) being a sort of "general-purpose" intelligence while avoiding the problems associated with reinforcement learning.
- As opposed to Legg-Hutter intelligence, there appears to be no simple explicit description for Q* maximizing I(Q) (e.g. among all programs of given length). This is not surprising, since computational cost considerations come into play. In this framework it appears to be inherently impossible to decouple the computational cost considerations: G's computations have to be realized mechanistically and therefore cannot be free of time cost and side-effects.
- Ceteris paribus, Q* deals efficiently with problems like counterfactual mugging. The "ceteris paribus" conditional is necessary here since because of cost and side-effects of computations it is difficult to make absolute claims. However, it doesn't deal efficiently with counterfactual mugging in which G doesn't exist in the "other universe". This is because the ontology used for defining U (which is given by N) assumes G does exist. At least this is the case for simple ontologies like described above: possibly we can construct N in which G might or might not exist. Also, if G uses a quantum ontology (i.e. N describes the universe in terms of a wavefunction and U computes the quantum expectation value of an operator) then it does take into account other Everett universes in which G doesn't exist.
- For many choices of N (for example if the G is realized by a machine M), QS-induction assigns well-defined probabilities to subjective expectations, contrary to what is expected from UDT. However:
- This is not the case for all N. In particular, if N admits destruction of M then M's sensations after the point of destruction are not well-defined. Indeed, we better allow for destruction of M if we want G's preferences to behave properly in such an event. That is, if we don't allow it we get a "weak anvil problem" in the sense that G experiences an ontological crisis when discovering its own mortality and the outcome of this crisis is not obvious. Note though that it is not the same as the original ("strong") anvil problem, for example G might come to the conclusion the dynamics of "M's ghost" will be some sort of random.
- These probabilities probably depend significantly on N and don't amount to an elegant universal law for solving the anthropic trilemma.
- Indeed this framework is not completely "updateless", it is "partially updated" by the introduction of N and t. This suggests we might want the updates to be minimal in some sense, in particular t should be t*.
- The framework suggests there is no conceptual problem with cosmologies in which Boltzmann brains are abundant. Q* wouldn't think it is a Boltzmann brain since the long address of Boltzmann brains within the universe makes the respective hypotheses complex thus suppressing them, even disregarding the suppression associated with N. I doubt this argument is original but I feel the framework validates it to some extent.
A few remarks about mass-downvoting
To whoever has for the last several days been downvoting ~10 of my old comments per day:
It is possible that your intention is to discourage me from commenting on Less Wrong.
The actual effect is the reverse. My comments still end up positive on average, and I am therefore motivated to post more of them in order to compensate for the steady karma drain you are causing.
If you are mass-downvoting other people, the effect on some of them is probably the same.
To the LW admins, if any are reading:
Look, can we really not do anything about this behaviour? It's childish and stupid, and it makes the karma system less useful (e.g., for comment-sorting), and it gives bad actors a disproportionate influence on Less Wrong. It seems like there are lots of obvious things that would go some way towards helping, many of which have been discussed in past threads about this.
Failing that, can we at least agree that it's bad behaviour and that it would be good in principle to stop it or make it more visible and/or inconvenient?
Failing that, can we at least have an official statement from an LW administrator that mass-downvoting is not considered an undesirable behaviour here? I really hope this isn't the opinion of the LW admins, but as the topic has been discussed from time to time with never any admin response I've been thinking it increasingly likely that it is. If so, let's at least be honest about it.
To anyone else reading this:
If you should happen to notice that a sizeable fraction of my comments are at -1, this is probably why. (Though of course I may just have posted a bunch of silly things. I expect it happens from time to time.)
My apologies for cluttering up Discussion with this. (But not very many apologies; this sort of mass-downvoting seems to me to be one of the more toxic phenomena on Less Wrong, and I retain some small hope that eventually something may be done about it.)
Logic as Probability
Followup To: Putting in the Numbers
Before talking about logical uncertainty, our final topic is the relationship between probabilistic logic and classical logic. A robot running on probabilistic logic stores probabilities of events, e.g. that the grass is wet outside, P(wet), and then if they collect new evidence they update that probability to P(wet|evidence). Classical logic robots, on the other hand, deduce the truth of statements from axioms and observations. Maybe our robot starts out not being able to deduce whether the grass is wet, but then they observe that it is raining, and so they use an axiom about rain causing wetness to deduce that "the grass is wet" is true.
Classical logic relies on complete certainty in its axioms and observations, and makes completely certain deductions. This is unrealistic when applied to rain, but we're going to apply this to (first order, for starters) math later, which a better fit for classical logic.
The general pattern of the deduction "It's raining, and when it rains the grass is wet, therefore the grass is wet" was modus ponens: if 'U implies R' is true, and U is true, then R must be true. There is also modus tollens: if 'U implies R' is true, and R is false, then U has to be false too. Third, there is the law of non-contradiction: "It's simultaneously raining and not-raining outside" is always false.
We can imagine a robot that does classical logic as if it were writing in a notebook. Axioms are entered in the notebook at the start. Then our robot starts writing down statements that can be deduced by modus ponens or modus tollens. Eventually, the notebook is filled with statements deducible from the axioms. Modus tollens and modus ponens can be thought of as consistency conditions that apply to the contents of the notebook.
Putting in the Numbers
Followup To: Foundations of Probability
In the previous post, we reviewed reasons why having probabilities is a good idea. These foundations defined probabilities as numbers following certain rules, like the product rule and the rule that mutually exclusive probabilities sum to 1 at most. These probabilities have to hang together as a coherent whole. But just because probabilities hang together a certain way, doesn't actually tell us what numbers to assign.
I can say a coin flip has P(heads)=0.5, or I can say it has P(heads)=0.999; both are perfectly valid probabilities, as long as P(tails) is consistent. This post will be about how to actually get to the numbers.
2013 Survey Results
Thanks to everyone who took the 2013 Less Wrong Census/Survey. Extra thanks to Ozy, who helped me out with the data processing and statistics work, and to everyone who suggested questions.
This year's results are below. Some of them may make more sense in the context of the original survey questions, which can be seen here. Please do not try to take the survey as it is over and your results will not be counted.
[link] Why Self-Control Seems (but may not be) Limited
Another attack on the resource-based model of willpower, Michael Inzlicht, Brandon J. Schmeichel and C. Neil Macrae have a paper called "Why Self-Control Seems (but may not be) Limited" in press in Trends in Cognitive Sciences. Ungated version here.
Some of the most interesting points:
- Over 100 studies appear to be consistent with self-control being a limited resource, but generally these studies do not observe resource depletion directly, but infer it from whether or not people's performance declines in a second self-control task.
- The only attempts to directly measure the loss or gain of a resource have been studies measuring blood glucose, but these studies have serious limitations, the most important being an inability to replicate evidence of mental effort actually affecting the level of glucose in the blood.
- Self-control also seems to replenish by things such as "watching a favorite television program, affirming some core value, or even praying", which would seem to conflict with the hypothesis inherent resource limitations. The resource-based model also seems evolutionarily implausible.
The authors offer their own theory of self-control. One-sentence summary (my formulation, not from the paper): "Our brains don't want to only work, because by doing some play on the side, we may come to discover things that will allow us to do even more valuable work."
- Ultimately, self-control limitations are proposed to be an exploration-exploitation tradeoff, "regulating the extent to which the control system favors task engagement (exploitation) versus task disengagement and sampling of other opportunities (exploration)".
- Research suggests that cognitive effort is inherently aversive, and that after humans have worked on some task for a while, "ever more resources are needed to counteract the aversiveness of work, or else people will gravitate toward inherently rewarding leisure instead". According to the model proposed by the authors, this allows the organism to both focus on activities that will provide it with rewards (exploitation), but also to disengage from them and seek activities which may be even more rewarding (exploration). Feelings such as boredom function to stop the organism from getting too fixated on individual tasks, and allow us to spend some time on tasks which might turn out to be even more valuable.
The explanation of the actual proposed psychological mechanism is good enough that it deserves to be quoted in full:
Based on the tradeoffs identified above, we propose that initial acts of control lead to shifts in motivation away from “have-to” or “ought-to” goals and toward “want-to” goals (see Figure 2). “Have-to” tasks are carried out through a sense of duty or contractual obligation, while “want-to” tasks are carried out because they are personally enjoyable and meaningful [41]; as such, “want-to” tasks feel easy to perform and to maintain in focal attention [41]. The distinction between “have-to” and “want-to,” however, is not always clear cut, with some “want-to” goals (e.g., wanting to lose weight) being more introjected and feeling more like “have-to” goals because they are adopted out of a sense of duty, societal conformity, or guilt instead of anticipated pleasure [53].
According to decades of research on self-determination theory [54], the quality of motivation that people apply to a situation ranges from extrinsic motivation, whereby behavior is performed because of external demand or reward, to intrinsic motivation, whereby behavior is performed because it is inherently enjoyable and rewarding. Thus, when we suggest that depletion leads to a shift from “have-to” to “want-to” goals, we are suggesting that prior acts of cognitive effort lead people to prefer activities that they deem enjoyable or gratifying over activities that they feel they ought to do because it corresponds to some external pressure or introjected goal. For example, after initial cognitive exertion, restrained eaters prefer to indulge their sweet tooth rather than adhere to their strict views of what is appropriate to eat [55]. Crucially, this shift from “have-to” to “want-to” can be offset when people become (internally or externally) motivated to perform a “have-to” task [49]. Thus, it is not that people cannot control themselves on some externally mandated task (e.g., name colors, do not read words); it is that they do not feel like controlling themselves, preferring to indulge instead in more inherently enjoyable and easier pursuits (e.g., read words). Like fatigue, the effect is driven by reluctance and not incapability [41] (see Box 2).
Research is consistent with this motivational viewpoint. Although working hard at Time 1 tends to lead to less control on “have-to” tasks at Time 2, this effect is attenuated when participants are motivated to perform the Time 2 task [32], personally invested in the Time 2 task [56], or when they enjoy the Time 1 task [57]. Similarly, although performance tends to falter after continuously performing a task for a long period, it returns to baseline when participants are rewarded for their efforts [58]; and remains stable for participants who have some control over and are thus engaged with the task [59]. Motivation, in short, moderates depletion [60]. We suggest that changes in task motivation also mediate depletion [61].
Depletion, however, is not simply less motivation overall. Rather, it is produced by lower motivation to engage in “have-to” tasks, yet higher motivation to engage in “want-to” tasks. Depletion stokes desire [62]. Thus, working hard at Time 1 increases approach motivation, as indexed by self-reported states, impulsive responding, and sensitivity to inherently-rewarding, appetitive stimuli [63]. This shift in motivational priorities from “have-to” to “want-to” means that depletion can increase the reward value of inherently-rewarding stimuli. For example, when depleted dieters see food cues, they show more activity in the orbitofrontal cortex, a brain area associated with coding reward value, compared to non-depleted dieters [64].
See also: Kurzban et al. on opportunity cost models of mental fatigue and resource-based models of willpower; Deregulating Distraction, Moving Towards the Goal, and Level Hopping.
Productivity tool: race!
Inspired by recent batch of productivity posts, I wrote my short story down. To a reasonable extent, this story is real.
| 2:00 | As usual, I put my breakfast into the microwave and set it to 2:00. |
| 1:58 | It took two seconds for a train of though to start: |
| "Waste, again. Again!" | |
| "What am I supposed to do this entire time? Stare at the clock?" | |
| "Useless and boring, but can't start any serious work or thought in time-frame this short" | |
| "Why don't I switch to Soylent, hire a maid or just eat the damn thing cold?" | |
| 1:53 | It took me five seconds to notice and derail that train on the basis of "Been there, done that, nothing significant changed since" |
| But this time something went differently. | |
| 1:52 | "What do I get to lose if I just try to do something, anything?" |
| "Food's gonna get cold. Also you can't multitask that much, so no thinking either." | |
| "So what. If I don't make it in time I'll just reheat that food and at least one other thing will be done already. And didn't I just say I can't think of anything serious that fast anyway?" | |
| It took me 16 seconds to scan today's TODO for anything I had any chance to accomplish within ninety-something seconds. | |
| "Work. Takes too long to start, hardware limitation" | |
| "Read... what? Would take a while to find something new and short enough. I could prepare next time, not now" | |
| ... | |
| "A shower. Usually takes me at least 5 minutes... but why?" | |
| 1:36 | I rushed to the bathroom. |
| "Skip everything that can be skipped, but nothing important" | |
| Leaving clothes where I stand. | |
| "No time to fiddle with the faucet. Just turn it on roughly around the point where it's supposed to be" | |
| A bit too cold. So what. | |
| "No time to select soap/shampoo/gel. Just apply top-to-bottom whatever comes up first." | |
| Blargh, hair conditioner. Bad idea. | |
| "Note to self: sort this stuff" | |
| Top-to-bottom, fast moves, keep accurate. | |
| "Come on, come on, come on! I can't believe the microwave didn't finish yet!" | |
| Head too. Don't skip anything important, remember? One last jet of water and I grab the towel. | |
| "No need to dry the hair so much, you're not going out anytime soon" | |
| I throw the towel back on the hanger and run to the kitchen. Did it beep already and I didn't hear it? Did it broke? | |
| 0:42 | "What?!" |
| 0:41 | - What?! |
| 0:20 | For 21 seconds of my Saved Time I allowed myself to stare at the clock to make sure time flows at the same rate it used to. |
| "It took me 54 seconds to take an okay shower. A minute and 40 seconds ago I didn't believe it was possible." | |
| "Not productive." | |
| "What else can I do?" | |
| "Now we're talking!" | |
| 0:001 | I spent the last 20 seconds to build a mental model of what just happened and store it for later experiments... |
| BEEP! | |
| ... and then it hit me, again. I made breakfast, took a shower and thought of something new and possibly significant, all within the time-frame so short I didn't believe possible. I could multitask that much. And I will do better with training. |
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)