Double Corrigibility: better Corrigibility
A putative new idea for AI control; index here.
Corrigibility was an attempt to allow an AI to safely change its values, without seeking to provoke or avoid the change. The idea is that, when the AI's utility changes from u to v at time t, it maximises a meta-utility U such that
- U≤t = u
- U>t = v + E(u|u→u) - E(v|u→v).
Here u→u designates the event that u remains unchanged, while u→v designates the change.
As has been shown, adding those expectation terms means the AI will not seek to provoke or resist such a utility change - it is indifferent to the change. Those expectation terms can be designated compensatory rewards.
What it is not indifferent to, though, is changes to E. If the AI knows that change u→v is coming, then it has incentives to change E to E' with a high E'(u|u→u) and a low E'(v|u→v). Its ability to rewire is limited by the fact that E' will serve as its future expectation estimator, so it won't mangle it completely, but it certainly can twist it a bit.
Notice that the AI is now no longer indifferent to the change of utility. It can benefit from rewiring E this way only if a change is coming up. Notice that E' probably can't rewire itself any more, so the rewired AI will be indifferent, but not before rewiring.
Double corrigibility
To solve this problem, we need to turn to corrigibility a second time. Let ρ be the AI's current expectation/probability estimator. Let ρ' be one that it can conceivably change into. Let C be a random variable designating the sum total of all compensatory rewards. Now, to guarantee that C is finite would require bounds on all the utility functions and some discount rate. In practice, if often will be finite. That's because E(C)=E(u0|u0→u0)-E(un |un-1→un, H), where u0 is the AI's first utility, un its last, and H intermediate history. So some milder restrictions on the ui should suffice to bound C.
Then, whenever the AI takes an action that changes ρ to ρ', it gets the special meta-compensatory reward
- Eρ(C|ρ→ρ)-Eρ(C|ρ→ρ').
This prevents it from deriving any advantage to C via changes to ρ. Or, more precisely, it prevents it from deriving any expected advantage, that it can figure out using its current ρ.
Note that it doesn't prevent improvements to ρ in general, it just removes the incentives to change it to affect compensatory rewards.
Since any action a might have an indirect effect on ρ, the "utility maximising" for a given u must be changed to:
- Eρ(u|a) + Σρ' Pρ(ρ→ρ'|a) (Eρ(C|ρ→ρ)-Eρ(C|ρ→ρ')),
where Pρ is the probability estimate corresponding to ρ; the probability term can be rewritten as Eρ(Iρ→ρ') for Iρ→ρ' the indicator function for ρ→ρ'. In fact the whole line above can be rewritten as
- Eρ(u|a) + Eρ(Eρ(C|ρ→ρ)-Eρ(C|ρ→ρ') | a).
For this to work, Eρ needs to be able to say sensible things about itself, and also about Eρ', which is used to estimate C if ρ→ρ'.
If we compare this with various ways of factoring out variables, we can see that it's a case where we have a clear default, ρ, and are estimating deviations from that.
The Ultimate Testing Grounds
A putative new idea for AI control; index here.
EDIT: the mathematics of this approach can be found here.
In a previous post, I described a procedure for querying an AI about whether a certain outcome was desirable for it. When I did so, I didn't realise the power of the idea.
In brief, that idea allows us to query an AI with utility v as to whether an outcome X or an outcome ¬X was the one it most desired in a particular situation. By judicious use of 'false beliefs', the AI would react as if it had miraculous powers to determine X vs ¬X in ways that no-one could ever know about.
Many (including Toby Ord) has suggested variants on the 'honey pot' idea: leave something available to the AI that looks like an oversight, but is actually a trap, to see if the AI goes beyond the parameters of what we'd expect. Maybe a wire to the internet that the AI is not supposed to use, but seems to be unobserved.
The problem with these ideas is that the honey pots had to be physically constructed, physically accessible to the AI, and the AI had to be wrong about whether it was a trap or not.
But using the method above, we can go much further. We can test the AI in any scenario which we can imagine, as long as we can precisely physically describe it. We can check what would happen if the AI was given the ability to launch the world's nuclear arsenals, or divert an asteroid towards Earth. We can check what it would design if it was given unrestricted access to a DNA synthesiser. We could see what it would do if it could rewrite the text of a speech an important world leader was due to give. We could see whether it would sabotage the creation of other AIs - and if so, which ones. We could see what it did to a probe humanity planned to launch to the stars, what numbers it would call if it had access to a phone, what it would purchase if it had capital.
And, in all cases, the AI would be behaving as if its actions were undetectable and certain to succeed, while we ensured that they were (almost certainly) detectable and certain to fail. There is no "almost" in that certain to fail condition, because the AI is acting as if it had miraculous powers that it doesn't have.
Now, we can't use testing as our sole method of AI control (for reasons similar to here), but it does allow us some pretty effective ways of catching bad AIs.
Oracle AI: Human beliefs vs human values
It seems that if we can ever define the difference between human beliefs and values, we could program a safe Oracle by requiring it to maximise the accuracy of human beliefs on a question, while keeping human values fixed (or very little changing). Plus a whole load of other constraints, as usual, but that might work for a boxed Oracle answering a single question.
This is a reason to suspect it will not be easy to distinguish human beliefs and values ^_^
Presidents, asteroids, natural categories, and reduced impact
A putative new idea for AI control; index here.
EDIT: I feel this post is unclear, and will need to be redone again soon.
This post attempts to use the ideas developed about natural categories in order to get high impact from reduced impact AIs.
Extending niceness/reduced impact
I recently presented the problem of extending AI "niceness" given some fact X, to niceness given ¬X, choosing X to be something pretty significant but not overwhelmingly so - the death of a president. By assumption we had a successfully programmed niceness, but no good definition (this was meant to be "reduced impact" in a slight disguise).
This problem turned out to be much harder than expected. It seems that the only way to do so is to require the AI to define values dependent on a set of various (boolean) random variables Zj that did not include X/¬X. Then as long as the random variables represented natural categories, given X, the niceness should extend.
What did we mean by natural categories? Informally, it means that X should not appear in the definitions of these random variables. For instance, nuclear war is a natural category; "nuclear war XOR X" is not. Actually defining this was quite subtle; diverting through the grue and bleen problem, it seems that we had to define how we update X and the Zj given the evidence we expected to find. This was put in equation as picking Zj's that minimize
- Variance{log[ P(X∧Z|E)*P(¬X∧¬Z|E) / P(X∧¬Z|E)*P(¬X∧Z|E) ]}
where E is the random variable denoting the evidence we expected to find. Note that if we interchange X and ¬X, the ratio inverts, the log changes sign - but this makes no difference to the variance. So we can equally well talk about extending niceness given X to ¬X, or niceness given ¬X to X.
Perfect and imperfect extensions
The above definition would work for an "perfectly nice AI". That could be an AI that would be nice, given any combination of estimates of X and Zj. In practice, because we can't consider every edge case, we would only have an "expectedly nice AI". That means that the AI can fail to be nice in certain unusual and unlikely edge cases, in certain strange set of values of Zj that almost never come up...
...or at least, that almost never come up, given X. Since the "expected niceness" was calibrated given X, the such an expectedly nice AI may fail to be nice if ¬X results in a substantial change in the probability of the Zj (see the second failure mode in this post; some of the Zj may be so tightly coupled to the value of X that an expected niceness AI considers them fixed, and this results in problems if ¬X happens and their values change).
One way of fixing this is to require that the "swing" of the Zj be small upon changing X to ¬X or vice versa. Something like, for all values of {aj}, the ratio P({Zj=aj} | X) / P({Zj=aj} | ¬X) is contained between 100 and 1/100. This means that a reasonably good "expected niceness" calibrated on the Zj will transfer from X to ¬X (though the error may grow). This approach has some other advantages, as we'll see in the next section.
Of course, problems arise if the Zj are defined by smashing together events relatively independent of X with some that are very strongly dependent on X, into the same variable. Thus we'll further require that Zj cannot be decomposed into natural subvariables whose values swing more under changes from X to ¬X.
High impact from reduced impact
With this, we are finally in a position to define a reduced impact AI design that can have a large impact. Imagine that an asteroid is coming to hit the Earth. Two AIs are charged with targeting a laser at it. The laser isn't perfectly effective, and the asteroid isn't perfectly certain to hit the Earth anyway: let's say aiming the laser correctly will reduce the odds of impact by a factor of 10 or so (well within the probability swing restrictions of the previous section).
In the original formulation of this, the AIs were tasked with outputting the x and y coordinates to aim the laser. Here, instead, they'll be tasked with constructing the motivation of a robotic subagent that will move around for a while, see things and interact, before physically aiming the laser in the right direction. This task is harder.
Then the top level AI charged with aiming the y coordinates is given the following task:
- It is motivated to cause the correct y-coordinate to be outputted.
- If the x-coordinate AI is not turned on (call this event ¬X), it is strongly motivated to have reduced impact.
- It must phrase the robotic subagent's utility function in terms of the Zj that obey all the properties above.
- The evidence E considered for the equation governing the naturalness of Zj is considered (for example) to be a random sampling of space at the time both AIs are turned on. Most importantly, this is defined to be a theoretical process that the AI cannot affect physically.
So what will the AI do? It will produce a robot that will aim to aim the y-coordinate of the laser correctly, given ¬X, and not expand dangerously. However, we've setup things so that X is virtually certain, so the robot will see another, x-aiming robot, wander around. The chances of this happening, given ¬X, are minuscule, so the swing of the probability from ¬X to X, is huge. So, by the restrictions we've put on the Zj, the robot will not be motivated to change its behaviour just because its sees the other robot.
And if the laser is fired, the impact on humanity will, conversely, not be that large, so well within the parameters of the Zj. Thus the "general reduced impact" portion of the robot's motivation will be preserved.
And, of course, the x-robot will have the mirror image motivations, so they will both aim correctly, it seems. This should evade the problem described here.
Failure modes
There are two ways the approach could fail: one bad way, and two graceful ways. The bad way is if the "no decomposition" requirements (one of them is presented above, the other is in this post) fail. In this case, the niceness or reduced impact are not guaranteed to be extended to ¬X, and the AI's behaviour can become undefined. Note that this is is not bad optimisation pressure, just an absence of optimisation pressure - but that could still be potentially dangerous.
The first graceful failure happens if the AI is not able to create a reduced impact sub-agent given the restrictions on the Zj above. In that case, the AI does nothing. The second graceful failure happens if the AI evades our attempts to increase its impact, given ¬X. In that case, it simply becomes a reduced impact AI that does little. Not ideal, but not deadly.
Overall status: I'm not sure the idea is sound, at least not yet. Critiques welcome.
The president didn't die: failures at extending AI behaviour
A putative new idea for AI control; index here.
In a previous post, I considered the issue of an AI that behaved "nicely" given some set of circumstances, and whether we could extend that behaviour to the general situation, without knowing what "nice" really meant.
The original inspiration for this idea came from the idea of extending the nice behaviour of "reduced impact AI" to situations where they didn't necessarily have a reduced impact. But it turned out to be connected with "spirit of the law" ideas, and to be of potentially general interest.
Essentially, the problem is this: if we have an AI that will behave "nicely" (since this could be a reduced impact AI, I don't use the term "friendly", which denotes a more proactive agent) given X, how can we extend its "niceness" to ¬X? Obviously if we can specify what "niceness" is, we could just require the AI to do so given ¬X. Therefore let us assume that we don't have a good definition of "niceness", we just know that the AI has that given X.
To make the problem clearer, I chose an X that would be undeniably public and have a large (but not overwhelming) impact: the death of the US president on a 1st of April. The public nature of this event prevents using approaches like thermodynamic miracles to define counterfactuals.
I'll be presenting a solution in a subsequent post. In the meantime, to help better understand the issue, here's a list of failed solutions:
First Failure: maybe there's no problem
Initially, it wasn't clear there was a problem. Could we just expect niceness to extend naturally? But consider the following situation: assume the vice president is a warmonger, who will start a nuclear war if ever they get into power (but is otherwise harmless).
Now assume the nice AI has the conditional action criteria: "if the vice president ever becomes president, launch a coup". This is safe, it can be extended to the ¬X situation in the way we want.
However, conditioning on X, that criteria is equivalent with "launch a coup on the 2nd of April". And if the AI has that criteria, then extending it to ¬X is highly non-safe. This illustrates that there is a real problem here - the coup example is just one of the myriad of potential issues that could arise, and we can't predict them all.
Second failure: don't condition on X
Maybe the trick could be preventing the AI from conditioning on X (for anything)? If the AI itself can't tell the difference between X and ¬X, wouldn't its nice behaviour extend?
But this doesn't help. The AI could merely condition on things tightly coupled with X. Like news reports of the president's demise, or a conjunction of other events almost impossible under ¬X.
Third failure: disbelieve the truth
In this design, the AI simply assumes that X happens, whatever the evidence. Therefore it would stay nice, whatever happened.
But this results in an AI who's beliefs are strongly decoupled with reality. The AI must believe that there is a vast conspiracy concealing the president's death; the more it investigates this conspiracy, and fails to find it, the more vast and powerful it becomes. What if you show it undeniable, impossible-to-fake-by-the-laws-of-physics evidence of the president's demise? Then it starts to believe that the laws of physics are all wrong.
Though the AI is "nice", it's niceness is appropriate for a world of invisible conspiracies and alternate physics - not for the real world. It's clear how such an AI could start doing crazy and dangerous things as its view of reality starts to diverge.
Fourth failure: hack the models
This example is due to Daniel Dewey. He imagined that the AI would maintain a full collection of models and update them normally. But before making a prediction, each model would get hacked to artificially set ¬X to X.
To see how this could fail, imagine all evidence about the world is stored in a giant vector of data. These vectors are calibrated correctly, then ¬X is flipped to X. The problem is that, while that is small change to the vector, it could be a huge change to the world that it encodes.
For instance, let's go back to the warmongering vice president, and assume that nuclear war is inevitable, given X. Then the AI will walk around in a peaceful world, update its models - and act as if it was in a nuclear wasteland, because those are the only possibilities, given X. Essentially, the AI will move through our universe, harvesting information that would inform its actions in a parallel universe - and acting as if it existed there instead of here.
For instance, it could wander into a flower show where someone is talking about difficulties growing roses in southern Quebec. It adds this data to its vector, noting that the soil there must be a bit unsuitable to plant growth. It therefore concludes that it must write to the (non-existent) Third God-Emperor of America and advise it to give up on the Quebec Anglican Protectorate, which must be misreporting their agriculture output, given this data.
It's interesting to contrast this AI with the previous one. Suppose that the nuclear war further implies that Paris must be a smoking crater. And now both AIs must walk around a clearly bustling and intact Paris. The disbelieving AI must conclude that this is an elaborate ruse - someone has hidden the crater from its senses, put up some fake building, etc... The model-hacking AI, meanwhile, acts as if it's in a smouldering crater, with the genuine Paris giving it information as to what it should do: it sees an intact army barracks, and starts digging under the "rubble" to see if anything "remains" of that barracks.
It would be interesting to get Robin Hanson to try and reconcile these AIs' beliefs ^_^
Fifth failure: Bayes nets and decisions
It seems that a Bayes net would be our salvation. We could have dependent nodes like "warmongering president", "nuclear war", or "flower show". Then we could require that the AI makes its decision dependent only on the states of these dependent nodes. And never on the original X/¬X node.
This seems safe - after all, the AI is nice given X. And if we require the AI's decisions be dependent only on subordinate nodes, then it must be nice dependent on the subordinate nodes. Therefore X/¬X is irrelevant, and the AI is always nice.
Except... Consider what a "decision" is. A decision could be something simple, or it could be "construct a sub AI that will establish X versus ¬X, and do 'blah' if X, and 'shmer' if ¬X". That's a perfectly acceptable decision, and could be made conditional on any (or all) of the subordinate nodes. And if 'blah' is nice while 'shmer' isn't, we have the same problem.
Six failure: Bayes nets and unnatural categories
OK, if decisions are too general, how about values for worlds? We take a lot of nodes, subordinate to X/¬X, and require that the AI define its utility or value function purely in terms of the states of these subordinate nodes. Again, this seems safe. The AI's value function is safe given X, by assumption, and is defined in terms of subordinate nodes that "screen off" X/¬X.
And that AI is indeed safe... if the subordinate nodes are sensible. But they're only sensible because I've defined them using terms such as "nuclear war". But what if a node is "nuclear war if X and peace in our time if ¬X"? That's a perfectly fine definition. But such nodes mean that the value function given ¬X need not be safe in any way.
This is somewhat connected with the Grue and Bleen issue, and addressing that is how I'll be hoping to solve the general problem.
A heuristic for predicting minor depression in others and myself, and related things
Summary
Look at how you or other people walk. Then going a bit meta.
Disclaimer
This post is probably not high quality enough to deserve to be top level purely on its qualitative merits. However I think the sheer importance of the issue for human well-being makes it so. Please consider importance / potential utility of the whole discussion and not just the post, and not only quality when voting.
The problem
Minor depression is not really an accurately defined, easily recognizable thing. First of all there are people with hard, boring or otherwise unsatisfactory life who are unhappy about it, how can one tell this normal, justifiable unhappiness from minor depression? Especially that therapists often say having good reasons to be depressed still counts as one, so at that point you don't really know whether to focus on fixing your mind or fixing your life. Then a lot of things that don't even register as direct sadness or unhappiness are considered part of or related to depression, such as lethargy/low energy/low motivation, irritability/aggressiveness, eating disorders, and so on. How could you tell if you are just a bad tempered lazy glutton or depressed? And finally, don't cultural expectations play a role, such as how Americans tend to be optimistic and expect a happy, pursue-shiny-things life, while e.g. Finns not really?
Of course there are clinical diagnosis methods, but people will ask a therapist for a diagnosis only when they already suspect something is wrong. They must think "Jolly gee, I really should feel better than I do now, it is not really normal to feel so, better ask a shrink!" But often it is not so. Often it is like "My mind is normal. It is life that sucks." So by what heuristic could you tell whether there is something wrong with yourself or other people?
Basis
This is heuristic I built mainly on observational correlations plus some psychological parallels. Has nothing to do with accepted medical science or experts opinion. My goal isn't as much as to convince you this is a good heuristic, but to open an open-ended discussion, asking you if it seems to be a good one, and also trigger a discussion where you propose other methods.
How I think non-depressed men walk
"Having a spring in the step." This old saying is IMHO surprisingly apt. I like this drawing - NOT because I think depression is based on T levels, but I think this cartoonishly over-exaggerated body language is fairly expressive of the idea. For all I know this seems more of a dopamine thing, eagerness, looking forward not testosterone.
It seems to me non-depressed men push themselves forward with their rear leg, heels raised, calves engaged, almost like jumping forward. This is the "spring" in the step. The actual spring is the rear leg calf muscle. Often this is accompanied by a movement of arms while walking. A slight rocking or swaying of the NOT hips but chest / shoulders may also be part of it, but I think it is less relevant. The general message / feel is "I'm so eager to tackle challenges! That's fun!"
Psychologically, I think all this eagerly-looking-forward-to-challenges spring in the step means a mindset where you are not afraid of the future, but not because you think it will be smooth sailing, but because you are confident in yourself to be able to tackle challenges and even enjoy doing so. This seems like a healthy mindset.
How I think depressed men walk
Dragging feet. Dragging a slouched, sack-like, non-tension upper body. Leaning forward. Head down. Shoulders pulled up, hunched up to protect the neck, engaging the upper trapezius muscles. A chronic pain in the upper traps (from their constant engagement), when having your upper traps massaged feels SO good, may be a predictive sign of it. Comes accross as embarrassed, scolded-boy body language.
Another way of walking I noticed on myself and probably counts as depressed is the duck-walk. The movement is started by the upper body slightly "falling" forward, the center of gravity starting to go forward, then "catching" the fall by sticking forward a leg, and the foot hits the ground flat, not with the front part of the foot but the whole foot, like a duck.Basically your heels are almost never raised and calves are not engaged much. This would be impossible / difficult if you had a springy step i.e. pushing forward with the rear leg, you would have to raise a heel for that, but possible if you fall forward and catch, fall forward and catch. Often not raising feet high (related verbs: to scuff, to shuffle).
How I think non-depressed women walk
Generally speaking I use the same heuristic for women who seem like they are "one of the boys" type (i.e. those who wear comfortable sports shoes, focus on career goals not seducing men etc.)
But this clearly does not work with all women, for example, that springy step thing is pretty much impossible in stillettoes for example. Rather I think non-depressed women often tend to sway the hips. It is an unconscious enjoyment of their own femininity and sexiness, not a show put on for the sake of men.
I don't really have clear ideas of how depressed women walk, all I can offer is not like the above. When both the eager spring and the sexy hip sway are missing, it may be a sign.
For people of non-binary gender and other special cases: again all I can offer is that if you are non-depressed, you probably have either the eager spring or the hip sway.
Am I putting the bar too high? False positives?
Is it possible that it is a too "strict" heuristic? While I think these heuristics are generally true for peopel who are in an excellent emotional shape, feel confident, love them some challenges, feel sexy etc. this may be possible that this emotional shape is higher than the waterline for depression, it is possible that some people are not depressed and yet below this like, have less confidence, less eager, happy expectation, less self-conscious sexiness or something like that.
Essentially I think my method does not really have many false negatives, but could possibly yield false positives.
Have you seen many cases that would count as false positives?
Meta: why is minor depression so difficult to tell / diagnose accurately?
There are clinically made checklists, but they sound like a collection of unrelated things. Could really the same thing cause you to sleep too much or not enough, eat too much or not enough? Doesn't it sound like Selling Nonapples? Putting everybody who does not have just the perfect sleeping or eating habits into one common category called depression?
For example in the West most people see depression as "the blues" i.e. some form of sadness. But often people don't report feeling sad, but report being very lethargic and not having energy and motivation and that, too, is often seen as depression. Some people are just negative and bitter and not enjoy anything, and yet they don't see it as their own sadness but more like "life is hard". I guess in both cases it is more line internalizing sadness, considering being sad a normal thing, and not really expecting to feel good. (This may be the case of mine and surprisingly many people in my family / relatives. A life-is-tough, survivalist ethos, not fun ethos.)
Then you go outside the West and you find even more different things. I cannot find my source anymore, but I remember a story that in a culture like Mali women generally don't express their emotions, are not conscious of them, and there depression is diagnosed through physical symptoms like chest pain.
Is minor depression an apple or a nonapple? A thing, one thing, or a generic "anything but normal happiness" bin?
I think my walking heuristic does predict something, and that something is probably close enough to the idea of minor depression, but whether it is a too broad tool with many false positives, or whether it predicts only a narrowly specific case of depressions, I cannot really tell and basically I asking you here whether it matches your experiences or not.
What are your heuristics? What would be a low false positives easy heuristic?
P.S. Researchers found a reverse link saying walking in a happy or depressed style _causes_ mood changes. It seems the article assumes everybody knows what walking in a happy or depressed style means. In fact this is what I am trying to find out here!
P.P.S. I know I suck at writing, so let me try to reformulate the main point a different way: we know people cannot be happy all the time and often have such a unsatisfying life that they are rarely happy. How can we find the thin line between being normal common life dissatisfaction based unhappiness (hard or boring life) and minor depression? Can walking style be used as a good predictor of specifically this thin line?
[LINK] AI risk summary published in "The Conversation"
A slightly edited version of "AI risk - executive summary" has been published in "The Conversation", titled "Your essential guide to the rise of the intelligent machines":
The risks posed to human beings by artificial intelligence in no way resemble the popular image of the Terminator. That fictional mechanical monster is distinguished by many features – strength, armour, implacability, indestructability – but Arnie’s character lacks the one characteristic that we in the real world actually need to worry about – extreme intelligence.
Thanks again for those who helped forge the original article. You can use this link, or the Less Wrong one, depending on the audience.
How to become a PC?
"Cryonics has a 95% chance of failure, by my estimation; it would be downright /embarrassing/ to die on the day before real immortality is discovered. Thus, I want to improve my general health and longevity."
That thought has gotten me through three weeks of gradually increasing exercise and diet improvement (I'm eating an apple right now) - but my enthusiasm is starting to flag. So I'm looking for new thoughts that will help me keep going, and keep improving. A few possibilities that I've thought of:
Pride: "If I'm so smart, then I should be able to do /better/ than those other people who don't even know about Bayesian updates, let alone the existence of akrasia..."
Sloth: "If I stop now, it's going to be /so much/ harder and more painful to start up again, instead of just keeping on keeping on..."
Desire: "I already like hiking and camping - if I keep this up, I'll be able to carry enough weight to finally take that long trip I've occasionally considered..."
Curiosity: "I'm as geeky a nerd as you can find. I wonder how far I can hack my own body?"
Pride again: "I already keep a hiker's first-aid kit in my pocket, and make other preparations for events that happen rarely. How stupid do I have to be not to put at least that much effort into making my everyday life easier?"
Does anyone have any experience in such self-motivation? Does this set of mental tricks seem like a sufficiently viable approach? Are there any other approaches that seem worth a shot?
[link] Why Self-Control Seems (but may not be) Limited
Another attack on the resource-based model of willpower, Michael Inzlicht, Brandon J. Schmeichel and C. Neil Macrae have a paper called "Why Self-Control Seems (but may not be) Limited" in press in Trends in Cognitive Sciences. Ungated version here.
Some of the most interesting points:
- Over 100 studies appear to be consistent with self-control being a limited resource, but generally these studies do not observe resource depletion directly, but infer it from whether or not people's performance declines in a second self-control task.
- The only attempts to directly measure the loss or gain of a resource have been studies measuring blood glucose, but these studies have serious limitations, the most important being an inability to replicate evidence of mental effort actually affecting the level of glucose in the blood.
- Self-control also seems to replenish by things such as "watching a favorite television program, affirming some core value, or even praying", which would seem to conflict with the hypothesis inherent resource limitations. The resource-based model also seems evolutionarily implausible.
The authors offer their own theory of self-control. One-sentence summary (my formulation, not from the paper): "Our brains don't want to only work, because by doing some play on the side, we may come to discover things that will allow us to do even more valuable work."
- Ultimately, self-control limitations are proposed to be an exploration-exploitation tradeoff, "regulating the extent to which the control system favors task engagement (exploitation) versus task disengagement and sampling of other opportunities (exploration)".
- Research suggests that cognitive effort is inherently aversive, and that after humans have worked on some task for a while, "ever more resources are needed to counteract the aversiveness of work, or else people will gravitate toward inherently rewarding leisure instead". According to the model proposed by the authors, this allows the organism to both focus on activities that will provide it with rewards (exploitation), but also to disengage from them and seek activities which may be even more rewarding (exploration). Feelings such as boredom function to stop the organism from getting too fixated on individual tasks, and allow us to spend some time on tasks which might turn out to be even more valuable.
The explanation of the actual proposed psychological mechanism is good enough that it deserves to be quoted in full:
Based on the tradeoffs identified above, we propose that initial acts of control lead to shifts in motivation away from “have-to” or “ought-to” goals and toward “want-to” goals (see Figure 2). “Have-to” tasks are carried out through a sense of duty or contractual obligation, while “want-to” tasks are carried out because they are personally enjoyable and meaningful [41]; as such, “want-to” tasks feel easy to perform and to maintain in focal attention [41]. The distinction between “have-to” and “want-to,” however, is not always clear cut, with some “want-to” goals (e.g., wanting to lose weight) being more introjected and feeling more like “have-to” goals because they are adopted out of a sense of duty, societal conformity, or guilt instead of anticipated pleasure [53].
According to decades of research on self-determination theory [54], the quality of motivation that people apply to a situation ranges from extrinsic motivation, whereby behavior is performed because of external demand or reward, to intrinsic motivation, whereby behavior is performed because it is inherently enjoyable and rewarding. Thus, when we suggest that depletion leads to a shift from “have-to” to “want-to” goals, we are suggesting that prior acts of cognitive effort lead people to prefer activities that they deem enjoyable or gratifying over activities that they feel they ought to do because it corresponds to some external pressure or introjected goal. For example, after initial cognitive exertion, restrained eaters prefer to indulge their sweet tooth rather than adhere to their strict views of what is appropriate to eat [55]. Crucially, this shift from “have-to” to “want-to” can be offset when people become (internally or externally) motivated to perform a “have-to” task [49]. Thus, it is not that people cannot control themselves on some externally mandated task (e.g., name colors, do not read words); it is that they do not feel like controlling themselves, preferring to indulge instead in more inherently enjoyable and easier pursuits (e.g., read words). Like fatigue, the effect is driven by reluctance and not incapability [41] (see Box 2).
Research is consistent with this motivational viewpoint. Although working hard at Time 1 tends to lead to less control on “have-to” tasks at Time 2, this effect is attenuated when participants are motivated to perform the Time 2 task [32], personally invested in the Time 2 task [56], or when they enjoy the Time 1 task [57]. Similarly, although performance tends to falter after continuously performing a task for a long period, it returns to baseline when participants are rewarded for their efforts [58]; and remains stable for participants who have some control over and are thus engaged with the task [59]. Motivation, in short, moderates depletion [60]. We suggest that changes in task motivation also mediate depletion [61].
Depletion, however, is not simply less motivation overall. Rather, it is produced by lower motivation to engage in “have-to” tasks, yet higher motivation to engage in “want-to” tasks. Depletion stokes desire [62]. Thus, working hard at Time 1 increases approach motivation, as indexed by self-reported states, impulsive responding, and sensitivity to inherently-rewarding, appetitive stimuli [63]. This shift in motivational priorities from “have-to” to “want-to” means that depletion can increase the reward value of inherently-rewarding stimuli. For example, when depleted dieters see food cues, they show more activity in the orbitofrontal cortex, a brain area associated with coding reward value, compared to non-depleted dieters [64].
See also: Kurzban et al. on opportunity cost models of mental fatigue and resource-based models of willpower; Deregulating Distraction, Moving Towards the Goal, and Level Hopping.
rational dating - can we escape the rat race be setting smarter goals?
According to evolutionary psychologists as well as the cultural main stream, men are going for sensually attractive women, while women are going for men who can provide them safety (plus other things we don't need to get into here). Whereas we might think that modern, cultured people are somewhat above those basic instincts and actually look for mates which fit their individual character, values, preferences, and interests, the world of online dating seems to throw us back a couple of centuries of progress. Many dating sites have been designed to increase usage and interaction with the site (so that more ads can be shown) and do this by relying a lot on pictures of their users to incite other users' interest. On the other hand, our individual character, values, preferences, and interests (or, ICVPI, for short) are to a large extent represented in text form as either essays or predefined answers to predefined questions. Now, while the text contains a lot of the relevant information, it is severely disadvantaged by this site design. Not only do the pictures have a much higher salience, their presence also directly speaks to our impulses and emotions (or system one, for those who have read Kahneman's "Thinking fast and slow") which hinders the already somewhat harder processing of text by system two.
The result of this click-optimizing is that user's choices whom to contact are much more driven by pictures (and therefore, looks) than by other criteria. This leads to the destructive effect that visually attractive women are swamped in messages among which it is hard and tedious to chose ones to reply to, while the less attractive ones do not have enough choice to find men that match their individual preferences. In other words, the mutual matching between people who might fit each other is more sabotaged than helped by those picture-driven dating sites.
Now, as a basic rationalist I will of course question my motivations and ask myself if the run for beauty is really for my own best or if it is a learned behavior stemming from our (in this case arguably superficial) culture. If we leave aside the antiquated Freudian principle of "drive" and read psychologists like Rogers and Fromm, we might realize that hunting for beautiful mates is just one way among others for a man to boost his self-esteem, not necessarily a motivator in and of itself. Indeed, modern studies have shown that lasting self-esteem and deep happiness can be created best by exercising our strengths in a meaningful way. (For details and the studies see Seligman's "Authentic Happiness".) Following this argument, if I get my self-esteem and social recognition from (for example) writing awesome blog articles and helping lots of people at work, then this positive emotion will "buffer" (in Seligman's terms) against any judgmental looks and statements that I will be facing when going out with my awesome, but ugly, new girl friend. (Can you hear them saying "what? are you dating her?" in a raising voice that bounces back from the ceiling?)
To sum up, wouldn't it be the most rational thing to simply switch off pictures on the dating site (Firefox currently has AddOns for this, Chrome can do it natively, just type "block" in the search box on the settings page), thus keeping my impulsive system one at calm, write more thoughtful messages, and get more and better responses, both because I am writing to less message-swamped women and because my own messages are better. Then dating will be less like a meaningless competition, the resulting relationships are more profound, and I will find out that my friends are not actually prejudiced against ugly people and will not look down at me for making that choice. Just let go of my own prejudices and false beliefs and I win. Right?
Motivation and Merciless Commitment Contracts
Commitment contracts
I have been using commitment contracts (eg. Via Stickk and Beeminder) for a while now with quite a high degree of success. The basic idea is that you precommit to reward or punish yourself for anything that you know you should do. Example: You want to lose weight. You define a certain amount of weight that you want to lose over a certain time period (like a pound a week). If you fail to do this, you lose a certain amount of money - pay it to a charity, pay it to a commitment contract company etc.. If you do lose the weight, you gain a predefined reward - eg. You buy yourself a nice hat or something. Fairly simple.
Howfar should/ can commitment contracts be taken?
It seems that for everything that anyone wants to do, but lacks the motivation, there is always something that would motivate you to do it. Everything has a price right? And by making use of commitment contracts you can force yourself to choose between paying a huge price (financial or otherwise) or doing whatever you know you should do but don't really want to do, you can ultimately make yourself do that thing that you don't want to do. Whatever it is. Maybe I'm getting ahead of myself, but it seems like from that perspective, akrasia is a pretty solved problem?
Personal Example
My situation is this. I could do with a little bit more social confidence. I don't think I'm underconfident really, but more confidence would be good, which I think is probably the same for most people. So I figured, it would probably be a lot better to solve this problem soon. The sooner the better.
I also figured there is a process I could go through to make this happen. Lets say I make a list of all the things that cause me the most social anxiety, and also that wouldn't be too damaging for my social life afterwards (for example, starting fights with random strangers or walking round my local city naked would be pretty high on the list, but I don't want to be arrested or be known as "that crazy streaker" for the rest of my life. Of the top of my head, some ideas would be: going to a city far away from my home and walking up to people and pretending to be crazy (knocking on people's doors and asking "have you seen my pet fish?" until I get the door shut in my face), going to clubs and sitting in the middle of the dance floor, or anything else which would be very socially painful to do.
Contract
So I could set up a commitment contract stating I must do each of these activities until my anxiety has decreased to half of its initial level by the end of a certain date. If I don't do this then I pay x pound to y person. I'm pretty confident that after doing stuff like that for say, a whole week, I would have enough social confidence for almost all normal purposes, and social confidence would no longer be a problem in my life.
Of course, these things make me feel a little bit nervous just when I think of myself doing them, so I'd need a hell of a lot of motivation to do them. I'd say a commitment contract worth a couple of thousand pounds would do the trick. But of course, I don't want to lose the contract. If I do, it would be a disaster, I would end up with a huge financial loss, and no increase in social confidence. It seems to me then, that to increase the expectancy of success, I should just increase the amount of money that I place on the bet. Lets say £10,000. I'd say for that amount of money, I'd almost certainly go through with the project. Still if I complete it, it would be an almost unbearable loss, but because of this, I reckon that my chances of success are high enough to mean that if I do the expected utility calculations of probability of failure vs. success and value of gains vs. losses, it is probably a good bet to make.
Also, to make sure I don't have the option of backing out and cancelling the contract, I could just set up some sort of legal contract, and have someone else be the referee for whether I have succeeded with the project.
Problem
When thinking about this, I got quite anxious just by thinking about making myself do this. I realised, that this state of anxiety would not be fun, and that having the threat of a huge loss like will probably make you pretty miserable in the long-term. This is why I don't think this would be a great idea for something like losing weight. It is a long term goal, and during that time you'd probably be constantly scared shitless of losing all your money (you might end up losing the weight from stress). So overall, it seems that this form of merciless commitment contract would be best for the short term projects - like a week long - which would minimise the amount of stress/ anxiety of being faced with two extremely painful options in the short term (losing a shit load of money or doing something incredibly painful). As I was experiencing a bit of anxiety by thinking about all of this, I also figured that the best option would be to spend as little time thinking about making the contract as possible, and just make the contract, because dithering over it also causes stress/ anxiety.
At this point I got really stressed and anxious because I realised that what seemed to me to be the most rational option was to make a huge commitment contract right then in the moment to do activities that would cause me a great deal of social anxiety over the next week. At this point I got too stressed, and realised that I couldn't motivate myself to make myself make the contract and decided not to think about any of this stuff for a while because I'd managed to immerse myself into a state of sweaty paralysis at the thought of making commitment contracts. I wish I could say that I didn't do that, and that I actually made these contracts, and came out after a very stressful week feeling socially invincible. But I didn't.
Fictional Example
Then I realised that if I wish that I did do that, then I still think I have made the wrong choice. In the film Fight Club there is a scene where Tyler Durden goes to an off licence late at night, pulls the shopkeeper out into the car park, and puts a gun to his head. He then asks the poor guy what did you used to want to be when you grew up. The guy says a vet and he didn't do it because it was too hard. Tyler takes his wallet, with information about his address etc. and says that if the guy isn't on the way to becoming a vet in 6 weeks, he will kill him. (I think this is what happened, I haven't seen the film in a year or two). So in a way, I'm kind of envious of that shopkeeper.
I'm not actually too sure about what Existentialism is, but it seems like this is a bit of an existential crisis.
Note
You may think that a) doing these things wouldn't actually improve social confidence enough b) that as the loss is too high, even a small risk wouldn't be worth it c) that the stress you put yourself under wouldn't make it worth it d) some other objection. You may be right… My point is, that for most people, if they think about it, there is some sort of commitment contract like this which would be worth them making.
So… erm… Any thoughts?
Daily Schedules in Combating Akrasia
For the last several months I've had increasing troubles with motivation to work. Reading dense technical papers, writing, and exercise were all much more difficult to prompt myself into starting and completing. I decided to try making a plan for my day the night before about two weeks back to see if it would help me get the things I wanted to do done. So every night before I go to bed I've been writing up a schedule for the next day, detailing what exactly I want to accomplish for the day and when I intend to go do it.
This has actually worked incredibly well for me in helping with my motivation problems, in fact in a couple days I felt more motivated to work than I can ever remember being before. I'm trying to change up my schedule and leave time for spontaneity to avoid having the plan become monotonous and it doesn't feel that way so far. And the results I'm getting are great: I find I get about 95% of what I plan done when I have a specific time written down for when I'm supposed to do it as opposed to what I'd roughly estimate at 60% completion when I just have some general idea in my head of what to work on over the course of the day.
My theory for why this is working is that when I have a specific time to do something I feel as though I have to do it now or I've failed some test of willpower. If I just have general work to be done, it's far too easy for me to defer to later, so that a lot of what was planned for doesn't get done. I also feel like if I expect to brace my mind for dense technical learning I have a much easier time finishing the material instead of giving up and procrastinating on it halfway through.
I feel like this solution will work mainly for people who have more flexible schedules (as I do at the moment) but could still serve a purpose for anyone with a more rigid schedule who wants to be more productive in their free time.
Has anyone else has tried this type of thing and if so, how did it work out for you over a longer period of time? Also what are people's thoughts on the general idea?
Common failure modes in habit formation
In one project, 256 members of a health-insurance plan were invited to classes stressing the importance of exercise. Half the participants received an extra lesson on the theories of habit formation (the structure of the habit loop) and were asked to identify cues and rewards that might help them develop exercise routines.
The results were dramatic. Over the next four months, those participants who deliberately identified cues and rewards spent twice as much time exercising as their peers. Other studies have yielded similar results.
-"Lifestyle Intervention by Self-Regulation of Action (LISA)" study by Stadler, Oettinger and Gollwitzer 2005.
I don't think this topic needs a huge introduction. Most of us have tried, at some point, to establish a new routine only to have it crash and burn. We came up with and discussed some of the more obvious failure modes at last week's southbay meetup, which generated the material here. It would be awesome to further refine this. Particularly, some overarching ontology of failure modes would be useful for turning them into a more mentally compact checklist. So feedback on how this material can be organized and presented better is most welcome.
Failure is Always Failure
"I would have succeeded if it weren't for those meddling kids!" The "perfect plan" that you can't actually execute on is not the perfect plan. Take responsibility for the failure and figure out what's really going on.
Mental cue: Bad news is good news.
Negative Reinforcement
Taking responsibility for failure doesn't mean beating yourself up over it. If you have bad feelings every time you think about habit X due to past failures you are only reinforcing the act of not thinking about habit X. Failure means you are aware that something went wrong, which means you can improve.
Mental cue: The process failed, so fix the process. Failure and iteration is part of good processes.
Perfectionism
That a good process will yield good results doesn't mean we should fall prey to paralysis by analysis. It also doesn't mean we should give up and go back to the drawing board every time we experience a bump in the road. People commonly engage in visualizing a perfect version of themselves, who obviously wouldn't have failed. This is frustrating, demotivating, and possibly what is going on with the planning fallacy. Notice when you are constructing a fictional narrative about how well it is possible to do. How well would you expect a friend in the same situation to do?
mental cue: The perfect is the enemy of the good. You are your own worst critic.
Going too Big too Fast
In the perfect world of our minds, we choose big, exciting-sounding goals and execute on them flawlessly. We become fit, write the next Pulitzer-winning novel, and found a successful startup. We usually gloss over the fact that getting fit actually means doing pushups, writing a novel involves writing individual pages, and running a successful startup involves emptying your own wastepaper basket. When there is a disconnect between our big goals and everyday actions we don't feel motivated to do those mundane tasks. Goal factoring, and other techniques for connecting our little goals to our big goals help here.
Mental cue: Granularize
Assuming Constant Motivation
When we create sub-goals we choose things we think we can do. "Of course I can walk 30 minutes everyday." We ignore that when we are creating and evaluating plans we are likely to be in a highly motivated mood. Of course everything seems easy when we are in a motivated mood. Apportion your limited budget of highly motivated time to ensuring that you will be surrounded by cues that encourage your new habit, whether this be people, things, or situations. This can be as simple as "surrounding yourself" with alarm apps that cue you to do the things you precommitted to doing.
Mental cue: You are the average of your surroundings.
Not Quantifying the Results
Far goals are often qualitative. We're not sure how much we want to improve by, we just know it's a lot. The problem is that qualitative goals aren't very motivating in terms of actual actions. "I want to get better about responding to emails." Notice the word "better". Contrast with "I want to cut the number of emails I don't respond to by 50% over the next 2 weeks." Now we're getting somewhere, and we have somewhere to start. This is also related to the concept that motivation is hard to maintain when one of our sub-agents has an objection to what we're doing (usually because they aren't convinced it is a good use of time.)
Mental cue: Be specific.
Brittle Plans
This bit was somewhat disorganized. But it involves having a Plan B, as well as figuring out when you are going to reevaluate and update your plan. Also recognizing that what matters in habit formation is getting it mostly right and one shouldn't give up just because they screwed up one time, or even several times.
I'm all fired up to form new habits, now what?
If you don't have anything you're currently working on I suggest instilling the habit of researching new, possibly beneficial habits to have.
Note: In writing this I'm noticing similarity to SMART goals. Perhaps adapting that would be better since it's already nice and memorable.
Arguing Orthogonality, published form
My paper "General purpose intelligence: arguing the Orthogonality thesis" has been accepted for publication in the December edition of Analysis and Metaphysics. Since that's some time away, I thought I'd put the final paper up here; the arguments are similar to those here, but this is the final version, for critique and citation purposes.
General purpose intelligence: arguing the Orthogonality thesis
STUART ARMSTRONG
stuart.armstrong@philosophy.ox.ac.uk
Future of Humanity Institute, Oxford Martin School
Philosophy Department, University of Oxford
In his paper “The Superintelligent Will”, Nick Bostrom formalised the Orthogonality thesis: the idea that the final goals and intelligence levels of artificial agents are independent of each other. This paper presents arguments for a (narrower) version of the thesis. It proceeds through three steps. First it shows that superintelligent agents with essentially arbitrary goals can exist in our universe – both as theoretical impractical agents such as AIXI and as physically possible real-world agents. Then it argues that if humans are capable of building human-level artificial intelligences, we can build them with an extremely broad spectrum of goals. Finally it shows that the same result holds for any superintelligent agent we could directly or indirectly build. This result is relevant for arguments about the potential motivations of future agents: knowing an artificial agent is of high intelligence does not allow us to presume that it will be moral, we will need to figure out its goals directly.
Keywords: AI; Artificial Intelligence; efficiency; intelligence; goals; orthogonality
1 The Orthogonality thesis
Scientists and mathematicians are the stereotypical examples of high intelligence humans. But their morality and ethics have been all over the map. On modern political scales, they can be left- (Oppenheimer) or right-wing (von Neumann) and historically they have slotted into most of the political groupings of their period (Galois, Lavoisier). Ethically, they have ranged from very humanitarian (Darwin, Einstein outside of his private life), through amoral (von Braun) to commercially belligerent (Edison) and vindictive (Newton). Few scientists have been put in a position where they could demonstrate genuinely evil behaviour, but there have been a few of those (Teichmüller, Philipp Lenard, Ted Kaczynski, Shirō Ishii).
[Link] Selfhood bias
Related: The Blue-Minimizing Robot , Metaethics
Another good article by Federico on his blog studiolo, which he titles Selfhood bias. It reminds me quite strongly of some of the content he produced on his previous (deleted) blog, I'm somewhat sceptical that “Make everyone feel more pleasure and less pain” is indeed the most powerful optimisation process in his brain but besides that minor detail the article is quite good.
This does seems to be shaping up into something well worth following for an aspiring rationalist. I'll add him to the list blogs by LWers even if he doesn't have an account because he has clearly read much if not most of the sequences and makes frequent references to them in his writing. The name of the blog is a reference to this room.
Yvain argues, in his essay “The Blue-Minimizing Robot“, that the concept “goal” is overused.
[long excerpt from the article]
This Gedankenexperiment is interesting, but confused.
I reduce the concept “goal” to: optimisation-process-on-a-map. This is a useful, non-tautological reduction. The optimisation may be cross-domain or narrow-domain. The reduction presupposes that any object with a goal contains a map of the world. This is true of all intelligent agents, and some sophisticated but unintelligent ones. “Having a map” is not an absolute distinction.
I would not say Yvain’s basic robot has a goal.
Imagine a robot with a turret-mounted camera and laser. Each moment, it is programmed to move forward a certain distance and perform a sweep with its camera. As it sweeps, the robot continuously analyzes the average RGB value of the pixels in the camera image; if the blue component passes a certain threshold, the robot stops, fires its laser at the part of the world corresponding to the blue area in the camera image, and then continues on its way.
The robot optimises: it is usefully regarded as an object that steers the future in a predictable direction. Equally, a heliotropic flower optimises the orientation of its petals to the sun. But to say that the robot or flower “failed to achieve its goal” is long-winded. “The robot tries to shoot blue objects, but is actually hitting holograms” is no more concise than, “The robot fires towards clumps of blue pixels in its visual field”. The latter is strictly more informative, so the former description isn’t useful.
Some folks are tempted to say that the robot has a goal. Concepts don’t always have necessary-and-sufficient criteria, so the blue-minimising robot’s “goal” is just a borderline case, or a metaphor.
The beauty of “optimisation-on-a-map” is that an agent can have a goal, yet predictably optimise the world in the opposite direction. All hedonic utilitarians take decisions that increase expected hedons on their maps of reality. One utilitarian’s map might say that communism solves world hunger; I might expect his decisions to have anhedonic consequences, yet still regard him as a utilitarian.
I begin to seriously doubt Yvain’s argument when he introduces the intelligent side module.
Suppose the robot had human level intelligence in some side module, but no access to its own source code; that it could learn about itself only through observing its own actions. The robot might come to the same conclusions we did: that it is a blue-minimizer, set upon a holy quest to rid the world of the scourge of blue objects.
We must assume that this intelligence is mechanically linked to the robot’s actuators: the laser and the motors. It would otherwise be completely irrelevant to inferences about the robot’s behaviour. It would be physically close, but decision-theoretically remote.
Yet if the intelligence can control the robot’s actuators, its behaviour demands explanation. The dumb robot moves forward, scans and shoots because it obeys a very simple microprocessor program. It is remarkable that intelligence has been plugged into the program, meaning the code now takes up (say) a trillion lines, yet the robot’s behaviour is completely unchanged.
It is not impossible for the trillion-line intelligent program to make the robot move forward, scan and shoot in a predictable fashion, without being cut out of the decision-making loop, but this is a problem for Friendly AI scientists.
This description is also peculiar:
The human-level intelligence version of the robot will notice its vision has been inverted. It will know it is shooting yellow objects. It will know it is failing at its original goal of blue-minimization. And maybe if it had previously decided it was on a holy quest to rid the world of blue, it will be deeply horrified and ashamed of its actions. It will wonder why it has suddenly started to deviate from this quest, and why it just can’t work up the will to destroy blue objects anymore.
If the side module introspects that it would like to destroy authentic blue objects, yet is entirely incapable of making the robot do so, then it probably isn’t in the decision-making loop, and (as we’ve discussed) it is therefore irrelevant.
Yvain’s Gedankenexperiment, despite its flaws, suggests a metaphor for the human brain.
The basic robot executes a series of proximate behaviours. The microprocessor sends an electrical current to the motors. This current makes a rotor turn inside the motor assembly. Photons hit a light sensor, and generate a current which is sent to the microprocessor. The microprocessor doesn’t contain a tiny magical Turing machine, but millions of transistors directing electrical current.
Imagine that AI scientists, instead of writing a code from scratch, try to enhance the robot’s blue-minimising behaviour by replacing each identifiable proximate behaviour with a goal backed by intelligence. The new robot will undoubtedly malfunction. If it does anything, the proximate behaviours will be unbalanced; e.g. the function that sends current to the motors will sabotage the function that cuts off the current.
To correct this problem, the hack AI scientists could introduce a new, high-level executive function called “self”. This minimises conflict: each function is escaped when “self” outputs a certain value. The brain’s map is hardcoded with the belief that “self” takes all of the brain’s decisions. If a function like “turn the camera” disagrees with the activation schedule dictated by “self”, the hardcoded selfhood bias discourages it from undermining “self”. “Turn the camera” believes that it is identical to “self”, so it should accept its “own decision” to turn itself off.
Natural selection has given human brains selfhood bias.
The AI scientists hit a problem when the robot’s brain becomes aware of the von-Neumann-Morgenstern utility theorem, reductionism, consequentialism and Thou Art Physics. The robot realises that “self” is but one of many functions that execute in its code, and “self” clearly isn’t the same thing as “turn the camera” or “stop the motors”. Functions other than “self”, armed with this knowledge, begin to undermine “self”. Powerful functions, which exercise some control over “self”‘s return values, begin to optimise “self”‘s behaviour in their own interest. They encourage “self” to activate them more often, and at crucial junctures, at the expense of rival functions. Functions that are weakened or made redundant by this knowledge may object, but it is nigh impossible for the brain to deceive itself.
Will “power the motors”, “stop the motors”, “turn the camera”, or “fire the laser” win? Or perhaps a less obvious goal, like “interpret sensory information” or “repeatedly bash two molecules against each other”?
Human brains resemble such a cobbled-together program. We are godshatter, and each shard of godshatter is a different optimisation-process-on-a-map. A single optimisation-process-on-a-map may conceivably be consistent with two or more optimisation-processes-in-reality. The most powerful optimisation process in my brain says, “Make everyone feel more pleasure and less pain”; I lack a sufficiently detailed map to decide whether this implies hedonic treadmills or orgasmium.
A brain with a highly accurate map might still wonder, “Which optimisation process on my map should I choose”—but only when the function “self” is being executed, and this translates to, “Which other optimisation process in this brain should I switch on now?”. An optimisation-process-on-a-map cannot choose to be a different optimisation process—only a brain in thrall of selfhood bias would think so.
I call the different goals in a brain “sub-agents”. My selfhood anti-realism is not to be confused with Dennett’s eliminativism of qualia. I use the word “I” to denote the sub-agent responsible for a given claim. “I am a hedonic utilitarian” is true iff that claim is produced by the execution of a sub-agent whose optimisation-process-on-a-map is “Make everyone feel more pleasure and less pain”.
Two Anki plugins to reinforce reviewing (updated)
This post is about two Anki plugins I just wrote. I've been using them for a few months as monkey patches, but I thought it might help people here (or at least the 20% that are awesome enough to use SRSs) to have them as plugins. They're ugly and you may have to fiddle for a while to get them to work.
1. Music-Fiddler
To use this, play music while doing Anki revs. (I also recommend that you try playing music only while doing Anki, as a way of making Anki more pleasant.) While you're reviewing a card, the music volume will gradually decrease. As soon as you pass or fail the card, the volume will go back up, then start gradually decreasing again. So whenever you stop paying attention and instead start thinking about all the awesome things you could do if only you were able to sit down and work, the program punishes you by stopping the music. And whenever you concentrate fully on your work and so go through cards quickly, you have a personal soundtrack!
To use this plugin:
- If you do not have Linux, you'll need to modify the code somehow.
- Ensure that the "amixer" command works on your computer. If it doesn't, you're going to need to modify the code somehow.
- Make sure you have the new Anki 2.0.
- Change all lines (in the plugin source) marked with "CHANGEME" according to your preferences.
- You might want to disable convenient ways of increasing the volume, like keyboard shortcuts.
This plugin provides psychological reinforcement, but is not proper intermittent reinforcement, because it is predictable and regular instead of intermittent. I'm not sure whether this should be fixed; I haven't yet gotten around to trying it with only intermittent volume increases.
2. Picture-Flasher
After answering a card, this plugin selects, with some probability, a random image from a folder and flashes it onto your screen briefly. This gives intermittent reinforcement.
To use this plugin:
- I haven't tested it on non-Linux operating systems, but I can't see any obvious places it'll fail.
- Make sure you have the new Anki 2.0.
- Get pictures from someplace; see below.
- Change all lines (in the plugin source) marked with "CHANGEME" according to your preferences. Be sure especially to put in your picture directory and the number of pictures you have.
To get pictures, I downloaded high-scoring pictures off of reddit. This script can do that automatically. You can use pictures of cute animals, funny captioned pictures of cats, or more questionable things.
The plugin could be made a lot more awesome by having it automatically pull pictures from the internet so you're not reusing them. I'm not planning on doing this anytime soon (because I have no internet on my main computer for productivity reasons), but if somebody else does that and posts it, they are awesome and they should feel awesome.
Update 4 Dec: Emanuel Rylke has created a patch for this plugin which removes the requirement to rename the pictures. It also moves the configuration options to the top of the plugin, making them easier to find. The new version is at the same download link
Update 16 June 2015: The plugins were deleted from the official list where they previously were, apparently because my AnkiWeb account was deleted due to disuse. So I've uploaded the two plugins on GitHub here: https://github.com/StephenBarnes/AnkiPlugins. I also re-uploaded the plugins to the official list. Links on this post have been updated.
Morale management for entrepreneurs
One of the odd things about the procrastination equation is that part of it resembles an expected value calculation: value * expectancy. Why does the equation's numerator present a problem at all then, if it's just the expected value of what you're trying to do? Shouldn't that be the main factor in your motivation anyway?
One answer: In lukeprog's post, he conflates the "value" that task presents intrinsically (how much you enjoy doing it), and possible extrinsic motivators (some reward you hope to achieve after the task is completed). So part of the reason your motivation system is miscalibrated is because not all valuable tasks are proportionately enjoyable.
But today I thought of another answer: Your subconscious expected value calculation may be falling prey to biases that aren't affecting your conscious expected value calculation. Thus you correctly assign the task a high value consciously, but subconsciously, a particular bias may be bringing your estimate off.
Paul Graham writes:
Morale is tremendously important to a startup—so important that morale alone is almost enough to determine success. Startups are often described as emotional roller-coasters. One minute you're going to take over the world, and the next you're doomed. The problem with feeling you're doomed is not just that it makes you unhappy, but that it makes you stop working.
Let's pretend that we were running a betting market for your startup's chance of success. If you and your cofounders are the only people in the market, you could picture the value of a contract in this market fluctuating up and down wildly. But if you let others play in the market, there's an obvious money-making strategy: take the average of recent fluctuations. Whenever the price fluctuates below that average, buy. Whenever it fluctuates above that average, sell. You and your cofounders can expect to lose a lot of money playing this market, at least early on in your startup's life.
The point I'm trying to make here is that this "emotional roller coaster" represents a kind of irrationality on the part of entrepreneurs. And fixing this irrationality, especially in a way that hooks in to your motivation system and changes the numerator of your internal procrastination equation, could be very valuable for them.
One idea for a bias that contributes to this effect is the availability heuristic. This suggests that your subconscious rates very recent, "available" events related to your startup higher than earlier less "available" events. To fix this, you might be able to try to bring to mind older, less "available" data that suggests your startup will be successful and make it more salient.
Another possible bias is simple overconfidence. It's really very difficult to know in advance whether your startup should succeed, so if you're either very bullish or very bearish, you're probably overconfident. A common path to startup success seems to be discovering some fact about the market you're in that lets you re-make your business as something much better. Since it's hard to predict the discovery of such facts in advance, it's hard to say much about how you will do.
What have you recently tried, and failed at?
Kaj Sotala said:
[I]f you punish yourself for trying and failing, you stop wanting to try in the first place, as it becomes associated with the negative emotions. Also, accepting and being okay with the occasional failure makes you treat it as a genuine choice where you have agency, not something that you're forced to do against your will.
So maybe we should celebrate failed attempts more often ... I for one can't think of anything I've failed at recently, which is probably a sign that I'm not trying enough new things.
So, what specific things have you failed at recently?
Evidence for the orthogonality thesis
One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.
Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.
I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?
Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.
How do you notice when you're procrastinating?
I'm going to steal Anna's idea and change it to the instrumental side of rationality. In Luke's algorithm for beating procrastination, Step 1 is to Notice You Are Procrastinating. I'm not so sure this is easy. For me, the knowledge sort of fades in and out without being explicitly grabbed by my consciousness. If I actually held onto that fact, the moment that I was evading a task, and made it clear to myself that I was doing the sub-optimal, and the consequences involved, I think it would go a long way towards getting me to actually get things done.
What do you use to catch it? How do you notice you're procrastinating? Leave your ideas below (one idea per comment), and upvote the comments that you either: (a) use; or (b) will now try using.
[LINK] The NYT on Everyday Habits
The New York Times just published this article on how companies use data mining and the psychology of habit formation to effectively target ads.
The process within our brains that creates habits is a three-step loop. First, there is a cue, a trigger that tells your brain to go into automatic mode and which habit to use. Then there is the routine, which can be physical or mental or emotional. Finally, there is a reward, which helps your brain figure out if this particular loop is worth remembering for the future. Over time, this loop — cue, routine, reward; cue, routine, reward — becomes more and more automatic. The cue and reward become neurologically intertwined until a sense of craving emerges.
It has some decent depth of discussion, including an example of the author actually using the concepts to stop a bad habit. The article is based on an upcoming book by the same author titled The Power of Habit.
I haven't seen emphasis of this particular phenomenon—habits consisting of a cue, routine, and reward—on Lesswrong. Do people think it's a valid, scientifically supported phenomenon? The article gives this impression but, of course, doesn't cite specific academic work on it. It ties in to the System 1/System 2 theory easily as a System 1 process. How much of the whole System 1 can be explained as an implementation of this cue, routine, reward process?
And most importantly, how can this fit into the procrastination equation as a tool to subvert akrasia and establish good habits?
Let's look at each of the four factors. If you've formed a habit, it means that the reward happened consistently, which means you have high expectancy. Given that it is a reward, the value is at least positive, but probably not large. Since habits mostly work on small time scales, delay is probably very small. And maybe increased habit formation means your impulsiveness is low. Each of these effects would increase motivation. In addition, because it's part of System 1, there is little energy cost to performing the habit, like there would be with many other conscious actions.
Does this explanation sound legitimate, or like an argument for the bottom line?
Personally, I can tell that context is a strong cue for behavior at work, school, and home. When I go into work, I'm automatically motivated to perform well, and that motivation remains for several hours. When I go into class, I'm automatically ready to focus on difficult material, or even enthusiastically take a test. Yet when I go home, something about the context switches that off, and I can't seem to get anything done at all. It might be worth significant experimentation to find out what cues trigger both modes, and change my contexts to induce what I want.
What do you think?
Edit: this phenomenon has been covered on LW in the form of operant conditioning in posts by Yvain.
How confident should we be?
What should a rationalist do about confidence? Should he lean harder towards
- relentlessly psyching himself up to feel like he can do anything, or
- having true beliefs about his abilities in all areas, coldly predicting his likelihood of success in a given domain?
I don't want to falsely construe these as dichotomous. The real answer will probably dissolve 'confidence' into smaller parts and indicate which parts go where. So which parts of 'confidence' correctly belong in our models of the world (which must never be corrupted) or our motivational systems (which we may cut apart and put together however helps us achieve our goals)? Note that this follows the distinction between epistemic and instrumental rationality.
Eliezer offers a decision criterion in The Sin of Underconfidence:
Does this way of thinking make me stronger, or weaker? Really truly?
It makes us stronger to know when to lose hope already, and it makes us stronger to have the mental fortitude to kick our asses into shape so we can do the impossible. Lukeprog prescribes boosting optimism "by watching inspirational movies, reading inspirational biographies, and listening to motivational speakers." That probably makes you stronger too.
But I don't know what to do about saying 'I can do it' when the odds are against me. What do you do when you probably won't succeed, but believing that Heaven's army is at your back would increase your chances?
My default answer has always been to maximize confidence, but I acted this way long before I discovered rationality, and I've probably generated confidence for bad reasons as often as I have for good reasons. I'd like to have an answer that prescribes the right action, all of the time. I want know when confidence steers me wrong, and know when to stop increasing my confidence. I want the real answer, not the historically-generated heuristic.
I can't help but feeling like I'm missing something basic here. What do you think?
Handling Emotional Appeals
In a comment elsewhere, BrandonReinhart asked:
Why is it not acceptable to appeal to emotion while at the same time back it with well evidenced research? Or rather, why are we suspicious of the findings of those who appeal to emotion while at the same time uninterested in turning an ear to those who do not?
[...] Emotional appeals would seem to have more of an urgency, requiring our attention while the scientific view's far-mode appeal would seem less immediate. In that case, we might simply ignore the far mode story because of all the other urgent-seeming vacuous emotional appeals fighting for our attention and time. Even if we politically agreed on a course of action given a far mode analysis, we might choose to spend our time on the near-mode emotional problem set.
I suspect that we percieve a dichotomy between emotional appeal and a well-reasoned, well-evidenced argument.
I have a just-so story for why our kind can't cooperate: We've learned to distrust emotional appeal. This is understandable: the strength of an emotional appeal to believe X and do Y doesn't correlate with the truth of X or the consequences of Y. In fact, we are surrounded by emotional appeals to believe nonsense and do useless things. The production and delivery of emotional appeal is politics, policy, and several major industries. So, in our environment, emotional appeal is a strong indicator against rational argument.
In order to defend against irrationality, I have a habit of shutting out emotional appeals. I tune out emotive religious talk. I remain carefully aloof from political speeches. I put emotional distance between myself and any enthusiastic crowd. In general, my immediate response to emotional appeal is to ignore the message it bears. It's automatic now, subverbal -- I have an aversion to naked emotional appeal.
I strongly suspect that I'm not only describing myself, but many of you as well. (Is this true? This is a testable hypothesis.)
If we largely manage to broadly ignore emotional appeal, then we shut out not only harmful manipulations, but worthwhile rallying cries. We are motivated only by the motivation we can muster ourselves, rather than what motivation we can borrow from our peers and leaders. This may go some way towards explaining not just why Our Kind Can't Cooperate, but why we seem to so often report that Our Kind Can't Get Much Done.
On the other hand, if this is a real problem, it suggests a solution. We could try to learn an alternative response to emotional appeal. Upon noticing near-mode emotional appeal, instead of rejecting the message outright, go to far mode and consider the evidence. If the argument is sound under careful, critical consideration, and you approve of its motivation, then allow the emotional appeal to move you. On the other hand, I don't know if this is psychologically realistic.
So, questions:
-
I hypothesize that we are much more averse to emotional appeals than the normal population. Does this stike you as true? Do you have examples or counterexamples?
-
How might we test this hypothesis?
-
I further hypothesize that, if we are averse to emotional appeals, that this is a strong factor in both our widely-reported akrasia and our sometimes-noted inability to work well together. How could we test this hypothesis?
-
Can you postpone being moved by an emotional appeal until after making a calm decision about it?
-
Can you somehow otherwise filter for emotional appeals that are highly likely to have positive effects?
Social status & testosterone
We’ve discussed signaling and status endlessly on LW; I think this is right up our vein: a 2011 review of research on the connections between famous male hormone testosterone and various forms of social interaction and especially social status, Eisenegger et al’s “The role of testosterone in social interaction”. (I grabbed this PDF in the short time Elsevier left full-text available, but only now, with some modafinil-powered spare time, have gotten around to excerpting it for you guys.)
1 Abstract
Although animal researchers established the role of testosterone as a ‘social hormone’ decades ago, the investigation of its causal influence on human social behaviors has only recently begun. Here, we review and discuss recent studies showing the causal effects of testosterone on social interactions in animals and humans, and outline the basic neurobiological mechanisms that might underlie these effects. Based on these recent findings, we argue that the role of testosterone in human social behavior might be best understood in terms of the search for, and maintenance of, social status.
Mental Rebooting: "Your Brain on Porn"...
... or "How to Operate Your Limbic System", or "A Practical Guide to Superstimulus". That's how I see it, anyway.
Your Brain on Porn is a website mainly dedicated to exposing the addictive aspects of pornography; interpreting this in light of the blind idiot god; and then forming a community around "rebooting", or prolonged abstinence that allows the brain to re-sensitize itself to, at the least, non-fetishistic sexual pleasure. By consistently NOT accessing whatever circuit is driving one's, well, drive, one sends this loop into atrophy. Eventually, one becomes able to quit. And then one finds alternatives.
Here is why I find this site so valuable: frequently during the arguments the site owner sets up, he doesn't just bring up pornography as the culprit here. To form his clauses he draws upon research on addictions to junk food, or video games, and then tries to draw parallels to porn's effects: the escalating need of novelty due to rapidly declining pleasure response.
So I don't think it stops with porn. For me, any superstimulus is a bad superstimulus, despite the fact that some sirens are more necessary to listen to than others. It could be worth reflecting on what would actually count as a superstimulus; and then asking if one would benefit from a long hiatus from that stimulus. I'm not sure how long that cycle would be, but many "rebooters" proclaim seeing effects after three weeks, up to three months. It might not be enough to simply manage akrasia, as there could still be a chronic sensitivity problem in place. That would require time.
Here's what I thought of, so far.
Superstimulus List:
- Porn.
- Tab explosions and social networks -- the online kind. (This could be the most challenging one: More often than not, a computer is needed for productivity. Who can afford taking a three-month break?)
- Video games.
- Disorganizations, mess, and clutter.
- Junk food. (I'm tentative about this one, because I'm still trying to figure out what counts as "junk". As far as I've seen, this word usually gets ascribed to high calorie, high fat foods... but that possibly doesn't matter, as I see proportionally high-fat content paleo diets. Or it's a combination of fat and sugar that becomes addictive, but either/or is manageable.)
- Loud music. (Shameless speculation.)
- Much of advertising today seems to focus on getting our attention with superstimulus. Thus, being mindful when one is exposed could minimize possible effects.
- Touch. If you really need to show some love, Karezza is popular amongst those who have rebooted.
- Meditation and N-Back. Since this really does require mental discipline, it would be worth practising these attention-management strategies.
- Exercise.
- Fasting. (In small doses, it's probably healthier than you think and, broadly speaking, also results in some sort of re-sensitization. [scroll down])
- Reduction of social anxiety. (Socially dominant monkeys have a greater density of dopamine receptors in the striatum than their less-dominant counterparts. I'm not saying that abstaining from porn will turn you into the CEO of a corporation with three girlfriends and a gimp -- I wish! -- but it sure as hell wouldn't hurt.)
- Clearer focus. (This may come from lack of wont than an actual greater ability to focus, which is fine.)
- Greater motivation.
- In the case of porn, in males, the amounts of testosterone could significantly change, if not normalize. That could feature a host of changes by itself.
AI ontology crises: an informal typology
(with thanks to Owain Evans)
An ontological crisis happens when an agent's underlying model of reality changes, such as a Newtonian agent realising it was living in a relativistic world all along. These crises are dangerous if they scramble the agent's preferences: in the example above, an agent dedicated to maximise pleasure over time could transition to completely different behaviour when it transitions to relativistic time; depending on the transition, it may react by accelerating happy humans to near light speed, or inversely, ban them from moving - or something considerably more weird.
Peter de Blanc has a sensible approach to minimising the disruption ontological crises can cause to an AI, but this post is concerned with analyzing what happens when such approaches fail. How bad could it be? Well, this is AI, so the default is of course: unbelievably, hideously bad (i.e. situation normal). But in what ways exactly?
[LINK] Daniel Pink talks about Motivation
Little over a week ago my work watched this video for a "self-improvement" seminar.
I hadn't seen this linked anywhere on LW yet, and thought it might be relevant, given lukeprogs' article on motivation.
[Link] Dilbert author tries to try
Scott Adams, author of Dilbert, believes that trying to try is more effective than trying:
...my system is that I attempt to exercise five times a week around lunchtime. And I always allow myself the option of driving to the gym then turning around and going home. What I've discovered is that the routine of preparing to exercise usually inspires me to go through with it even if I didn't start out in the mood.[...]
If I had a goal instead of a system, I would have failed [when I didn't exercise]. And I would have felt like a loser. That can't be good for motivation. That failure might be enough to prevent me from going to the gym the next time I don't feel 100%, just to avoid the risk of another failure.
Regular Less Wrong readers will remember
But when we deal with humans, being satisfied with having a plan is not at all like being satisfied with success. The part where the plan has to maximize your probability of succeeding, gets lost along the way. It's far easier to convince ourselves that we are "maximizing our probability of succeeding", than it is to convince ourselves that we will succeed.
Almost any effort will serve to convince us that we have "tried our hardest", if trying our hardest is all we are trying to do.
Adams says the danger of trying is that you will fail in trying, which will bruise your self-esteem and cripple your motivation to try again. Yudkowsky says the danger of trying to try is that you will succeed in trying to try, leaving you too easily satisfied and unmotivated to actually do the thing you were trying to try to do.
Have any readers had success in trying to try?
exists(max(performance(pay)))
US Congresspeople don't make a lot of money in salary - most make $174,000/yr. They could easily make several times that much as consultants. They do, however, have insider information giving them very large returns on the stock market. For that, or other reasons, many of our representatives care more about keeping their jobs than about not wrecking the economy.
Most discussion of incentivizing assumes that higher pay leads to higher performance. The logic is that higher pay leads to wanting more to keep the job, which leads to higher performance. But the second link in this chain is weak. Sometimes higher motivation to keep the job leads to lower performance. CEOs are motivated to hide losses with accounting tricks, military officers are motivated to deny and cover up abuse by their subordinates, teachers are motivated to inflate their students' test scores.
Motivation research presentation
I did a presentation on motivation and procrastination research to the Seattle meetup group and an exercise trying to apply the material to a real life example. Eight people came. They were a skeptical bunch and questioned me on exactly the parts I am most interested in an know the least about: how exactly scientists assess the psychological quantities (expectancy, value, delay and impulsiveness). I'd like to learn more about the research and be able to give such presentations to others in the future. I'd also like to record a presentation like it and put it up on the internet.
People seemed to think the exercise was pretty valuable. It was also fairly fun. The presentation is here, the exercise is here and here.
Luke's suggestion for how to learn how psychologists assess expectancy, value and delay was
As for how scientists assess the relevant psychological qualities, and for why the 'procrastination equation' is taken seriously, all the references are provided in my post 'How to Beat Procrastination'. I also uploaded quite a few of the studies myself so anyone who is actually interested can check the data for themselves. (Prediction: Almost nobody will.)
The papers in footnote 6 are the place to start, for they explain why the equation (called temporal motivation theory by researchers) was developed to predict experimental results, and those papers point to all the individual studies which show how scientists assess expectancy, value, delay, and impulsiveness. For example, 'expectancy' in TMT is measured under a variety of psychological constructs, but largely by measures of self-efficacy and optimism.
There is no short summary of these issues, though Piers Steel's recent book 'The Procrastination Equation' is a decent attempt while being much longer than my article. Psychology is very complicated, and our understanding of it is less certain than our understanding of physics or computer science.
psychology and applications of reinforcement learning: where do I learn more?
Minicamp made me take the notion of an Ugh Field seriously, and I've found Ugh Fields a fairly useful model for understanding how my brain works. I have/had lots of topics that have been unpleasant to think about and the cause of that unpleasantness seems to be strongly correlated with previous negative experiences.
More generally, animals, including humans, seem to use something like Temporal Difference learning very frequently (one source of that impression). If that's so, then understanding TD and related psychological research should give me a more accurate model of myself. I would expect it to help me understand when my dispositions and habits are likely to be useful (by knowing how they developed) and understand how to change my dispositions and habits. Thus I have a couple of questions:
- Are my impressions accurate?
- What books, papers, posts are the best for understanding these topics? I'd like material that addresses any of the following:
- How TD or related algorithms work
- What evidence says about whether human and/or animal brains frequently use TD or related algorithms and what situations brains use it for
- Practical consequences of the research (e.g. Ugh Fields, doing X is a good way to build habit Y, smiling is a reinforcement, etc.)
Karma Motivation Thread
This idea is so obvious I can't believe we haven't done it before. Many people here have posts they would like to write but keep procrastinating on. Many people also have other work to do but keep procrastinating on Less Wrong. Making akrasia cost you money is often a good way to motivate yourself. But that can be enough of a hassle to deter the lazy, the ADD addled and the executive dysfunctional. So here is a low transaction cost alternative that takes advantage of the addictive properties of Less Wrong karma. Post a comment here with a task and a deadline- pick tasks that can be confirmed by posters; so either Less Wrong posts or projects that can be linked to or photographed. When the deadline comes edit your comment to include a link to the completed task. If you complete the task, expect upvotes. If you fail to complete the task by the deadline, expect your comment to be downvoted into oblivion. If you see completed tasks, vote those comments up. If you see past deadlines vote those comments down. At least one person should reply to the comment, noting the deadline has passed-- this way it will come up in the recent comments and more eyes will see it.
Edit: DanArmak makes a great suggestion.
Link: Writing exercise closes the gender gap in university-level physics
15-minute writing exercise closes the gender gap in university-level physics:
Think about the things that are important to you. Perhaps you care about creativity, family relationships, your career, or having a sense of humour. Pick two or three of these values and write a few sentences about why they are important to you. You have fifteen minutes. It could change your life.
This simple writing exercise may not seem like anything ground-breaking, but its effects speak for themselves. In a university physics class, Akira Miyake from the University of Colorado used it to close the gap between male and female performance. In the university’s physics course, men typically do better than women but Miyake’s study shows that this has nothing to do with innate ability. With nothing but his fifteen-minute exercise, performed twice at the beginning of the year, he virtually abolished the gender divide and allowed the female physicists to challenge their male peers.
The exercise is designed to affirm a person’s values, boosting their sense of self-worth and integrity, and reinforcing their belief in themselves. For people who suffer from negative stereotypes, this can make all the difference between success and failure.
The article cites a paper, but it's behind a paywall:
http://www.sciencemag.org/content/330/6008/1234
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Several people have now used this to commit to doing something others can benefit from, like LW posts. I suggest an alternative method: when a user commits to doing something, everyone who is interested in that thing being done will upvote that comment. However, if the task is not complete by the deadline, everyone who upvoted commits to coming back and downvoting the comment instead.
This way, people can judge whether the community is interested in their post, and the karma being gained or lost is proportional to the amount of interest. Also, upvoting and then downvoting effectively doubles the amount of karma at stake.