Filter This month

Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Allegory On AI Risk, Game Theory, and Mithril

21 James_Miller 13 February 2017 08:41PM

“Thorin, I can’t accept your generous job offer because, honestly, I think that your company might destroy Middle Earth.”  


“Bifur, I can tell that you’re one of those “the Balrog is real, evil, and near” folks who thinks that in the next few decades Mithril miners will dig deep enough to wake the Balrog causing him to rise and destroy Middle Earth.  Let’s say for the sake of argument that you’re right.  You must know that lots of people disagree with you.  Some don’t believe in the Balrog, others think that anything that powerful will inevitably be good, and more think we are hundreds or even thousands of years away from being able to disturb any possible Balrog.  These other dwarves are not going to stop mining, especially given the value of Mithril.  If you’re right about the Balrog we are doomed regardless of what you do, so why not have a high paying career as a Mithril miner and enjoy yourself while you can?”  


“But Thorin, if everyone thought that way we would be doomed!”


“Exactly, so make the most of what little remains of your life.”


“Thorin, what if I could somehow convince everyone that I’m right about the Balrog?”


“You can’t because, as the wise Sinclair said, ‘It is difficult to get a dwarf to understand something, when his salary depends upon his not understanding it!’  But even if you could, it still wouldn’t matter.  Each individual miner would correctly realize that just him alone mining Mithril is extraordinarily unlikely to be the cause of the Balrog awakening, and so he would find it in his self-interest to mine.  And, knowing that others are going to continue to extract Mithril means that it really doesn’t matter if you mine because if we are close to disturbing the Balrog he will be awoken.” 


“But dwarves can’t be that selfish, can they?”  


“Actually, altruism could doom us as well.  Given Mithril’s enormous military value many cities rightly fear that without new supplies they will be at the mercy of cities that get more of this metal, especially as it’s known that the deeper Mithril is found, the greater its powers.  Leaders who care about their citizen’s safety and freedom will keep mining Mithril.  If we are soon all going to die, altruistic leaders will want to make sure their people die while still free citizens of Middle Earth.”


“But couldn’t we all coordinate to stop mining?  This would be in our collective interest.”


“No, dwarves would cheat rightly realizing that if just they mine a little bit more Mithril it’s highly unlikely to do anything to the Balrog, and the more you expect others to cheat, the less your cheating matters as to whether the Balrog gets us if your assumptions about the Balrog are correct.”  


“OK, but won’t the rich dwarves step in and eventually stop the mining?  They surely don’t want to get eaten by the Balrog.”   


“Actually, they have just started an open Mithril mining initiative which will find and then freely disseminate new and improved Mithril mining technology.  These dwarves earned their wealth through Mithril, they love Mithril, and while some of them can theoretically understand how Mithril mining might be bad, they can’t emotionally accept that their life’s work, the acts that have given them enormous success and status, might significantly hasten our annihilation.”


“Won’t the dwarven kings save us?  After all, their primary job is to protect their realms from monsters.


“Ha!  They are more likely to subsidize Mithril mining than to stop it.  Their military machines need Mithril, and any king who prevented his people from getting new Mithril just to stop some hypothetical Balrog from rising would be laughed out of office.  The common dwarf simply doesn’t have the expertise to evaluate the legitimacy of the Balrog claims and so rightly, from their viewpoint at least, would use the absurdity heuristic to dismiss any Balrog worries.  Plus, remember that the kings compete with each other for the loyalty of dwarves and even if a few kings came to believe in the dangers posed by the Balrog they would realize that if they tried to imposed costs on their people, they would be outcompeted by fellow kings that didn’t try to restrict Mithril mining.  Bifur, the best you can hope for with the kings is that they don’t do too much to accelerating Mithril mining.”


“Well, at least if I don’t do any mining it will take a bit longer for miners to awake the Balrog.”


“No Bifur, you obviously have never considered the economics of mining.  You see, if you don’t take this job someone else will.  Companies such as ours hire the optimal number of Mithril miners to maximize our profits and this number won’t change if you turn down our offer.”


“But it takes a long time to train a miner.  If I refuse to work for you, you might have to wait a bit before hiring someone else.”


“Bifur, what job will you likely take if you don’t mine Mithril?”


“Gold mining.”


“Mining gold and Mithril require similar skills.  If you get a job working for a gold mining company, this firm would hire one less dwarf than it otherwise would and this dwarf’s time will be freed up to mine Mithril.  If you consider the marginal impact of your actions, you will see that working for us really doesn’t hasten the end of the world even under your Balrog assumptions.”  


“OK, but I still don’t want to play any part in the destruction of the world so I refuse work for you even if this won’t do anything to delay when the Balrog destroys us.”


“Bifur, focus on the marginal consequences of your actions and don’t let your moral purity concerns cause you to make the situation worse.  We’ve established that your turning down the job will do nothing to delay the Balrog.  It will, however, cause you to earn a lower income.  You could have donated that income to the needy, or even used it to hire a wizard to work on an admittedly long-shot, Balrog control spell.  Mining Mithril is both in your self-interest and is what’s best for Middle Earth.” 

The Semiotic Fallacy

18 Stabilizer 21 February 2017 04:50AM

Acknowledgement: This idea is essentially the same as something mentioned in a podcast where Julia Galef interviews Jason Brennan.

You are in a prison. You don't really know how to fight and you don't have very many allies yet. A prison bully comes up to you and threatens you. You have two options: (1) Stand up to the bully and fight. If you do this, you will get hurt, but you will save face. (2) You can try and run away. You might get hurt less badly, but you will lose face.

What should you do?

From reading accounts of former prisoners and also from watching realistic movies and TV shows, it seems like (1) is the better option. The reason is that the semiotics—or the symbolic meaning—of running away has bad consequences down the road. If you run away, you will be seen as weak, and therefore you will be picked on more often and causing more damage down the road.

This is a case where focusing the semiotics on the action is the right decision, because it is underwritten by future consequences.

But consider now a different situation. Suppose a country, call it Macholand, controls some tiny island far away from its mainland. Macholand has a hard time governing the island and the people on the island don't quite like being ruled by Macholand. Suppose, one fine day, the people of the island declare independence from Macholand. Macholand has two options: (1) Send the military over and put down the rebellion; or (2) Allow the island to take its own course.

From a semiotic standpoint, (1) is probably better. It signals that Macholand is strong and powerful country. But from a consequential standpoint, it is at least plausible (2) is a better option. Macholand saves money and manpower by not having to govern that tiny island; the people on the island are happier by being self-governing; and maybe the international community doesn't really care what Macholand does here.

This is a case where focusing on the semiotics can lead to suboptimal outcomes. 

Call this kind of reasoning the semiotic fallacy: Thinking about the semiotics of possible actions without estimating the consequences of the semiotics.

I think the semiotic fallacy is widespread in human reasoning. Here are a few examples:

  1. People argue that democracy is good because it symbolizes egalitarianism. (This is example used in the podcast interview)
  2. People argue that we should build large particle accelerators because it symbolizes human achievement.
  3. People argue that we shouldn't build a wall on the southern border because it symbolizes division.
  4. People argue that we should build a wall on the southern border because it symbolizes national integrity. 

Two comments are in order:

  1. The semiotic fallacy is a special case of errors in reasoning and judgement caused from signaling behaviors (à la Robin Hanson). The distinctive feature of the semiotic fallacy is that the semiotics are explicitly stated during reasoning. Signaling type errors are often subconscious: e.g., if we spend a lot of money on our parents' medical care, we might be doing it for symbolic purposes (i.e., signaling) but we wouldn't say explicitly that that's why we are doing it. In the semiotic fallacy on the other hand, we do explicitly acknowledge the reason we do something is because of its symbolism.
  2. Just like all fallacies, the existence of the fallacy doesn't necessarily mean the final conclusion is wrong. It could be that the semiotics are underwritten by the consequences. Or the conclusion could be true because of completely orthogonal reasons. The fallacy occurs when we ignore, in our reasoning during choice, the need for the consequential undergirding of symbolic acts.

Why is the surprisingly popular answer correct?

18 Stuart_Armstrong 03 February 2017 04:24PM

In Nature, there's been a recent publication arguing that the best way of gauging the truth of a question is to get people to report their views on the truth of the matter, and their estimate of the proportion of people who would agree with them.

Then, it's claimed, the surprisingly popular answer is likely to be the correct one.

In this post, I'll attempt to sketch a justification as to why this is the case, as far as I understand it.

First, an example of the system working well:


Capital City

Canberra is the capital of Australia, but many people think the actual capital is Sydney. Suppose only a minority knows that fact, and people are polled on the question:

Is Canberra the capital of Australia?

Then those who think that Sydney is the capital will think the question is trivially false, and will generally not see any reason why anyone would believe it true. They will answer "no" and put high proportion of people answering "no".

The minority who know the true capital of Australia will answer "yes". But most of them will likely know a lot of people who are mistaken, so they won't put a high proportion on people answering "yes". Even if they do, there are few of them, so the population estimate for the population estimate of "yes", will still be low.

Thus "yes", the correct answer, will be surprisingly popular.

A quick sanity check: if we asked instead "Is Alice Springs the capital of Australia?", then those who believe Sydney is the capital will still answer "no" and claim that most people would do the same. Those who believe the capital is in Canberra will answer similarly. And there will be no large cache of people believing in Alice Springs being the capital, so "yes" will not be surprisingly popular.

What is important here is that adding true information to the population, will tend to move the proportion of people believing in the truth, more than that moves people's estimate of that proportion.


No differential information:

Let's see how that setup could fail. First, it could fail in a trivial fashion: the Australian Parliament and the Queen secretly conspire to move the capital to Melbourne. As long as they aren't included in the sample, nobody knows about the change. In fact, nobody can distinguish a world in which that was vetoed from one where where it passed. So the proportion of people who know the truth - that being those few deluded souls who already though the capital was in Melbourne, for some reason - is no higher in the world where it's true than the one where it's false.

So the population opinion has to be truth-tracking, not in the sense that the majority opinion is correct, but in the sense that more people believe X is true, relatively, in a world where X is true versus a world where X is false.

Systematic bias in population proportion:

A second failure mode could happen when people are systematically biased in their estimate of the general opinion. Suppose, for instance, that the following headline went viral:

"Miss Australia mocked for claims she got a doctorate in the nation's capital, Canberra."

And suppose that those who believed the capital was in Sydney thought "stupid beauty contest winner, she thought the capital was in Canberra!". And suppose those know knew the true capital thought "stupid beauty contest winner, she claimed to have a doctorate!". So the actual proportion in the belief doesn't change much at all.

But then suppose everyone reasons "now, I'm smart, so I won't update on this headline, but some other people, who are idiots, will start to think the capital is in Canberra." Then they will update their estimate of the population proportion. And Canberra may no longer be surprisingly popular, just expectedly popular.


Purely subjective opinions

How would this method work on a purely subjective opinion, such as:

Is Picasso superior to Van Gogh?

Well, there are two ways of looking at this. The first is to claim this is a purely subjective opinion, and as such people's beliefs are not truth tracking, and so the answers don't give any information. Indeed, if everyone accepts that the question is purely subjective, then there is no such thing as private (or public) information that is relevant to this question at all. Even if there were a prior on this question, no-one can update on any information.

But now suppose that there is a judgement that is widely shared, that, I don't know, blue paintings are objectively superior to paintings that use less blue. Then suddenly answers to that question become informative again! Except now, the question that is really being answered is:

Does Picasso use more blue than Van Gogh?

Or, more generally:

According to widely shared aesthetic criteria, is Picasso superior to Van Gogh?

The same applies to moral questions like "is killing wrong?". In practice, that is likely to reduce to:

According to widely shared moral criteria, is killing wrong?


2017: An Actual Plan to Actually Improve

17 helldalgo 27 January 2017 06:42PM

[Epistemic status: mostly confident, but being this intentional is experimental]

This year, I'm focusing on two traits: resilience and conscientiousness.  I think these (or the fact that I lack them) are my biggest barriers to success.  Also: identifying them as goals for 2017 doesn't mean I'll stop developing them in 2018.  A year is just a nice, established amount of time in which progress can actually be made.  This plan is a more intentional version of techniques I've used to improve myself over the last few years.  I have outside verification that I'm more responsible, high-functioning, and resilient than I was several years ago.  I have managed to reduce my SSRI dose, and I have finished more important tasks this year than last year.  

Inspiring blog posts and articles can only do so much for personal development.  The most valuable writing in that genre tends to outline actual steps that (the author believes) generate positive results.  Unfortunately, finding those steps is a fairly personal process.  The song that gives me twenty minutes of motivation and the drug that helps me overcome anxiety might do the opposite for you.  Even though I'm including detailed steps in this plan, you should keep that in mind.  I hope that this post can give you a template for troubleshooting and discovering your own bottlenecks.


First, I want to talk about my criteria for success.  Without illustrating the end result, or figuring out how to measure it, I could finish out the year with a false belief that I'd made progress.  If you plan something without success criteria, you run the same risk. I also believe that most of the criteria should be observable by a third party, i.e. hard to fake. 

  1. I respond to disruptions in my plans with distress and anger.  While I've gotten better at calming down, the distress still happens. I would like to have emotional control such that I observe first, and then feel my feelings.  Disruptions should incite curiosity, and a calm evaluation of whether to correct course.  The observable bit is whether or not my husband and friends report that I seem less upset when they disrupt me.  This process is already taking place; I've been practicing this skill for a long time and I expect to continue seeing progress.  (resilience)
  2. If an important task takes very little time, doesn't require a lot of effort, and doesn't disrupt a more important process, I will do it immediately. The observable part is simple, here: are the dishes getting done? Did the trash go out on Wednesday?  (conscientiousness)
  3. I will do (2) without "taking damage."  I will use visualization of the end result to make my initial discomfort less significant.  (resilience) 
  4. I will use various things like audiobooks, music, and playfulness to make what can be made pleasant, pleasant.  (resilience and conscientiousness)
  5. My instinct when encountering hard problems will be to dissolve them into smaller pieces and identify the success criteria, immediately, before I start trying to generate solutions. I can verify that I'm doing this by doing hard problems in front of people, and occasionally asking them to describe my process as it appears.  
  6. I will focus on the satisfaction of doing hard things, and practice sitting in discomfort regularly (cold tolerance, calming myself around angry people, the pursuit of fitness, meditation).  It's hard to identify an external sign that this is accomplished.  I expect aversion-to-starting to become less common, and my spouse can probably identify that.  (conscientiousness)
  7. I will keep a daily journal of what I've accomplished, and carry a notebook to make reflective writing easy and convenient.  This will help keep me honest about my past self.  (conscientiousness) 
  8. By the end of the year, I will find myself and my close friends/family satisfied with my growth.  I will have a record of finishing several important tasks, will be more physically fit than I am now, and will look forward to learning difficult things.
One benefit of the some of these is that practice and success are the same.  I can experience the satisfaction of any piece of my practice done well; it will count as being partly successful.  


I've taken the last few years to identify these known bottlenecks and reinforcing actions.  Doing one tends to make another easier, and neglecting them keeps harder things unattainable.  These are the most important habits to establish early.  

  1. Meditation for 10 minutes a day directly improves my resilience and lowers my anxiety.
  2. Medication shouldn't be skipped (an SSRI, DHEA, and methylphenidate). If I decide to go off of it, I should properly taper rather than quitting cold turkey.  DHEA counteracts the negatives of my hormonal birth control and (seems to!) make me more positively aggressive and confident.
  3. Fitness (in the form of dance, martial arts, and lifting) keeps my back from hurting, gives me satisfaction, and has a number of associated cognitive benefits.  Dancing and martial arts also function as socialization, in a way that leads to group intimacy faster than most of my other hobbies.  Being fit and attractive helps me maintain a high libido.  
  4. I need between 7 and 9 hours of sleep.  I've tried getting around it.  I can't.  Getting enough sleep is a well-documented process, so I'm not going to outline my process here.
  5. Water.  Obviously.
  6. Since overcoming most of my social anxiety, I've discovered that frequent, high-value socialization is critical to avoid depression.  I try to regularly engage in activities that bootstrap intimacy, like the dressing room before performances, solving a hard problem with someone, and going to conventions.  I need several days a week to include long conversations with people I like.  
Unknown bottlenecks can be identified by identifying a negative result, and tracing the chain of events backwards until you find a common denominator.  Sometimes, these can also be identified by people who interact with you a lot.


My personal "toolkit" is a list of things that give me temporary motivation or rapidly deescalate negative emotions.  

  1. Kratom (<7g) does wonders for my anxieties about starting a task.  I try not to take it too often, since I don't want to develop tolerance, but I like to keep some on hand for this.
  2. Nicotine+caffeine/ltheanine capsules gives me an hour of motivation without jitters.  This also has a rapid tolerance so I don't do it often.
  3. A 30-second mindfulness meditation can usually calm my first emotional response to a distressing event.
  4. Various posts on can help reconnect me to my values when I'm feeling particularly demotivated.  
  5. Reorganizing furniture makes me feel less "stuck" when I get restless.  Ditto for doing a difficult thing in a different place.
  6. Google Calendar, a number of notebooks, and a whiteboard keep me from forgetting important tasks.
  7. Josh Waitzkin's book, The Art of Learning, remotivates me to achieve mastery in various hobbies.
  8. External prompting from other people can make me start a task I've been avoiding. Sometimes I have people aggressively yell at me.
  9. The LW study hall ( helps keep me focused. I also do "pomos" over video with other people who don't like Complice.

This outline is the culmination of a few years of troubleshooting, getting feedback, and looking for invented narratives or dishonesty in my approach.  Personal development doesn't happen quickly for me, and I expect it doesn't for most people.  You should expect significant improvements to be a matter of years, not months, unless you're improving the basics like sleep or fitness.  For those, you see massive initial gains that eventually level off.  

If you have any criticisms or see any red flags in my approach, let me know in the comments.


Stupidity as a mental illness

14 PhilGoetz 10 February 2017 03:57AM

It's great to make people more aware of bad mental habits and encourage better ones, as many people have done on LessWrong.  The way we deal with weak thinking is, however, like how people dealt with depression before the development of effective anti-depressants:

  • Clinical depression was only marginally treatable.
  • It was seen as a crippling character flaw, weakness, or sin.
  • Admitting you had it could result in losing your job and/or friends.
  • Treatment was not covered by insurance.
  • Therapy was usually analytic or behavioral and not very effective.
  • People thus went to great mental effort not to admit, even to themselves, having depression or any other mental illness.
continue reading »

The Social Substrate

14 lahwran 09 February 2017 07:22AM

This post originally appeared on The Gears To Ascension


I present generative modeling of minds as a hypothesis for the complexities of social dynamics, and build a case for it out of pieces. My hope is that this explains social behaviors more precisely and with less handwaving than its components. I intend this to be a framework for reasoning about social dynamics more explicitly and for training intuitions. In future posts I plan to build on it to give more concrete evidence, and give examples of social dynamics that I think become more legible with the tools provided by combining these ideas.

Epistemic status: Hypothesis, currently my maximum likelihood hypothesis, of why social interaction is so weird.


People talk to each other a lot. Many of them are good at it. Most people don't really have a deep understanding of why, and it's rare for people to question why it's a thing that's possible to be bad at. Many of the rules seem arbitrary at first look, and it can be quite hard to transfer skill at interaction by explanation.

Some of the rules sort of make sense, and you can understand why bad things would happen when you break them: Helping people seems to make them more willing to help you. Being rude to people makes them less willing to help you. People want to "feel heard". But what do those mean, exactly?

I've been wondering about this for a while. I wasn't naturally good at social interaction, and have had to put effort into learning it. This has been a spotty success - I often would go to people for advice, and then get things like "people want to know that you care". That advice sounded nice, but it was vague and not usable.

The more specific social advice seems to generalize quite badly. "Don't call your friends stupid", for example. Banter is an important part of some friendships! People say each other are ugly and feel cared for. Wat?

Recently, I've started to see a deeper pattern here that actually seems to have strong generalization: it's simple to describe, it correctly predicts large portions of very complicated and weird social patterns, and it reliably gives me a lens to decode what happened when something goes wrong. This blog post is my attempt to share it as a package.

I basically came up with none of this. What I'm sharing is the synthesis of things that Andrew Critch, Nate Soares, and Robin Hanson have said - I didn't find these ideas that useful on their own, but together I'm kind of blown away by how much they collectively explain. In future blog posts I'll share some of the things I have used this to understand.

WARNING: An easy instinct, on learning these things, is to try to become more complicated yourself, to deal with the complicated territory. However, my primary conclusion is "simplify, simplify, simplify": try to make fewer decisions that depend on other people's state of mind. You can see more about why and how in the posts in the "Related" section, at the bottom.


Newcomb's problem is a game that two beings can play. Let's say that the two people playing are you and Newcomb. On Newcomb's turn, Newcomb learns all that they can about you, and then puts one opaque box and one transparent box in a room. Then on your turn, you go into the room, and you can take one or both of the boxes. What Newcomb puts in the boxes depends on what they think you'll do once it's your turn:

  • If Newcomb thinks that you'll take only the opaque box, they fill it with $1 million, and put $1000 in the transparent box.
  • If Newcomb thinks that you'll take both of the boxes, they only put $1000 in the transparent box.

Once Newcomb is done setting the room up, you enter and may do whatever you like.

This problem is interesting because the way you win or lose has little to do with what you actually do once you go into the room, it's entirely about what you can convince Newcome you'll do. This leads many people to try to cheat: convince Newcomb that you'll only take one box, and then take two.

In the original framing, Newcomb is a mind-reading oracle, and knows for certain what you'll do. In a more realistic version of the test, Newcomb is merely a smart person and paying attention to you. Newcomb's problem is simply a crystallized view of something that people do all the time: evaluate what kind of people each other, to determine trust. And it's interesting to look at it and note that when it's crystallized, it's kind of weird. When you put it this way, it becomes apparent that there are very strong arguments for why you should always do the trustworthy thing and one-box.


(This section inspired by nate soares' post "newcomblike problems are the norm".)

You want to know that people care about you. You don't just want to know that the other person is acting helpfully right now. If someone doesn't care about you, and is just helping you because it helps them, then you'll trust and like them less. If you know that someone thinks your function from experience to emotions is acceptable to them, you will feel validated.

I think this makes a lot of sense. In artificial distributed systems, we ask a bunch of computers to work together, each computer a node in the system. All of the computers must cooperate to perform some task - some artificial distributed systems, like bittorrent, are intended to allow the different nodes (computers) in the system to share things with each other, but where each participating computer joins to benefit from the system. Other distributed systems, such as the backbone routers of the internet, are intended to provide a service to the outside world - in the case of the backbone routers, they make the internet work.

However, nodes can violate the distributed system's protocols, and thereby gain advantage. In bittorrent, nodes can download but refuse to upload. In the internet backbone, each router needs to know where other routers are, but if a nearby router lies, then the entire internet may slow down dramatically, or route huge portions of US traffic to china. Unfortunately, despite the many trust problems in distributed systems, we have solved relatively few of them. Bitcoin is a fun exception to this - I'll use it as a metaphor in a bit.

Humans are each nodes in a natural distributed system, where each node has its own goals, and can provide and consume services, just like the artificial ones we've built. But we also have this same trust problem, and must be solving it somehow, or we wouldn't be able to make civilizations.

Human intuitions automatically look for reasons why the world is the way it is. In stats/ML/AI, it's called generative modeling. When you have an experience - every time you have any experience, all the time, on the fly - your brain's low level circuitry assumes there was a reason that the experience happened. Each moment your brain is looking for what the process was that created that experience for you. Then in the future, you can take your mental version of the world and run it forward to see what might happen.

When you're young, you start out pretty uncertain about what processes might be driving the world, but as you get older your intuition learns to expect gravity to work, learns to expect that pulling yourself up by your feet won't work, and learns to think of people as made of similar processes to oneself.

So when you're interacting with an individual human, your brain is automatically tracking what sort of process they are - what sort of person they are. It is my opinion that this is one of the very hardest things that brains do (where I got that idea). When you need to decide whether you trust them, you don't just have to do that based off their actions - you also have your mental version of them that you've learned from watching how they behave.

But it's not as simple as evaluating, just once, what kind of person someone is. As you interact with someone, you are continuously automatically tracking what kind of person they are, what kind of thoughts they seem to be having right now, in the moment. When I meet a person and they say something nice, is it because they think they're supposed to, or because they care about me? If my boss is snapping at me, are they to convince me I'm unwelcome at the company without saying it outright, or is my boss just having a bad day?


Note: I am not familiar with the details of the evolution of cooperation. I propose a story here to transfer intuitions, but the details may have happened in a different order. I would be surprised if I am not describing a real event, and it would weaken my point.

Humans are smart, and our ancestors have been reasonably smart going back a very long time, far before even primates branched off. So imagine what it was like to be an animal in a pre-tribal species. You want to survive, and you need resources to do so. You can take them from other animals. You can give them to other animals. Some animals may be more powerful than you, and attempt to take yours.

Imagine what it's like to be an animal partway through the evolution of cooperation. You feel some drive to be nice to other animals, but you don't want to be nice if the other animal will take advantage of you. So you pay attention to which animals seem to care about being nice, and you only help them. They help you, and you both survive.

As the generations go on, this happens repeatedly. An animal that doesn't feel caring for other animals is an animal that you can't trust; An animal that does feel caring is one that you want to help, because they'll help you back.

Over generations, it becomes more and more the case that the animals participating in this system actually want to help each other - because the animals around them are all running newcomblike tests of friendliness. Does this animal seem to have a basic urge to help me? Will this animal only take the one box, if I leave the boxes lying out? If the answer is that you can trust them, and you recognize that you can trust them, then that is the best for you, because then the other animal recognizes that they were trusted and will be helpful back.

After many generations of letting evolution explore this environment, you can expect to end up with animals that feel strong emotions for each other, animals which want to be seen as friendly, animals where helping matters. Here is an example of another species that has learned to behave sort of this way.

This seems to me be a good generating hypothesis for why people care about what other people think of them innately, and seems to predict ways that people will care about each other. I want to feel like people actually care about me, I don't just want to hear them say that they do. In particular, it seems to me that humans want this far more than you would expect of an arbitrary smart-ish animal.

I'll talk more in detail about what I think human innate social drives actually are in a future blog post. I'm interested in links to any research on things like human basic needs or emotional validation. For now, the heuristic I've found most useful is simply "People want to know that those around them approve of/believe their emotional responses to their experiences are sane". See also Succeed Socially, in the related list.


Knowing that humans evaluate each other in newcomblike ways doesn't seem to me to be enough to figure out how to interact with them. Only armed with the statement "one needs to behave in a way that others will recognize as predictably cooperative", I still wouldn't know how to navigate this.

At a lightning talk session I was at a few months ago, Andrew Critch made the argument that humans regularly model many layers deep in real situations. His claim was that people intuitively have a sense of what each other are thinking, including their senses of what you're thinking, and back and forth for a bit. Before I go on, I should emphasize how surprising this should be, without the context of how the brain actually does it: the more levels of me-imagining-you-imagining-me-imagining-you-imagining… you go, the more of an explosion of different options you should expect to see, and the less you should expect actual-sized human minds to be able to deal with it.

However, after having thought about it, I don't think it's as surprising as it seems. I don't think people actually vividly imagine this that many levels deep: what I think is going on is that as you grow up, you learn to recognize different clusters of ways a person can be. Stereotypes, if you will, but not necessarily so coarse as that implies.

At a young age, if I am imagining you, I imagine a sort of blurry version of you. My version of you will be too blurry to have its own version of me, but I learn to recognize the blurry-you when I see it. The blurry version of you only has a few emotions, but I sort of learn what they are: my blurry you can be angry-"colored", or it can be satisfied-"colored", or it can be excited-"colored", etc. ("Color" used here as a metaphor, because I expect this to be built a similar way to color or other basic primitives in the brain.)

Then later, as I get older, I learn to recognize when you see a blurry version of me. My new version of you is a little less blurry, but this new version of you has a blurry-me, made out of the same anger-color or satisfaction-color that I had learned you could be made out of. I go on, and eventually this version of you becomes its own individual colors - you can be angry-you-with-happy-me-inside colored when I took your candy, or you can be relieved-you-with-distraught-me-inside colored when you are seeing that I'm unhappy when a teacher took your candy back.

As this goes on, I learn to recognize versions of you as their own little pictures, with only a few colors - but each color is a "color" that I learned in the past, and the "color" can have me in it, maybe recursively. Now my brain doesn't have to track many levels - it just has to have learned that there is a "color" for being five levels deep of this, or another "color" for being five levels deep of that. Now that I have that color, my intuition can make pictures out of the colors and thereby handle six levels deep, and eventually my intuition will turn six levels into colors and I'll be able to handle seven.

I think it gets a bit more complicated than this for particularly socially competent people, but that's a basic outline of how humans could reliably learn to do this.


I found the claim that humans regularly social-model 5+ levels deep hard to believe at first, but Critch had an example to back it up, which I attempt to recreate here.

Fair warning, it's a somewhat complicated example to follow, unless you imagine yourself actually there. I only share it for the purpose of arguing that this sort of thing actually can happen; if you can't follow it, then it's possible the point stands without it. I had to invent notation in order to make sure I got the example right, and I'm still not sure I did.

(I'm sorry this is sort of contrived. Making these examples fully natural is really really hard.)

  • You're back in your teens, and friends with Kris and Gary. You hang out frequently and have a lot of goofy inside jokes and banter.
  • Tonight, Gary's mom has invited you and Kris over for dinner.
  • You get to Gary's house several hours early, but he's still working on homework. You go upstairs and borrow his bed for a nap.
  • Later, you're awoken by the activity as Kris arrives, and Gary's mom shouts a greeting from the other room: "Hey, Kris! Your hair smells bad.". Kris responds with "Yours as well." This goes back and forth, with Gary, Kris, and Gary's mom fluidly exchanging insults as they chat. You're surprised - you didn't know Kris knew Gary's mom.
  • Later, you go downstairs to say hi. Gary's mom says "welcome to the land of the living!" and invites you all to sit and eat.
  • Partway through eating, Kris says "Gary, you look like a slob."
  • You feel embarrassed in front of Gary's mom, and say "Kris, don't be an ass."
  • You knew they had been bantering happily earlier. If you hadn't had an audience, you'd have just chuckled and joined in. What happened here?

If you'd like, pause for a moment and see if you can figure it out.

You, Gary, and Kris all feel comfortable bantering around each other. Clearly, Gary and Kris feel comfortable around Gary's mom, as well. But the reason you were uncomfortable is that you know Gary's mom thought you were asleep when Kris got there, and you hadn't known they were cool before, so as far as Gary's mom knows, you think she thinks kris is just being an ass. So you respond to that.

Let me try saying that again. Here's some notation for describing it:

  • X => Y: X correctly believes Y
  • X ~> Y: X incorrectly believes Y
  • X ?? Y: X does not know Y
  • X=Y=Z=...: X and Y and Z and ... are comfortable bantering

And here's an explanation in that notation:

  • Kris=You=Gary: Kris, You, and Gary are comfortable bantering.
  • Gary=Kris=Gary's mom: Gary, Kris, and Gary's mom are comfortable bantering.
  • You => [gary=Gary's mom=kris]: You know they're comfortable bantering.
  • Gary's mom ~> [You ?? [gary=Gary's mom=kris]]: Gary's mom doesn't know you know.
  • You => [Gary's mom ~> [You ?? [gary=Gary's mom=kris]]]: You know Gary's mom doesn't know you know they're comfortable bantering.

And to you in the moment, this crazy recursion just feels like a bit of anxiety, fuzzyness, and an urge to call Kris out so Gary's mom doesn't think you're ok with Kris being rude.

Now, this is a somewhat unusual example. It has to be set up just right in order to get such a deep recursion. The main character's reaction is sort of unhealthy/fake - better would have been to clarify that you overheard them bantering earlier. As far as I can tell, the primary case where things get this hairy is when there's uncertainty. But it does actually get this deep - this is a situation pretty similar to ones I've found myself in before.

There's a key thing here: when things like this happen, you react nearly immediately. You don't need to sit and ponder, you just immediately feel embarrassed for Kris, and react right away. Even though in order to figure out explicitly what you were worried about, you would have had to think about it four levels deep.

If you ask people about this, and it takes deep recursion to figure out what's going on, I expect you will generally get confused non-answers, such as "I just had a feeling". I also expect that when people give confused non-answers, it is almost always because of weird recursion things happening.

In Critch's original lightning talk, he gave this as an argument that the human social skills module is the one that just automatically gets this right. I agree with that, but I want to add: I think that that module is the same one that evaluates people for trust and tracks their needs and generally deals with imagining other people.


So people have generative models of each other, and they care about each other's generative models of them. I care about people's opinion of me, but not in just a shallow way: I can't just ask them to change their opinion of me, because I'll be able to tell what they really think. Their actual moral judgement of their actual generative model of me directly affects my feelings of acceptance. So I want to let them know what kind of person I am: I don't just want to claim to be that kind of person, I want to actually show them that I am that kind of person.

You can't just tell someone "I'm not an asshole"; that's not strong evidence about whether you're an asshole. People have incentives to lie. People have powerful low-level automatic bayesian inference systems, and they'll automatically and intuitively recognize what social explanations are more likely as explanations of your behavior. If you want them to believe you're not an asshole, you have to give credible evidence that you are not an asshole: you have to show them that you do things that would have been unlikely had you been an asshole. You have to show them that you're willing to be nice to them, you have to show them that you're willing to accommodate their needs. Things that would be out of character if you were a bad character.

If you hang out with people who read Robin Hanson, you've probably heard of this before, under the name "signaling".

But many people who hear that interpret it as a sort of vacuous version, as though "signaling" is a sort of fakery, as though all you need to do is give the right signals. If someone says "I'm signaling that I'm one of the cool kids", then sure, they may be doing things that for other people would be signals of being one of the cool kids, but on net the evidence is that they are not one of the cool kids. Signaling isn't about the signals, it's about giving evidence about yourself.In order to be able to give credible evidence that you're one of the cool kids, you have to either get really good at lying-with-your-behavior such that people actually believe you, or you have to change yourself to be one of the cool kids. (This is, I think, a big part of where social anxiety advice falls down: "fake it 'til you make it" works only insofar as faking it actually temporarily makes it.)

"Signaling" isn't fakery, it is literally all communication about what kind of person you are. A common thing Hanson says, "X isn't about Y, it's about signaling" seems misleading to me: if someone is wearing a gold watch, it's not so much that wearing a gold watch isn't about knowing the time, it's that the owner's actual desires got distorted by the lens of common knowledge. Knowing that someone would be paying attention to them to infer their desires, they filtered their desires to focus on the ones they thought would make them look good. This also can easily come off as inauthentic, and it seems fairly clear why to me: if you're filtering your desires to make yourself look good, then that's a signal that you need to fake your desires or else you won't look good.

Signals are focused around hard-to-fake evidence. Anything and everything that is hard to fake and would only happen if you're a particular kind of person, and that someone else recognizes as so, is useful in conveying information about what kind of person you are. Fashion and hygiene are good examples of this: being willing to put in the effort make yourself fashionable or presentable, respectively, is evidence of being the kind of person who cares about participating in the societal distributed system.

Conveying truth in ways that are hard to fake is the sort of thing that comes up in artificial distributed systems, too. Bitcoin is designed around a "blockchain": a series of incredibly-difficult-to-fake records of transactions. 
Bitcoin has interesting cryptographic tricks to make this hard to fake, but it centers around having a lot of people doing useless work, so that no one person can do a bunch more useless work and thereby succeed at faking it.


From the inside, it doesn't feel like we're in a massive distributed system. It doesn't feel like we're tracking game theory and common knowledge. Even though everyone, even those who don't know about it, do it automatically.

In the example, the main character just felt like something was funny. The reason they were able to figure it out and say something so fast was that they were a competent human who had focused their considerable learning power on understanding social interaction, presumably from a young age, and automatically recognized a common knowledge pattern when it presented itself.

But in real life, people are constantly doing this. To get along with people, you have to be willing to pay attention to giving evidence about your perception of them. To be accepted, you have to be willing to give evidence that you are the kind of person that other people want to accept, and you might need to change yourself if you actually just aren't.

In general, I currently think that minimizing recursion depth of common knowledge is important. Try to find ways-to-be that people will be able to recognize more easily. Think less about social things in-the-moment so that others have to think less to understand you; adjust your policies to work reliably so that people can predict them reliably.

Other information of interest

[Link] David Chalmers on LessWrong and the rationalist community (from his reddit AMA)

13 ignoranceprior 22 February 2017 07:07PM

Increasing GDP is not growth

13 PhilGoetz 16 February 2017 06:04PM

I just saw another comment implying that immigration was good because it increased GDP.  Over the years, I've seen many similar comments in the LW / transhumanist / etc bubble claiming that increasing a country's population is good because it increases its GDP.  These are generally used in support of increasing either immigration or population growth.

It doesn't, however, make sense.  People have attached a positive valence to certain words, then moved those words into new contexts.  They did not figure out what they want to optimize and do the math.

I presume they want to optimize wealth or productivity per person.  You wouldn't try to make Finland richer by absorbing China.  Its GDP would go up, but its GDP per person would go way down.

continue reading »

[Link] Slate Star Codex Notes on the Asilomar Conference on Beneficial AI

13 Gunnar_Zarncke 07 February 2017 12:14PM

Planning 101: Debiasing and Research

12 lifelonglearner 03 February 2017 03:01PM

Planning 101: Techniques and Research

<Cross-posed from my blog>

[Epistemic status: Relatively strong. There are numerous studies showing that predictions often become miscalibrated. Overconfidence in itself appears fairly robust, appearing in different situations. The actual mechanism behind the planning fallacy is less certain, though there is evidence for the inside/outside view model. The debiasing techniques are supported, but more data on their effectiveness could be good.]

Humans are often quite overconfident, and perhaps for good reason. Back on the savanna and even some places today, bluffing can be an effective strategy for winning at life. Overconfidence can scare down enemies and avoid direct conflict.

When it comes to making plans, however, overconfidence can really screw us over. You can convince everyone (including yourself) that you’ll finish that report in three days, but it might still really take you a week. Overconfidence can’t intimidate advancing deadlines.

I’m talking, of course, about the planning fallacy, our tendency to make unrealistic predictions and plans that just don’t work out.

Being a true pessimist ain’t easy.

Students are a prime example of victims to the planning fallacy:

First, students were asked to predict when they were 99% sure they’d finish a project. When the researchers followed up with them later, though, only about 45%, less than half of the students, had actually finished by their own predicted times [Buehler, Griffin, Ross, 1995].

Even more striking, students working on their psychology honors theses were asked to predict when they’d finish, “assuming everything went as poor as it possibly could.” Yet, only about 30% of students finished by their own worst-case estimate [Buehler, Griffin, Ross, 1995].

Similar overconfidence was also found in Japanese and Canadian cultures, giving evidence that this is a human (and not US-culture-based) phenomenon. Students continued to make optimistic predictions, even when they knew the task had taken them longer last time [Buehler and Griffin, 2003, Buehler et al., 2003].

As I student myself, though, I don’t mean to just pick on ourselves.

The planning fallacy affects projects across all sectors.

An overview of public transportation projects found that most of them were, on average, 20–45% above the estimated cost. In fact, research has shown that these poor predictions haven’t improved at all in the past 30 years [Flyvbjerg 2006].

And there’s no shortage of anecdotes, from the Scottish Parliament Building, which cost 10 times more than expected, or the Denver International Airport, which took over a year longer and cost several billion more.

When it comes to planning, we suffer from a major disparity between our expectations and reality. This article outlines the research behind why we screw up our predictions and gives three suggested techniques to suck less at planning.


The Mechanism:

So what’s going on in our heads when we make these predictions for planning?

On one level, we just don’t expect things to go wrong. Studies have found that we’re biased towards not looking at pessimistic scenarios [Newby-Clark et al., 2000]. We often just assume the best-case scenario when making plans.

Part of the reason may also be due to a memory bias. It seems that we might underestimate how long things take us, even in our memory [Roy, Christenfeld, and McKenzie 2005].

But by far the dominant theory in the field is the idea of an inside view and an outside view [Kahneman and Lovallo 1993]. The inside view is the information you have about your specific project (inside your head). The outside view is what someone else looking at your project (outside of the situation) might say.

Obviously you want to take the Outside View.


We seem to use inside view thinking when we make plans, and this leads to our optimistic predictions. Instead of thinking about all the things that might go wrong, we’re focused on how we can help our project go right.

Still, it’s the outside view that can give us better predictions. And it turns out we don’t even need to do any heavy-lifting in statistics to get better predictions. Just asking other people (from the outside) to predict your own performance, or even just walking through your task from a third-person point of view can improve your predictions [Buehler et al., 2010].

Basically, the difference in our predictions seems to depend on whether we’re looking at the problem in our heads (a first-person view) or outside our heads (a third-person view). Whether we’re the “actor” or the “observer” in our minds seems to be a key factor in our planning [Pronin and Ross 2006].

Debiasing Techniques:

I’ll be covering three ways to improve predictions: MurphyjitsuReference Class Forecasting (RCF), and Back-planning. In actuality, they’re all pretty much the same thing; all three techniques focus, on some level, on trying to get more of an outside view. So feel free to choose the one you think works best for you (or do all three).

For each technique, I’ll give an overview and cover the steps first and then end with the research that supports it. They might seem deceptively obvious, but do try to keep in mind that obvious advice can still be helpful!

(Remembering to breathe, for example, is obvious, but you should still do it anyway. If you don't want to suffocate.)



“Avoid Obvious Failures”

Almost as good as giving procrastination an ass-kicking.

The name Murphyjitsu comes from the infamous Murphy’s Law: “Anything that can go wrong, will go wrong.” The technique itself is from the Center for Applied Rationality (CFAR), and is designed for “bulletproofing your strategies and plans”.

Here are the basic steps:

  1. Figure out your goal. This is the thing you want to make plans to do.
  2. Write down which specific things you need to get done to make the thing happen. (Make a list.)
  3. Now imagine it’s one week (or month) later, and yet you somehow didn’t manage to get started on your goal. (The visualization part here is important.) Are you surprised?
  4. Why? (What went wrong that got in your way?)
  5. Now imagine you take steps to remove the obstacle from Step 4.
  6. Return to Step 3. Are you still surprised that you’d fail? If so, your plan is probably good enough. (Don’t fool yourself!)
  7. If failure still seems likely, go through Steps 3–6 a few more times until you “problem proof” your plan.

Murphyjitsu based off a strategy called a “premortem” or “prospective hindsight”, which basically means imagining the project has already failed and “looking backwards” to see what went wrong [Klein 2007].

It turns out that putting ourselves in the future and looking back can help identify more risks, or see where things can go wrong. Prospective hindsight has been shown to increase our predictive power so we can make adjustments to our plans — before they fail [Mitchell et al., 1989, Veinott et al., 2010].

This seems to work well, even if we’re only using our intuitions. While that might seem a little weird at first (“aren’t our intuitions pretty arbitrary?”), research has shown that our intuitions can be a good source of information in situations where experience is helpful [Klein 1999; Kahneman 2011]*.

While a premortem is usually done on an organizational level, Murphyjitsu works for individuals. Still, it’s a useful way to “failure-proof” your plans before you start them that taps into the same internal mechanisms.

Here’s what Murphyjitsu looks like in action:

“First, let’s say I decide to exercise every day. That’ll be my goal (Step 1). But I should also be more specific than that, so it’s easier to tell what “exercising” means. So I decide that I want to go running on odd days for 30 minutes and do strength training on even days for 20 minutes. And I want to do them in the evenings (Step 2).

Now, let’s imagine that it’s now one week later, and I didn’t go exercising at all! What went wrong? (Step 3) The first thing that comes to mind is that I forgot to remind myself, and it just slipped out of my mind (Step 4). Well, what if I set some phone / email reminders? Is that good enough? (Step 5)

Once again, let’s imagine it’s one week later and I made a reminder. But let’s say I still didn’t got exercising. How surprising is this? (Back to Step 3) Hmm, I can see myself getting sore and/or putting other priorities before it…(Step 4). So maybe I’ll also set aside the same time every day, so I can’t easily weasel out (Step 5).

How do I feel now? (Back to Step 3) Well, if once again I imagine it’s one week later and I once again failed, I’d be pretty surprised. My plan has two levels of fail-safes and I do want to do exercise anyway. Looks like it’s good! (Done)

Reference Class Forecasting:

“Get Accurate Estimates”

Predicting the future…using the past!

Reference class forecasting (RCF)is all about using the outside view. Our inside views tend to be very optimistic: We will see all the ways that things can go right, but none of the ways things can go wrong. By looking at past history — other people who have tried the same or similar thing as us — we can get a better idea of how long things will really take.

Here are the basic steps:

  1. Figure out what you want to do.
  2. See your records how long it took you last time 3.
  3. That’s your new prediction.
  4. If you don’t have past information, look for about how long it takes, on average, to do our thing. (This usually looks like Googling “average time to do X”.)**
  5. That’s your new prediction!

Technically, the actual process for reference class forecasting works a little differently. It involves a statistical distribution and some additional calculations, but for most everyday purposes, the above algorithm should work well enough.

In both cases, we’re trying to take an outside view, which we know improves our estimates [Buehler et al., 1994].

When you Google the average time or look at your own data, you’re forming a “reference class”, a group of related actions that can give you info about how long similar projects tend to take. Hence, the name “reference class forecasting”.

Basically, RCF works by looking only at results. This means that we can avoid any potential biases that might have cropped up if we were to think it through. We’re shortcutting right to the data. The rest of it is basic statistics; most people are close to average. So if we have an idea of what the average looks like, we can be sure we’ll be pretty close to average as well [Flyvbjerg 2006; Flyvbjerg 2008].

The main difference in our above algorithm from the standard one is that this one focuses on your own experiences, so the estimate you get tends to be more accurate than an average we’d get from an entire population.

For example, if it usually takes me about 3 hours to finish homework (I use Toggl to track my time), then I’ll predict that it will take me 3 hours today, too.

It’s obvious that RCF is incredibly simple. It literally just tells you that how long something will take you this time will be very close to how long it took you last time. But that doesn’t mean it’s ineffective! Often, the past is a good benchmark of future performance, and it’s far better than any naive prediction your brain might spit out.

RCF + Murphyjitsu Example:

For me, I’ve found that using a mixture of Reference Class Forecasting and Murphyjitsu to be helpful for reducing overconfidence in my plans.

When starting projects, I will often ask myself, “What were the reasons that I failed last time?” I then make a list of the first three or four “failure-modes” that I can recall. I now make plans to preemptively avoid those past errors.

(This can also be helpful in reverse — asking yourself, “How did I solve a similar difficult problem last time?” when facing a hard problem.)

Here’s an example:

“Say I’m writing a long post (like this one) and I want to know how what might go wrong. I’ve done several of these sorts of primers before, so I have a “reference class” of data to draw from. So what were the major reasons I fell behind for those posts?

<Cue thinking>

Hmm, it looks like I would either forget about the project, get distracted, or lose motivation. Sometimes I’d want to do something else instead, or I wouldn’t be very focused.

Okay, great. Now what are some ways that I might be able to “patch” those problems?

Well, I can definitely start by making a priority list of my action items. So I know which things I want to finish first. I can also do short 5-minute planning sessions to make sure I’m actually writing. And I can do some more introspection to try and see what’s up with my motivation.



“Calibrate Your Intuitions with Reality”

Back-planning involves, as you might expect, planning from the end. Instead of thinking about where we start and how to move forward, we imagine we’re already at our goal and go backwards.

Time-travelling inside your internal universe.

Here are the steps:

  1. Figure out the task you want to get done.
  2. Imagine you’re at the end of your task.
  3. Now move backwards, step-by-step. What is the step right before you finish?
  4. Repeat Step 3 until you get to where you are now.
  5. Write down how long you think the task will now take you.
  6. You now have a detailed plan as well as better prediction!

The experimental evidence for back-planning basically suggests that people will predict longer times to start and finish projects.

There are a few interesting hypotheses about why back-planning seems to improve predictions. The general gist of these theories is that back-planning is a weird, counterintuitive way to think about things, which means it disrupts a lot of mental processes that can lead to overconfidence [Wiese et al., 2012].

This means that back-planning can make it harder to fall into the groove of the easy “best-case” planning we default to. Instead, we need to actually look at where things might go wrong. Which is, of course, what we want.

In my own experience, I’ve found that going through a quick back-planning session can help my intuitions “warm up” to my prediction more. As in, I’ll get an estimation from RCF, but it still feels “off”. Walking through the plan through back-planning can help all the parts of me understand that it really will probably take longer.

Here’s the back-planning example:

“Right now, I want to host a talk at my school. I know that’s the end goal (Step 1). So the end goal is me actually finishing the talk and taking questions (Step 2). What happens right before that? (Step 3). Well, people would need to actually be in the room. And I would have needed a room.

Is that all? (Step 3). Also, for people to show up, I would have needed publicity. Probably also something on social media. I’d need to publicize at least a week in advance, or else it won’t be common knowledge.

And what about the actual talk? I would have needed slides, maybe memorize my talk. Also, I’d need to figure out what my talk is actually going to be on.

Huh, thinking it through like this, I’d need something like 3 weeks to get it done. One week for the actual slides, one week for publicity (at least), and one week for everything else that might go wrong.

That feels more ‘right’ than my initial estimate of ‘I can do this by next week.’”


Experimental Ideas:

Murphyjitsu, Reference Class Forecasting, and Back-planning are the three debiasing techniques that I’m fairly confident work well. This section is far more anecdotal. They’re ideas that I think are useful and interesting, but I don’t have much formal backing for them.

Decouple Predictions From Wishes:

In my own experience, I often find it hard to separate when I want to finish a task versus when I actually think I will finish a task. This is a simple distinction to keep in mind when making predictions, and I think it can help decrease optimism. The most important number, after all, is when I actually think I will finish—it’s what’ll most likely actually happen.

There’s some evidence suggesting that “wishful thinking” could actually be responsible for some poor estimates but it’s far from definitive [Buehler et al., 1997, Krizan and Windschitl].

Incentivize Correct Predictions:

Lately, I’ve been using a 4-column chart for my work. I write down the task in Column 1 and how long I think it will take me in Column 2. Then I go and do the task. After I’m done, I write down how long it actually took me in Column 3. Column 4 is the absolute value of Column 2 minus Column 3, or my “calibration score”.

The idea is to minimize my score every day. It’s simple and it’s helped me get a better sense for how long things really take.

Plan For Failure:

In my schedules, I specifically write in “distraction time”. If you aren’t doing this, you may want to consider doing this. Most of us (me included) have wandering attentions, and I know I’ll lost at least some time to silly things every day.

Double Your Estimate:

I get it. The three debiasing techniques I outlined above can sometimes take too long. In a pinch, you can probably approximate good predictions by just doubling your naive prediction.

Most people tend to be less than 2X overconfident, but I think (pessimistically) sticking to doubling is probably still better than something like 1.5X.


Working in Groups:

Obviously because groups are made of individuals, we’d expect them to be susceptible to the same overconfidence biases I covered earlier. Though some research has shown that groups are less susceptible to bias, more studies have shown that group predictions can be far more optimistic than individual predictions [Wright and Wells, Buehler et al., 2010]. “Groupthink” is term used to describe the observed failings of decision making in groups [Janis].

Groupthink (and hopefully also overconfidence), can be countered by either assigning a “Devil’s Advocate” or engaging in “dialectical inquiry” [Lunenburg 2012]:

We give out more than cookies over here

A Devil’s Advocate is a person who is actively trying to find fault with the group’s plans, looking for holes in reasoning or other objections. It’s suggested that the role rotates, and it’s associated with other positives like improved communication skills.

A dialectical inquiry is where multiple teams try to create the best plan, and then present them. Discussion then happens, and then the group selects the best parts of each plan . It’s a little like building something awesome out of lots of pieces, like a giant robot.

This is absolutely how dialectical inquiry works in practice.

For both strategies, research has shown that they lead to “higher-quality recommendations and assumptions” (compared to not doing them), although it can also reduce group satisfaction and acceptance of the final decision [Schweiger et al. 1986].

(Pretty obvious though; who’d want to keep chatting with someone hell-bent on poking holes in your plan?)



If you’re interested in learning (even) more about the planning fallacy, I’d highly recommend the paper The Planning Fallacy: Cognitive, Motivational, and Social Origins by Roger Buehler, Dale Griffin, and Johanna Peetz. Most of the material in this guide here is was taken from their paper. Do go check it out! It’s free!

Remember that everyone is overconfident (you and me included!), and that failing to plan is the norm. There are scary unknown unknowns out there that we just don’t know about!

Good luck and happy planning!



* Just don’t go and start buying lottery tickets with your gut. We’re talking about fairly “normal” things like catching a ball, where your intuitions give you accurate predictions about where the ball will land. (Instead of, say, calculating the actual projectile motion equation in your head.)

** In a pinch, you can just use your memory, but studies have shown that our memory tends to be biased too. So as often as possible, try to use actual measurements and numbers from past experience.

Works Cited:

Buehler, Roger, Dale Griffin, and Johanna Peetz. "The Planning Fallacy: Cognitive,

Motivational, and Social Origins." Advances in Experimental Social Psychology 43 (2010): 1-62. Social Science Research Network.

Buehler, Roger, Dale Griffin, and Michael Ross. "Exploring the Planning Fallacy: Why People

Underestimate their Task Completion Times." Journal of Personality and Social Psychology 67.3 (1994): 366.

Buehler, Roger, Dale Griffin, and Heather MacDonald. "The Role of Motivated Reasoning in

Optimistic Time Predictions." Personality and Social Psychology Bulletin 23.3 (1997): 238-247.

Buehler, Roger, Dale Griffin, and Michael Ross. “It’s About Time: Optimistic Predictions in

Work and Love.” European Review of Social Psychology Vol. 6, (1995): 1–32

Buehler, Roger, et al. "Perspectives on Prediction: Does Third-Person Imagery Improve Task

Completion Estimates?." Organizational Behavior and Human Decision Processes 117.1 (2012): 138-149.

Buehler, Roger, Dale Griffin, and Michael Ross. "Inside the Planning Fallacy: The Causes and

Consequences of Optimistic Time Predictions." Heuristics and Biases: The Psychology of Intuitive Judgment (2002): 250-270.

Buehler, R., & Griffin, D. (2003). Planning, Personality, and Prediction: The Role of Future

Focus in Optimistic Time Predictions. Organizational Behavior and Human Decision Processes, 92, 80–90

Flyvbjerg, Bent. "From Nobel Prize to Project Management: Getting Risks Right." Project

Management Journal 37.3 (2006): 5-15. Social Science Research Network.

Flyvbjerg, Bent. "Curbing Optimism Bias and Strategic Misrepresentation in Planning:

Reference Class Forecasting in Practice." European Planning Studies 16.1 (2008): 3-21.

Janis, Irving Lester. "Groupthink: Psychological Studies of Policy Decisions and Fiascoes."


Johnson, Dominic DP, and James H. Fowler. "The Evolution of Overconfidence." Nature

477.7364 (2011): 317-320.

Kahneman, Daniel. Thinking, Fast and Slow. Macmillan, 2011.

Kahneman, Daniel, and Dan Lovallo. “Timid Choices and Bold Forecasts: A Cognitive

Perspective on Risk Taking." Management Science 39.1 (1993): 17-31.

Klein, Gary. Sources of power: How People Make DecisionsMIT press, 1999.

Klein, Gary. "Performing a Project Premortem." Harvard Business Review 85.9 (2007): 18-19.

Krizan, Zlatan, and Paul D. Windschitl. "Wishful Thinking About the Future: Does Desire

Impact Optimism?" Social and Personality Psychology Compass 3.3 (2009): 227-243.

Lunenburg, F. "Devil’s Advocacy and Dialectical Inquiry: Antidotes to Groupthink."

International Journal of Scholarly Academic Intellectual Diversity 14 (2012): 1-9.

Mitchell, Deborah J., J. Edward Russo, and Nancy Pennington. "Back to the Future: Temporal

Perspective in the Explanation of Events." Journal of Behavioral Decision Making 2.1 (1989): 25-38.

Newby-Clark, Ian R., et al. "People focus on Optimistic Scenarios and Disregard Pessimistic

Scenarios While Predicting Task Completion Times." Journal of Experimental Psychology: Applied 6.3 (2000): 171.

Pronin, Emily, and Lee Ross. "Temporal Differences in Trait Self-Ascription: When the Self is

Seen as an Other." Journal of Personality and Social Psychology 90.2 (2006): 197.

Roy, Michael M., Nicholas JS Christenfeld, and Craig RM McKenzie. "Underestimating the

Duration of Future Events: Memory Incorrectly Used or Memory Bias?." Psychological Bulletin 131.5 (2005): 738.

Schweiger, David M., William R. Sandberg, and James W. Ragan. "Group Approaches for

Improving Strategic Decision Making: A Comparative Analysis of Dialectical Inquiry,

Devil's Advocacy, and Consensus." Academy of Management Journal 29.1 (1986): 51-71.

Veinott, Beth. "Klein, and Sterling Wiggins,“Evaluating the Effectiveness of the Premortem

Technique on Plan Confidence,”." Proceedings of the 7th International ISCRAM Conference (May, 2010).

Wiese, Jessica, Roger Buehler, and Dale Griffin. "Backward Planning: Effects of Planning

Direction on Predictions of Task Completion Time." Judgment and Decision Making 11.2

(2016): 147.

Wright, Edward F., and Gary L. Wells. "Does Group Discussion Attenuate the Dispositional

Bias?." Journal of Applied Social Psychology 15.6 (1985): 531-546.

[Link] The "I Already Get It" Slide

12 jsalvatier 01 February 2017 03:11AM

How often do you check this forum?

11 JenniferRM 30 January 2017 04:56PM

I'm interested from hearing from everyone who reads this.

Who is checking LW's Discussion area and how often?

1. When you check, how much voting or commenting do you do compared to reading?

2. Do bother clicking through to links?

3. Do you check using a desktop or a smart phone?  Do you just visit the website in browser or use an RSS something-or-other?

4. Also, do you know of other places that have more schellingness for the topics you think this place is centered on? (Or used to be centered on?) (Or should be centered on?)

I would ask this in the current open thread except that structurally it seems like it needs to be more prominent than that in order to do its job.

If you have very very little time to respond or even think about the questions, I'd appreciate it if you just respond with "Ping" rather than click away.

[Link] How to not earn a delta (Change My View)

10 Viliam 14 February 2017 10:04AM

Emergency learning

9 Stuart_Armstrong 28 January 2017 10:05AM

Crossposted at the Intelligent Agent Foundation Forum.

Suppose that we knew that superintelligent AI was to be developed within six months, what would I do?

Well, drinking coffee by the barrel at Miri's emergency research retreat I'd...... still probably spend a month looking at things from the meta level, and clarifying old ideas. But, assuming that didn't reveal any new approaches, I'd try and get something like this working.

continue reading »

[Link] Performance Trends in AI

9 sarahconstantin 28 January 2017 08:36AM

Civil resistance and the 3.5% rule

8 morganism 02 February 2017 06:53PM

Interesting, haven't seen anything data-driven like this before...


Civil resistance and the 3.5% rule.

"no campaigns failed once they’d achieved the active and sustained participation of just 3.5% of the population—and lots of them succeeded with far less than that."

"Then I analyzed the data, and the results blew me away. From 1900 to 2006, nonviolent campaigns worldwide were twice as likely to succeed outright as violent insurgencies. And there’s more. This trend has been increasing over time—in the last fifty years civil resistance has become increasingly frequent and effective, whereas violent insurgencies have become increasingly rare and unsuccessful."


Data viz:



Interesting strategic viewpoint

1. Size and diversity of participation.

2. Nonviolent discipline.

3. Flexible & innovative techniques. switching between concentrated methods like demonstrations and dispersed methods like strikes and stay-aways.

4. Loyalty shifts.
if erstwhile elite supporters begin to abandon the opponent, remain silent when they would typically defend him, and refuse to follow orders to repress dissidents, or drag their feet in carrying out day-to-day orders, the incumbent is losing his grip.


(observations from article above)

"The average nonviolent campaign takes about 3 years to run its course (that’s more than three times shorter than the average violent campaign, by the way)."

"The average nonviolent campaign is about eleven times larger as a proportion of the overall population as the average violent campaign.

"Nonviolent resistance campaigns are ten times more likely to usher in democratic institutions than violent ones."




original overview and links article:


and a training site that has some exercises in group cohesion and communication tech, from Guardian.


edit: The article that got me looking, how to strike in a gig economy, and international reach

AI Safety reading group

8 SoerenE 28 January 2017 12:07PM

I am hosting a weekly AI Safety reading group, and perhaps someone here would be interested in joining.

Here is what the reading group has covered so far:

Next week, on Wednesday the 1st of February 19:45 UTC, we will discuss "How Feasible is the Rapid Development of Artificial Superintelligence?" by Kaj Sotala. I publish some slides before each meeting, and present the article, so you can also join if you have have not read the article. 

To join, add me on Skype ("soeren.elverlin"). General coordination happens on a Facebook group, at 

You can see the time in your local timezone here:

Nearest unblocked strategy versus learning patches

6 Stuart_Armstrong 23 February 2017 12:42PM

Crossposted at Intelligent Agents Forum.

The nearest unblocked strategy problem (NUS) is the idea that if you program a restriction or a patch into an AI, then the AI will often be motivated to pick a strategy that is as close as possible to the banned strategy, very similar in form, and maybe just as dangerous.

For instance, if the AI is maximising a reward R, and does some behaviour Bi that we don't like, we can patch the AI's algorithm with patch Pi ('maximise R0 subject to these constraints...'), or modify R to Ri so that Bi doesn't come up. I'll focus more on the patching example, but the modified reward one is similar.

continue reading »

A semi-technical question about prediction markets and private info

6 CronoDAS 20 February 2017 02:20AM

There exists a 6-sided die that is weighted such that one of the 6 numbers has a 50% chance to come up and all the other numbers have a 1 in 10 chance. Nobody knows for certain which number the die is biased in favor of, but some people have had a chance to roll the die and see the result.

You get a chance to roll the die exactly once, with nobody else watching. It comes up 6. Running a quick Bayes's Theorem calculation, you now think there's a 50% chance that the die is biased in favor of 6 and a 10% chance for the numbers 1 through 5.

You then discover that there's a prediction market about the die. The prediction market says there's a 50% chance that "3" is the number the die is biased in favor of, and each other number is given 10% probability. 

How do you update based on what you've learned? Do you make any bets?

I think I know the answer for this toy problem, but I'm not sure if I'm right or how it generalizes to real life...


[Link] "The unrecognised simplicities of effective action #2: 'Systems engineering’ and 'systems management' - ideas from the Apollo programme for a 'systems politics'", Cummings 2017

6 gwern 17 February 2017 12:59AM

[Link] Decision Theory subreddit

6 gwern 07 February 2017 06:42PM

True understanding comes from passing exams

6 Stuart_Armstrong 06 February 2017 11:51AM

Crossposted at the Intelligent Agent Forum

I'll try to clarify what I was doing with the AI truth setup in a previous post. First I'll explain the nature of the challenge, and then how the setup tries to solve it.

The nature of the challenge is to have an AI give genuine understanding to a human. Getting the truth out of an AI or Oracle is not that hard, conceptually: you get the AI to report some formal property of its model. The problem is that that truth can be completely misleading, or, more likely, incomprehensible.

continue reading »

[Link] Prediction Calibration - Doing It Right

6 SquirrelInHell 30 January 2017 10:05AM

[Link] Split Brain Does Not Lead to Split Consciousness

6 ChristianKl 28 January 2017 08:58AM

Concrete Takeaways Post-CFAR

5 lifelonglearner 24 February 2017 06:31PM

Concrete Takeaways:

[So I recently volunteered at a CFAR workshop. This is part five of a five-part series on how I changed my mind. It's split into 3 sections: TAPs, Heuristics, and Concepts. They get progressively more abstract. It's also quite long at around 3,000 words, so feel free to just skip around and see what looks interesting.]

This is a collection of TAPs, heuristics, and concepts that I’ve been thinking about recently. Many of them were inspired by my time at the CFAR workshop, but there’s not really underlying theme behind it all. It’s just a collection of ideas that are either practical or interesting.



TAPs, or Trigger Action Planning, is a CFAR technique that is used to build habits. The basic idea is you pair a strong, concrete sensory “trigger” (e.g. “when I hear my alarm go off”) with a “plan”—the thing you want to do (e.g. “I will put on my running shoes”).

If you’re good at noticing internal states, TAPs can also use your feelings or other internal things as a trigger, but it’s best to try this with something concrete first to get the sense of it.

Some of the more helpful TAPs I’ve recently been thinking about are below:

Ask for Examples TAP:

[Notice you have no mental picture of what the other person is saying. → Ask for examples.]

Examples are good. Examples are god. I really, really like them.

In conversations about abstract topics, it can be easy to understand the meaning of the words that someone said, yet still miss the mental intuition of what they’re pointing at. Asking for an example clarifies what they mean and helps you understand things better.

The trigger for this TAP is noticing that what someone said gave you no mental picture.

I may be extrapolating too far from too little data here, but it seems like people do try to “follow along” with things in their head when listening. And if this mental narrative, simulation, or whatever internal thing you’re doing comes up blank when someone’s speaking, then this may be a sign that what they said was unclear.

Once you notice this, you ask for an example of what gave you no mental picture. Ideally, the other person can then respond with a more concrete statement or clarification.

Quick Focusing TAP:

[Notice you feel aversive towards something → Be curious and try to source the aversion.]

Aversion Factoring, Internal Double Crux, and Focusing are all techniques CFAR teaches to help deal with internal feelings of badness.

While there are definite nuances between all three techniques, I’ve sort of abstracted from the general core of “figuring out why you feel bad” to create an in-the-moment TAP I can use to help debug myself.

The trigger is noticing a mental flinch or an ugh field, where I instinctively shy away from looking too hard.

After I notice the feeling, my first step is to cultivate a sense of curiosity. There’s no sense of needing to solve it; I’m just interested in why I’m feeling this way.

Once I’ve directed my attention to the mental pain, I try to source the discomfort. Using some backtracking and checking multiple threads (e.g. “is it because I feel scared?”) allows me to figure out why. This whole process takes maybe half a minute.

When I’ve figured out the reason why, a sort of shift happens, similar to the felt shift in focusing. In a similar way, I’m trying to “ground” the nebulous, uncertain discomfort, forcing it to take shape.

I’d recommend trying some Focusing before trying this TAP, as it’s basically an expedited version of it, hence the name.

Rule of Reflexivity TAP:

[Notice you’re judging someone → Recall an instance where you did something similar / construct a plausible internal narrative]

[Notice you’re making an excuse → Recall times where others used this excuse and update on how you react in the future.]

This is a TAP that was born out of my observation that our excuses seem way more self-consistent when we’re the ones saying then. (Oh, why hello there, Fundamental Attribution Error!) The point of practicing the Rule of Reflexivity is to build empathy.

The Rule of Reflexivity goes both ways. In the first case, you want to notice if you’re judging someone. This might feel like ascribing a value judgment to something they did, e.g. “This person is stupid and made a bad move.”

The response is to recall times where either you did something similar or (if you think you’re perfect) think of a plausible set of events that might have caused them to act in this way. Remember that most people don’t think they’re acting stupidly; they’re just doing what seems like a good idea from their perspective.

In the second case, you want to notice when you’re trying to justify your own actions. If the excuses you yourself make suspiciously sound like things you’ve heard others say before, then you may want to jump less likely to immediately dismissing them in the future.

Keep Calm TAP:

[Notice you’re starting to get angry → Take a deep breath → Speak softer and slower]

Okay, so this TAP is probably not easy to do because you’re working against a biological response. But I’ve found it useful in several instances where otherwise I would have gotten into a deeper argument.

The trigger, of course, is noticing that you’re angry. For me, this feels like an increased tightness in my chest and a desire to raise my voice. I may feel like a cherished belief of mine is being attacked.

Once I notice these signs, I remember that I have this TAP which is about staying calm. I think something like, “Ah yes, I’m getting angry now. But I previously already made the decision that it’d be a better idea to not yell.”

After that, I take a deep breath, and I try to open up my stance. Then I remember to speak in a slower and quieter tone than previously. I find this TAP especially helpful in arguments—ahem, collaborative searches for the truth—where things get a little too excited on both sides.  



Heuristics are algorithm-like things you can do to help get better results. I think that it’d be possible to turn many of the heuristics below into TAPs, but there’s a sense of deliberately thinking things out that separates these from just the “mindless” actions above.

As more formal procedures, these heuristics do require you to remember to Take Time to do them well. However, I think that the sorts of benefits you get from make it worth the slight investment in time.


Modified Murphyjitsu: The Time Travel Reframe:

(If you haven’t read up on Murphyjitsu yet, it’d probably be good to do that first.)

Murphyjitsu is based off the idea of a premortem, where you imagine that your project failed and you’re looking back. I’ve always found this to be a weird temporal framing, and I realized there’s a potentially easier way to describe things:

Say you’re sitting at your desk, getting ready to write a report on intertemporal travel. You’re confident you can finish before the hour is over. What could go wrong? Closing Facebook, you begin to start typing.

Suddenly, you hear a loud CRACK! A burst of light floods your room as a figure pops into existence, dark and silhouetted by the brightness behind it. The light recedes, and the figure crumples to the ground. Floating in the air is a whirring gizmo, filled with turning gears. Strangely enough, your attention is drawn from the gizmo to the person on the ground:

The figure has a familiar sort of shape. You approach, tentatively, and find the splitting image of yourself! The person stirs and speaks.

“I’m you from one week into the future,” your future self croaks. Your future self tries to tries to get up, but sinks down again.

“Oh,” you say.

“I came from the future to tell you…” your temporal clone says in a scratchy voice.

“To tell me what?” you ask. Already, you can see the whispers of a scenario forming in your head…

Future Your slowly says, “To tell you… that the report on intertemporal travel that you were going to write… won’t go as planned at all. Your best-case estimate failed.”

“Oh no!” you say.

Somehow, though, you aren’t surprised…

At this point, what plausible reasons for your failure come to mind?

I hypothesize that the time-travel reframe I provide here for Murphyjitsu engages similar parts of your brain as a premortem, but is 100% more exciting to use. In all seriousness, I think this is a reframe that is easier to grasp compared to the twisted “imagine you’re in the future looking back into the past, which by the way happens to be you in the present” framing normal Murphyjitsu uses.

The actual (non-dramatized) wording of the heuristic, by the way, is, “Imagine that Future You from one week into the future comes back telling you that the plan you are about to embark on will fail: Why?”

Low on Time? Power On!

Often, when I find myself low on time, I feel less compelled to try. This seems sort of like an instance of failing with abandon, where I think something like, “Oh well, I can’t possibly get anything done in the remaining time between event X and event Y”.

And then I find myself doing quite little as a response.

As a result, I’ve decided to internalize the idea that being low on time doesn’t mean I can’t make meaningful progress on my problems.

This a very Resolve-esque technique. The idea is that even if I have only 5 minutes, that’s enough to get things down. There’s lots of useful things I can pack into small time chunks, like thinking, brainstorming, or doing some Quick Focusing.

I’m hoping to combat the sense of apathy / listlessness that creeps in when time draws to a close.

Supercharge Motivation by Propagating Emotional Bonds:

[Disclaimer: I suspect that this isn’t an optimal motivation strategy, and I’m sure there are people who will object to having bonds based on others rather than themselves. That’s okay. I think this technique is effective, I use it, and I’d like to share it. But if you don’t think it’s right for you, feel free to just move along to the next thing.]

CFAR used to teach a skill called Propagating Urges. It’s now been largely subsumed by Internal Double Crux, but I still find Propagating Urges to be a powerful concept.

In short, Propagating Urges hypothesizes that motivation problems are caused because the implicit parts of ourselves don’t see how the boring things we do (e.g. filing taxes) causally relate to things we care about (e.g. not going to jail). The actual technique involves walking through the causal chain in your mind and some visceral imagery every step of the way to get the implicit part of yourself on board.

I’ve taken the same general principle, but I’ve focused it entirely on the relationships I have with other people. If all the parts of me realize that doing something would greatly hurt those I care about, this becomes a stronger motivation than most external incentives.

For example, I walked through an elaborate internal simulation where I wanted to stop doing a Thing. I imagined someone I cared deeply for finding out about my Thing-habit and being absolutely deeply disappointed. I focused on the sheer emotional weight that such disappointment would cause (facial expressions, what they’d feel inside, the whole deal).

I now have a deep injunction against doing the Thing, and all the parts of me are in agreement because we agree that such a Thing would hurt other people and that’s obviously bad.

The basic steps for Propagating Emotional Bonds looks like:

  • Figure out what thing you want to do more of or stop doing.

  • Imagine what someone you care about would think or say.

  • Really focus on how visceral that feeling would be.

  • Rehearse the chain of reasoning (“If I do this, then X will feel bad, and I don’t want X to feel bad, so I won’t do it”) a few times.

Take Time in Social Contexts:

Often, in social situations, when people ask me questions, I feel an underlying pressure to answer quickly. It feels like if I don’t answer in the next ten seconds, something’s wrong with me. (School may have contributed to this). I don’t exactly know why, but it just feels like it’s expected.

I also think that being forced to hurry isn’t good for thinking well. As a result, something helpful I’ve found is when someone asks something like, “Is that all? Anything else?” is to Take Time.

My response is something like, “Okay, wait, let me actually take a few minutes.” At which point, I, uh, actually take a few minutes to think things through. After saying this, it feel like it’s now socially permissible for me to take some time thinking.

This has proven in several contexts where, had I not Taken Time, I would have forgotten to bring up important things or missed key failure-modes.

Ground Mental Notions in Reality not by Platonics:

One of the proposed reasons that people suck at planning is that we don’t actually think about the details behind our plans. We end up thinking about them in vague black-box-style concepts that hide all the scary unknown unknowns. What we’re left with is just the concept of our task, rather than a deep understanding of what our task entails.

In fact, this seems fairly similar to the the “prototype model” that occurs in scope insensitivity.

I find this is especially problematic for tasks which look nothing like their concepts. For example, my mental representation of “doing math” conjures images of great mathematicians, intricate connections, and fantastic concepts like uncountable sets.

Of course, actually doing math looks more like writing stuff on paper, slogging through textbooks, and banging your head on the table.

My brain doesn’t differentiate well between doing a task and the affect associated with the task. Thus I think it can be useful to try and notice when our brains our doing this sort of black-boxing and instead “unpack” the concepts.

This means getting better correspondences between our mental conceptions of tasks and the tasks themselves, so that we can hopefully actually choose better.

3 Conversation Tips:

I often forget what it means to be having a good conversation with someone. I think I miss opportunities to learn from others when talking with them. This is my handy 3-step list of Conversation Tips to get more value out of conversations:

1) "Steal their Magic": Figure out what other people are really good at, and then get inspired by their awesomeness and think of ways you can become more like that. Learn from what other people are doing well.

2) "Find the LCD"/"Intellectually Escalate": Figure out where your intelligence matches theirs, and learn something new. Focus on Actually Trying to bridge those inferential distances. In conversations, this means focusing on the limits of either what you know or what the other person knows.

3) "Convince or Be Convinced”: (This is a John Salvatier idea, and it also follows from the above.) Focus on maximizing your persuasive ability to convince them of something. Or be convinced of something. Either way, focus on updating beliefs, be it your own or the other party’s.

Be The Noodly Appendages of the Superintelligence You Wish To See in the World:

CFAR co-founder Anna Salamon has this awesome reframe similar to IAT which asks, “Say a superintelligence exists and is trying to take over the world. However, you are its only agent. What do you do?”

I’ll admit I haven’t used this one, but it’s super cool and not something I’d thought of, so I’m including it here.



Concepts are just things in the world I’ve identified and drawn some boundaries around. They are farthest from the pipeline that goes from ideas to TAPs, as concepts are just ideas. Still, I do think these concepts “bottom out” at some point into practicality, and I think playing around with them could yield interesting results.

Paperspace =/= Mindspace:

I tend to write things down because I want to remember them. Recently, though I’ve noticed that rather act as an extension of my brain, I seem to treat things I write down as no longer in my own head. As in, if I write something down, it’s not necessarily easier for me to recall it later.

It’s as if by “offloading” the thoughts onto paper, I’ve cleared them out of my brain. This seems suboptimal, because a big reason I write things down is to cement them more deeply within my head.

I can still access the thoughts if I’m asking myself questions like, “What did I write down yesterday?” but only if I’m specifically sorting for things I write down.

The point is, I want stuff I write down on paper to be, not where I store things, but merely a sign of what’s stored inside my brain.

Outreach: Focus on Your Target’s Target:

One interesting idea I got from the CFAR workshop was that of thinking about yourself as a radioactive vampire. Um, I mean, thinking about yourself as a memetic vector for rationality (the vampire thing was an actual metaphor they used, though).

The interesting thing they mentioned was to think, not about who you’re directly influencing, but who your targets themselves influence.

This means that not only do you have to care about the fidelity of your transmission, but you need to think of ways to ensure that your target also does a passable job of passing it on to their friends.

I’ve always thought about outreach / memetics in terms of the people I directly influence, so looking at two degrees of separation is a pretty cool thing I hadn’t thought about in the past.

I guess that if I took this advice to heart, I’d probably have to change the way that I explain things. For example, I might want to try giving more salient examples that can be easily passed on or focusing on getting the intuitions behind the ideas across.

Build in Blank Time:

Professor Barbara Oakley distinguishes between focused and diffused modes of thinking. Her claim is that time spent in a thoughtless activity allows your brain to continue working on problems without conscious input. This is the basis of diffuse mode.

In my experience, I’ve found that I get interesting ideas or remember important ideas when I’m doing laundry or something else similarly mindless.

I’ve found this to be helpful enough that I’m considering building in “Blank Time” in my schedules.

My intuitions here are something like, “My brain is a thought-generator, and it’s particularly active if I can pay attention to it. But I need to be doing something that doesn’t require much of my executive function to even pay attention to my brain. So maybe having more Blank Time would be good if I want to get more ideas.”

There’s also the additional point that meta-level thinking can’t be done if you’re always in the moment, stuck in a task. This means that, cool ideas aside, if I just want to reorient or survey my current state, Blank Time can be helpful.

The 99/1 Rule: Few of Your Thoughts are Insights:

The 99/1 Rule says that the vast majority of your thoughts every day are pretty boring and that only about one percent of them are insightful.

This was generally true for my life…and then I went to the CFAR workshop and this rule sort of stopped being appropriate. (Other exceptions to this rule were EuroSPARC [now ESPR] and EAG)


I bulldozed through a bunch of ideas here, some of which could have probably garnered a longer post. I’ll probably explore some of these ideas later on, but if you want to talk more about any one of them, feel free to leave a comment / PM me.


Levers, Emotions, and Lazy Evaluators:

5 lifelonglearner 20 February 2017 11:00PM

Levers, Emotions, and Lazy Evaluators: Post-CFAR 2

[This is a trio of topics following from the first post that all use the idea of ontologies in the mental sense as a bouncing off point. I examine why naming concepts can be helpful, listening to your emotions, and humans as lazy evaluators. I think this post may also be of interest to people here. Posts 3 and 4 are less so, so I'll probably skip those, unless someone expresses interest. Lastly, the below expressed views are my own and don’t reflect CFAR’s in any way.]


When I was at the CFAR workshop, someone mentioned that something like 90% of the curriculum was just making up fancy new names for things they already sort of did. This got some laughs, but I think it’s worth exploring why even just naming things can be powerful.

Our minds do lots of things; they carry many thoughts, and we can recall many memories. Some of these phenomena may be more helpful for our goals, and we may want to name them.

When we name a phenomenon, like focusing, we’re essentially drawing a boundary around the thing, highlighting attention on it. We’ve made it conceptually discrete. This transformation, in turn, allows us to more concretely identify which things among the sea of our mental activity correspond to Focusing.

Focusing can then become a concept that floats in our understanding of things our minds can do. We’ve taken a mental action and packaged it into a “thing”. This can be especially helpful if we’ve identified a phenomena that consists of several steps which usually aren’t found together.

By drawing certain patterns around a thing with a name, we can hopefully help others recognize them and perhaps do the same for other mental motions, which seems to be one more way that we find new rationality techniques.

This then means that we’ve created a new action that is explicitly available to our ontology. This notion of “actions I can take” is what I think forms the idea of levers in our mind. When CFAR teaches a rationality technique, the technique itself seems to be pointing at a sequence of things that happen in our brain. Last post, I mentioned that I think CFAR techniques upgrade people’s mindsets by changing their sense of what is possible.

I think that levers are a core part of this because they give us the feeling of, “Oh wow! That thing I sometimes do has a name! Now I can refer to it and think about it in a much nicer way. I can call it ‘focusing’, rather than ‘that thing I sometimes do when I try to figure out why I’m feeling sad that involves looking into myself’.”

For example, once you understand that a large part of habituation is simply "if-then" loops (ala TAPs, aka Trigger Action Plans), you’ve now not only understood what it means to learn something as a habit, but you’ve internalized the very concept of habituation itself. You’ve gone one meta-level up, and you can now reason about this abstract mental process in a far more explicit way.

Names haves power in the same way that abstraction barriers have power in a programming language—they change how you think about the phenomena itself, and this in turn can affect your behavior.  



CFAR teaches a class called “Understanding Shoulds”, which is about seeing your “shoulds”, the parts of yourself that feel like obligations, as data about things you might care about. This is a little different from Nate Soares’s Replacing Guilt series, which tries to move past guilt-based motivation.

In further conversations with staff, I’ve seen the even deeper view that all emotions should be considered information.

The basic premise seems to be based off the understanding that different parts of us may need different things to function. Our conscious understanding of our own needs may sometimes be limited. Thus, our implicit emotions (and other S1 processes) can serve as a way to inform ourselves about what we’re missing.

In this way, all emotions seem channels where information can be passed on from implicit parts of you to the forefront of “meta-you”. This idea of “emotions as a data trove” is yet another ontology that produces different rationality techniques, as it’s operating on, once again, a mental model that is built out of a different type of abstraction.

Many of the skills based on this ontology focus on communication between different pieces of the self.

I’m very sympathetic to this viewpoint, as it form the basis of the Internal Double Crux (IDC) technique, one of my favorite CFAR skills. In short, IDC assumes that akrasia-esque problems are caused by a disagreement between different parts of you, some of which might be in the implicit parts of your brain.

By “disagreement”, I mean that some part of you endorses an action for some well-meaning reasons, but some other part of you is against the action and also has justifications. To resolve the problem, IDC has us “dialogue” between the conflicting parts of ourselves, treating both sides as valid. If done right, without “rigging” the dialogue to bias one side, IDC can be a powerful way to source internal motivation for our tasks.

While I do seem to do some communication between my emotions, I haven’t fully integrated them as internal advisors in the IFS sense. I’m not ready to adopt a worldview that might potentially hand over executive control to all the parts of me. Meta-me still deems some of my implicit desires as “foolish”, like the part of me that craves video games, for example. In order to avoid slippery slopes, I have a blanket precommitment on certain things in life.

For the meantime, I’m fine sticking with these precommitments. The modern world is filled with superstimuli, from milkshakes to insight porn (and the normal kind) to mobile games, that can hijack our well-meaning reward systems.

Lastly, I believe that without certain mental prerequisites, some ontologies can be actively harmful. Nate’s Resolving Guilt series can leave people without additional motivation for their actions; guilt can be a useful motivator. Similarly, Nihilism is another example of an ontology that can be crippling unless paired with ideas like humanism.


Lazy Evaluators:

In In Defense of the Obvious, I gave a practical argument as to why obvious advice was very good. I brought this point up up several times during the workshop, and people seemed to like the point.

While that essay focused on listening to obvious advice, there appears to be a similar thing where merely asking someone, “Did you do all the obvious things?” will often uncover helpful solutions they have yet to do.


My current hypothesis for this (apart from “humans are programs that wrote themselves on computers made of meat”, which is a great workshop quote) is that people tend to be lazy evaluators. In programming, lazy evaluation is a way of solving for the value of expressions at the last minute, not until the answers are absolutely needed.

It seems like something similar happens in people’s heads, where we simply don’t ask ourselves questions like “What are multiple ways I could accomplish this?” or “Do actually I want to do this thing?” until we need to…Except that most of the time, we never need to—Life putters on, whether or not we’re winning at it.

I think this is part of what makes “pair debugging”, a CFAR activity where a group of people try to help one person with their “bugs”, effective. When we have someone else taking an outside view asking us these questions, it may even be the first time we see these questions ourselves.

Therefore, it looks like a helpful skill is to constantly ask ourselves questions and cultivate a sense of curiosity about how things are. Anna Salamon refers to this skill of “boggling”. I think boggling can help with both counteracting lazy evaluation and actually doing obvious actions.

Looking at why obvious advice is obvious, like “What the heck does ‘obvious’ even mean?” can help break the immediate dismissive veneer our brain puts on obvious information.

EX: “If I want to learn more about coding, it probably makes sense to ask some coder friends what good resources are.”

“Nah, that’s so obvious; I should instead just stick to this abstruse book that basically no one’s heard of—wait, I just rejected something that felt obvious.”

“Huh…I wonder why that thought felt obvious…what does it even mean for something to be dubbed ‘obvious’?”

“Well…obvious thoughts seem to have a generally ‘self-evident’ tag on them. If they aren’t outright tautological or circularly defined, then there’s a sense where the obvious things seems to be the shortest paths to the goal. Like, I could fold my clothes or I could build a Rube Goldberg machine to fold my clothes. But the first option seems so much more ‘obvious’…”

“Aside from that, there also seems to be a sense where if I search my brain for ‘obvious’ things, I’m using a ‘faster’ mode of thinking (ala System 1). Also, aside from favoring simpler solutions, also seems to be influenced by social norms (what do people ‘typically’ do). And my ‘obvious action generator’ seems to also be built off my understanding of the world, like, I’m thinking about things in terms of causal chains that actually exist in the world. As in, when I’m thinking about ‘obvious’ ways to get a job, for instance, I’m thinking about actions I could take in the real world that might plausibly actually get me there…”

“Whoa…that means that obvious advice is so much more than some sort of self-evident tag. There’s a huge amount of information that’s being compressed when I look at it from the surface…’Obvious’ really means something like ‘that which my brain quickly dismisses because it is simple, complies with social norms, and/or runs off my internal model of how the universe works.”

The goal is to reduce the sort of “acclimation” that happens with obvious advice by peering deeper into it. Ideally, if you’re boggling at your own actions, you can force yourself to evaluate earlier. Otherwise, it can hopefully at least make obvious advice more appealing.

I’ll end with a quote of mine from the workshop:

“You still yet fail to grasp the weight of the Obvious.”

CHCAI/MIRI research internship in AI safety

5 RobbBB 13 February 2017 06:34PM

The Center for Human-Compatible AI (CHCAI) and the Machine Intelligence Research Institute (MIRI) are looking for talented, driven, and ambitious technical researchers for a summer research internship.



CHCAI is a research center based at UC Berkeley with PIs including Stuart Russell, Pieter Abbeel and Anca Dragan. CHCAI describes its goal as "to develop the conceptual and technical wherewithal to reorient the general thrust of AI research towards provably beneficial systems".

MIRI is an independent research nonprofit located near the UC Berkeley campus with a mission of helping ensure that smarter-than-human AI has a positive impact on the world.

CHCAI's research focus includes work on inverse reinforcement learning and human-robot cooperation (link), while MIRI's focus areas include task AI and computational reflection (link). Both groups are also interested in theories of (bounded) rationality that may help us develop a deeper understanding of general-purpose AI agents.


To apply:

1. Fill in the form here:

2. Send an email to with the subject line "AI safety internship application", attaching your CV, a piece of technical writing on which you were the primary author, and your research proposal.

The research proposal should be one to two pages in length. It should outline a problem you think you can make progress on over the summer, and some approaches to tackling it that you consider promising. We recommend reading over CHCAI's annotated bibliography and the concrete problems agenda as good sources for open problems in AI safety, if you haven't previously done so. You should target your proposal at a specific research agenda or a specific adviser’s interests. Advisers' interests include:

Andrew Critch (CHCAI, MIRI): anything listed in CHCAI's open technical problems; negotiable reinforcement learning; game theory for agents with transparent source code (e.g., "Program Equilibrium" and "Parametric Bounded Löb's Theorem and Robust Cooperation of Bounded Agents").

• Daniel Filan (CHCAI): the contents of "Foundational Problems," "Corrigibility," "Preference Inference," and "Reward Engineering" in CHCAI's open technical problems list.

• Dylan Hadfield-Menell (CHCAI): application of game-theoretic analysis to models of AI safety problems (specifically by people who come from a theoretical economics background); formulating and analyzing AI safety problems as CIRL games; the relationships between AI safety and principal-agent models / theories of incomplete contracting; reliability engineering in machine learning; questions about fairness.

Jessica Taylor, Scott Garrabrant, and Patrick LaVictoire (MIRI): open problems described in MIRI's agent foundations and alignment for advanced ML systems research agendas.

This application does not bind you to work on your submitted proposal. Its purpose is to demonstrate your ability to make concrete suggestions for how to make progress on a given research problem.


Who we're looking for:

This is a new and somewhat experimental program. You’ll need to be self-directed, and you'll need to have enough knowledge to get started tackling the problems. The supervisors can give you guidance on research, but they aren’t going to be teaching you the material. However, if you’re deeply motivated by research, this should be a fantastic experience. Successful applicants will demonstrate examples of technical writing, motivation and aptitude for research, and produce a concrete research proposal. We expect most successful applicants will either:

• have or be pursuing a PhD closely related to AI safety;

• have or be pursuing a PhD in an unrelated field, but currently pivoting to AI safety, with evidence of sufficient knowledge and motivation for AI safety research; or

• be an exceptional undergraduate or masters-level student with concrete evidence of research ability (e.g., publications or projects) in an area closely related to AI safety.



Program dates are flexible, and may vary from individual to individual. However, our assumption is that most people will come for twelve weeks, starting in early June. The program will take place in the San Francisco Bay Area. Basic living expenses will be covered. We can’t guarantee that housing will be all arranged for you, but we can provide assistance in finding housing if needed. Interns who are not US citizens will most likely need to apply for J-1 intern visas. Once you have been accepted to the program, we can help you with the required documentation.



The deadline for applications is the March 1. Applicants should hear back about decisions by March 20.

[Link] The types of manipulation on vote-based forums

5 pepe_prime 11 February 2017 05:09PM

Open thread, Feb. 06 - Feb. 12, 2017

5 MrMind 06 February 2017 08:34AM

If it's worth saying, but not worth its own post, then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should start on Monday, and end on Sunday.

4. Unflag the two options "Notify me of new top level comments on this article" and "

Error and Terror: Are We Worrying about the Wrong Risks?

5 philosophytorres 30 January 2017 01:07AM

I would be happy to get feedback on this article, originally posted by the IEET:


When people worry about the dark side of emerging technologies, most think of terrorists or lone psychopaths with a death wish for humanity. Some future Ted Kaczynski might acquire a masters degree in microbiology, purchase some laboratory equipment intended for biohackers, and synthesize a pathogen that spreads quickly, is incurable, and kills 90 percent of those it infects.

Alternatively, Benjamin Wittes and Gabriella Blum imagine a scenario in which a business competitor releases “a drone attack spider, purchased from a bankrupt military contractor, to take you out. … Upon spotting you with its sensors, before you have time to weigh your options, the spider—if it is, indeed, an attack spider—shoots an infinitesimally thin needle … containing a lethal dose of a synthetically produced poison.” Once this occurs, the spider exits the house and promptly self-destructs, leaving no trace behind it.

This is a rather terrifying picture of the future that, however fantastical it may sound, is not implausible given current techno-developmental trends. The fact is that emerging technologies like synthetic biology and nanotechnology are becoming exponentially more powerful as well as more accessible to small groups and even single individuals. At the extreme, we could be headed toward a world in which a large portion of society, or perhaps everyone, has access to a “doomsday button” that could annihilate our species if pressed.

This is an unsettling thought given that there are hundreds of thousands of terrorists—according to one estimate—and roughly 4 percent of the population are sociopaths—meaning that there are approximately 296 million sociopaths in our midst today. The danger posed by such agents could become existential in the foreseeable future.

But what if deranged nutcases with nefarious intentions aren’t the most significant threat to humanity? An issue that rarely comes up in such conversations is the potentially greater danger posed by well-intentioned people with access to advanced technologies. In his erudite and alarming book Our Final Hour, Sir Martin Rees distinguishes between two types of agent-related risks: terror and error. The difference between these has nothing to do with the consequences—a catastrophe caused by error could be no less devastating than one caused by terror. Rather, what matters are the intentions behind the finger that pushes a doomsday button, causing spaceship Earth to explode.

There are reasons for thinking that error could actually constitute a greater threat than terror. First, let’s assume that science and technology become democratized such that most people on the planet have access to a doomsday button of some sort. Let’s say that the global population at this time is 10 billion people.

Second, note that the number off individuals who could pose an error threat will vastly exceed the number of individuals who would pose a terror threat. (In other words, the former is a superset of the latter.) On the one hand, every terrorist hell-bent on destroying the world could end up pushing the doomsday button by accident. Perhaps while attempting to create a designer pathogen that kills everyone not vaccinated against it, a terrorist inadvertently creates a virus that escapes the laboratory and is 100 percent lethal. The result is a global pandemic that snuffs out the human species.

On the other hand, any good-intentioned hobbyist with a biohacking laboratory could also accidentally create a new kind of lethal germ. History reveals numerous leaks from highly regulated laboratories—the 2009 swine flu epidemic that killed 12,000 between 2009 and 2010 was likely caused by a laboratory mistake in the late 1970s—so it’s not implausible to imagine someone in a largely unregulated environment mistakenly releasing a pathogenic bug.

In a world where nearly everyone has access to a doomsday button, exactly how long could it last? We can, in fact, quantify the danger here. Let’s begin by imagining a world in which all 10 billion people have (for the sake of argument) a doomsday button on their smartphone. This button could be pushed at any moment if one opens up the Doomsday App. Further imagine that of the 10 billion people who live in this world, not a single one has any desire to destroy it. Everyone wants the world to continue and humanity to flourish.

Now, how likely is this world to survive the century if each individual has a tiny chance of pressing the button? Crunching a few numbers, it turns out that doom would be all-but-guaranteed if each person had a negligible 0.000001 percent chance of error. The reason is that even though the likelihood of any one person causing total annihilation on accident is incredibly small, this probability adds up across the population. With 10 billion people, one should expect an existential catastrophe even if everyone is very, very, very careful not to press the button.

Consider an alternative scenario: imagine a world of 10 billion morally good people in which only 500 have the Doomsday App on their smartphone. This constitutes a mere 0.000005 percent of the total population. Imagine further that each of these individuals has an incredibly small 1 percent chance of pushing the button each decade. How long should civilization as a whole, with its 10 billion denizens, expect to survive? Crunching a few numbers again reveals that the probability of annihilation in the next 10 years would be a whopping 99 percent—that is, more or less certain.

The staggering danger of this situation stems from the two trends mentioned above: the growing power and accessibility of technology. A world in which fanatics want to blow everything up would be extremely dangerous if “weapons of total destruction” were to become widespread. But even if future people are perfectly compassionate—perhaps because of moral bioenhancements or what Steven Pinker calls the “moral Flynn effect”—the fact of human fallibility will make survival for centuries or decades highly uncertain. As Rees puts this point:

If there were millions of independent fingers on the button of a Doomsday machine, then one person’s act of irrationality, or even one person’s error, could do us all in. … Disastrous accidents (for instance, the unintended creation or release of a noxious fast-spreading pathogen, or a devastating software error) are possible even in well-regulated institutions. As the threats become graver, and the possible perpetrators more numerous, disruption may become so pervasive that society corrodes and regresses. There is a longer-term risk even to humanity itself.

As scholars have noted, “an elementary consequence of probability theory [is] that even very improbable outcomes are very likely to happen, if we wait long enough.” The exact same goes for improbable events that could be caused by a sufficiently large number of individuals—not across time, but across space.

Could this situation be avoided? Maybe. For example, perhaps engineers could design future technologies with safety mechanisms that prevent accidents from causing widespread harm—although this may turn out to be more difficult than it seems. Or, as Ray Kurzweil suggests, we could build a high-tech nano-immune system to detect and destroy self-replicating nanobots released into the biosphere (a doomsday scenario known as “grey goo”).

Another possibility advocated by Ingmar Persson and Julian Savulescu entails making society just a little less “liberal” by trading personal privacy for global security. While many people may, at first glance, be resistant to this proposal—after all, privacy seems like a moral right of all humans—if the alternative is annihilation than this trade-off might be worth the sacrifice. Or perhaps we could adopt the notion of sousveillance, whereby citizens themselves monitor society the use of wearable cameras and other apparatuses. In other words, the surveillees (those being watched) could use advanced technologies to surveil the surveillers (those doing the watching)—a kind of “inverse panopticon” to protect people from the misuse and abuse of state power.

While terror gets the majority of attention from scholars and the media, we should all be thinking more about the existential dangers inherent in the society-wide distribution of offensive capabilities involving advanced technologies. There’s a frighteningly good chance that future civilization will be more susceptible to error than terror.

(Parts of this are excerpted from my forthcoming book Morality, Foresight, and Human Flourishing: An Introduction to Existential Risks.)

Ontologies are Operating Systems

4 lifelonglearner 18 February 2017 05:00AM

Ontologies are Operating Systems: Post-CFAR 1

[I recently came back from volunteering at a CFAR workshop. I found the whole experience to be 100% enjoyable, and I’ll be doing an actual workshop review soon. I also learned some new things and updated my mind. This is the first in a four-part series on new thoughts that I’ve gotten as a result of the workshop. If LW seems to like this one, I'll post the rest too.]

I’ve been thinking more about the idea of how we even reason about our own thinking, our “ontology of mind”, and how our internal mental model of how our brain works.


(Roughly speaking, “ontology” means the framework you view reality through, and I’ll be using it here to refer specifically to how we view our minds.)

Before I continue, it might be helpful to ask yourself some of the below questions:

  • What is my brain like, perhaps in the form of a metaphor?

  • How do I model my thoughts?

  • What things can and can’t my brain do?

  • What does it feel like when I am thinking?

  • Do my thoughts often influence my actions?

<reminder to actually think a little before continuing>

I don’t know about you, but for me, my thoughts often feel like they float into my head. There’s a general sense of effortlessly having things stream in. If I’m especially aware (i.e. metacognitive), I can then reflect on my thoughts. But for the most part, I’m filled with thoughts about the task I’m doing.

Though I don’t often go meta, I’m aware of the fact that I’m able to. In specific situations, knowing this helps me debug my thinking processes. For example, say my internal dialogue looks like this:

“Okay, so I’ve sent to forms to Steve, and now I’ve just got to do—oh wait what about my physics test—ARGH PAIN NO—now I’ve just got to do the write-up for—wait, I just thought about physics and felt some pain. Huh… I wonder why…Move past the pain, what’s bugging me about physics? It looks like I don’t want to do it because…  because I don’t think it’ll be useful?”

Because my ontology of how my thoughts operate includes the understanding that metacognition is possible, this is a “lever” I can pull on in my own mind.

I suspect that people who don’t engage in thinking about their thinking (via recursion, talking to themselves, or other things to this effect) may have a less developed internal picture of how their minds work. Things inside their head might seem to just pop in, with less explanation.

I posit that having a model of your brain that is less fleshed out affects our perception of what our brains can and can’t do.

We can imagine a hypothetical person who is self-aware and generally a fine human, except that their internal picture of their mind feels very much like a black box. They might have a sense of fatalism about some things in their mind or just feel a little confused about how their thoughts originate.

Then they come to a CFAR workshop.

What I think a lot of the CFAR rationality techniques gives these people is an upgraded internal picture of their mind with many additional levers. By “lever”, I mean a thing we can do in our brain, like metacognition or focusing (I’ll write more about levers next post). The upgraded internal picture of their mind draws attention to these levers and empowers people to have greater awareness and control in their heads by “pulling” on them.

But it’s not exactly these new levers that are the point. CFAR has mentioned that the point of teaching rationality techniques is to not only give people shiny new tools, but also improve their mindset. I agree with this view—there does seem to be something like an “optimizing mindset” that embodies rationality.

I posit that CFAR’s rationality techniques upgrade people’s ontologies of mind by changing their sense of what is possible. This, I think, is the core of an improved mindset—an increased corrigibility of mind.


Consider: Our hypothetical human goes to a rationality workshop and leaves with a lot of skills, but the general lesson is bigger than that. They’ve just seen that their thoughts can be accessed and even changed! It’s as if a huge blind spot in their thinking has been removed, and they’re now looking at entirely new classes of actions they can take!

When we talk about levers and internal models of our thinking, it’s important to remember that we’re really just talking about analogies or metaphors that exist in the mind. We don’t actually have access to our direct brain activity, so we need to make do with intermediaries that exist as concepts, which are made up of concepts, which are made up of concepts, etc etc.

Your ontology, the way that you think about how your thoughts work, is really just an abstract framework that makes it easier for “meta-you” (the part of your brain that seems like “you”) to more easily interface with your real brain.


Kind of like an operating system.

In other words, we can’t directly deal with all those neurons; our ontology, which contains thoughts, memories, internal advisors, and everything else is a conceptual interface that allows us to better manipulate information stored in our brain.

However, the operating system you acquire by interacting with CFAR-esque rationality techniques isn’t the only way type of upgraded ontology you can acquire. There exist other models which may also be just as valid. Different ontologies may draw boundaries around other mental things and empower your mind in different ways.

Leverage Research, for example, seems to be building its view of rationality from a perspective deeply grounded in introspection. I don’t know too much about them, but in a few conversations, they’ve acknowledged that their view of the mind is much more based off beliefs and internal views of things. This seems like they’d have a different sense of what is and isn’t possible.

My own personal view of rationality often views humans as merely a collection of TAPs (basically glorified if-then loops) for the most part. This ontology leads me to often think about shaping the environment, precommitment, priming/conditioning, and other ways to modify my habit structure. Within this framework of “humans as TAPs”, I search for ways to improve.

This is contrast with another view I hold of myself as an “agenty” human that has free will in a meaningful sense. Under this ontology, I’m focusing on metacognition and executive function. Of course, this assertion of my ability to choose and pick my actions seems to be at odds with my first view of myself as a habit-stuffed zombie.

It seems plausible then, that rationality techniques which often seem at odds with one another, like the above examples, occur because they’re operating on fundamentally different assumptions of how to interface with the human mind.

In some way, it seems like I’m stating that every ontology of mind is correct. But what about mindsets that model the brain as a giant hamburger? That seems obviously wrong. My response here is to appeal to practicality. In reality, all these mental models are wrong, but some of them can be useful. No ontology accurately depicts what’s happening in our brains, but the helpful ones can allows us to think better and make better choices.


The biggest takeaway for me after realizing all this was that even my mental framework, the foundation from which I built up my understanding of instrumental rationality, is itself based on certain assumptions of my ontology. And these assumptions, though perhaps reasonable, are still just a helpful abstraction that makes it easier for me to deal with my brain.


[Link] Gates 2017 Annual letter

4 ike 15 February 2017 02:39AM

What are you surprised people don't just buy?

4 AspiringRationalist 13 February 2017 01:07AM

Two of the main resources people have are time and money.  The world offers many opportunities to trade one for the other, at widely varying rates.

I've often heard people recommend trading money for time in the abstract, but this advice is rarely accompanied by specific recommendations on how to do so.

How do you use money to buy time or otherwise make your life better/easier?

See also the flip-side of this post, "what are you surprised people pay for instead of doing themselves?"

Satisfaction Levers

4 ig0r 11 February 2017 10:54PM

(Cross-posted on my blog:

I believe gnawing and uncomfortable sensations (nihilism, restlessness, etc) that one may not quite understand how to resolve are a manifestation of poorly understood desires, and there are concrete practices one can develop to help understand and resolve these sensations. We’ve come to associate certain sensations in our stomach with the idea of hunger because they are resolved by putting certain types of objects into our mouth and chewing. What if we didn’t know about food — how would we understand “hunger”? What does this say about a complex sensation like “anxiety”?

The human mind can be thought of as a machine that produces and satisfies desires. We become familiar with these desires from birth. When we exit the womb we don’t yet know how to breathe, but it is likely that we already desire to. It appears as though the mere exposure to air is sufficient to make the newborn aware that “breathing in” is an option available to it, and that upon doing so it comes to realize that this breathing thing satisfies some gnawing feeling (a desire for air). This is the mind’s first exposure to a “satisfaction lever” — an affordance for desire-satisfaction. As the mind matures it becomes aware of (produces!) new desires for itself: mother, food, stimulus, friends, approval, status, money, expression, meaning, etc. We create habits, both “good” and “bad”, that create their own desires. Pulling satisfaction levers gives us access to objects of desire — the things that can be taken from outside the organism and brought in — which temporarily satisfy some desire.

This may feel strange, but it seems that there is no a priori relationship between the sensations of desire and the corresponding objects that satisfy them. From our point of view, it feels intuitive that the hunger sensation in the stomach would logically be related to a desire for food. But as we can see with children, they often have little sense of when they are hungry or thirsty or sleepy and often adults must force some levers upon them — often in response to crankiness or general antisocial behavior on the part of the child. Over many repetitions, as the sensations of desire present themselves and are then followed by their satisfaction with a familiar pattern of objects — available through the pulling of satisfaction levers — the mind makes the association stronger and stronger until it just “is”. It is hard to imagine alternative manifestations of the feeling of hunger.

As the mind matures and continues to manufacture new desires, we must continue to seek the satisfaction levers that satiate them. Without a parent paying attention to our whining and offering us potential levers, we must seek them out on our own. This becomes especially tricky with desires that only rear their heads every once in awhile rather than on a daily basis. The ability to satisfy feelings of having low energy with exercise is a non-intuitive one, but once a habit is established the lever becomes one we can easily reach for because we know it’s there. However, often minds find themselves experiencing frustrating sensations that they don’t associate with obvious levers. Feelings described with words such as anxiety, restlessness, ennui, or nihilism may fall into this category. To expect to reason from the raw sensations to the corresponding action which would satisfy them seems exceptionally difficult. A more bountiful approach is to find some potential satisfaction levers to pull and pay attention to what happens to these ill-defined sensations.

Furthermore, there seems to be a capacity where we can seek out new levers, even if it is not clear what they may be for. Sometimes we accidentally pull a lever that gives us some unexpected feeling of relief or pleasure. This seems to be the satisfaction of a desire that one was not aware of or could not previously articulate. This is an important feeling. When this happens, one can take note of the relationship and begin building a list of “non-obvious satisfaction levers”. Then, periodically, one can scan this list. By allowing the mind to imagine pulling on one of these levers, it can feel out whether at that time it would satisfy some hidden, poorly understood desire. At the same time, by starting to map which levers satisfy which kinds of feelings, we are able to better understand and describe these amorphous feelings of desire.

Some ideas for satisfaction levers that may relate to complex, hard to describe desires:

  1. Cultivating presence and mindfulness: paying attention to the moment on a purely physical level rather than to thoughts and ideas generated
  2. Creating objects: anything from abstract art to software to social experiences
  3. Destroying objects: getting rid of stuff, tearing something down into its parts for potential reuse, clearing away or reorganizing space
  4. Taking physical or social risks: seeking out unfamiliar manifestations of fear

[Link] Introduction to Local Interpretable Model-Agnostic Explanations (LIME)

4 Gunnar_Zarncke 09 February 2017 08:29AM

Stupid Questions February 2017

4 Erfeyah 08 February 2017 07:51PM


This thread is for asking any questions that might seem obvious, tangential, silly or what-have-you. Don't be shy, everyone has holes in their knowledge, though the fewer and the smaller we can make them, the better.

Please be respectful of other people's admitting ignorance and don't mock them for it, as they're doing a noble thing.

To any future monthly posters of SQ threads, please remember to add the "stupid_questions" tag.

[Link] The humility argument for honesty

4 Benquo 05 February 2017 05:26PM

A question about the rules

4 phl43 01 February 2017 10:55PM

I'm new to Less Wrong and I have a question about the rules. I posted a link to the latest post on my blog, in which I argue in a polemical way against the claim that Trump's election caused a wave of hate crimes in the US. Someone complained about the tone of my post, which is fair enough (although I tend not to take very seriously criticism about tone that aren't accompanied by any substantive criticism), but I noticed that my link was taken down.

The same person also said that he or she thought LW tried to avoid politics, so I'm wondering if that's why the link was taken down. I don't really mind that my link was taken down, although I think part of the criticism was unfair (the person in question complained that I hadn't provided any evidence that people had made the claim I was attacking, which is true although it's only because I don't see how anyone could seriously deny it unless they have been living on another planet these past few months, but in any case I edited the post to address the criticism), but I would like to know what I'm permitted to post for future reference.

Like I said, I'm new here, so I apologize if I violated the rules and I'm not asking you to change them for me (obviously), but I would like to know what they are. (I didn't find anything that says we can't share links about politics, though it's true that when I browse past discussions, which I should probably have done in the first place, there doesn't seem to be any.) Is it forbidden to post anything that is related to politics, even if it makes a serious effort at evidence-based analysis, as I think it's fair to say my post does? I plan to post plenty of things on my blog that have nothing to do with politics, such as the post I just shared about moral relativism, but I just want to make sure I don't run afoul of the rules again.

[Link] "What Happens When Doctors Only Take Cash"? Everybody, Especially Patients, Wins

4 morganism 30 January 2017 11:57PM

[Link] Did slavery make the US an economic superpower and would the industrial revolution have happened without it?

4 phl43 27 January 2017 09:01PM

View more: Next