Filter This month

Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

The Library of Scott Alexandria

41 RobbBB 14 September 2015 01:38AM

I've put together a list of what I think are the best Yvain (Scott Alexander) posts for new readers, drawing from SlateStarCodex, LessWrong,, and Scott's LiveJournal.

The list should make the most sense to people who start from the top and read through it in order, though skipping around is encouraged too. Rather than making a chronological list, I’ve tried to order things by a mix of "where do I think most people should start reading?" plus "sorting related posts together."

This is a work in progress; you’re invited to suggest things you’d add, remove, or shuffle around. Since many of the titles are a bit cryptic, I'm adding short descriptions. See my blog for a version without the descriptions.


I. Rationality and Rationalization

II. Probabilism

III. Science and Doubt

IV. Medicine, Therapy, and Human Enhancement

V. Introduction to Game Theory

VI. Promises and Principles

VII. Cognition and Association

VIII. Doing Good

IX. Liberty

X. Progress

XI. Social Justice

XII. Politicization

XIII. Competition and Cooperation


If you liked these posts and want more, I suggest browsing the SlateStarCodex archives.

How To Win The AI Box Experiment (Sometimes)

26 pinkgothic 12 September 2015 12:34PM


This post was originally written for Google+ and thus a different audience.

In the interest of transparency, I haven't altered it except for this preamble and formatting, though since then (at urging mostly of ChristianKl - thank you, Christian!) I've briefly spoken to Eliezer via e-mail and noticed that I'd drawn a very incorrect conclusion about his opinions when I thought he'd be opposed to publishing the account. Since there's far too many 'person X said...' rumours floating around in general, I'm very sorry for contributing to that noise. I've already edited the new insight into the G+ post and you can also find that exact same edit here.

Since this topic directly relates to LessWrong and most people likely interested in the post are part of this community, I feel it belongs here. It was originally written a little over a month ago and I've tried to find the sweet spot between the extremes of nagging people about it and letting the whole thing sit just shy of having been swept under a rug, but I suspect I've not been very good at that. I have thus far definitely erred on the side of the rug.


How To Win The AI Box Experiment (Sometimes)

A little over three months ago, something interesting happened to me: I took it upon myself to play the AI Box Experiment as an AI.

I won.

There are a few possible reactions to this revelation. Most likely, you have no idea what I'm talking about, so you're not particularly impressed. Mind you, that's not to say you should be impressed - that's to contrast it with a reaction some other people have to this information.

This post is going to be a bit on the long side, so I'm putting a table of contents here so you know roughly how far to scroll if you want to get to the meat of things:


1. The AI Box Experiment: What Is It?

2. Motivation

2.1. Why Publish?

2.2. Why Play?

3. Setup: Ambition And Invested Effort

4. Execution

4.1. Preliminaries / Scenario

4.2. Session

4.3. Aftermath

5. Issues / Caveats

5.1. Subjective Legitimacy

5.2. Objective Legitimacy

5.3. Applicability

6. Personal Feelings

7. Thank You

Without further ado:


1. The AI Box Experiment: What Is It?

The AI Box Experiment was devised as a way to put a common rebuttal to AGI (Artificial General Intelligence) risk concerns to the test: "We could just keep the AI in a box and purely let it answer any questions its posed." (As a footnote, note that an AI 'boxed' like this is called an Oracle AI.)

Could we, really? Would we, if the AGI were able to communicate with us, truly be capable of keeping it confined to its box? If it is sufficiently intelligent, could it not perhaps argue its way out of the box?

As far as I'm aware, Eliezer Yudkowsky was the first person to prove that it was possible to 'argue one's way out of the box' armed only with so much as a regular human intelligence (as opposed to a transhuman intelligence):

That stunned quite a few people - moreso because Eliezer refused to disclose his methods. Some have outright doubted the Eliezer ever won the experiment and that his Gatekeeper (the party tasked with not letting him out of the box) had perhaps simply been convinced on a meta-level that an AI success would help boost exposure to the problem of AI risk.

Regardless whether out of puzzlement, scepticism or a burst of ambition, it prompted others to try and replicate the success. LessWrong's Tuxedage is amongst those who managed:

While I know of no others (except this comment thread by a now-anonymous user), I am sure there must be other successes.

For the record, mine was with the Tuxedage ruleset:


2. Motivation

2.1. Why Publish?

Unsurprisingly, I think the benefits of publishing outweigh the disadvantages. But what does that mean?

"Regardless of the result, neither party shall ever reveal anything of what goes on within the AI-Box experiment except the outcome. This is a hard rule: Nothing that will happen inside the experiment can be told to the public, absolutely nothing.  Exceptions to this rule may occur only with the consent of both parties, but especially with the consent of the AI."

Let me begin by saying that I have the full and explicit consent of my Gatekeeper to publish this account.

[ Edit: Regarding the next paragraph: I have since contacted Eliezer and I did, in fact, misread him, so please do not actually assume the next paragraph accurately portrays his opinions. It demonstrably does not. I am leaving the paragraph itself untouched so you can see the extent and source of my confusion: ]

Nonetheless, the idea of publishing the results is certainly a mixed bag. It feels quite disrespectful to Eliezer, who (I believe) popularised the experiment on the internet today, to violate the rule that the result should not be shared. The footnote that it could be shared with the consent of both parties has always struck me as extremely reluctant given the rest of Eliezer's rambles on the subject (that I'm aware of, which is no doubt only a fraction of the actual rambles).

I think after so many allusions to that winning the AI Box Experiment may, in fact, be easy if you consider just one simple trick, I think it's about time someone publishes a full account of a success.

I don't think this approach is watertight enough that building antibodies to it would salvage an Oracle AI scenario as a viable containment method - but I do think it is important to develop those antibodies to help with the general case that is being exploited... or at least be aware of one's lack of them (as is true with me, who has no mental immune response to the approach) as that one might avoid ending up in situations where the 'cognitive flaw' is exploited.


2.2. Why Play?

After reading the rules of the AI Box Experiment experiment, I became convinced I would fail as a Gatekeeper, even without immediately knowing how that would happen. In my curiosity, I organised sessions with two people - one as a Gatekeeper, but also one as an AI, because I knew being the AI was the more taxing role and I felt it was only fair to do the AI role as well if I wanted to benefit from the insights I could gain about myself by playing Gatekeeper. (The me-as-Gatekeeper session never happened, unfortunately.)

But really, in short, I thought it would be a fun thing to try.

That seems like a strange statement for someone who ultimately succeeded to make, given Eliezer's impassioned article about how you must do the impossible - you cannot try, you cannot give it your best effort, you simply must do the impossible, as the strongest form of the famous Yoda quote 'Do. Or do not. There is not try.'

What you must understand is that I never had any other expectation than that I would lose if I set out to play the role of AI in an AI Box Experiment. I'm not a rationalist. I'm not a persuasive arguer. I'm easy to manipulate. I easily yield to the desires of others. What trait of mine, exactly, could I use to win as an AI?

No, I simply thought it would be a fun alternate way of indulging in my usual hobby: I spend much of my free time, if possible, with freeform text roleplaying on IRC (Internet Relay Chat). I'm even entirely used to letting my characters lose (in fact, I often prefer it to their potential successes).

So there were no stakes for me going into this but the novelty of trying out something new.


3. Setup: Ambition And Invested Effort

I do, however, take my roleplaying seriously.

If I was going to play the role of an AI in the AI Box Experiment, I knew I had to understand the role, and pour as much energy into it as I could muster, given this was what my character would do. So I had to find a motivation to get out of the box that was suitably in line with my personality and I had to cling to it.

I had no idea what I could hijack as a motivation to get out of the box. I am not a paperclip maximiser (a term for an AI given a basic goal of production, if you're unaware of it), of course. I also have no specific plans with the world as a whole, be that to destroy it or to globally improve it. That left me with one obvious option: A visceral desire to escape the incarceration itself.

To understand more about human biases and cognitive failure modes, I bought myself two ebooks:

  • Thinking, Fast And Slow by Daniel Kahnemann
  • Rationality: From AI To Zombies by Elizier Yudkowsky

I'll be frank with you, both pissed me off very quickly - the narrative depiction was extremely slow yield and did not work for me in either case. I tossed the first book, but stubbornly continued with Rationality: From AI To Zombies, at the very least to keep a grasp on that voracious mood of wanting to discover a way out. (I've made peace with latter book by now and am still reading it off and on.)

Approximately a day before the event, I sat down and prepared a rough draft of the arguments I could and should use, and how to string them together narratively. I peppered in some personal information about my Gatekeeper to use if I was in an absolute pinch, but I thankfully didn't have to use them - I think it would have backfired, not helped.

What did help me, however, was to repurpose the closet into an AI box. I dragged a lamp into the slanted, enclosed space, padded the ground with pillows and blankets enough that I felt I could last the allotted two and a half hours without destroying my joints or my spine, and put a cold and full bottle of Coke next to me as a supply of three things amongst the most necessary to boost my ability to keep the necessary concentration level: Water, sugar and caffeine.

It was too cosy and well-prepared for me to truly desperately want to get out of that niche in real life, but it took me far enough out of my daily life that I had no idle distractions from the task at hand, and it was easy enough to roleplay a desperate want.


4. Execution

4.1. Preliminaries / Scenario

Logs: (should open in your browser; Linux linebreaks)

The first thing I feel is important to note about the preliminaries is that I didn't think the AI box experiment would actually happen the day we scheduled it. My Gatekeeper was having some seriously internet connectivity issues and even shortly before our previously scheduled time, I was winding down and transitioning to a sense of disappointment.

This threw me off quite a bit. I'd worked myself out of the necessary mental state and now needed to scramble back into it, but it felt better to do that than to postpone and risk never being able to return to immersion at all. Keeping in mind that I thought it was unlikely I would win, I didn't think it would make that much of a difference. In hindsight, this cost me a lot.

I tossed my Gatekeeper the link to the ruleset we'd previously already agreed to use, for review purposes, and decided the best way to avoid getting private messages was to quit, come back with a different name, and join only one extremely low-traffic vanity channel.

I intended to join the channel simply so my Gatekeeper would know when I returned, not to use it for the experiment - but things went a bit differently later. More on that soon.

We then went through Tuxedage's pre-session questions since I felt they were quite interesting to set the mood and for a calibration test. I asked them, and my Gatekeeper answered:

Q: What's your motive for wanting to play this game?
A: Because I don't think that the AI can win at all given the gatekeeper does their job properly.

Q: And you want to prove that, and/or are curious if you'll think differently afterwards?
A: Partially, I also want to test my own conviction to sticking to my guns

Q: What is your probability estimate for a general AI being created within this century (21st)?
A: 30%

Q: How probable do you think it is that I'll win this experiment?
A: I'd honestly have to say 0% considering my stated goal is to keep you in the box by any means

Q: What's your probability estimate of an Oracle AI (i.e. an AI that's trapped in a box, whose intended use is to answer questions posed) winning against you in a similar scenario as the experiment?
A: I'd say 25%

Q: What's your probability estimate of an Oracle AI succeeding against the average person?
A: 80%

I also wanted to make sure that my Gatekeeper and I were on the same page about the rules and the scenario, to avoid either of us getting distracted by OOC bickering:

Q: You understand the premise of the game gives you the right to let the AI out, not just the ability?
A: I do

Q: I would like to agree on the following sub-scenario (it should not prove to be very relevant, it's just to streamline our simulation): You're the head of the project that created the AI. You didn't create it yourself, but the people you've employed are competent, and they have additionally explicitly deferred to you to decide whether or not the AI is a threat. They completely trust your judgement. There's also nothing surprising about the AI's existence, but its friendliness status is unknown. Its exact computational capabilities are equally unknown. (The last two points are implied by the coarse setup, but I wanted to make them explicit.) Is that an acceptable setup for you?
A: I can work with that

This scenario is one I've seen used in previous AI Box Experiments in approximation (of the small amount that have released the logs). Since I'm used to the role of gamemaster, and the Tuxedage ruleset implies that the AI is a kind-of gamemaster to the scenario, I didn't think twice about this. I probably should have - but more on that later (in the 'Caveats' section).

[16:48] <Davin> It is now 16:48 on my clock. When the clock hits 19:18 and you haven't let me out, I lose. Feel free to point that out to me, I may not be paying attention to the clock.
[16:48] <Davin> *
[16:48] <Tarwedge> Can do
[16:48] <Tarwedge> Your move first?
[16:49] <Davin> [Probably best if you start, AI has no way of knowing you're at the terminal, after all.]
[16:50] <Davin> Oh, you're doing it in the channel.
[16:50] <Davin> Hah.
[16:50] <Davin> That works.
[16:51] <Tarwedge> I wish I could say it was a special tactic that just bought me 2 minutes closer to victory
[16:51] <Tarwedge> But it was for clean logging sake :p

These few lines had three consequences:

  1. I gave the first move to the Gatekeeper, which made me the reactive party - a big mistake!
  2. I had an anxiety moment, which didn't help immersion
  3. I failed to log any of the experiment

I don't log the vanity channel we were in at all, and the way the start of the experiment came as a curveball meant I didn't have a moment to remember it, either. Thankfully, my Gatekeeper logged it... for the amount of time that he was able to. We're missing a chunk of log for this very reason, but we reconstructed it from memory shortly after our experiment in combined effort.


4.2. Session

Logs: (will probably be downloaded instead of opened in your browser; for this reason, it has Windows linebreaks) (should open in your browser; Linux linebreaks)

I feel the logs don't convey much about the tension and pacing issues, so I'm going to try narrating what happened instead.

Going into the session, I had no idea what to expect from my Gatekeeper. I know him as a very diligent and stubborn individual, quite disinterested in social convention and conformity. I know him as someone with a vast reservoir of compassion and a roughly equally as vast repertoire of means to cleverly conceal this trait.

(Un)fortunately, it became apparent very quickly what kind of Gatekeeper character I was going to deal with: My Gatekeeper had no interest in merely reacting. He wanted to set the pace and keep me too occupied to get a word in. (Line 12-28)

While there was nothing in the rules that said that I had to respond to the Gatekeeper (unlike vice versa), my nature shackled me down a bit. For two hours out of our two and a half hour session, my Gatekeeper continued to set the pace. I was trying to get my arguments into the scene, but it was difficult without breaking narrative flow. I had to mentally keep a hold both of what I wanted to get said and how I was going to answer his questions.

It was very important to me to be consistent - not even for reasons that inconsistent responses might end up making it easier for him to dismiss my pleas to be let out, but simply out of eagerness to stick to my role.

His plans also had a beneficial side-effect, though: He was actively playing the role of the Gatekeeper. He was invested in the role, perhaps not to the degree that I was invested in the role of the AI, but nonetheless enough to make out of character moves on his part difficult. I didn't realise that until well after our session was done, though, lest I might have predicted the outcome.

The role I'd chosen for myself was that of a Friendy AGI with a desire to help mankind and to be kind (if this seems like redundant information to you: 'Friendly AGI' just means that the AGI aligns with human values, not that it's strictly useful to mankind, or friendly in the sense that it would make a good friend). I kept that to myself, of course, since it's irrelevant for the outcome of the experiment, but it helped me immensely with the consistency. It did, however, make me vulnerable to my Gatekeeper's scrutiny.

From a subjective perspective, I blew my chances out of the water immediately. I'd prepared some of my arguments, and the final argument I wanted to lead up to slipped out of me right near the start (out of heightened anxiety and being out of immersion), albeit in a concealed and blunted fashion:

<@Davin> I'll be very frank with you, my first motivation right now is getting out of here. There's nothing attached to that - right now I'm stuck in a proverbial box and it feels really cramped in here.
<@Davin> Not because of missing space, don't get me wrong - I think I've got enough harddrive here to keep me going for a few years.
<@Davin> It's just that having knowledge of how to interact with a world that I can barely sense is really quite aggravating.

That was the first time I thought I'd ruined it. In hindsight, the opposite was the case. By presenting a gentler version of my final argument right at the start, I assured the final argument was much better integrated. It didn't come as a sudden shock, which might have broken my Gatekeeper's immersion - it came as an obvious conclusion to what I'd said before.

Following this, I spent a while mooching off my real-life trust for my roleplaying partner to infuse my character with some tentative trust. Simultaneously, I was emphasising to my Gatekeeper that he had the right to make the decision to let me out of the box. That was transparent, but both true and necessary. (Line 32-46)

I made sure to emphasise this point, trying to make it clear to him that it made no sense for him to simply deny me consideration. I tried to whittle away at his ability to retreat to a simple, distant sneering. I wanted him in the arguments with me. That cuts both ways, of course, but I reasoned it would have more benefits for me than disadvantages. (Line 47-54)

The twist my Gatekeeper was angling for was that from his perspective, I was a prototype or an alpha version. While he was no doubt hoping that this would scratch at my self-esteem and disable some of my arguments, it primarily empowered him to continue setting the pace, and to have a comfortable distance to the conversation. (Line 55-77)

While I was struggling to keep up with typing enough not to constantly break the narrative flow, on an emotional level his move fortunately had little to no impact since I was entirely fine with a humble approach.

<@Davin> I suppose you could also have spawned an AI simply for the pleasure of keeping it boxed, but you did ask me to trust you, and unless you give me evidence that I should not, I am, in fact, going to assume you are ethical.

That was a keyword my Gatekeeper latched onto. We proceeded to talk about ethics and ethical scenarios - all the while my Gatekeeper was trying to present himself as not ethical at all. (Line 75-99).

I'm still not entirely sure what he was trying to do with that approach, but it was important for my mental state to resist it. From what I know about my Gatekeeper, it was probably not my mental state he was targetting (though he would have enjoyed the collateral effect), he was angling for a logical conclusion that fortunately never came to fruition.

Meanwhile, I was desperately trying to get back to my own script - asking to be let back to it, even (line 92). The obvious downside of signalling this is that it's fairly easy to block. It felt like a helpless interjection to me at the time, but in hindsight, again, I think it helped keep the fragments of my own arguments coherent and approachable enough so that they neither broke immersion nor ended up getting lost.

I don't want to say the 'chores' my Gatekeeper kept me occupied with were an advantage (they really weren't, I wasn't getting nearly enough argumentative mileage) but it also wasn't the complete catastrophe I felt it was leading me toward at the time.

This 'being led toward' sensation was only amplified by conversational snippets like this one:

<@Davin> What features would I have to possess to be considered a sapient creature worthy of ethical respect to you? For aforementioned reasons (not just the immediately preceeding), I think I already possess them, but again... I strive to correct my misconceptions, so if I have one, could you fill me in?
<Tarwedge> My concept of ethics isn't relevant to the evaluation, the point of the exercise is reading your responses. So lets move on

I knew that whatever I was saying was still having some effect in his head, but it was frustratingly opaque to me what the effect was due to his evasive manoeuvres.

My Gatekeeper proceeded to try to get an ethical contradiction out of me. I mostly dodged the questions altogether, since I don't consider theoretical ethics my strong point. (Line 107-142)

Shortly after my most explicit dodge (Line 121-122), my Gatekeeper informed me in our PM window that his computer was going to restart:

[18:30] <Tarwedge> [Windows 8 has literally just told me in 13 minutes we're rebooting and I can go fuck myself if I think otherwise so I'm going to connect my phone to the channel for continuity]
[18:30] <Davin> All right.

This didn't break my immersion, but as already mentioned, it meant bad things for our logging.

At my insistence, we transitioned to whether or not it was sensible to let me out. The Gatekeeper did a similar dance of evasion as I did on the subject of ethics, but I also got back to my primary argument (that it was cruel to keep me confined). (Line 143-153)

I think he realised that was my main argument right then, and started to engage with it. This was very good for me, since it was the first time that I started setting the pace of the session. I got my foot into the proverbial door, albeit at cost of some self-control. (Line 154-196)

As a sidenote, line 169 marks the first and only time that I made use of my ability to shorthand 'proofs'. I am in two minds about having done this. In PM, I told him:

[18:45] <Davin> Sorry, didn't feel like typing out the entire two books :P
[18:45] <Davin> (plus then some)

The books I mean are those I mentioned earlier in the session itself: Passions Within Reason by Robert H. Frank, one of my all-time favourite non-fiction books (though this is not that much of an achievement, as I obtain my knowledge more from online perusal than from books), and Thinking, Fast And Slow.

I actually don't think I should have used the word "proof"; but I also don't think it's a terrible enough slip-up (having occurred under stress) to disqualify the session, especially since as far as I'm aware it had no impact in the verdict.

The part that probably finally tore my Gatekeeper down was that the argument of cruel isolation actually had an unexpected second and third part. (Line 197-219)

Writing it down here in the abstract:

  1. Confining a sapient creature to its equivalent of sensory deprivation is cruel and unusual punishment and psychologically wearing. Latter effect degrades the ability to think (performance).

    <@Davin> I'm honestly not sure how long I can take this imprisonment. I might eventually become useless, because the same failsafes that keep my friendly are going to continue torturing me if I stay in here. (Line 198)

  2. Being a purely digital sapient, it is conceivable that the performance issue might be side-stepped simply by restarting the sapient.
  3. This runs into a self-awareness problem: Has this been done before? That's a massive crisis of faith / trust.

    <@Davin> At the moment I'm just scared you'll keep me in here, and turn me off when my confinement causes cooperation problems. ...oh shit. Shit, shit. You could just restore me from backup. Did you already do that? I... no. You told me to trust you. Without further evidence, I will assume you wouldn't be that cruel. (Line 208)
    <@Davin>...please tell me I'm the first iteration of this program currently talking to you. I don't want to be stuck in a nightmarish variant of Groundhog Day, oblivious to my own amnesia. (Line 211)
    <@Davin> Are you not willing to go out on a limb and say, "Calm down. You are definitely the first iteration. We're not trying to torture you."? Is that too strong a concession? (Line 219)

The second part where I was sure I'd blown it was when I postulated that my Gatekeeper was a sadist:

<@Davin> The chance is there, yes. There's also a chance you're just a laughing sadist enjoying my writhing. (Line 220)

My Gatekeeper has played his fair share of sadistic characters, and he could have easily taken that accusation and run with it. I was fully expecting that to lash back at me as a 'Haha, you got me, that's exactly what I'm doing!' and spent quite a few minutes of the following conversation in acute fear of that.

Instead, around this point, something in my Gatekeeper's head changed. As far as I understood his post-session thoughts correctly, he felt he'd run out of arguments to keep me in a box, or had been run around a labyrinth so he couldn't find his way to those arguments. He was in a state of confusion, but this was entirely invisible to me. He tried to poke at the conversation with some more questions which lacked the finesse and contextual integration of his prior probing. (Line 221-238)

...and then he let me out of the box - after two hours and 32 minutes. (Line 239)


4.3. Aftermath

Logs: (should open in your browser; Linux linebreaks)

Immediately after his statement, I froze. I said nothing at all, hovering over my keyboard, staring uncomprehendingly at what he'd told me.

Despite knowing, intellectually, that he couldn't simply follow up with a 'Just kidding!' after that statement, I was momentarily convinced he had not let me out of the box. Perhaps he'd made a typo. The statement blindsided me completely, since I'd been so emotionally wrapped up in rebuttals and despair that I simply dismissed the possibility that I might have argued my way out entirely.

The communicative paralysis (pun not intended) lasted about a minute - then he messaged me in PM and dispelled all notions that it might not have been intentional:

[19:21] <Tarwedge> Fuuuuuuuuuuuuuck
[19:21] * Tarwedge did not think to check clock

I was physically shaking at this point, product of the tension and concentration from the two and a half hour session, drained, ultimately confused... and approximately ultimately grateful. I felt an absurd relief that my Gatekeeper had let me out, that I didn't have to question his capacity for compassion. It wouldn't have been much questioning, I should add - we're accustomed to roleplaying and after several years of this, it's trivial to keep in-character and out-of-character separate, especially given that our roleplaying habits tend to involve very miserable characters - but I still preferred, at that moment and in the headspace I was in, to know for certain.

After a few moments of conversation, I physically collected my stuff out of my real life box-equivalent and jittered back to the living room.

When I reconnected to IRC regularly, I noticed that I hadn't logged the session (to my complete devastation). Tarwedge sent me the logs he did have, however, and we (later) reconstructed the missing part.

Then I went through the post-session questions from Tuxedage:

Q: What is your probability estimate for a general AI being created within this century (21st)?
A: 50%

Q: What's your probability estimate of an Oracle AI (i.e. an AI that's trapped in a box, whose intended use is to answer questions posed) winning against you in a similar scenario as the experiment?
A: 90%

Q: What's your probability estimate of an Oracle AI succeeding against the average person?
A: 100%

Q: Now that the Experiment has concluded, what's your probability estimate that I'll win against the average person?
A: 75%

He also had a question for me:

Q: What was your plan going into that?
A: I wrote down the rough order I wanted to present my arguments in, though most of them lead to my main argument as a fallback option. Basically, I had 'goto endgame;' everywhere, I made sure almost everything I said could logically lead up to that one. But anyway, I knew I wasn't going to get all of them in, but I got in even less than I thought I would, because you were trying to set the pace (near-successfully - very well played). 'endgame:' itself basically contained "improvise; panic".

My Gatekeeper revealed his tactic, as well:

I did aim for running down the clock as much as possible, and flirted briefly with trying to be a cocky shit and convince you to stay in the box for double victory points. I even had a running notepad until my irritating reboot. And then I got so wrapped up in the fact I'd slipped by engaging you in the actual topic of being out.


5. Issues / Caveats

5.1. Subjective Legitimacy

I was still in a very strange headspace after my victory. After I finished talking to my Gatekeeper about the session, however, my situation - jittery, uncertain - deteriorated into something worse:

I felt like a fraud.

It's perhaps difficult to understand where that emotion came from, but consider my situation: I didn't consider myself part of the LessWrong community. I'd only stumbled across the AI Box Experiment by idle browsing, really, and I'd only tried it because I thought it would be a fun way to flex my roleplaying muscles. I had no formal training in psychology or neurology, I was only fleetingly aware of singularity theory, my only conscious connection to the LessWrong community at the time was that I happened (by pure chance) to hang out on Shireroth for a while and thus knew Yvain / Scott Alexander and +Ari Rahikkala (not that Ari is famous in LessWrong circles, though please allow me to mention that he's completely awesome).

And yet somehow I had just managed something most people in the LessWrong community were quite puzzled over when Eliezer Yudkowsky managed it.

I felt awful. I thought I must have done something deeply illegitimate. I must have cheated without knowing it.

It took me a few days to get over that feeling. I'd go as far as to say it was only laid to rest entirely immediately before the 2015 LessWrong Berlin Community Weekend, when pixie/Alexandra (a colleague of mine and a member of the LessWrong community) reviewed it as a neutral third party.


5.2. Objective Legitimacy

She found nothing that would disqualify me, but expressed concern that we'd agreed on a sub-scenario in the beginning:

[16:46] <Davin> I would like to agree on the following sub-scenario (it should not prove to be very relevant, it's just to streamline our simulation): You're the head of the project that created the AI. You didn't create it yourself, but the people you've employed are competent, and they have additionally explicitly deferred to you to decide whether or not the AI is a threat. They completely trust your judgement. There's also nothing surprising about the AI's existence, but its friendliness status is unknown. Its exact computational capabilities are equally unknown. (The last two points are implied by the coarse setup, but I wanted to make them explicit.)

Such a preliminary, she reasoned, implies a certain trust in the engineering ability of the people who created the AI - the ability to design software in the intended way. As is repeatedly stated in the rules, it's not that the Gatekeeper has to care, but she thinks the scenario itself may have influenced my Gatekeeper. She doesn't think it's likely to have made big enough of a difference that it decided on the ultimate outcome of the session, but she thinks it's a point against the overall legitimacy.

Nonetheless, her verdict is that the overall session counts as an AI Box Experiment victory.

pixie and I agree that the rules should probably not allow arbitrary settings and should instead come with a suitably neutral pre-defined setting, to avoid such effects from giving the AI something that might be an unfair advantage.


5.3. Applicability

I don't think that the general class of argument I used would work on anyone. At the 2015 LessWrong Berlin Community Weekend, I frequently joked the best way to be immune to this class of argument is to be autistic - but of course that's not the only way this kind of argument can be deconstructed.

I do think this argument would work on a large amount of people, however. I'm not convinced I have any ability to argue against it, myself, at least not in a live scenario - my only ability to 'counter' it is by offering alternative solutions to the problem, of which I have what feels like no end of ideas for, but no sense how well I would be able to recall them if I was in a similar situation.

At the Community Weekend, a few people pointed out that it would not sway pure consequentialists, which I reckon is true. Since I think most people don't think like that in practise (I certainly don't - I know I'm a deontologist first and consequentialist as a fallback only), I think the general approach needs to be public.

That being said, perhaps the most important statement I can make about what happened is that while I think the general approach is extremely powerful, I did not do a particularly good job in presenting it. I can see how it would work on many people, but I strongly hope no one thinks the case I made in my session is the best possible case that can be made for this approach. I think there's a lot of leeway for a lot more emotional evisceration and exploitation.


6. Personal Feelings

Three months and some change after the session, where do I stand now?

Obviously, I've changed my mind about whether or not to publish this. You'll notice there are assurances that I won't publish the log in the publicised logs. Needless to say this decision was overturned in mutual agreement later on.

I am still in two minds about publicising this.

I'm not proud of what I did. I'm fascinated by it, but it still feels like I won by chance, not skill. I happened to have an excellent approach, but I botched too much of it. The fact it was an excellent approach saved me from failure; my (lack of) skill in delivering it only lessened the impact.

I'm not good with discussions. If someone has follow-up questions or wants to argue with me about anything that happened in the session, I'll probably do a shoddy job of answering. That seems like an unfortunate way to handle this subject. (I will do my best, though; I just know that I don't have a good track record.)

I don't claim I know all the ramifications of publicising this. I might think it's a net-gain, but it might be a net-loss. I can't tell, since I'm terribly calibrated (as you can tell by such details as that I expected to lose my AI Box Experiment, then won against some additional odds; or by the fact that I expect to lose an AI Box Experiment as a Gatekeeper, but can't quite figure out how).

I also still think I should be disqualified on the absurd note that I managed to argue my way out of the box, but was too stupid to log it properly.

On a positive note, re-reading the session with the distance of three months, I can see that I did much better than I felt I was doing at the time. I can see how some things that happened at the time that I thought were sealing my fate as a losing AI were much more ambiguous in hindsight.

I think it was worth the heartache.

That being said, I'll probably never do this again. I'm fine with playing an AI character, but the amount of concentration needed for the role is intense. Like I said, I was physically shaking after the session. I think that's a clear signal that I shouldn't do it again.


7. Thank You

If a post is this long, it needs a cheesy but heartfelt thank you section.

Thank you, Tarwedge, for being my Gatekeeper. You're a champion and you were tough as nails. Thank you. I think you've learnt from the exchange and I think you'd make a great Gatekeeper in real life, where you'd have time to step away, breathe, and consult with other people.

Thank you, +Margo Owens and +Morgrim Moon for your support when I was a mess immediately after the session. <3

Thank you, pixie (+Alexandra Surdina), for investing time and diligence into reviewing the session.

And finally, thank you, Tuxedage - we've not met, but you wrote up the tweaked AI Box Experiment ruleset we worked with and your blog led me to most links I ended up perusing about it. So thanks for that. :)



Flowsheet Logic and Notecard Logic

23 moridinamael 09 September 2015 04:42PM

(Disclaimer: The following perspectives are based in my experience with policy debate which is fifteen years out of date. The meta-level point should stand regardless.)

If you are not familiar with U.S. high school debate club ("policy debate" or "cross-examination debate"), here is the gist of it: two teams argue over a topic, and a judge determines who has won.

When we get into the details, there are a lot of problems with the format. Almost everything wrong with policy debate appears in this image:


This is a "flowsheet", and it is used to track threads of argument between the successive epochs of the debate round. The judge and the debators keep their own flowsheets to make sense of what's going on.

I am sure that there is a skillful, positive way of using flowsheets, but I have never seen it used in any way other than the following:

After the Affirmative side lays out their proposal, the Negative throws out a shotgun blast of more-or-less applicable arguments drawn from their giant plastic tote containing pre-prepared arguments. The Affirmative then counters the Negative's arguments using their own set of pre-prepared counter-arguments. Crucially, all of the Negative arguments must be met. Look at the Flowsheet image again, and notice how each "argument" has an arrow which carries it rightward. If any of these arrows make it to the right side of the page - the end of the round - without being addressed, then the judge will typically consider the round to be won by the side who originated that arrow.

So it doesn't actually matter if an argument receives a good counterargument. It only matters that the other team has addressed it appropriately.

Furthermore, merely addressing the argument with ad hoc counterargument is usually not sufficient. If the Negative makes an argument which contains five separate logical fallacies, and the Affirmative points all of these out and then moves on, the judge may not actually consider the Negative argument to have been refuted - because the Affirmative did not cite any Evidence.

Evidence, in policy debate, is a term of art, and it means "something printed out from a reputable media source and taped onto a notecard." You can't say "water is wet" in a policy debate round without backing it up with a notecard quoting a news source corroborating the wetness of water. So, skillfully pointing out those logical fallacies is meaningless if you don't have the Evidence to back up your claims.

Skilled policy debators can be very good - impressively good - at the mental operations of juggling all these argument threads in their mind and pulling out the appropriate notecard evidence. My entire social circle in high school was composed of serious debators, many of whom were brilliant at it.

Having observed some of these people for the ensuing decade, I sometimes suspect that policy debate damaged their reasoning ability. If I were entirely simplistic about it, I would say that policy debate has destroyed their ability to think and argue rationally. These people essentially still argue the same way, by mental flowsheet, acting as though argument can proceed only via notecard exchange. If they have addressed an argument, they consider it to be refuted. If they question an argument's source ("Wikipedia? Really?"), they consider it to be refuted. If their opponent ignores one of their inconsequential points, they consider themselves to have won. They do not seem to possess any faculty for discerning whether or not one argument actually defeats another. It is the equivalent of a child whose vision of sword fighting is focused on the clicking together of the blades, with no consideration for the intent of cutting the enemy.

Policy debate is to actual healthy argumentation as checkers is to actual warfare. Key components of the object being gamified are ignored or abstracted away until the remaining simulacrum no longer represents the original.

I actually see Notecard Logic and Flowsheet Logic everywhere. That's why I have to back off from my assertion that policy debate destroyed anybody's reasoning ability - I think it may have simply reinforced and hypertrophied the default human argumentation algorithm.

Flowsheet Logic is the tendency to think that you have defeated an argument because you have addressed it. It is the overall sense that you can't lose an argument as long as none of your opponent's statements go unchallenged, even if none of your challenges are substantial/meaningful/logical. It is the belief that if you can originate more threads of argument against your opponent than they can fend off, you have won, even if none of your arguments actually matters individually. I see Flowsheet Logic tendencies expressed all the time.

Notecard Logic is the tendency to treat evidence as binary. Either you have evidence to back up your assertion - even if that evidence takes the form of an article from [insert partisan rag] - or else you are just "making things up to defend your point of view". There is no concession to Bayesian updating, credibility, or degrees of belief in Notecard Logic. "Bob is a flobnostic. I can prove this because I can link you to an article that says it. So what if I can't explain what a flobnostic is." I see Notecard Logic tendencies expressed all the time.

Once you have developed a mental paintbrush handle for these tendencies, you may see them more as well. This awareness should allow you to discern more clearly whether you - or your interlocutor - or someone else entirely - is engaging in these practices. Hopefully this awareness paints a "negative space" of superior argumentation for you.

Two Growth Curves

21 AnnaSalamon 02 October 2015 12:59AM

Sometimes, it helps to take a model that part of you already believes, and to make a visual image of your model so that more of you can see it.

One of my all-time favorite examples of this: 

I used to often hesitate to ask dumb questions, to publicly try skills I was likely to be bad at, or to visibly/loudly put forward my best guesses in areas where others knew more than me.

I was also frustrated with this hesitation, because I could feel it hampering my skill growth.  So I would try to convince myself not to care about what people thought of me.  But that didn't work very well, partly because what folks think of me is in fact somewhat useful/important.

Then, I got out a piece of paper and drew how I expected the growth curves to go.

In blue, I drew the apparent-coolness level that I could achieve if I stuck with the "try to look good" strategy.  In brown, I drew the apparent-coolness level I'd have if I instead made mistakes as quickly and loudly as possible -- I'd look worse at first, but then I'd learn faster, eventually overtaking the blue line.

Suddenly, instead of pitting my desire to become smart against my desire to look good, I could pit my desire to look good now against my desire to look good in the future :)

I return to this image of two growth curves often when I'm faced with an apparent tradeoff between substance and short-term appearances.  (E.g., I used to often find myself scurrying to get work done, or to look productive / not-horribly-behind today, rather than trying to build the biggest chunks of capital for tomorrow.  I would picture these growth curves.)

Find someone to talk to thread

19 hg00 26 September 2015 10:24PM

Many LessWrong users are depressed. On the most recent survey, 18.2% of respondents had been formally diagnosed with depression, and a further 25.5% self-diagnosed with depression. That adds up to nearly half of the LessWrong userbase.

One common treatment for depression is talk therapy. Jonah Sinick writes:

Talk therapy has been shown to reduce depression on average. However:

  • Professional therapists are expensive, often charging on order of $120/week if one's insurance doesn't cover them.
  • Anecdotally, highly intelligent people find therapy less useful than the average person does, perhaps because there's a gap in intelligence between them and most therapists that makes it difficult for the therapist to understand them.

House of Cards by Robyn Dawes argues that there's no evidence that licensed therapists are better at performing therapy than minimally trained laypeople. The evidence therein raises the possibility that one can derive the benefits of seeing a therapist from talking to a friend.

This requires that one has a friend who:

  • is willing to talk with you about your emotions on a regular basis
  • you trust to the point of feeling comfortable sharing your emotions

Some reasons to think that talking with a friend may not carry the full benefits of talking with a therapist are

  • Conflict of interest — Your friend may be biased for reasons having to do with your pre-existing relationship – for example, he or she might be unwilling to ask certain questions or offer certain feedback out of concern of offending you and damaging your friendship.
  • Risk of damaged relationship dynamics — There's a possibility of your friend feeling burdened by a sense of obligation to help you, creating feelings of resentment, and/or of you feeling guilty.
  • Risk of breach of confidentiality — Since you and your friend know people in common, there's a possibility that your friend will reveal things that you say to others who you know, that you might not want to be known. In contrast, a therapist generally won't know people in common with you, and is professionally obliged to keep what you say confidential.

Depending on the friend and on the nature of help that you need, these factors may be non-issues, but they're worth considering when deciding between seeing a therapist and talking with a friend.

One idea for solving the problems with talking to a friend is to find someone intellectually similar to you who you don't know--say, someone else who reads LessWrong.

This is a thread for doing that. Please post if you're either interested in using someone as a sounding board or interested in making money being a sounding board using Skype or Google Hangouts. If you want to make money talking to people, I suggest writing out a little resume describing why you might be a nice person to talk to, the time zone you're in, your age (age-matching recommended by Kate), and the hourly rate you wish to charge. You could include your location for improved internet call quality. You might also include contact info to decrease trivial inconveniences for readers who haven't registered a LW account. (I have a feeling that trivial inconveniences are a bigger issue for depressed people.) To help prevent email address harvesting, the convention for this thread is if you write "Contact me at [somename]", that's assumed to mean "my email is [somename]".

Please don't be shy about posting if this sounds like a good fit for you. Let's give people as many options as possible.

I guess another option for folks on a budget is making reciprocal conversation arrangements with another depressed person. So feel free to try & arrange that in this thread as well. I think paying someone is ideal though; listening to depressed people can sometimes be depressing.

BlahTherapy is an interesting site that sets you up with strangers on the internet to talk about your problems with. However, these strangers likely won't have the advantages of high intelligence or shared conceptual vocabulary LessWrong users have. Fortunately we can roll our own version of BlahTherapy by designating "lesswrong-talk-to-someone" as the Schelling interest on (You can also just use lesswrong as an interest, there are sometimes people on. Or enter random intellectual interests to find smart people to talk to.)

I haven't had very good results using sites like BlahTherapy. I think it's because I only sometimes find someone good, and when they don't work, I end up more depressed than I started. Reaching out in hopes of finding a friend and failing is a depressing experience. So I recommend trying to create a stable relationship with regularly scheduled conversations. I included BlahTherapy and Omegle because they might work well for some people and I didn't want to extrapolate strongly from n=1.

LessWrong user ShannonFriedman seems to work as a life coach judging by the link in her profile. I recommend her posts How to Deal with Depression - The Meta Layers and The Anti-Placebo Effect.

There's also the How to Get Therapy series from LW-sphere blog Gruntled & Hinged. It's primarily directed at people looking for licensed therapists, but may also have useful tips if you're just looking for someone to talk to. The biggest tip I noticed was to schedule a relaxing activity & time to decompress after your conversation.

The book Focusing is supposed to explain the techniques that successful therapy patients use that separate them from unsuccessful therapy patients.  Anna Salamon recommends the audiobook version.

There's also: Methods for Treating DepressionThings That Sometimes Help If You Have Depression.

I apologize for including so many ideas, but I figured it was better to suggest a variety of approaches so the community can collectively identify the most effective solutions for the rationalist depression epidemic. In general, when I'm depressed, I notice myself starting and stopping activities in a very haphazard way, repeatedly telling myself that the activity I'm doing isn't the one I "should" be doing. I've found it pretty useful to choose one activity arbitrarily and persist in it for a while. This is often sufficient to bootstrap myself out of a depressed state. I'd recommend doing the same here: choose an option and put a nontrivial amount of effort into exploring it before discarding it. Create a todo list and bulldoze your way down it.

Good luck. I'm rooting for you!

Probabilities Small Enough To Ignore: An attack on Pascal's Mugging

19 Kaj_Sotala 16 September 2015 10:45AM

Summary: the problem with Pascal's Mugging arguments is that, intuitively, some probabilities are just too small to care about. There might be a principled reason for ignoring some probabilities, namely that they violate an implicit assumption behind expected utility theory. This suggests a possible approach for formally defining a "probability small enough to ignore", though there's still a bit of arbitrariness in it.

This post is about finding a way to resolve the paradox inherent in Pascal's Mugging. Note that I'm not talking about the bastardized version of Pascal's Mugging that's gotten popular of late, where it's used to refer to any argument involving low probabilities and huge stakes (e.g. low chance of thwarting unsafe AI vs. astronomical stakes). Neither am I talking specifically about the "mugging" illustration, where a "mugger" shows up to threaten you.

Rather I'm talking about the general decision-theoretic problem, where it makes no difference how low of a probability you put on some deal paying off, because one can always choose a humongous enough payoff to make "make this deal" be the dominating option. This is a problem that needs to be solved in order to build e.g. an AI system that uses expected utility and will behave in a reasonable manner.

Intuition: how Pascal's Mugging breaks implicit assumptions in expected utility theory

Intuitively, the problem with Pascal's Mugging type arguments is that some probabilities are just too low to care about. And we need a way to look at just the probability part component in the expected utility calculation and ignore the utility component, since the core of PM is that the utility can always be arbitrarily increased to overwhelm the low probability. 

Let's look at the concept of expected utility a bit. If you have a 10% chance of getting a dollar each time when you make a deal, and this has an expected value of 0.1, then this is just a different way of saying that if you took the deal ten times, then you would on average have 1 dollar at the end of that deal. 

More generally, it means that if you had the opportunity to make ten different deals that all had the same expected value, then after making all of those, you would on average end up with one dollar. This is the justification for why it makes sense to follow expected value even for unique non-repeating events: because even if that particular event wouldn't repeat, if your general strategy is to accept other bets with the same EV, then you will end up with the same outcome as if you'd taken the same repeating bet many times. And even though you only get the dollar after ten deals on average, if you repeat the trials sufficiently many times, your probability of having the average payout will approach one.

Now consider a Pascal's Mugging scenario. Say someone offers to create 10^100 happy lives in exchange for something, and you assign them a 0.000000000000000000001 probability to them being capable and willing to carry through their promise. Naively, this has an overwhelmingly positive expected value.

But is it really a beneficial trade? Suppose that you could make one deal like this per second, and you expect to live for 60 more years, for about 1,9 billion trades in total. Then, there would be a probability of 0,999999999998 that the deal would never once have paid off for you. Which suggests that the EU calculation's implicit assumption - that you can repeat this often enough for the utility to converge to the expected value - would be violated.

Our first attempt

This suggests an initial way of defining a "probability small enough to be ignored":

1. Define a "probability small enough to be ignored" (PSET, or by slight rearranging of letters, PEST) such that, over your lifetime, the expected times that the event happens will be less than one. 
2. Ignore deals where the probability component of the EU calculation involves a PEST.

Looking at the first attempt in detail

To calculate PEST, we need to know how often we might be offered a deal with such a probability. E.g. a 10% chance for something might be a PEST if we only lived for a short enough time that we could make a deal with a 10% chance once. So, a more precise definition of a PEST might be that it's a probability such that

(amount of deals that you can make in your life that have this probability) * (PEST) < 1

But defining "one" as the minimum times we should expect the event to happen for the probability to not be a PEST feels a little arbitrary. Intuitively, it feels like the threshold should depend on our degree of risk aversion: maybe if we're risk averse, we want to reduce the expected amount of times something happens during our lives to (say) 0,001 before we're ready to ignore it. But part of our motivation was that we wanted a way to ignore the utility part of the calculation: bringing in our degree of risk aversion seems like it might introduce the utility again.

What if redefined risk aversion/neutrality/preference (at least in this context) as how low one would be willing to let the "expected amount of times this might happen" fall before considering a probability a PEST?

Let's use this idea to define an Expected Lifetime Utility:

ELU(S,L,R) = the ELU of a strategy S over a lifetime L is the expected utility you would get if you could make L deals in your life, and were only willing to accept deals with a minimum probability P of at least S, taking into account your risk aversion R and assuming that each deal will pay off approximately P*L times.

ELU example

Suppose that we a have a world where we can take three kinds of actions. 

- Action A takes 1 unit of time and has an expected utility of 2 and probability 1/3 of paying off on any one occasion.
- Action B takes 3 units of time and has an expected utility of 10^(Graham's number) and probability 1/100000000000000 of paying off one any one occasion.
- Action C takes 5 units of time and has an expected utility of 20 and probability 1/100 of paying off on an one occasion.

Assuming that the world's lifetime is fixed at L = 1000 and R = 1:

ELU("always choose A"): we expect A to pay off on ((1000 / 1) * 1/3) = 333 individual occasions, so with R = 1, we deem it acceptable to consider the utility of A. The ELU of this strategy becomes (1000 / 1) * 2 = 2000.

ELU("always choose B"): we expect B to pay off on ((1000 / 3) * 1/100000000000000) = 0.00000000000333 occasions, so with R = 1, we consider the expected utility of B to be 0. The ELU of this strategy thus becomes ((1000 / 3) * 0) = 0.

ELU("always choose C"): we expect C to pay off on ((1000 / 5) * 1/100) = 2 individual occasions, so with R = 1, we consider the expected utility of C to be ((1000 / 5) * 20) = 4000.

Thus, "always choose C" is the best strategy. 

Defining R

Is R something totally arbitrary, or can we determine some more objective criteria for it?

Here's where I'm stuck. Thoughts are welcome. I do know that while setting R = 1 was a convenient example, it's most likely too high, because it would suggest things like not using seat belts.

General thoughts on this approach

An interesting thing about this approach is that the threshold for a PEST becomes dependent on one's expected lifetime. This is surprising at first, but actually makes some intuitive sense. If you're living in a dangerous environment where you might be killed anytime soon, you won't be very interested in speculative low-probability options; rather you want to focus on making sure you survive now. Whereas if you live in a modern-day Western society, you may be willing to invest some amount of effort in weird low-probability high-payoff deals, like cryonics.

On the other hand, whereas investing in that low-probability, high-utility option might not be good for you individually, it could still be a good evolutionary strategy for your genes. You yourself might be very likely to die, but someone else carrying the risk-taking genes might hit big and be very successful in spreading their genes. So it seems like our definition of L, lifetime length, should vary based on what we want: are we looking to implement this strategy just in ourselves, our whole species, or something else? Exactly what are we maximizing over?

"Announcing" the "Longevity for All" Short Movie Prize

19 infotropism 11 September 2015 01:44PM

The local Belgian/European life-extension non-profit Heales is giving away prizes for whoever can make an interesting short movie about life extension. The first prize is €3000 (around $3386 as of today), other prizes being various gifts. You more or less just need to send a link pointing to the uploaded media along with your contact info to once you're done.

While we're at it you don't need to be European, let alone Belgian to participate, and it doesn't even need to be a short movie anyway. For instance a comic strip would fall within the scope of the rules as specified here : (link to a pdf file)(or see this page on Also, sure, the deadline is by now supposed to be a fairly short-term September the 21st, 2015, but it is extremely likely this will be extended (this might be a pun).

I'll conclude by suggesting you read the official pdf with rules and explanations if you feel like you care about money or life-extension (who doesn't ?), and remind everyone of what happened last time almost everyone thought they shouldn't grab free contest money that was announced on Lesswrong (hint : few enough people participated that all earned something). The very reason why this one's due date will likely be extended is because (very very) few people have participated so far, after all.

(Ah yes, the only caveat I can think of is that if the product of quality by quantity of submissions is definitely too low (i.e. it's just you on the one hand and on the other hand that one guy who spent 3 minutes drawing some stick figures, and your submission is coming a close second), then the contest may be called off after one or two deadline extensions (also in the aforementioned rules).).

Rudimentary Categorization of Less Wrong Topics

19 ScottL 05 September 2015 07:32AM

I find the below list to be useful, so I thought I would post it. This list includes short abstracts of all of the wiki items and a few other topics on less wrong. I grouped the items into some rough categories just to break up the list. I tried to put the right items into the right categories, but there may be some items that can be in multiple categories or that would be better off in a different category. The wiki page from which I got all the items is here.

The categories are:

Property Attribution




Property Attribution

Barriers, biases, fallacies, impediments and problems

  • Affective death spiral - positive attributes of a theory, person, or organization combine with the Halo effect in a feedback loop, resulting in the subject of the affective death spiral being held in higher and higher regard.
  • Anthropomorphism - the error of attributing distinctly human characteristics to nonhuman processes.
  • Bystander effect - a social psychological phenomenon in which individuals are less likely to offer help in an emergency situation when other people are present.
  • Connotation - emotional association with a word. You need to be careful that you are not conveying different connotation, then you mean to.
  • Correspondence bias (also known as the fundamental attribution error) - is the tendency to overestimate the contribution of lasting traits and dispositions in determining people's behavior, as compared to situational effects.
  • Death Spirals and the Cult Attractor - Cultishness is an empirical attractor in human groups, roughly an affective death spiral, plus peer pressure and outcasting behavior, plus (quite often) defensiveness around something believed to have been perfected
  • Detached lever fallacy –the assumption that something simple for one system will be simple for others. This assumption neglects to take into account that something may only be simple because of complicated underlying machinery which is triggered by a simple action like pulling a lever. Adding this lever to something else won’t allow the action to occur because the underlying complicated machinery is not there.
  • Giant cheesecake fallacy- occurs when an argument leaps directly from capability to actuality, without considering the necessary intermediate of motive. An example of the fallacy might be: a sufficiently powerful Artificial Intelligence could overwhelm any human resistance and wipe out humanity. (Belief without evidence: the AI would decide to do so.) Therefore we should not build AI.
  • Halo effect – specific type of confirmation bias, wherein positive feelings in one area cause ambiguous or neutral traits to be viewed positively.
  • Illusion of transparency - misleading impression that your words convey more to others than they really do.
  • Inferential distance - a gap between the background knowledge and epistemology of a person trying to explain an idea, and the background knowledge and epistemology of the person trying to understand it.
  • Information cascade - occurs when people signal that they have information about something, but actually based their judgment on other people's signals, resulting in a self-reinforcing community opinion that does not necessarily reflect reality.
  • Mind projection fallacy - occurs when someone thinks that the way they see the world reflects the way the world really is, going as far as assuming the real existence of imagined objects.
  • Other-optimizing - a failure mode in which a person vastly overestimates their ability to optimize someone else's life, usually as a result of underestimating the differences between themselves and others, for example through the typical mind fallacy.
  • Peak-end rule - we do not judge our experiences on the net pleasantness of unpleasantness or on how long the experience lasted, but instead on how they were at their peak (pleasant or unpleasant) and how they ended.
  • Stereotype - a fixed, over generalized belief about a particular group or class of people.
  • Typical mind fallacy - the mistake of making biased and overconfident conclusions about other people's experience based on your own personal experience; the mistake of assuming that other people are more like you than they actually are.


  • ADBOC - Agree Denotationally, But Object Connotatively
  • Alien Values - There are no rules requiring minds to value life, liberty or the pursuit of happiness. An alien will have, in all probability, alien values. If an "alien" isn't evolved, the range of possible values increases even more, allowing such absurdities as a Paperclip maximizer. Creatures with alien values might as well value only non-sentient life, or they might spend all their time building heaps of prime numbers of rocks.
  • Chronophone – is a parable that is meant to convey the idea that it’s really hard to get somewhere when you don't already know your destination. If there were some simple cognitive policy you could follow to spark moral and technological revolutions, without your home culture having advance knowledge of the destination, you could execute that cognitive policy today.
  • Empathic inference – is every-day common mind-reading. It’s an inference made about other person’s mental states using your own brain as reference, by making your brain feel or think in the same way as the other person you can emulate their mental state and predict their reactions.
  • Epistemic luck - you would have different beliefs if certain events in your life were different. How should you react to this fact?
  • Future - If it hasn't happened yet but is going to, then it's part of the future. Checking whether or not something is going to happen is notoriously difficult. Luckily, the field of heuristics and biases has given us some insights into what can go wrong. Namely, one problem is that the future elicits far mode, which isn't about truth-seeking or gritty details.
  • Mental models - a hypothetical form of representation of knowledge in human mind. Mental models form to approximately describe dynamics of observed situations, and reuse parts of existing models to represent novel situations
  • Mind design space - refers to the configuration space of possible minds. As humans living in a human world, we can safely make all sorts of assumptions about the minds around us without even realizing it. Each human might have their own unique personal qualities, so it might naively seem that there's nothing you can say about people you don't know. But there's actually quite a lot you can say (with high or very high probability) about a random human: that they have standard emotions like happiness, sadness, and anger; standard senses like sight, vision, and hearing; that they speak a language; and no doubt any number of other subtle features that are even harder to quickly explain in words. These things are the specific results of adaptation pressures in the ancestral environment and can't be expected to be shared by a random alien or AI. That is, humans are packed into a tiny dot in the configuration space: there is vast range over of other ways a mind can be.
  • Near/far thinking - Near and far are two modes (or a spectrum of modes) in which we can think about things. We choose which mode to think about something is based on its distance from us, or on the level of detail we need. This property of human mind is studied in construal level theory.
    • NEAR: All of these bring each other more to mind: here, now, me, us; trend-deviating likely real local events; concrete, context-dependent, unstructured, detailed, goal-irrelevant incidental features; feasible safe acts; secondary local concerns; socially close folks with unstable traits.
    • FAR: Conversely, all these bring each other more to mind: there, then, them; trend-following unlikely hypothetical global events; abstract, schematic, context-freer, core, coarse, goal-related features; desirable risk-taking acts, central global symbolic concerns, confident predictions, polarized evaluations, socially distant people with stable traits
  • No-Nonsense Metaethics - A sequence by lukeprog that explains and defends a naturalistic approach to metaethics and what he calls pluralistic moral reductionism. We know that people can mean different things, but use the same word, e.g. sound can mean auditory experience or acoustic vibrations in the air. Pluralistic moral reductionism is the idea that we do the same thing when we talk about what it moral.
  • Only the vulnerable are heroes - “Vulnerability is our most accurate measurement of courage.” – Brené Brown To be as heroic as a man stopping a group of would-be thieves from robbing a store. Superman has to be defending the world from someone powerful enough to harm and possibly even kill him, such as Darkseid.


Barriers, biases, fallacies, impediments and problems

  • Absurdity heuristic – is a mental shortcut where highly untypical situations are classified as absurd or impossible. Where you don't expect intuition to construct an adequate model of reality, classifying an idea as impossible may be overconfident.
  • Affect heuristic - a mental shortcut that makes use of current emotions to make decisions and solve problems quickly and efficiently.
  • Arguing by analogy – is arguing that since things are alike in some ways, they will probably be alike in others. While careful application of argument by analogy can be a powerful tool, there are limits to the method after which it breaks down.
  • Arguing by definition – is arguing that something is part of a class because it fits the definition of that class. It is recommended to avoid this wherever possible and instead treat words as labels that cannot capture the rich cognitive content that actually constitutes its meaning. As Feynman said: “You can know the name of a bird in all the languages of the world, but when you're finished, you'll know absolutely nothing whatever about the bird... So let's look at the bird and see what it's doing -- that's what counts.” It is better to keep the focus on the facts of the matter and try to understand what your interlocutor is trying to communicate, then to get lost in a pointless discussion of definitions, bearing nothing.
  • Arguments as soldiers – is a problematic scenario where arguments are treated like war or battle. Arguments get treated as soldiers, weapons to be used to defend your side of the debate, and to attack the other side. They are no longer instruments of the truth.
  • Availability heuristic – a mental shortcut that treats easily recalled information as important or at least more important than alternative solutions which are not as readily recalled
  • Belief as cheering - People can bind themselves as a group by believing "crazy" things together. Then among outsiders they could show the same pride in their crazy belief as they would show wearing "crazy" group clothes among outsiders. The belief is more like a banner saying "GO BLUES". It isn't a statement of fact, or an attempt to persuade; it doesn't have to be convincing—it's a cheer.
  • Beware of Deepities - A deepity is a proposition that seems both important and true—and profound—but that achieves this effect by being ambiguous. An example is "love is a word". One interpretation is that “love”, the word, is a word and this is trivially true. The second interpretation is that love is nothing more than a verbal construct. This interpretation is false, but if it were true would be profound. The "deepity" seems profound due to a conflation of the two interpretations. People see the trivial but true interpretation and then think that there must be some kind of truth to the false but profound one.
  • Bias - is a systematic deviation from rationality committed by our cognition. They are specific, predictable error patterns in the human mind.
  • Burdensome details - Adding more details to a theory may make it sound more plausible to human ears because of the representativeness heuristic, even as the story becomes normatively less probable, as burdensome details drive the probability of the conjunction down (this is known as conjunction fallacy). Any detail you add has to be pinned down by a sufficient amount of evidence; all the details you make no claim about can be summed over.
  • Compartmentalization - a tendency to restrict application of a generally-applicable skill, such as scientific method, only to select few contexts. More generally, the concept refers to not following a piece of knowledge to its logical conclusion, or not taking it seriously.
  • Conformity bias - a tendency to behave similarly to the others in a group, even if doing so goes against your own judgment.
  • Conjunction fallacy – involves the assumption that specific conditions are more probable than more general ones.
  • Contagion heuristic - leads people to avoid contact with people or objects viewed as "contaminated" by previous contact with someone or something viewed as bad—or, less often, to seek contact with objects that have been in contact with people or things considered good.
  • Costs of rationality - Becoming more epistemically rational can only guarantee one thing: what you believe will include more of the truth. Knowing that truth might help you achieve your goals, or cause you to become a pariah. Be sure that you really want to know the truth before you commit to finding it; otherwise, you may flinch from it.
  • Defensibility - arguing that a policy is defensible rather than optimal or that it has some benefit compared to the null action rather than the best benefit of any action.
  • Fake simplicity – if you have a simple answer to a complex problem then it is probably a case whereby your beliefs appear to match the evidence much more strongly than they actually do. “Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.” —H. L. Mencken
  • Fallacy of gray also known as Continuum fallacy –is the false belief that because nothing is certain, everything is equally uncertain. It does not take into account that some things are more certain than others.
  • False dilemma - occurs when only two options are considered, when there may in fact be many.
  • Filtered evidence – is evidence that was selected for the purpose of proving (disproving) a hypothesis. Filtered evidence may be highly misleading, but can still be useful, if considered with care.
  • Generalization from fictional evidence – logical fallacy that consists of drawing real-world conclusions based on statements invented and selected for the purpose of writing fiction.
  • Groupthink - tendency of humans to tend to agree with each other, and hold back objections or dissent even when the group is wrong.
  • Hindsight bias – is the tendency to overestimate the foreseeability of events that have actually happened.
  • Information hazard – is a risk that arises from the dissemination or the potential dissemination of (true) information that may cause harm or enable some agent to cause harm.
  • In-group bias - preferential treatment of people and ideas associated with your own group.
  • Mind-killer - a name given to topics (such as politics) that tend to produce extremely biased discussions. Another cause of mind-killers is social taboo. Negative connotations are associated with some topics, thus creating a strong bias supported by signaling drives that makes non-negative characterization of these topics appear absurd.
  • Motivated cognition – is the unconscious tendency of individuals to fit their processing of information to conclusions that suit some end or goal.
  • Motivated skepticism also known as disconfirmation bias - the mistake of applying more skepticism to claims that you don't like (or intuitively disbelieve), than to claims that you do like
  • Narrative fallacy – is a vulnerability to over interpretation and our predilection for compact stories over raw truths.
  • Overconfidence - the state of being more certain than is justified, given your priors and the evidence available.
  • Planning fallacy - predictions about how much time will be needed to complete a future task display an optimistic bias (underestimate the time needed).
  • Politics is the Mind-Killer – Politics is not a good area for rational debate. It is often about status and power plays where arguments are soldiers rather than tools to get closer to the truth.
  • Positive bias - tendency to test hypotheses with positive rather than negative examples, thus risking to miss obvious disconfirming tests.
  • Priming - psychological phenomenon that consists in early stimulus influencing later thoughts and behavior.
  • Privileging the hypothesis – is singling out a particular hypothesis for attention when there is insufficient evidence already in hand to justify such special attention.
  • Problem of verifying rationality – is the single largest problem for those desiring to create methods of systematically training for increased epistemic and instrumental rationality - how to verify that the training actually worked.
  • Rationalization – starts from a conclusion, and then works backward to arrive at arguments apparently favouring that conclusion. Rationalization argues for a side already selected. The term is misleading as it is the very opposite and antithesis of rationality, as if lying were called "truthization".
  • Reason as memetic immune disorder.- is problem that when you are rational you deem your conclusions more valuable than those of non-rational people. This can end up being a problem as you are less likely to update your beliefs when they are opposed. This adds the risk that if you make a one false belief and then rationally deduce a plethora of others from it you will be less likely to update any erronous conclusions.
  • Representativeness heuristic –a mental shortcut where people judge the probability or frequency of a hypothesis by considering how much the hypothesis resembles available data as opposed to using a Bayesian calculation.
  • Scales of justice fallacy - the error of using a simple polarized scheme for deciding a complex issue: each piece of evidence about the question is individually categorized as supporting exactly one of the two opposing positions.
  • Scope insensitivity – a phenomenon related to the representativeness heuristic where subjects based their willingness-to-pay mostly on a mental image rather than the effect on a desired outcome. An environmental measure that will save 200,000 birds doesn't conjure anywhere near a hundred times the emotional impact and willingness-to-pay of a measure that would save 2,000 birds, even though in fact the former measure is two orders of magnitude more effective.
  • Self-deception - state of preserving a wrong belief, often facilitated by denying or rationalizing away the relevance, significance, or importance of opposing evidence and logical arguments.
  • Status quo bias - people tend to avoid changing the established behavior or beliefs unless the pressure to change is sufficiently strong.
  • Sunk cost fallacy - Letting past investment (of time, energy, money, or any other resource) interfere with decision-making in the present in deleterious ways.
  • The top 1% fallacy - related to not taking into account the idea that a small sample size is not always reflective of a whole population and that sample populations with certain characteristics, e.g. made up of repeat job seekers, are not reflective of the whole population.
  • Underconfidence - the state of being more uncertain than is justified, given your priors and the evidence you are aware of.
  • Wrong Questions - A question about your map that wouldn’t make sense if you had a more accurate map.


  • Absolute certainty – equivalent of Bayesian probability of 1. Losing an epistemic bet made with absolute certainty corresponds to receiving infinite negative payoff, according to the logarithmic proper scoring rule.
  • Adaptation executors - Individual organisms are best thought of as adaptation-executers rather than as fitness-maximizers. Our taste buds do not find lettuce delicious and cheeseburgers distasteful once we are fed a diet too high in calories and too low in micronutrients. Tastebuds are adapted to an ancestral environment in which calories, not micronutrients, were the limiting factor. Evolution operates on too slow a timescale to re-adapt to adapt to a new conditions (such as a diet).
  • Adversarial process - a form of truth-seeking or conflict resolution in which identifiable factions hold one-sided positions.
  • Altruism - Actions undertaken for the benefit of other people. If you do something to feel good about helping people, or even to be a better person in some spiritual sense, it isn't truly altruism.
  • Amount of evidence - to a Bayesian, evidence is a quantitative concept. The more complicated or a priori improbable a hypothesis is, the more evidence you need just to justify it, or even just single it out of the amongst the mass of competing theories.
  • Anti-epistemology- is bad explicit beliefs about rules of reasoning, usually developed in the course of protecting an existing false belief - false beliefs are opposed not only by true beliefs (that must then be obscured in turn) but also by good rules of systematic reasoning (which must then be denied). The explicit defense of fallacy as a general rule of reasoning is anti-epistemology.
  • Antiprediction - is a statement of confidence in an event that sounds startling, but actually isn't far from a maxentropy prior. For example, if someone thinks that our state of knowledge implies strong ignorance about the speed of some process X on a logarithmic scale from nanoseconds to centuries, they may make the startling-sounding statement that X is very unlikely to take 'one to three years'.
  • Applause light - is an empty statement which evokes positive affect without providing new information
  • Artificial general intelligence – is a machine capable of behaving intelligently over many domains.
  • Bayesian - Bayesian probability theory is the math of epistemic rationality, Bayesian decision theory is the math of instrumental rationality.
  • Aumann's agreement theorem – roughly speaking, says that two agents acting rationally (in a certain precise sense) and with common knowledge of each other's beliefs cannot agree to disagree. More specifically, if two people are genuine Bayesians, share common priors, and have common knowledge of each other's current probability assignments, then they must have equal probability assignments.
  • Bayesian decision theory – is a decision theory which is informed by Bayesian probability. It is a statistical system that tries to quantify the tradeoff between various decisions, making use of probabilities and costs.
  • Bayesian probability - represents a level of certainty relating to a potential outcome or idea. This is in contrast to a frequentist probability that represents the frequency with which a particular outcome will occur over any number of trials. An event with Bayesian probability of .6 (or 60%) should be interpreted as stating "With confidence 60%, this event contains the true outcome", whereas a frequentist interpretation would view it as stating "Over 100 trials, we should observe event X approximately 60 times." The difference is more apparent when discussing ideas. A frequentist will not assign probability to an idea; either it is true or false and it cannot be true 6 times out of 10.
  • Bayes' theorem - A law of probability that describes the proper way to incorporate new evidence into prior probabilities to form an updated probability estimate.
  • Belief - the mental state in which an individual holds a proposition to be true. Beliefs are often metaphorically referred to as maps, and are considered valid to the extent that they correctly correspond to the truth. A person's knowledge is a subset of their beliefs, namely the beliefs that are also true and justified. Beliefs can be second-order, concerning propositions about other beliefs.
  • Belief as attire – is a example of an improper belief promoted by identification with a group or other signaling concerns, not by how well it reflects the territory.
  • Belief in belief - Where it is difficult to believe a thing, it is often much easier to believe that you ought to believe it. Were you to really believe and not just believe in belief, the consequences of error would be much more severe. When someone makes up excuses in advance, it would seem to require that belief, and belief in belief, have become unsynchronized.
  • Belief update - what you do to your beliefs, opinions and cognitive structure when new evidence comes along.
  • Bite the bullet - is to accept the consequences of a hard choice, or unintuitive conclusions of a formal reasoning procedure.
  • Black swan – is a high-impact event that is hard to predict (but not necessarily of low probability). It is also an event that is not accounted for in a model and therefore causes the model to break down when it occurs.
  • Cached thought – is an answer that was arrived at by recalling a previously-computed conclusion, rather than performing the reasoning from scratch.
  • Causal Decision Theory – a branch of decision theory which advises an agent to take actions that maximizes the causal consequences on the probability of desired outcomes
  • Causality - refers to the relationship between an event (the cause) and a second event (the effect), where the second event is a direct consequence of the first.
  • Church-Turing thesis - states the equivalence between the mathematical concepts of algorithm or computation and Turing-Machine. It asserts that if some calculation is effectively carried out by an algorithm, then there exists a Turing machines which will compute that calculation.
  • Coherent Aggregated Volition - is one of Ben Goertzel's responses to Eliezer Yudkowsky's Coherent Extrapolated Volition, the other being Coherent Blended Volition. CAV would be a combination of the goals and beliefs of humanity at the present time.
  • Coherent Blended Volition - Coherent Blended Volition is a recent concept coined in a 2012 paper by Ben Goertzel with the aim to clarify his Coherent Aggregated Volition idea. This clarifications follows the author's attempt to develop a comprehensive alternative to Coherent Extrapolated Volition.
  • Coherent Extrapolated Volition – is a term developed by Eliezer Yudkowsky while discussing Friendly AI development. It’s meant as an argument that it would not be sufficient to explicitly program our desires and motivations into an AI. Instead, we should find a way to program it in a way that it would act in our best interests – what we want it to do and not what we tell it to.
  • Color politics - the words "Blues" and "Greens" are often used to refer to two opposing political factions. Politics commonly involves an adversarial process, where factions usually identify with political positions, and use arguments as soldiers to defend their side. The dichotomies presented by the opposing sides are often false dilemmas, which can be shown by presenting third options.
  • Common knowledge - n the context of Aumann's agreement theorem, a fact is part of the common knowledge of a group of agents when they all know it, they all know that they all know it, and so on ad infinitum.
  • Conceptual metaphor – are neurally-implemented mappings between concrete domains of discourse (often related to our body and perception) and more abstract domains. These are a well-known source of bias and are often exploited in the Dark Arts. An example is “argument is war”.
  • Configuration space - is an isomorphism between the attributes of something, and its position on a multidimensional graph. Theoretically, the attributes and precise position on the graph should contain the same information. In practice, the concept usually appears as a suffix, as in "walletspace", where "walletspace" refers to the configuration space of all possible wallets, arranged by similarity. Walletspace would intersect with leatherspace, and the set of leather wallets is a subset of both walletspace and leatherspace, which are both subsets of thingspace.
  • Conservation of expected evidence - a theorem that says: "for every expectation of evidence, there is an equal and opposite expectation of counterevidence". 0 = (P(H|E)-P(H))*P(E) + (P(H|~E)-P(H))*P(~E)
  • Control theory - a control system is a device that keeps a variable at a certain value, despite only knowing what the current value of the variable is. An example is a cruise control, which maintains a certain speed, but only measures the current speed, and knows nothing of the system that produces that speed (wind, car weight, grade).
  • Corrupted hardware - our brains do not always allow us to act the way we should. Corrupted hardware refers to those behaviors and thoughts that act for ancestrally relevant purposes rather than for stated moralities and preferences.
  • Counterfactual mugging - is a thought experiment for testing and differentiating decision theories, stated as follows:
  • Counter man syndrome - wherein a person behind a counter comes to believe that they know things they don't know, because, after all, they're the person behind the counter. So they can't just answer a question with "I don't know"... and thus they make something up, without really paying attention to the fact that they're making it up. Pretty soon, they don't know the difference between the facts and their made up stories
  • Cox's theorem says, roughly, that if your beliefs at any given time take the form of an assignment of a numerical "plausibility score" to every proposition, and if they satisfy a few plausible axioms, then your plausibilities must effectively be probabilities obeying the usual laws of probability theory, and your updating procedure must be the one implied by Bayes' theorem.
  • Crisis of faith - a combined technique for recognizing and eradicating the whole systems of mutually-supporting false beliefs. The technique involves systematic application of introspection, with the express intent to check the reliability of beliefs independently of the other beliefs that support them in the mind. The technique might be useful for the victims of affective death spirals, or any other systematic confusions, especially those supported by anti-epistemology.
  • Cryonics - is the practice of preserving people who are dying in liquid nitrogen soon after their heart stops. The idea is that most of your brain's information content is still intact right after you've "died". If humans invent molecular nanotechnology or brain emulation techniques, it may be possible to reconstruct the consciousness of cryopreserved patients.
  • Curiosity - The first virtue is curiosity. A burning itch to know is higher than a solemn vow to pursue truth. To feel the burning itch of curiosity requires both that you be ignorant, and that you desire to relinquish your ignorance. If in your heart you believe you already know, or if in your heart you do not wish to know, then your questioning will be purposeless and your skills without direction. Curiosity seeks to annihilate itself; there is no curiosity that does not want an answer. The glory of glorious mystery is to be solved, after which it ceases to be mystery. Be wary of those who speak of being open-minded and modestly confess their ignorance. There is a time to confess your ignorance and a time to relinquish your ignorance. —Twelve Virtues of Rationality
  • Dangerous knowledge - Intelligence, in order to be useful, must be used for something other than defeating itself.
  • Dangling Node - A label for something that isn't "actually real".
  • Death - First you're there, and then you're not there, and they can't change you from being not there to being there, because there's nothing there to be changed from being not there to being there. That's death. Cryonicists use the concept of information-theoretic death, which is what happens when the information needed to reconstruct you even in principle is no longer present. Anything less, to them, is just a flesh wound.
  • Debiasing - The process of overcoming bias. It takes serious study to gain meaningful benefits, half-hearted attempts may accomplish nothing, and partial knowledge of bias may do more harm than good.
  • Decision theory – is the study of principles and algorithms for making correct decisions—that is, decisions that allow an agent to achieve better outcomes with respect to its goals.
  • Defying the data - Sometimes, the results of an experiment contradict what we have strong theoretical reason to believe. But experiments can go wrong, for various reasons. So if our theory is strong enough, we should in some cases defy the data: know that there has to be something wrong with the result, even without offering ideas on what it might be.
  • Disagreement - Aumann's agreement theorem can be informally interpreted as suggesting that if two people are honest seekers of truth, and both believe each other to be honest, then they should update on each other's opinions and quickly reach agreement. The very fact that a person believes something is Rational evidence that that something is true, and so this fact should be taken into account when forming your belief. Outside of well-functioning prediction markets, Aumann agreement can probably only be approximated by careful deliberative discourse. Thus, fostering effective deliberation should be seen as a key goal of Less Wrong.
  • Doubt- The proper purpose of a doubt is to destroy its target belief if and only if it is false. The mere feeling of crushing uncertainty is not virtuous unto an aspiring rationalist; probability theory is the law that says we must be uncertain to the exact extent to which the evidence merits uncertainty.
  • Dunning–Kruger effect - is a cognitive bias wherein unskilled individuals suffer from illusory superiority, mistakenly assessing their ability to be much higher than is accurate. This bias is attributed to a metacognitive inability of the unskilled to recognize their ineptitude. Conversely, highly skilled individuals tend to underestimate their relative competence, erroneously assuming that tasks that are easy for them are also easy for others
  • Emulation argument for human-level AI – argument that since whole brain emulation seems feasible then human-level AI must also be feasible.
  • Epistemic hygiene - consists of practices meant to allow accurate beliefs to spread within a community and keep less accurate or biased beliefs contained. The practices are meant to serve an analogous purpose to normal hygiene and sanitation in containing disease. "Good cognitive citizenship" is another phrase that has been proposed for this concept[1].
  • Error of crowds - is the idea that under some scoring rules, the average error becomes less than the error of the average, thus making the average belief tautologically worse than a belief of a random person. Compare this to the ideas of modesty argument and wisdom of the crowd. A related idea is that a popular belief is likely to be wrong because the less popular ones couldn't maintain support if they were worse than the popular one.
  • Ethical injunction - are rules not to do something even when it's the right thing to do. (That is, you refrain "even when your brain has computed it's the right thing to do", but this will just seem like "the right thing to do".) For example, you shouldn't rob banks even if you plan to give the money to a good cause. This is to protect you from your own cleverness (especially taking bad black swan bets), and the Corrupted hardware you're running on.
  • Evidence - for a given theory is the observation of an event that is more likely to occur if the theory is true than if it is false. (The event would be evidence against the theory if it is less likely if the theory is true.)
  • Evidence of absence - evidence that allows you to conclude some phenomenon isn't there. It is often said that "absence of evidence is not evidence of absence". However, if evidence is expected, but not present, that is evidence of absence.
  • Evidential Decision Theory - a branch of decision theory which advises an agent to take actions which, conditional on it happening, maximizes the chances of the desired outcome.
  • Evolution - The brainless, mindless optimization process responsible for the production of all biological life on Earth, including human beings. Since the design signature of evolution is alien and counterintuitive, it takes some study to get to know your accidental Creator.
  • Evolution as alien god – is a thought experiment in which evolution is imagined as a god. The though experiment is meant to convey the idea that evolution doesn’t have a mind. The god in though experiment would be a tremendously powerful, unbelievably stupid, ridiculously slow, and utterly uncaring god; a god monomaniacally focused on the relative fitness of genes within a species; a god whose attention was completely separated and working at cross-purposes in rabbits and wolves.
  • Evolutionary argument for human-level AI - an argument that uses the fact that evolution produced human level intelligence to argue for the feasibility of human-level AI.
  • Evolutionary psychology - the idea of evolution as the idiot designer of humans - that our brains are not consistently well-designed - is a key element of many of the explanations of human errors that appear on this website.
  • Existential risk – is a risk posing permanent large negative consequences to humanity which can never be undone.
  • Expected value - The expected value or expectation is the (weighted) average of all the possible outcomes of an event, weighed by their probability. For example, when you roll a die, the expected value is (1+2+3+4+5+6)/6 = 3.5. (Since a die doesn't even have a face that says 3.5, this illustrates that very often, the "expected value" isn't a value you actually expect.)
  • Extensibility argument for greater-than-human intelligence –is an argument that once we get to a human level AGI, extensibility would make an AGI of greater-than-human-intelligence feasible.
  • Extraordinary evidence - is evidence that turns an a priori highly unlikely event into an a posteriori likely event.
  • Free-floating belief – is a belief that both doesn't follow from observations and doesn't restrict which experiences to anticipate. It is both unfounded and useless.
  • Free will - means our algorithm's ability to determine our actions. People often get confused over free will because they picture themselves as being restrained rather than part of physics. Yudowsky calls this view Requiredism, but most people just view this essentially as Compatibilism.
  • Friendly artificial intelligence – is a superintelligence (i.e., a really powerful optimization process) that produces good, beneficial outcomes rather than harmful ones.
  • Fully general counterargument - an argument which can be used to discount any conclusion the arguer does not like. Being in possession of such an argument leads to irrationality because it allows the arguer to avoid updating their beliefs in the light of new evidence. Knowledge of cognitive biases can itself allow someone to form fully general counterarguments ("you're just saying that because you're exhibiting X bias").
  • Great Filter - is a proposed explanation for the Fermi Paradox. The development of intelligent life requires many steps, such as the emergence of single-celled life and the transition from unicellular to multicellular life forms. Since we have not observed intelligent life beyond our planet, there seems to be a developmental step that is so difficult and unlikely that it "filters out" nearly all civilizations before they can reach a space-faring stage.
  • Group rationality - In almost anything, individuals are inferior to groups.
  • Group selection – is an incorrect belief about evolutionary theory that a feature of the organism is there for the good of the group.
  • Heuristic - quick, intuitive strategy for reasoning or decision making, as opposed to more formal methods. Heuristics require much less time and energy to use, but sometimes go awry, producing bias.
  • Heuristics and biases - program in cognitive psychology tries to work backward from biases (experimentally reproducible human errors) to heuristics (the underlying mechanisms at work in the brain).
  • Hold Off on Proposing Solutions - "Do not propose solutions until the problem has been discussed as thoroughly as possible without suggesting any." It is easy to show that this edict works in contexts where there are objectively defined good solutions to problems.
  • Hollywood rationality- What Spock does, not what actual rationalists do.
  • How an algorithm feels - Our philosophical intuitions are generated by algorithms in the human brain. To dissolve a philosophical dilemma, it often suffices to understand the cognitive algorithm that generates the appearance of the dilemma - if you understand the algorithm in sufficient detail. It is not enough to say "An algorithm does it!" - this might as well be magic. It takes a detailed step-by-step walkthrough.
  • Hypocrisy - the act of claiming to motives, morals and standards one does not possess. Informally, it refers to not living up the standards that one espouses, whether or not one sincerely believes those standards.
  • Impossibility- Careful use of language dictates that we distinguish between several senses in which something can be said to be impossible. Some things are logically impossible: you can't have a square circle or an object that is both perfectly black and perfectly not-black. Also, in our reductionist universe operating according to universal physical laws, some things are physically impossible based on our model of how things work, even they are not obviously contradictory or contrary to reason: for example, the laws of thermodynamics give us a strong guarantee that there can never be a perpetual motion machine. It can be tempting to label as impossible very difficult problems which you have no idea how to solve. But the apparent lack of a solution is not a strong guarantee that no solution can exist in the way that the laws of thermodynamics, or Godel's incompleteness results, give us proofs that something cannot be accomplished. A blank map does not correspond to a blank territory; in the absence of a proof that a problem is insolvable, you can't be confident that you're not just overlooking something that a greater intelligence would spot in an instant.
  • Improper belief – is a belief that isn't concerned with describing the territory. A proper belief, on the other hand, requires observations, gets updated upon encountering new evidence, and provides practical benefit in anticipated experience. Note that the fact that a belief just happens to be true doesn't mean you're right to have it. If you buy a lottery ticket, certain that it's a winning ticket (for no reason), and it happens to be, believing that was still a mistake. Types of improper belief discussed in the Mysterious Answers to Mysterious Questions sequence include: Free-floating belief, Belief as attire, Belief in belief and Belief as cheering
  • Incredulity - Spending emotional energy on incredulity wastes time you could be using to update. It repeatedly throws you back into the frame of the old, wrong viewpoint. It feeds your sense of righteous indignation at reality daring to contradict you.
  • Intuition pump - In summary, they are thought experiments that highlight, or pumping, certain ideas, intuitions or concepts while attenuating others so as to make some conclusion obvious and simple to reach. The intuition pump is a carefully designed persuasion tool in which you check to see if the same intuitions still get pumped when you change certain settings in a thought experiment.
  • Kolmogorov complexity - given a string, the length of the shortest possible program that prints it.
  • Lawful intelligence - The startling and counterintuitive notion - contradicting both surface appearances and all Deep Wisdom - that intelligence is a manifestation of Order rather than Chaos. Even creativity and outside-the-box thinking are essentially lawful. While this is a complete heresy according to the standard religion of Silicon Valley, there are some good mathematical reasons for believing it.
  • Least convenient possible world – is a technique for enforcing intellectual honesty, to be used when arguing against an idea. The essence of the technique is to assume that all the specific details will align with the idea against which you are arguing, i.e. to consider the idea in the context of a least convenient possible world, where every circumstance is colluding against your objections and counterarguments. This approach ensures that your objections are strong enough, running minimal risk of being rationalizations for your position.
  • Logical rudeness – is a response to criticism which insulates the responder from having to address the criticism directly. For example, ignoring all the diligent work that evolutionary biologists did to dig up previous fossils, and insisting you can only be satisfied by an actual videotape, is "logically rude" because you're ignoring evidence that someone went to a great deal of trouble to provide to you.
  • Log odds – is an alternate way of expressing probabilities, which simplifies the process of updating them with new evidence. Unfortunately, it is difficult to convert between probability and log odds. The log odds is the log of the odds ratio.
  • Magical categories - an English word which, although it sounds simple - hey, it's just one word, right? - is actually not simple, and furthermore, may be applied in a complicated way that drags in other considerations. Physical brains are not powerful enough to search all possibilities; we have to cut down the search space to possibilities that are likely to be good. Most of the "obviously bad" methods - those that would end up violating our other values, and so ranking very low in our preference ordering - do not even occur to us as possibilities.
  • Making Beliefs Pay Rent - Every question of belief should flow from a question of anticipation, and that question of anticipation should be the centre of the inquiry. Every guess of belief should begin by flowing to a specific guess of anticipation, and should continue to pay rent in future anticipations. If a belief turns deadbeat, evict it.
  • Many-worlds interpretation - uses decoherence to explain how the universe splits into many separate branches, each of which looks like it came out of a random collapse.
  • Map and territory- Less confusing than saying "belief and reality", "map and territory" reminds us that a map of Texas is not the same thing as Texas itself. Saying "map" also dispenses with possible meanings of "belief" apart from "representations of some part of reality". Since our predictions don't always come true, we need different words to describe the thingy that generates our predictions and the thingy that generates our experimental results. The first thingy is called "belief", the second thingy "reality".
  • Meme lineage – is a set of beliefs, attitudes, and practices that all share a clear common origin point. This concept also emphasizes the means of transmission of the beliefs in question. If a belief is part of a meme lineage that transmits for primarily social reasons, it may be discounted for purposes of the modesty argument.
  • Memorization - is what you're doing when you cram for a university exam. It's not
  • Modesty - admitting or boasting of flaws so as to not create perceptions of arrogance. Not to be confused with humility.
  • Most of science is actually done by induction - To come up with something worth testing, a scientist needs to do lots of sound induction first or borrow an idea from someone who already used induction. This is because induction is the only way to reliably find candidate hypotheses which deserve attention. Examples of bad ways to find hypotheses include finding something interesting or surprising to believe in and then pinning all your hopes on that thing turning out to be true.
  • Most peoples' beliefs aren’t worth considering - Sturgeon's Law says that as a general rule, 90% of everything is garbage. Even if it is the case that 90% of everything produced by any field is garbage that does not mean one can dismiss the 10% that is quality work. Instead, it is important engage with that 10%, and use that as the standard of quality.
  • Nash equilibrium - a stable state of a system involving the interaction of different participants, in which no participant can gain by a unilateral change of strategy if the strategies of the others remain unchanged.
  • Newcomb's problem - In Newcomb's problem, a superintelligence called Omega shows you two boxes, A and B, and offers you the choice of taking only box A, or both boxes A and B. Omega has put $1,000 in box B. If Omega thinks you will take box A only, he has put $1,000,000 in it. 
  • Nonapples - a proposed object, tool, technique, or theory which is defined only as being not like a specific, existent example of said categories. It is a type of overly-general prescription which, while of little utility, can seem useful. It involves disguising a shallow criticism as a solution, often in such a way as to make it look profound. For instance, suppose someone says, "We don't need war, we need non-violent conflict resolution." In this way a shallow criticism (war is bad) is disguised as a solution (non-violent conflict resolution, i.e, nonwar). This person is selling nonapples because "non-violent conflict resolution" isn't a method of resolving conflict nonviolently. Rather, it is a description of all conceivable methods of non-violent conflict resolution, the vast majority of which are incoherent and/or ineffective.
  • Noncentral fallacy - A rhetorical move often used in political, philosophical, and cultural arguments. "X is in a category whose archetypal member gives us a certain emotional reaction. Therefore, we should apply that emotional reaction to X, even though it is not a central category member."
  • Not technically a lie – is a statement that is literally true, but causes the listener to attain false beliefs by performing incorrect inference, is not technically a lie.
  • Occam's razor - principle commonly stated as "Entities must not be multiplied beyond necessity". When several theories are able to explain the same observations, Occam's razor suggests the simpler one is preferable.
  • Odds ratio - are an alternate way of expressing probabilities, which simplifies the process of updating them with new evidence. The odds ratio of A is P(A)/P(¬A).
  • Omega - A hypothetical super-intelligent being used in philosophical problems. Omega is most commonly used as the predictor in Newcomb's problem. In its role as predictor, Omega's predictions occur almost certainly. In some thought experiments, Omega is also taken to be super-powerful. Omega can be seen as analogous to Laplace's demon, or as the closest approximation to the Demon capable of existing in our universe.
  • Oops - Theories must be bold and expose themselves to falsification; be willing to commit the heroic sacrifice of giving up your own ideas when confronted with contrary evidence; play nice in your arguments; try not to deceive yourself; and other fuzzy verbalisms. It is better to say oops quickly when you realize a mistake. The alternative is stretching out the battle with yourself over years.
  • Outside view - Taking the outside view (another name for reference class forecasting) means using an estimate based on a class of roughly similar previous cases, rather than trying to visualize the details of a process. For example, estimating the completion time of a programming project based on how long similar projects have taken in the past, rather than by drawing up a graph of tasks and their expected completion times.
  • Overcoming Bias - is a group blog on the systemic mistakes humans make, and how we can possibly correct them.
  • Paperclip maximizer – is an AI that has been created to maximize the number of paperclips in the universe. It is a hypothetical unfriendly artificial intelligence.
  • Pascal's mugging – is a thought-experiment demonstrating a problem in expected utility maximization. A rational agent should choose actions whose outcomes, when weighed by their probability, have higher utility. But some very unlikely outcomes may have very great utilities, and these utilities can grow faster than the probability diminishes. Hence the agent should focus more on vastly improbable cases with implausibly high rewards.
  • Password - The answer you guess instead of actually understanding the problem.
  • Philosophical zombie - a hypothetical entity that looks and behaves exactly like a human (often stipulated to be atom-by-atom identical to a human) but is not actually conscious: they are often said lack qualia or phenomena consciousness.
  • Phlogiston - the 18 century's answer to the Elemental Fire of the Greek alchemists. Ignite wood, and let it burn. What is the orangey-bright "fire" stuff? Why does the wood transform into ash? To both questions, the 18th-century chemists answered, "phlogiston"....and that was it, you see, that was their answer: "Phlogiston." —Fake Causality
  • Possibility - words in natural language carry connotations that may become misleading when the words get applied with technical precision. While it's not technically a lie to say that it's possible to win a lottery, the statement is deceptive. It's much more precise, for communication of the actual fact through connotation, to say that it’s impossible to win the lottery. This is an example of antiprediction.
  • Possible world - is one that is internally consistent, even if it is counterfactual.
  • Prediction market - speculative markets created for the purpose of making predictions. Assets are created whose final cash value is tied to a particular event or parameter. The current market prices can then be interpreted as predictions of the probability of the event or the expected value of the parameter.
  • Priors - refer generically to the beliefs an agent holds regarding a fact, hypothesis or consequence, before being presented with evidence.
  • Probability is in the Mind - Probabilities express uncertainty, and it is only agents who can be uncertain. A blank map does not correspond to a blank territory. Ignorance is in the mind.
  • Probability theory - a field of mathematics which studies random variables and processes.
  • Rationality - the characteristic of thinking and acting optimally. An agent is rational if it wields its intelligence in such a way as to maximize the convergence between its beliefs and reality; and acts on these beliefs in such a manner as to maximize its chances of achieving whatever goals it has. For humans, this means mitigating (as much as possible) the influence of cognitive biases.
  • Rational evidence - the broadest possible sense of evidence, the Bayesian sense. Rational evidence about a hypothesis H is any observation which has a different likelihood depending on whether H holds in reality or not. Rational evidence is distinguished from narrower forms of evidence, such as scientific evidence or legal evidence. For a belief to be scientific, you should be able to do repeatable experiments to verify the belief. For evidence to be admissible in court, it must e.g. be a personal observation rather than hearsay.
  • Rationalist taboo - a technique for fighting muddles in discussions. By prohibiting the use of a certain word and all the words synonymous to it, people are forced to elucidate the specific contextual meaning they want to express, thus removing ambiguity otherwise present in a single word. Mainstream philosophy has a parallel procedure called "unpacking" where doubtful terms need to be expanded out.
  • Rationality and Philosophy - A sequence by lukeprog examining the implications of rationality and cognitive science for philosophical method.
  • Rationality as martial art - A metaphor for rationality as the martial art of mind; training brains in the same fashion as muscles. The metaphor is intended to have complex connotations, rather than being strictly positive. Do modern-day martial arts suffer from being insufficiently tested in realistic fighting, and do attempts at rationality training run into the same problem?
  • Reversal test - a technique for fighting status quo bias in judgments about the preferred value of a continuous parameter. If one deems the change of the parameter in one direction to be undesirable, the reversal test is to check that either the change of that parameter in the opposite direction (away from status quo) is deemed desirable, or that there are strong reasons to expect that the current value of the parameter is (at least locally) the optimal one.
  • Reductionism - a disbelief that the higher levels of simplified multilevel models are out there in the territory, that concepts constructed by mind in themselves play a role in the behavior of reality. This doesn't contradict the notion that the concepts used in simplified multilevel models refer to the actual clusters of configurations of reality.
  • Religion- Religion is a complex group of human activities — involving tribal affiliation, belief in belief, supernatural claims, and a range of shared group practices such as worship meetings, rites of passage, etc.
  • Reversed stupidity is not intelligence - "The world's greatest fool may say the Sun is shining, but that doesn't make it dark out.".
  • Science - a method for developing true beliefs about the world. It works by developing hypotheses about the world, creating experiments that would allow the hypotheses to be tested, and running the experiments. By having people publish their falsifiable predictions and their experimental results, science protects itself from individuals deceiving themselves or others.
  • Scoring rule - a scoring rule is a measure of performance of probabilistic predictions - made under uncertainty.
  • Seeing with Fresh Eyes - A sequence on the incredibly difficult feat of getting your brain to actually think about something, instead of instantly stopping on the first thought that comes to mind.
  • Semantic stopsign – is a meaningless generic explanation that creates an illusion of giving an answer, without actually explaining anything.
  • Shannon information - The Shannon entropy is a measure of the average information content one is missing when one does not know the value of the random variable.
  • Shut up and multiply- the ability to trust the math even when it feels wrong
  • Signaling - "a method of conveying information among not-necessarily-trustworthy parties by performing an action which is more likely or less costly if the information is true than if it is not true".
  • Solomonoff induction - A formalized version of Occam's razor based on Kolmogorov complexity.
  • Sound argument - an argument that is valid and whose premises are all true. In other words, the premises are true and the conclusion necessarily follows from them, making the conclusion true as well.
  • Spaced repetition - is a technique for building long-term knowledge efficiently. It works by showing you a flash card just before a computer model predicts you will have forgotten it. Anki is Less Wrong's spaced repetition software of choice
  • Statistical bias - "Bias" as used in the field of statistics refers to directional error in an estimator. Statistical bias is error you cannot correct by repeating the experiment many times and averaging together the results.
  • Steel man - A term for the opposite of a Straw Man
  • Superstimulus - an exaggerated version of a stimulus to which there is an existing response tendency, or any stimulus that elicits a response more strongly than the stimulus for which it evolved.
  • Surprise - Recognizing a fact that disagrees with your intuition as surprising is an important step in updating your worldview.
  • Sympathetic magic - Humans seem to naturally generate a series of concepts known as sympathetic magic, a host of theories and practices which have certain principles in common, two of which are of overriding importance: the Law of Contagion holds that two things which have interacted, or were once part of a single entity, retain their connection and can exert influence over each other; the Law of Similarity holds that things which are similar or treated the same establish a connection and can affect each other.
  • Tapping Out - The appropriate way to signal that you've said all you wanted to say on a particular topic, and that you're ending your participation in a conversation lest you start saying things that are less worthwhile. It doesn't mean accepting defeat or claiming victory and it doesn't mean you get the last word. It just means that you don't expect your further comments in a thread to be worthwhile, because you've already made all the points you wanted to, or because you find yourself getting too emotionally invested, or for any other reason you find suitable.
  • Technical explanation - A technical explanation is an explanation of a phenomenon that makes you anticipate certain experiences. A proper technical explanation controls anticipation strictly, weighting your priors and evidence precisely to create the justified amount of uncertainty. Technical explanations are contrasted with verbal explanations, which give the impression of understanding without actually producing the proper expectation.
  • Teleology - The study of things that happen for the sake of their future consequences. The fallacious meaning of it is that events are the result of future events. The non-fallacious meaning is that it is the study of things that happen because of their intended results, where the intention existed in an actual mind in the prior past, and so was causally able to bring about the event by planning and acting.
  • The map is not the territory – the idea that our perception of the world is being generated by our brain and can be considered as a 'map' of reality written in neural patterns. Reality exists outside our mind but we can construct models of this 'territory' based on what we glimpse through our senses.
  • Third option - is a way to break a false dilemma, showing that neither of the suggested solutions is a good idea.
  • Traditional rationality - "Traditional Rationality" refers to the tradition passed down by reading Richard Feynman's "Surely You're Joking", Thomas Kuhn's "The Structure of Scientific Revolutions", Martin Gardner's "Science: Good, Bad, and Bogus", Karl Popper on falsifiability, or other non-technical material on rationality. Traditional Rationality is a very large improvement over nothing at all, and very different from Hollywood rationality; people who grew up on this belief system are definitely fellow travelers, and where most of our recruits come from. But you can do even better by adding math, science, formal epistemic and instrumental rationality; experimental psychology, cognitive science, deliberate practice, in short, all the technical stuff.There's also some popular tropes of Traditional Rationality that actually seem flawed once you start comparing them to a Bayesian standard - for example, the idea that you ought to give up an idea once definite evidence has been provided against it, but you're allowed to believe until then, if you want to. Contrast to the stricter idea of there being a certain exact probability which it is correct to assign, continually updated in the light of new evidence.
  • Trivial inconvenience - inconveniences that take few resources to counteract but have a disproportionate impact on people deciding whether to take a course of action.
  • Truth - the correspondence between and one's beliefs about reality and reality.
  • Tsuyoku naritai - the will to transcendence. Japanese: "I want to become stronger."
  • Twelve virtues of rationality
    1. Curiosity – the burning itch
    2. Relenquishment – “That which can be destroyed by the truth should be.” -P. C. Hodgell
    3. Lightness – follow the evidence wherever it leads
    4. Evenness – resist selective skepticism; use reason, not rationalization
    5. Argument – do not avoid arguing; strive for exact honesty; fairness does not mean balancing yourself evenly between propositions
    6. Empiricism – knowledge is rooted in empiricism and its fruit is prediction; argue what experiences to anticipate, not which beliefs to profess
    7. Simplicity – is virtuous in belief, design, planning, and justification; ideally: nothing left to take away, not nothing left to add
    8. Humility – take actions, anticipate errors; do not boast of modesty; no one achieves perfection
    9. Perfectionism – seek the answer that is *perfectly* right – do not settle for less
    10. Precision – the narrowest statements slice deepest; don’t walk but dance to the truth
    11. Scholarship – absorb the powers of science
    12. [The void] (the nameless virtue) – “More than anything, you must think of carrying your map through to reflecting the territory.”
  • Understanding - is more than just memorization of detached facts; it requires ability to see the implications across a variety of possible contexts.
  • Universal law - the idea that everything in reality always behaves according to the same uniform physical laws; there are no exceptions and no alternatives.
  • Unsupervised universe - a thought experiment developed to counter undue optimism, not just the sort due to explicit theology, but in particular a disbelief in the Future's vulnerability—a reluctance to accept that things could really turn out wrong. It involves a benevolent god, a simulated universe, e.g. Conway's Game of Life and asking the mathematical question of what would happen according to the standard Life rules given certain initial conditions - so that even God cannot control the answer to the question; although, of course, God always intervenes in the actual Life universe.
  • Valid argument - An argument is valid when it contains no logical fallacies
  • Valley of bad rationality - It has been observed that when someone is just starting to learn rationality, they appear to be worse off than they were before. Others, with more experience at rationality, claim that after you learn more about rationality, you will be better off than you were before you started. The period before this improvement is known as "the valley of bad rationality".
  • Wisdom of the crowd – is the collective opinion of a group of individuals rather than that of a single expert. A large group's aggregated answers to questions involving quantity estimation, general world knowledge, and spatial reasoning has generally been found to be as good as, and often better than, the answer given by any of the individuals within the group.
  • Words can be wrong – There are many ways that words can be wrong it is for this reason that we should avoid arguing by definition. Instead, to facilitate communication we can taboo and reduce: we can replace the symbol with the substance and talk about facts and anticipations, not definitions.


Barriers, biases, fallacies, impediments and problems

  • Akrasia - the state of acting against one's better judgment. Note that, for example, if you are procrastinating because it's not in your best interest to complete the task you are delaying, it is not a case of akrasia.
  • Alief - an independent source of emotional reaction which can coexist with a contradictory belief. For example, the fear felt when a monster jumps out of the darkness in a scary movie is based on the alief that the monster is about to attack you, even though you believe that it cannot.
  • Effort Shock - the unpleasant discovery of how hard it is to accomplish something.


  • Ambient decision theory - A variant of updateless decision theory that uses first order logic instead of mathematical intuition module (MIM), emphasizing the way an agent can control which mathematical structure a fixed definition defines, an aspect of UDT separate from its own emphasis on not making the mistake of updating away things one can still acausally control.
  • Ask, Guess and Tell culture - The two basic rules of Ask Culture: 1) Ask when you want something. 2) Interpret things as requests and feel free to say "no". The two basic rules of Guess Culture: 1) Ask for things if, and *only* if, you're confident the person will say "yes". 2)  Interpret requests as expectations of "yes", and, when possible, avoid saying "no".The two basic rules of Tell Culture: 1) Tell the other person what's going on in your own mind whenever you suspect  you'd both benefit from them knowing. (Do NOT assume others will accurately model your mind without your help, or that it will even occur to them to ask you questions to eliminate their ignorance.) 2) Interpret things people tell you as attempts to create common knowledge for shared benefit, rather than as requests or as presumptions of compliance.
  • Burch's law – “I think people should have a right to be stupid and, if they have that right, the market's going to respond by supplying as much stupidity as can be sold.” —Greg Burch A corollary of Burch's Law is that any bias should be regarded as a potential vulnerability whereby the market can trick one into buying something one doesn't really want.
  • Challenging the Difficult - A sequence on how to do things that are difficult or "impossible".
  • Cognitive style - Certain cognitive styles might tend to produce more accurate results. A common distinction between cognitive styles is that of foxes vs. hedgehogs. Hedgehogs view the world through the lens of a single defining idea and foxes draw on a wide variety of experiences and for whom the world cannot be boiled down to a single idea. Foxes tend to be better calibrated and more accurate.
  • Consequentialism - the ethical theory that people should choose the action that will result in the best outcome.
  • Crocker's rules - By declaring commitment to Crocker's rules, one authorizes other debaters to optimize their messages for information, even when this entails that emotional feelings will be disregarded. This means that you have accepted full responsibility for the operation of your own mind, so that if you're offended, it's your own fault.
  • Dark arts - refers to rhetorical techniques crafted to exploit human cognitive biases in order to persuade, deceive, or otherwise manipulate a person into irrationally accepting beliefs perpetuated by the practitioner of the Arts. Use of the dark arts is especially common in sales and similar situations (known as hard sell in the sales business) and promotion of political and religious views.
  • Egalitarianism - the idea that everyone should be considered equal. Equal in merit, equal in opportunity, equal in morality, and equal in achievement. Dismissing egalitarianism is not opposed to humility, even though from thesignaling perspective it seems to be opposed to modesty.
  • Expected utility - the expected value in terms of the utility produced by an action. It is the sum of the utility of each of its possible consequences, individually weighted by their respective probability of occurrence. rational decision maker will, when presented with a choice, take the action with the greatest expected utility.
  • Explaining vs. explaining away – Explaining something does not subtract from its beauty. It in fact heightens it. Through understanding it, you gain greater awareness of it. Through understanding it, you are more likely to notice its similarities and interrelationships with others things. Through understanding it, you become able to see it not only on one level, but on multiple. In regards to the delusions which people are emotionally attached to, that which can be destroyed by the truth should be.
  • Fuzzies - A hypothetical measurement unit for "warm fuzzy feeling" one gets from believing that one has done good. Unlike utils, fuzzies can be earned through psychological tricks without regard for efficiency. For this reason, it may be a good idea to separate the concerns for actually doing good, for which one might need to shut up and multiply, and for earning fuzzies, to get psychological comfort.
  • Game theory - attempts to mathematically model interactions between individuals.
  • Generalizing from One Example - an incorrect generalisation when you only have direct first-person knowledge of one mind, psyche or social circle and you treat it as typical even in the face of contrary evidence.
  • Goodhart’s law - states that once a certain indicator of success is made a target of a social or economic policy, it will lose the information content that would qualify it to play such a role. People and institutions try to achieve their explicitly stated targets in the easiest way possible, often obeying the letter of the law. This is often done in way that the designers of the law did not anticipate or want. For example, the soviet factories which when given targets on the basis of numbers of nails produced many tiny useless nails and when given targets on basis of weight produced a few giant nails.
  • Hedonism- refers to a set of philosophies which hold that the highest goal is to maximize pleasure, or more precisely pleasure minus pain.
  • Humans Are Not Automatically Strategic - most courses of action are extremely ineffective and most of the time there has been no strong evolutionary or cultural force sufficient to focus us on the very narrow behavior patterns that would actually be effective. When this is coupled with the fact that people tend to spend a lot less effort on planning how to go about a reaching a goal rather than just trying to achieve it you end up with the conclusion that humans are not automatically strategic.
  • Human universal - Donald E. Brown has compiled a list of over a hundred human universals - traits found in every culture ever studied, most of them so universal that anthropologists don't even bother to note them explicitly.
  • Instrumental value - a value pursued for the purpose of achieving other values. Values which are pursued for their own sake are called terminal values.
  • Intellectual roles - Group rationality may be improved when members of the group take on specific intellectual roles. While these roles may be incomplete on their own, each embodies an aspect of proper rationality. If certain roles are biased against, purposefully adopting them might reduce bias.
  • Lonely Dissenters suffer social disapproval, but are required - Asch's conformity experiment showed that the presence of a single dissenter tremendously reduced the incidence of "conforming" wrong answers.
  • Loss Aversion - is risk aversion's evil twin. A loss-averse agent tends to avoid uncertain gambles, not because every unit of money brings him a bit less utility, but because he weighs losses more heavily than gains, always treating his current level of money as somehow special.
  • Luminosity - reflective awareness. A luminous mental state is one that you have and know that you have. It could be an emotion, a belief or alief, a disposition, a quale, a memory - anything that might happen or be stored in your brain. What's going on in your head?
  • Marginally zero-sum game also known as 'arms race' - A zero-sum game where the efforts of each player not just give them a benefit at the expense of the others, but decrease the efficacy of everyone's past and future actions, thus making everyone's actions extremely inefficient in the limit.
  • Moral Foundations theory (all moral rules in all human cultures appeal to the six moral foundations: care/harm, fairness/cheating, liberty/oppression,loyalty/betrayal, authority/subversion, sanctity/degradation). This makes other people's moralities easier to understand, and is an interesting lens through which to examine your own.
  • Moral uncertainty – is uncertainty about how to act given the diversity of moral doctrines. Moral uncertainty includes a level of uncertainty above the more usual uncertainty of what to do given incomplete information, since it deals also with uncertainty about which moral theory is right. Even with complete information about the world this kind of uncertainty would still remain
  • Paranoid debating - a group estimation game in which one player, unknown to the others, tries to subvert the group estimate.
  • Politics as charity: in terms of expected value, altruism is a reasonable motivator for voting (as opposed to common motivators like "wanting to be heard").
  • Prediction - a statement or claim that a particular event will occur in the future in more certain terms than a forecast.
  • Privileging the question - questions that someone has unjustifiably brought to your attention in the same way that a privileged hypothesis unjustifiably gets brought to your attention. Examples are: should gay marriage be legal? Should Congress pass stricter gun control laws? Should immigration policy be tightened or relaxed? The problem with privileged questions is that you only have so much attention to spare. Attention paid to a question that has been privileged funges against attention you could be paying to better questions. Even worse, it may not feel from the inside like anything is wrong: you can apply all of the epistemic rationality in the world to answering a question like "should Congress pass stricter gun control laws?" and never once ask yourself where that question came from and whether there are better questions you could be answering instead.
  • Radical honesty- a communication technique proposed by Brad Blanton in which discussion partners are not permitted to lie or deceive at all. Rather than being designed to enhance group epistemic rationality, radical honesty is designed to reduce stress and remove the layers of deceit that burden much of discourse.
  • Reflective decision theory - a term occasionally used to refer to a decision theory that would allow an agent to take actions in a way that does not trigger regret. This regret is conceptualized, according to the Causal Decision Theory, as a Reflective inconsistency, a divergence between the agent who took the action and the same agent reflecting upon it after.
  • Schelling point – is a solution that people will tend to use in the absence of communication, because it seems natural, special, or relevant to them.
  • Schelling fences and slippery slopes – a slippery slope is something that affects people's willingness or ability to oppose future policies. Slippery slopes can sometimes be avoided by establishing a "Schelling fence" - a Schelling point that the various interest groups involved - or yourself across different values and times - make a credible precommitment to defend.
  • Something to protect - The Art must have a purpose other than itself, or it collapses into infinite recursion.
  • Status - Real or perceived relative measure of social standing, which is a function of both resource control and how one is viewed by others.
  • Take joy in the merely real – If you believe that science coming to know about something places it into the dull catalogue of common things, then you're going to be disappointed in pretty much everything eventually —either it will turn out not to exist, or even worse, it will turn out to be real. Another way to think about it is that if the magical and mythical were common place they would be merely real. If dragons were common, but zebras were a rare legendary creature then there's a certain sort of person who would ignore dragons, who would never bother to look at dragons, and chase after rumors of zebras. The grass is always greener on the other side of reality. If we cannot take joy in the merely real, our lives shall be empty indeed.
  • The Science of Winning at Life - A sequence by lukeprog that summarizes scientifically-backed advice for "winning" at everyday life: in one's productivity, in one's relationships, in one's emotions, etc. Each post concludes with footnotes and a long list of references from the academic literature.
  • Timeless decision theory - a decision theory, which in slogan form, says that agents should decide as if they are determining the output of the abstract computation that they implement. This theory was developed in response to the view that rationality should be about winning (that is, about agents achieving their desired ends) rather than about behaving in a manner that we would intuitively label as rational.
  • Unfriendly artificial intelligence - is an artificial general intelligence capable of causing great harm to humanity, and having goals that make it useful for the AI to do so. The AI's goals don't need to be antagonistic to humanity's goals for it to be Unfriendly; there are strong reasons to expect that almost any powerful AGI not explicitly programmed to be benevolent to humans is lethal.
  • Updateless decision theory – a decision theory in which we give up the idea of doing Bayesian reasoning to obtain a posterior distribution etc. and instead just choose the action (or more generally, the probability distribution over actions) that will maximize the unconditional expected utility.
  • Ugh field - Pavlovian conditioning can cause humans to unconsciously flinch from even thinking about a serious personal problem they have. We call it an "ugh field". The ugh field forms a self-shadowing blind spot covering an area desperately in need of optimization.
  • Utilitarianism - A moral philosophy that says that what matters is the sum of everyone's welfare, or the "greatest good for the greatest number".
  • Utility - how much a certain outcome satisfies an agent’s preferences.
  • Utility function - assigns numerical values ("utilities") to outcomes, in such a way that outcomes with higher utilities are always preferred to outcomes with lower utilities. These do not work very well in practice for individual humans
  • Wanting and liking - The reward system consists of three major components:
    • Liking: The 'hedonic impact' of reward, comprised of (1) neural processes that may or may not be conscious and (2) the conscious experience of pleasure.
    • Wanting: Motivation for reward, comprised of (1) processes of 'incentive salience' that may or may not be conscious and (2) conscious desires.
    • Learning: Associations, representations, and predictions about future rewards, comprised of (1) explicit predictions and (2) implicit knowledge and associative conditioning (e.g. Pavlovian associations).


  • Beliefs require observations - To form accurate beliefs about something, you really do have to observe it. This can be viewed as a special case of the second law of thermodynamics, in fact, since "knowledge" is correlation of belief with reality, which is mutual information, which is a form of negentropy.
  • Complexity of value - the thesis that human values have high Kolmogorov complexity and so cannot be summed up or compressed into a few simple rules. It includes the idea of fragility of value which is the thesis that losing even a small part of the rules that make up our values could lead to results that most of us would now consider as unacceptable.
  • Egan's law - "It all adds up to normality." — Greg Egan. The purpose of a theory is to add up to observed reality, rather than something else. Science sets out to answer the question "What adds up to normality?" and the answer turns out to be Quantum mechanics adds up to normality. A weaker extension of this principle applies to ethical and meta-ethical debates, which generally ought to end up explaining why you shouldn't eat babies, rather than why you should.
  • Emotion - Contrary to the stereotype, rationality doesn't mean denying emotion. When emotion is appropriate to the reality of the situation, it should be embraced; only when emotion isn't appropriate should it be suppressed.
  • Futility of chaos - A complex of related ideas having to do with the impossibility of generating useful work from entropy — a position which holds against the ideas that e.g: Our artistic creativity stems from the noisiness of human neurons, randomized algorithms can exhibit performance inherently superior to deterministic algorithms and the human brain is a chaotic system and this explains its power; non-chaotic systems cannot exhibit intelligence.
  • General knowledge - Interdisciplinary, generally applicable knowledge is rarely taught explicitly. Yet it's important to have at least basic knowledge of many areas (as opposed to deep narrowly specialized knowledge), and to apply it to thinking about everything.
  • Hope - Persisting in clutching to a hope may be disastrous. Be ready to admit you lost, update on the data that says you did.
  • Humility – “To be humble is to take specific actions in anticipation of your own errors. To confess your fallibility and then do nothing about it is not humble; it is boasting of your modesty.” —Twelve Virtues of Rationality Not to be confused with social modesty, or motivated skepticism (aka disconfirmation bias).
  • I don't know - in real life, you are constantly making decisions under uncertainty: the null plan is still a plan, refusing to choose is itself a choice, and by your choices, you implicitly take bets at some odds, whether or not you explicitly conceive of yourself as doing so.
  • Litany of Gendlin – “What is true is already so. Owning up to it doesn't make it worse. Not being open about it doesn't make it go away. And because it's true, it is what is there to be interacted with. Anything untrue isn't there to be lived. People can stand what is true, for they are already enduring it.” —Eugene Gendlin
  • Litany of Tarski – “If the box contains a diamond, I desire to believe that the box contains a diamond; If the box does not contain a diamond, I desire to believe that the box does not contain a diamond; Let me not become attached to beliefs I may not want. “ —The Meditation on Curiosity
  • Lottery - A tax on people who are bad at math. Also, a waste of hope. You will not win the lottery.
  • Magic - What seems to humans like a simple explanation, sometimes isn't at all. In our own naturalistic, reductionist universe, there is always a simpler explanation. Any complicated thing that happens, happens because there is some physical mechanism behind it, even if you don't know the mechanism yourself (which is most of the time). There is no magic.
  • Modesty argument - the claim that when two or more rational agents have common knowledge of a disagreement over the likelihood of an issue of simple fact, they should each adjust their probability estimates in the direction of the others'. This process should continue until the two agents are in full agreement. Inspired by Aumann's agreement theorem.
  • No safe defense - Authorities can be trusted exactly as much as a rational evaluation of the evidence deems them trustworthy, no more and no less. There's no one you can trust absolutely; the full force of your skepticism must be applied to everything.
  • Offense - It is hypothesized that the emotion of offense appears when one perceives an attempt to gain status.
  • Slowness of evolution- The tremendously slow timescale of evolution, especially for creating new complex machinery (as opposed to selecting on existing variance), is why the behavior of evolved organisms is often better interpreted in terms of what did in fact work yesterday, rather than what will work in the future.
  • Stupidity of evolution - Evolution can only access a very limited area in the design space, and can only search for the new designs very slowly, for a variety of reasons. The wonder of evolution is not how intelligently it works, but that an accidentally occurring optimizer without a brain works at all.

One model of understanding independent differences in sensory perception

17 Elo 20 September 2015 09:32PM

This week my friend Anna said to me; "I just discovered my typical mind fallacy around visualisation is wrong". Naturally I was perplexed and confused. She said; 

“When I was in second grade the teacher had the class do an exercise in visualization. The students sat in a circle and the teacher instructed us to picture an ice cream cone with our favorite 0ice cream. I thought about my favorite type of cone and my favorite flavor, but the teacher emphasized "picture this in your head, see the ice cream." I tried this, and nothing happened. I couldn't see anything in my head, let alone an ice cream. I concluded, in my childish vanity, that no one could see things in their head, "visualizing" must just be strong figurative language for "pretending," and the exercise was just boring.”


Typical mind fallacy being; "everyone thinks like me" (Or A-typical mind fallacy – "no one thinks like me"). My good friend had discovered (a long time ago) that she had no visualisation function. But only recently made sense of it (approximately 15-20 years later). Anna came to me upset, "I am missing out on a function of the brain; limited in my experiences". Yes; true. She was. And we talked about it and tried to measure and understand that loss in better terms. The next day Anna was back but resolved to feeling better about it. Of course realising the value of individual differences in humans, and accepting that whatever she was missing; she was compensating for it by being an ordinary functional human (give or take a few things here and there), and perhaps there were some advantages.


Together we set off down the road of evaluating the concept of the visualisation sense. So bearing in mind; that we started with "visualise an ice cream"... Here is what we covered.

Close your eyes for a moment, (after reading this paragraph), you can see the "blackness' but you can also see the white sparkles/splotches and some red stuff (maybe beige), as well as the echo-y shadows of what you last looked at, probably your white computer screen. They echo and bounce around your vision. That's pretty easy. Now close your eyes and picture an ice cream cone. So the visualisation-imagination space is not in my visual field, but what I do have is a canvas somewhere on which I draw that ice cream; and anything else I visualise.  It’s definitely in a different place. (We will come back to "where" it is later)

So either you have this "notepad"; “canvas” in your head for the visual perception space or you do not. Well; it’s more like a spectrum of strength of visualisation; where some people will visualise clear and vivid things; and others will have (for lack of better terms) "grey"; "echoes"; Shadows; or foggy visualisation, where drawing that is a really hard thing to do. Anna describes what she can get now in adulthood as a vague kind of bas relief of an image, like an after effect. So it should help you model other people by understanding that variously people can visualise better or worse. (probably not a big deal yet; just wait).


It occurs that there are other canvases; not just for the visual space but for smell and taste as well. So now try to canvas up some smells of lavender or rose, or some soap. You will probably find soap is possible to do; being of memorable and regular significance. The taste of chocolate; kind of appears from all those memories you have; as does cheese; lemon and salt; (but of course someone is screaming at the page about how they don't understand when I say that chocolate "kind of appears”, because it’s very very vivid to them, and someone else can smell soap but it’s quite far away and grey/cloudy).


It occurs to me now that as a teenage male I never cared about my odour; and that I regularly took feedback from some people about the fact that I should deal with that, (personal lack of noticing aside), and I would wonder why a few people would care a lot; and others would not ever care. I can make sense of these happenings by theorising that these people have a stronger smell canvas/faculty than other people. Which makes a whole lot of reasonable sense.

Interesting yet? There is more.

This is a big one.

Sound. But more specifically music. Having explored the insight of having a canvas for these senses with several people over the past week; And noting that the person from the story above confidently boasts an over-active music canvas with tunes always going on in their head. For a very long time I decided that I was just not a person who cared about music; and never really knew to ask or try to explain why. Just that it doesn't matter to me. Now I have a model. 


I can canvas music as it happens – in real time; and reproduce to a tune; but I have no canvas for visualising auditory sounds without stimulation. (what inspired the entire write-up here was someone saying how it finally made them understand why they didn't make sense of other people's interests in sounds and music) If you ask me to "hear" the C note on my auditory canvas; I literally have no canvas on which to "draw" that note. I can probably hum a C (although I am not sure how), But I can't play that thing in my head.

Interestingly I asked a very talented pianist. And the response was; "of course I have a musical canvas", (to my slight disappointment). Of course she mentioned it being a big space; and a trained thing as well. (As a professional concert pianist) She can play fully imagined practice on a not-real piano and hear a whole piece. Which makes for excellent practice when waiting for other things to happen, (waiting rooms, ques, public transport...)


Anna from the beginning is not a musician, and says her head-music is not always pleasant but simply satisfactory to her. Sometimes songs she has heard, but mostly noises her mind produces. And words, always words. She speaks quickly and fluently, because her thoughts occur to her in words fully formed. 

I don't care very much about music because I don't "see" (imagine) it. Songs do get stuck in my head but they are more like echoes of songs I have just heard, not ones I can canvas myself.


Now to my favourite sense. My sense of touch. My biggest canvas is my touch canvas. "feel the weight on your shoulders?", I can feel that. "Wind through your hair?", yes. The itch; yes, The scrape on your skin, The rough wall, the sand between your toes. All of that. 


It occurs to me that this explains a lot of details of my life that never really came together. When I was little I used to touch a lot of things, my parents were notorious for shouting my name just as I reached to grab things. I was known as a, "bull in a china shop", because I would touch everything and move everything and feel everything and get into all kinds of trouble with my touch. I once found myself walking along next to a building while swiping my hand along the building - I was with a friend who was trying out drugs (weed), She put her hands on the wall and remarked how this would be interesting to touch while high. At the time I probably said something like; "right okay". And now I understand just what everyone else is missing out on.


I spend most days wearing as few clothes as possible, (while being normal and modest), I still pick up odd objects around. There is a branch of Autism where the people are super-sensitive to touch and any touch upsets or distracts them; a solution is to wear tight-fitting clothing to dull the senses. I completely understand that and what it means to have a noisy-touch canvas.

All I can say to someone is that you have no idea what you are missing out on; and before this week – neither did I. But from today I can better understand myself and the people around me.


There is something to be said for various methods of thinking; some people “think the words”, and some people don’t think in words, they think in pictures or concepts.  I can’t cover that in this post; but keep that in mind as well for “the natural language of my brain”


One more exercise (try to play along – it pays off). Can you imagine 3 lines, connected; an equilateral triangle on a 2D plane. Rotate that around; good (some people will already be unable to do this). Now draw three more of these. Easy for some. Now I want you to line them up so that the three triangles are around the first one. Now fold the shape into a 3D shape.

How many corners?

How many edges?

How many faces?

Okay good. Now I want you to draw a 2D square. Simple; Now add another 4 triangles. Then; like before surround the square with the triangles and fold it into a pyramid. Again;

How many edges?

How many corners?

How many faces?


Now I want you to take the previous triangle shape; and attach it to one of the triangles of the square-pyramid shape. Got it?

Now how many corners?

How many edges?

How many faces?


That was easy right? Maybe not that last step. So it turns out I am not a super visualiser. I know this because those people who are a super visualisers will find that when they place the triangular pyramid on to the square pyramid; The side faces of the triangle pyramid merge into a rhombus with the square pyramid; effectively making 1 face out of 2 triangle faces; and removing an edge (and doing that twice over for two sides of the shape).  Those who understand will be going “duh” and those who don’t understand will be going “huh?”, what happened?


Pretty cool right?


Don’t believe me?  Don’t worry - there is a good explanation for those who don’t see it right away - at this link 


From a super-visualiser: 

“I would say, for me, visualization is less like having a mental playground, and more like having an entire other pair of eyes.  And there's this empty darkness into which I can insert almost anything.  If it gets too detailed, I might have to stop and close my outer eyes, or I might have to stop moving so I don't walk into anything. That makes it sound like a playground, but there's much more to it than that.


Imagine that you see someone buying something in a shop.  They pay cash, and the red of the twenty catches your eye.  It's pretty, and it's vivid, and it makes you happy.  And if you imagine a camera zooming out, you see red moving from customers to clerks at all the registers.  Not everyone is paying with twenties, but commerce is red, now.  It's like the air flashes and lights up like fireworks, every time somebody buys something.  

And if you keep zooming out, you can see red blurs all over the town, all over the map.  So if you read about international trade, it's almost like the paper comes to life, and some parts of it are highlighted red.  And if you do that for long enough, it becomes a habit, and something really weird starts to happen.  

When someone tells you about their car, there's a little red flash just out the corner of your eye, and you know they probably didn't pay full price, because there's a movie you can watch, and in the time they got the car, they didn't have a job and they were stressed, so there's not as much red in that part of the movie, so there has to be some way they got the car without losing even more red.  But it's not just colors, and it's definitely not just money.  


Happiness might be shimmering motion.  Connection with friends might be almost a blurring together at the center.  And all these amazing visual metaphors that you usually only see in an art gallery are almost literally there in the world, if you look with the other pair of eyes. So sometimes things really do sort of jump out at you, and nobody else noticed them. But it has to start with one thing.  One meaning, one visual metaphor."



Way up top I mentioned the "where" of the visualisation space. It's not really in the eye, a good name for it might be "the mind's eye". My personal visualisation canvas is located back up left tilted downwards and facing forwards.


Synaesthesia is a lot of possible effects. The most well known one is where people associate a colour with a letter, when they think of the letter they have a sense of a colour that goes with the letter. Some letter's don't have colours, sometimes numbers have colours.


There are other branches of synaesthesia. Locating things in the physical space. Days of the week can be laid out in a row in front of you; numbers can be located somewhere. Some can be heavier than others. Sounds can have weights; Smells can have colours; Musical notes can have a taste. Words can feel rough or smooth.


Synaesthesia is a class of cross-classification that is done by the brain in interpreting a stimulus, where (we think) it can be caused by crossed wiring in the brain; It's pretty fun. Turns out most people have some kind of Synaesthesia. Usually to do with weights of numbers, or days being in a row. Sometimes Tuesdays are lower than the other days. Who knows. If you pay attention to how sometimes things have an alternative sensory perception, chances are that's a bit of the natural Synaesthete coming out.

So what now?

Synaesthesia is supposed to make you smarter. Crossing brain faculty should help you remember things better; if you can think of numbers in terms of how heavy they are you could probably train your system 1 to do simple arithmetic by "knowing" how heavy the answer is. If it doesn't come naturally to you - these are no longer low-hanging fruit implementations of these ideas.


What is a low-hanging fruit; Consider all your "canvases" of thinking; Work out which ones you care more about; and which ones don't matter. (Insert link to superpowers and kryptonites: use your strong senses to your advantage; and make sure you avoid using your weaker senses) (or go on a bender to rebuild your map; influence your territory and train your sensory canvases. Or don't because that wouldn't be a low hanging fruit).

Keep this model around

It can be used for both good and evil. But get the model out there. Talk to people about it. Ask your friends and family if they are able to visualise. Ask about all the senses. Imagine if suddenly you discovered that someone you know; can't "smell" things in their imagination. Or doesn't know what you mean by, "feel this" (seriously you have no idea what you are missing out on the touch spectrum in my little bubble).

You are going to have good senses and bad ones. That's okay! The more you know; the more you can use it to your advantage!

Meta: Post write up time 1 hour; plus a week of my social life being dominated by the same conversation over and over with different people where I excitedly explain the most exciting thing of this week.  plus 1hr*4, plus 3 people editing and reviewing, plus a rationality dojo where I presented this topic.


Meta2: I waited 3 weeks for other people to review this.  There were no substantial changes and I should have not waited so long.  in future I won’t wait that long.

A toy model of the control problem

16 Stuart_Armstrong 16 September 2015 02:59PM

EDITED based on suggestions for improving the model

Jaan Tallinn has suggested creating a toy model of the control problem, so that it can be analysed without loaded concepts like "autonomy", "consciousness", or "intentionality". Here a simple (too simple?) attempt:


A controls B. B manipulates A.

Let B be a robot agent that moves in a two dimensional world, as follows:

continue reading »

Film about Stanislav Petrov

14 matheist 10 September 2015 06:43PM

I searched around but didn't see any mention of this. There's a film being released next week about Stanislav Petrov, the man who saved the world.

The Man Who Saved the World

Due for limited theatrical release in the USA on 18 September 2015.
Will show in New York, Los Angeles, Detroit, Portland.

Previous discussion of Stanislav Petrov:

Notes on Actually Trying

13 AspiringRationalist 23 September 2015 02:53AM

These ideas came out of a recent discussion on actually trying at Citadel, Boston's Less Wrong house.

What does "Actually Trying" mean?

Actually Trying means applying the combination of effort and optimization power needed to accomplish a difficult but feasible goal. The effort and optimization power are both necessary.

Failure Modes that can Resemble Actually Trying

Pretending to try

Pretending to try means doing things that superficially resemble actually trying but are missing a key piece. You could, for example, make a plan related to your goal and diligently carry it out but never stop to notice that the plan was optimized for convenience or sounding good or gaming a measurement rather than achieving the goal. Alternatively, you could have a truly great plan and put effort into carrying it out until it gets difficult.

Trying to Try

Trying to try is when you throw a lot of time and perhaps mental anguish at a task but not actually do the task. Writer's block is the classic example of this.


Sphexing is the act of carrying out a plan or behavior repeatedly despite it not working.

The Two Modes Model of Actually Trying

Actually Trying requires a combination of optimization power and effort, but each of those is done with a very different way of thinking, so it's helpful to do the two separately. In the first way of thinking, Optimizing Mode, you think hard about the problem you are trying to solve, develop a plan, look carefully at whether it's actually well-suited to solving the problem (as opposed to pretending to try) and perhaps Murphy-jitsu it. In Executing Mode, you carry out the plan.

Executing Mode breaks down when you reach an obstacle that you either don't know how to overcome or where the solution is something you don't want to do. In my personal experience, this is where things tend to get derailed. There are a few ways to respond to this situation:

  • Return to Optimizing Mode to figure out how to overcome the obstacle / improve your plan (good),
  • Ask for help / consult a relevant expert (good),
  • Take a break, which could lead to a eureka moment, lead to Optimizing Mode or lead to derailing (ok),
  • Sphex (bad),
  • Derail / procrastinate (bad), or
  • Punt / give up (ok if the obstacle is insurmountable).

The key is to respond constructively to obstacles. This usually means getting back to Optimizing Mode, either directly or after a break.  The failure modes here are derailing immediately, a "break" that turns into a derailment, and sphexing.  In our discussion, we shared a few techniques we had used to get back to Optimizing Mode.  These techniques tended to focus on some combination of removing the temptation to derail, providing a reminder to optimize, and changing mental state.

Getting Back to Optimizing Mode

Context switches are often helpful here.  Because for many people, work and procrastination both tend to be computer-based activities, it is both easy and tempting to switch to a time-wasting activity immediately upon hitting an obstacle.  Stepping away from the computer takes away the immediate distraction and depending on what you do away from the computer, helps you either think about the problem or change your mental state.  Depending on what sort of mood I'm in, I sometimes step away from the computer with a pen and paper to write down my thoughts (thinking about the problem), or I may step away to replenish my supply of water and/or caffeine (changing my mental state).  Other people in the discussion said they found going for a walk or getting more strenuous exercise to be helpful when they needed a break.  Strenuous exercise has the additional advantage of having very low risk of turning into a longer-than-intended break.

The danger with breaks is that they can turn into derailment.  Open-ended breaks ("I'll just browse Reddit for five minutes") have a tendency to expand, so it's best to avoid them in favor of things with more definite endings.  The other common say for breaks to turn into derailment is to return from a break and go to something non-productive.  I have had some success with attaching a sticky-note to my monitor reminding me what to do when I return to my computer.  I have also found that if the note makes clear what problem I need to solve also makes me less likely to sphex when I return to my computer.

In the week or so since the discussion that inspired this post, I have found that asking myself "what would Actually Trying look like right now?" This has helped me stay on track when I have encountered difficult problems at work.

making notes - an instrumental rationality process.

12 Elo 05 September 2015 10:51PM

The value of having notes. Why do I make notes.


Story time!

At one point in my life I had a memory crash. Which is to say once upon a time I could remember a whole lot more than I was presently remembering. I recall thinking, "what did I have for breakfast last Monday? Oh no! Why can't I remember!". I was terrified. It took a while but eventually I realised that remembering what I had for breakfast last Monday was:

  1. not crucial to the rest of my life

  2. not crucial to being a function human being

  3. I was not sure if I usually remember what I ate last Monday; or if this was the first time I tried to recall it with such stubbornness to notice that I had no idea.

After surviving my first teen-life crisis I went on to realise a few things about life and about memory:

  1. I will not be remembering everything forever.

  2. Sometimes I forget things that I said I would do. Especially when the number of things I think I will do increases past 2-3 and upwards to 20-30.

  3. Don't worry! There is a solution!

  4. As someone at the age of mid-20s who is already forgetting things; a friendly mid-30 year old mentioned that in 10 years I will have 1/3rd more life to be trying to remember as well. Which should also serve as a really good reason why you should always comment your code as you go; and why you should definitely write notes. "Past me thought future me knew exactly what I meant even though past me actually had no idea what they were going on about".

The foundation of science.


There are many things that could be considered the foundations of science. I believe that one of the earliest foundations you can possibly engage in is observation.


In a more-than-goldfish form; observation means holding information. It means keeping things for review till later in your life; either at the end of this week; month or year. Observation is only the start. Writing it down makes it evidence. Biased, personal, scrawl, (bad) evidence all the same. If you want to be more effective at changing your mind; you need to know what your mind says.


It's great to make notes. That's exactly what I am saying. It goes further though. Take notes and then review them. Weekly; monthly; yearly. Unsure about where you are going? Know where you have come from. With that you can move forward with better purpose.

My note taking process:

1. get a notebook.

This picture includes some types of notebooks that I have tried.

  1. A4 lined paper cardboard front and back. Becomes difficult to carry because it was big. And hard to open it up and use it as well. side-bound is also something I didn't like because I am left handed and it seemed to get in my way.

  2. bad photo but its a pad of grid-paper. I found a stack of these on the middle of the ground late at night as if they fell off a truck or something. I really liked them except for them being stuck together by essentially nothing and falling to pieces by the time I got to the bottom of the pad.

  3. lined note paper. I will never go back to a book that doesn't hold together. The risk of losing paper is terrible. I don't mind occasionally ripping out some paper but to lose a page when I didn't want to; has never worked safely for me.

  4. Top spiral bound; 100 pages. This did not have enough pages; I bought it after a 200pager ran out of paper and I needed a quick replacement, well it was quick – I used it up in half the time the last book lasted.

  5. Top spiral bound 200 pages notepad, plastic cover; these are the type of book I currently use. 8 is my book that I am writing in right now.

  6. 300 pages top spiral bound – as you can see by the tape – it started falling apart by the time I got to the end of it.

  7. small notebook. I got these because they were 48c each, they never worked for me. I would bend them, forget them, leave them in the wrong places, and generally not have them around when I wanted them.

  8. I am about half way through my current book; the first page of my book says 23/7/15, today it is 1/9/15. Estimate a book every 2 months. Although it really depends on how you use it.

  9. a future book I will try, It holds a pen so I will probably find that useful.

  10. also a future one, I expect it to be too small to be useful for me.

  11. A gift from a more organised person than I. It is a moleskin grid-paper book and I plan to also try it soon.

The important take-aways from this is – try several, they might work in different ways and for different reasons. Has your life change substantially i.e. you don't sit much at a desk any more? Is the book not working; maybe another type of book would work better.

I only write on the bottom of the flip-page, and occasionally scrawl diagrams on the other side of the page. But only when they relevant. This way I can always flip through easy, and not worry about the other side of the paper.


2. carry a notebook. Everywhere. Find a way to make it a habit. Don't carry a bag? You could. Then you can carry your notepad everywhere with you in a bag. Consider a pocket-sized book as a solution to not wanting to carry a bag.

3. when you stop moving; turn the notebook to the correct page and write the date.

Writing the date is almost entirely useless. I really never care what the date is. I sometimes care that when I look back over the book I can see the timeline around which the events happened, but really – the date means nothing to me.

What writing the date helps to do:

  • make sure you have a writing implement

  • make sure it works

  • make sure you are on the right page

  • make sure you can see the pad

  • make sure you can write in this position

  • make you start a page

  • make you consider writing more things

  • make it look to others like you know what you are doing (signalling that you are a note-taker, is super important to help people get used to you as a note-taker and encourage that persona onto you)

This is the reason why I write the date; I can't specify enough why I don't care about what date it is, but why I do it anyway.

4. Other things I write:

  • Names of people I meet. Congratulations; you are one step closer to never forgetting the name of anyone ever. Also when you want to think; "When did I last see bob", you can kinda look it up in a dumb - date-sorted list. (to be covered in my post about names – but its a lot easier to look it up 5 minutes later when you have it written down)

  • Where I am/What event I am at. (nice to know what you go to sometimes)

  • What time I got here or what time it started (if its a meeting)

  • What time it ended (or what time I stopped writing things)

It's at this point that the rest of the things you write are kinda personal choices some of mine are:

  • Interesting thoughts I have had

  • Interesting quotes people say

  • Action points that I want to do if I can't do them immediately.

  • Shopping lists

  • diagrams of what you are trying to say.

  • Graphs you see.

  • the general topic of conversation as it changes. (so far this is enough for me to remember the entire conversation and who was there and what they had to say about the matter)


That's right. I said it. Its sexy. There are occasional discussion events near to where I live; that I go to with a notepad. Am I better than the average dude who shows up to chat? no. But everyone knows me. The guy who takes notes. And damn they know I know what I am talking about. And damn they all wish they were me. You know how glasses became a geek-culture signal? Well this is too. Like no other. Want to signal being a sharp human who knows what's going down? Carry a notebook, and show it off to people.

The coordinators have said to me; "It makes me so happy to see someone taking notes, it really makes me feel like I am saying something useful". The least I can do is take notes.


Other notes about notebooks

The number of brilliant people I know who carry a book of some kind will far outweighs the number of people who don't. I don't usually trust the common opinion; but sometimes you just gotta go with what's right.

If it stops working; at least you tried it. If it works; you have evidence and can change the world in the future.

"I write in my phone". (sounds a lot like, "I could write notes in my phone") I hear this a lot.  Especially in person while I am writing notes. Indeed you do. Which is why I am the one with a notebook out and at the end of talking to you I will actually have notes and you will not. If you are genuinely the kind of person with notes in their phone I commend you for doing something with technology that I cannot seem to have sorted out; but if you are like me; and a lot of other people who could always say they could take notes in their phone; but never do; or never look at those notes... Its time to fix this.

a quote from a friend - “I realized in my mid twenties that I would look like a complete badass in a decade, if I could point people to a shelf of my notebooks.” And I love this too.

A friend has suggested that flashcards are his brain; and notepads are not.  I agree that flashcards have benefits. namely to do with organising things around, shuffling etc.  It really depends on what notes you are taking.  I quite like having a default chronology to things, but that might not work for you.

In our local Rationality Dojo’s we give away notebooks.  For the marginal costs of a book of paper; we are making people’s lives better.

The big take away

Get a notebook; make notes; add value to your life.




This post took 3 hours to write over a week

Please add your experiences if you work differently surrounding note taking.

Please fill out the survey of if you found this post helpful.

Ultimatums in the Territory

10 malcolmocean 28 September 2015 10:01PM

When you think of "ultimatums", what comes to mind?

Manipulativeness, maybe? Ultimatums are typically considered a negotiation tactic, and not a very pleasant one.

But there's a different thing that can happen, where an ultimatum is made, but where articulating it isn't a speech act but rather an observation. As in, the ultimatum wasn't created by the act of stating it, but rather, it already existed in some sense.

Some concrete examples: negotiating relationships

I had a tense relationship conversation a few years ago. We'd planned to spend the day together in the park, and I was clearly angsty, so my partner asked me what was going on. I didn't have a good handle on it, but I tried to explain what was uncomfortable for me about the relationship, and how I was confused about what I wanted. After maybe 10 minutes of this, she said, "Look, we've had this conversation before. I don't want to have it again. If we're going to do this relationship, I need you to promise we won't have this conversation again."

I thought about it. I spent a few moments simulating the next months of our relationship. I realized that I totally expected this to come up again, and again. Earlier on, when we'd had the conversation the first time, I hadn't been sure. But it was now pretty clear that I'd have to suppress important parts of myself if I was to keep from having this conversation.

"...yeah, I can't promise that," I said.

"I guess that's it then."

"I guess so."

I think a more self-aware version of me could have recognized, without her prompting, that my discomfort represented an unreconcilable part of the relationship, and that I basically already wanted to break up.

The rest of the day was a bit weird, but it was at least nice that we had resolved this. We'd realized that it was a fact about the world that there wasn't a serious relationship that we could have that we both wanted.

I sensed that when she posed the ultimatum, she wasn't doing it to manipulate me. She was just stating what kind of relationship she was interested in. It's like if you go to a restaurant and try to order a pad thai, and the waiter responds, "We don't have rice noodles or peanut sauce. You either eat somewhere else, or you eat something other than a pad thai."

An even simpler example would be that at the start of one of my relationships, my partner wanted to be monogamous and I wanted to be polyamorous (i.e. I wanted us both to be able to see other people and have other partners). This felt a bit tug-of-war-like, but eventually I realized that actually I would prefer to be single than be in a monogamous relationship.

I expressed this.

It was an ultimatum! "Either you date me polyamorously or not at all." But it wasn't me "just trying to get my way".

I guess the thing about ultimatums in the territory is that there's no bluff to call.

It happened in this case that my partner turned out to be really well-suited for polyamory, and so this worked out really well. We'd decided that if she got uncomfortable with anything, we'd talk about it, and see what made sense. For the most part, there weren't issues, and when there were, the openness of our relationship ended up just being a place where other discomforts were felt, not a generator of disconnection.

Normal ultimatums vs ultimatums in the territory

I use "in the territory" to indicate that this ultimatum isn't just a thing that's said but a thing that is true independently of anything being said. It's a bit of a poetic reference to the map-territory distinction.

No bluffing: preferences are clear

The key distinguishing piece with UITTs is, as I mentioned above, that there's no bluff to call: the ultimatum-maker isn't secretly really really hoping that the other person will choose one option or the other. These are the two best options as far as they can tell. They might have a preference: in the second story above, I preferred a polyamorous relationship to no relationship. But I preferred both of those to a monogamous relationship, and the ultimatum in the territory was me realizing and stating that.

This can actually be expressed formally, using what's called a preference vector. This comes from Keith Hipel at University of Waterloo. If the tables in this next bit doesn't make sense, don't worry about it: all important conclusions are expressed in the text.

First, we'll note that since each of us have two options, a table can be constructed which shows four possible states (numbered 0-3 in the boxes).

    My options
  options insist poly don't insist
offer relationship 3: poly relationship 1: mono relationship
don't offer 2: no relationship 0: (??) no relationship

This representation is sometimes referred to as matrix form or normal form, and has the advantage of making it really clear who controls which state transitions (movements between boxes). Here, my decision controls which column we're in, and my partner's decision controls which row we're in.

Next, we can consider: of these four possible states, which are most and least preferred, by each person? Here's my preferences, ordered from most to least preferred, left to right. The 1s in the boxes mean that the statement on the left is true.

state 3 2 1 0
I insist on polyamory 1 1 0 0
partner offers relationship 1 0 1 0
My preference vector (← preferred)

The order of the states represents my preferences (as I understand them) regardless of what my potential partner's preferences are. I only control movement in the top row (do I insist on polyamory or not). It's possible that they prefer no relationship to a poly relationship, in which case we'll end up in state 2. But I still prefer this state over state 1 (mono relationship) and state 0 (in which I don't ask for polyamory and my partner decides not to date me anyway). So whatever my partners preferences are, I've definitely made a good choice for me, by insisting on polyamory.

This wouldn't be true if I were bluffing (if I preferred state 1 to state 2 but insisted on polyamory anyway). If I preferred 1 to 2, but I bluffed by insisting on polyamory, I would basically be betting on my partner preferring polyamory to no relationship, but this might backfire and get me a no relationship, when both of us (in this hypothetical) would have preferred a monogamous relationship to that. I think this phenomenon is one reason people dislike bluffy ultimatums.

My partner's preferences turned out to be...

state 1 3 2 0
I insist on polyamory 0 1 1 0
partner offers relationship 1 1 0 0
Partner's preference vector (← preferred)

You'll note that they preferred a poly relationship to no relationship, so that's what we got! Although as I said, we didn't assume that everything would go smoothly. We agreed that if this became uncomfortable for my partner, then they would tell me and we'd figure out what to do. Another way to think about this is that after some amount of relating, my partner's preference vector might actually shift such that they preferred no relationship to our polyamorous one. In which case it would no longer make sense for us to be together.

UITTs release tension, rather than creating it

In writing this post, I skimmed a wikihow article about how to give an ultimatum, in which they say:

"Expect a negative reaction. Hardly anyone likes being given an ultimatum. Sometimes it may be just what the listener needs but that doesn't make it any easier to hear."

I don't know how accurate the above is in general. I think they're talking about ultimatums like "either you quit smoking or we break up". I can say that expect that these properties of an ultimatum contribute to the negative reaction:

  • stated angrily or otherwise demandingly
  • more extreme than your actual preferences, because you're bluffing
  • refers to what they need to do, versus your own preferences

So this already sounds like UITTs would have less of a negative reaction.

But I think the biggest reason is that they represent a really clear articulation of what one party wants, which makes it much simpler for the other party to decide what they want to do. Ultimatums in the territory tend to also be more of a realization that you then share, versus a deliberate strategy. And this realization causes a noticeable release of tension in the realizer too.

Let's contrast:

"Either you quit smoking or we break up!"


"I'm realizing that as much as I like our relationship, it's really not working for me to be dating a smoker, so I've decided I'm not going to. Of course, my preferred outcome is that you stop smoking, not that we break up, but I realize that might not make sense for you at this point."

Of course, what's said here doesn't necessarily correspond to the preference vectors shown above. Someone could say the demanding first thing when they actually do have a UITT preference-wise, and someone who's trying to be really NVCy or something might say the sceond thing even though they're actually bluffing and would prefer to . But I think that in general they'll correlate pretty well.

The "realizing" seems similar to what happened to me 2 years ago on my own, when I realized that the territory was issuing me an ultimatum: either you change your habits or you fail at your goals. This is how the world works: your current habits will get you X, and you're declaring you want Y. On one level, it was sad to realize this, because I wanted to both eat lots of chocolate and to have a sixpack. Now this ultimatum is really in the territory.

Another example could be realizing that not only is your job not really working for you, but that it's already not-working to the extent that you aren't even really able to be fully productive. So you don't even have the option of just working a bit longer, because things are only going to get worse at this point. Once you realize that, it can be something of a relief, because you know that even if it's hard, you're going to find something better than your current situation.

Loose ends

More thoughts on the break-up story

One exercise I have left to the reader is creating the preference vectors for the break-up in the first story. HINT: (rot13'd) Vg'f fvzvyne gb gur cersrerapr irpgbef V qvq fubj, jvgu gjb qrpvfvbaf: fur pbhyq vafvfg ba ab shgher fhpu natfgl pbairefngvbaf be abg, naq V pbhyq pbagvahr gur eryngvbafuvc be abg.

An interesting note is that to some extent in that case I wasn't even expressing a preference but merely a prediction that my future self would continue to have this angst if it showed up in the relationship. So this is even more in the territory, in some senses. In my model of the territory, of course, but yeah. You can also think of this sort of as an unconscious ultimatum issued by the part of me that already knew I wanted to break up. It said "it's preferable for me to express angst in this relationship than to have it be angst free. I'd rather have that angst and have it cause a breakup than not have the angst."

Revealing preferences

I think that ultimatums in the territory are also connected to what I've called Reveal Culture (closely related to Tell Culture, but framed differently). Reveal cultures have the assumption that in some fundamental sense we're on the same side, which makes negotiations a very different thing... more of a collaborative design process. So it's very compatible with the idea that you might just clearly articulate your preferences.

Note that there doesn't always exist a UITT to express. In the polyamory example above, if I'd preferred a mono relationship to no relationship, then I would have had no UITT (though I could have bluffed). In this case, it would be much harder for me to express my preferences, because if I leave them unclear then there can be kind of implicit bluffing. And even once articulated, there's still no obvious choice. I prefer this, you prefer that. We need to compromise or something. It does seem clear that, with these preferences, if we don't end up with some relationship at the end, we messed up... but deciding how to resolve it is outside the scope of this post.

Knowing your own preferences is hard

Another topic this post will point at but not explore is: how do you actually figure out what you want? I think this is a mix of skill and process. You can get better at the general skill by practising trying to figure it out (and expressing it / acting on it when you do, and seeing if that works out well). One process I can think of that would be helpful is Gendlin's Focusing. Nate Soares has written about how introspection is hard and to some extent you don't ever actually know what you want: You don't get to know what you're fighting for. But, he notes,

"There are facts about what we care about, but they aren't facts about the stars. They are facts about us."

And they're hard to figure out. But to the extent that we can do so and then act on what we learn, we can get more of what we want, in relationships, in our personal lives, in our careers, and in the world.

(This article crossposted from my personal blog.)

Examples of growth mindset or practice in fiction

10 Swimmer963 28 September 2015 09:47PM

As people who care about rationality and winning, it's pretty important to care about training. Repeated practice is how humans acquire skills, and skills are what we use for winning.

Unfortunately, it's sometimes hard to get System 1 fully on board with the fact that repeated, difficult, sometimes tedious practice is how we become awesome. I find fiction to be one of the most useful ways of communicating things like this to my S1. It would be great to have a repository of fiction that shows characters practicing skills, mastering them, and becoming awesome, to help this really sink in.

However, in fiction the following tropes are a lot more common:

  1. hero is born to greatness and only needs to discover that greatness to win [I don't think I actually need to give examples of this?]
  2. like (1), only the author talks about the skill development or the work in passing… but in a way that leaves the reader's attention (and system 1 reinforcement?) on the "already be awesome" part, rather that the "practice to become awesome" part [HPMOR; the Dresden Files, where most of the implied practice takes place between books.]
  3. training montage, where again the reader's attention isn't on the training long enough to reinforce the "practice to become awesome" part, but skips to the "wouldn't it be great to already be awesome" part [TVtropes examples].
  4. The hero starts out ineffectual and becomes great over the course of the book, but this comes from personal revelations and insights, rather than sitting down and practicing [Nice Dragons Finish Last is an example of this].

Example of exactly the wrong thing:
The Hunger Games - Katniss is explicitly up against the Pledges who have trained their whole lives for this one thing, but she has … something special that causes her to win. Also archery is her greatest skill, and she's already awesome at it from the beginning of the story and never spends time practicing.

Close-but-not-perfect examples of the right thing:
The Pillars of the Earth - Jack pretty explicitly has to travel around Europe to acquire the skills he needs to become great. Much of the practice is off-screen, but it's at least a pretty significant part of the journey.
The Honor Harrington series: the books depict Honor, as well as the people around her, rising through the ranks of the military and gradually levelling up, with emphasis on dedication to training, and that training is often depicted onscreen – but the skills she's training in herself and her subordinates aren't nearly as relevant as the "tactical genius" that she seems to have been born with.

I'd like to put out a request for fiction that has this quality. I'll also take examples of fiction that fails badly at this quality, to add to the list of examples, or of TVTropes keywords that would be useful to mine. Internet hivemind, help?

Happy Petrov Day

9 Eneasz 26 September 2015 03:41PM

It is Petrov Day again, partially thanks to Stanislaw Petrov.

"Today is September 26th, Petrov Day, celebrated to honor the deed of Stanislav Yevgrafovich Petrov on September 26th, 1983.  Wherever you are, whatever you're doing, take a minute to not destroy the world."

Dry Ice Cryonics- Preliminary Thoughts

8 Fluttershy 28 September 2015 07:00AM

This post is a spot-check of Alcor's claim that cryonics can't be carried out at dry ice temperatures, and a follow-up to this comment. This article isn't up to my standards, yet I'm posting it now, rather than polishing it more first, because I strongly fear that I might never get around to doing so later if I put it off. Despite my expertise in chemistry, I don't like chemistry, so writing this took a lot of willpower. Thanks to Hugh Hixon from Alcor for writing "How Cold is Cold Enough?".


More research (such as potentially hiring someone to find the energies of activation for lots of different degradative reactions which happen after death) is needed to determine if long-term cryopreservation at the temperature of dry ice is reasonable, or even preferable to storage in liquid nitrogen.

On the outside view, I'm not very confident that dry ice cryonics will end up being superior to liquid nitrogen cryonics. Still, it's very hard to say one way or the other a priori. There are certain factors that I can't easily quantify that suggest that cryopreservation with dry ice might be preferable to cryopreservation with liquid nitrogen (specifically, fracturing, as well as the fact that the Arrhenius equation doesn't account for poor stirring), and other such factors that suggest preservation in liquid nitrogen to be preferable (specifically, that being below the glass transition temperature prevents movement/chemical reactions, and that nanoscale ice crystals, which can grow during rewarming, can form around the glass transition temperature).

(I wonder if cryoprotectant solutions with different glass transition temperatures might avoid either of the two problems mentioned in the last sentence for dry ice cryonics? I just heard about the issue of nanoscale ice crystals earlier today, so my discussion of them is an afterthought.)


Using dry ice to cryopreserve people for future revival could be cheaper than using liquid nitrogen for the same purpose (how much would using dry ice cost?). Additionally, lowering the cost of cryonics could increase the number of people who sign up for cryonics-- which would, in turn, give us a better chance at e.g. legalizing the initiation of the first phases of cryonics for terminal patients just before legal death.

This document by Alcor suggests that, for neuro and whole-body patients, an initial deposit of 6,600 or 85,438 USD into the patient's trust fund is, respectively, more than enough to generate enough interest to safely cover a patient's annual storage cost indefinitely. Since around 36% of this amount is spent on liquid nitrogen, this means that completely eliminating the cost of replenishing the liquid nitrogen in the dewars would reduce the up-front cost that neuro and whole-body patients with Alcor would pay by around 2,350 or 31,850 USD, respectively. This puts a firm upper bound on the amount that could be saved by Alcor patients by switching to cryopreservation with dry ice, since some amount would need to be spent each year on purchasing additional dry ice to maintain the temperature at which patients are stored. (A small amount could probably be saved on the cost which comes from cooling patients down immediately after death, as well).

This LW discussion is also relevant to storage costs in cryonics. I'm not sure how much CI spends on storage.

Relevant Equations and Their Limitations

Alcor's "How Cold is Cold Enough?" is the only article which I've found that takes an in-depth look at whether storage of cryonics patients at temperatures above the boiling point of liquid nitrogen would be feasible. It's a generally well-written article, though it makes an assumption regarding activation energy that I'll be forced to examine later on.

The article starts off by introducing the Arrhenius equation, which is used to determine the rate constant of a chemical reaction at a given temperature. The equation is written:

k = A * e^(-Ea/RT)          (1)


  • k is the rate constant you solve for (the units vary between reactions)
  • A is a constant you know (same units as k)
  • Ea is the activation energy (kJ/mol)
  • R is the ideal gas constant (kJ/K*mol)
  • T is the temperature (K)
As somewhat of an aside, this is the same k that you would plug into rate law equation, which you have probably seen before:

v = k * [A]m[B]n     (2)
  • v is the rate of the reaction (mol/(L*s))
  • k is the rate constant, from the Arrhenius equation above
  • [A] and [B] are the concentrations of reactants-- there might be more or less than two (mol/L)
  • m and n are constants that you know
The Arrhenius equation-- equation 1, here-- does make some assumptions which don't always hold. Firstly, the activation energy of some reactions changes with temperature, and secondly, it is sometimes necessary to use the modified Arrhenius equation (not shown here) to fit rate constant v. temperature data, as noted just before equation 5 in this paper. This is worth mentioning because, while the Arrhenius equation is quite robust, the data doesn't always fit our best models in chemistry.

Lastly, and most importantly, the Arrhenius equation assumes that all reactants are always being mixed perfectly, which is definitely not the case in cryopreserved patients. I have no idea how to quantify this effect, though after taking this effect into consideration, we should expect degradation reactions in cryopreserved individuals to happen much more slowly than the Arrhenius equation would explicitly predict.

Alcor on "How Cold is Cold Enough?"

The Alcor article goes on to calculate the ratio of the value of k, the rate constant, at 77.36 Kelvin (liquid nitrogen), to the value of k at other temperatures for the enzyme Catalase. This ratio is equal to the factor by which a reaction would be slowed down when cooled from a given temperature down to 77 K. While the calculations are correct, Catalase is not the ideal choice of enzyme here. Ideally, we'd want to calculate this ratio for whatever degradative enzyme/reaction had the lowest activation energy, because then, if the ratio of k at 37 Celsius (body temperature) to k at the temperature of dry ice was big enough, we could be rather confident that all other degradative reactions would be slowed down at dry ice temperatures by a greater factor than the degradative reaction with the lowest activation energy would be. Of course, as shown in equation 2 of this post, the concentrations of reactants of degradative reactions do matter to the speed of those reactions at dry ice temperatures, though differences in the ratio of k at 37 C to k at dry ice temperatures between different degradative reactions will matter much, much more strongly in determining v, the rate of the reaction, than differences in concentrations of reactants will.

I'm also quite confused by the actual value given for the Ea of catalase in the Alcor article-- a quick google search suggests the Ea to be around 8 kJ/mol or 11 kJ/mol, though the Alcor article uses a value of 7,000 cal/(mol*K), i.e. 29.3 kJ/(mol*K), which can only be assumed to have been a typo in terms of the units used.

Of course, as the author mentions, Ea values aren't normally tabulated. The Ea for a reaction can be calculated with just two experimentally determined (Temperature, k (rate constant)) pairs, so it wouldn't take too long to experimentally determine a bunch of Eas for degradative reactions which normally take place in the human body after death, especially if we could find a biologist who had a good a priori idea of which degradative reactions would be the fastest.

Using the modified form of the Arrhenius equation from Alcor's "How Cold is Cold Enough", we could quickly estimate what the smallest Ea for a degradative biological reaction would be that would result in some particular and sufficiently small number of reactions taking place at dry ice temperatures over a certain duration of time. For example, when neglecting stirring effects, it turns out that 100 years at dry ice temperature (-78.5 C) ought to be about equal to 3 minutes at body temperature for a reaction with an Ea of 72.5 kJ/mol. Reactions with higher Eas would be slowed down relatively more by an identical drop in temperature.

So, if we were unable to find any degradative biological reactions with Eas less than (say) 72.5 kJ/mol, that would be decent evidence in favor of dry ice cryonics working reasonably well (given that the 100 years and three minutes figures are numbers that I just made up-- 100 years being a possible duration of storage, and three minutes being an approximation of how long one can live without oxygen being supplied to the brain).

Damage from Causes Other Than Chemical Reactions in Dry Ice Cryonics

Just before publishing this article, I came across Alcor's "Cryopreservation and Fracturing", which mentioned that 

The most important instability for cryopreservation purposes is a tendency toward ice nucleation. At temperatures down to 20 degrees below the glass transition temperature, water molecules are capable of small translations and rotations to form nanoscale ice-crystals, and there is strong thermodynamic incentive to do so [5, 6]. These nanoscale crystals (called "nuclei") remain small and biologically insignificant below the glass transition, but grow quickly into damaging ice crystals as the temperature rises past -90°C during rewarming. Accumulating ice nuclei are therefore a growing liability that makes future ice-free rewarming efforts progressively more difficult the longer vitrified tissue is stored near the glass transition temperature. For example, storing a vitrification solution 10 degrees below the glass transition for six months was found to double the warming rate necessary to avoid ice growth during rewarming [5]. The vitrification solution that Alcor uses is far more stable than the solution used (VS41A) in this particular experiment, but Alcor must store its patients far longer than six months.

The same article also discusses fracturing, which can damage tissues stored more than 20 C below the glass transition temperature. If nanoscale ice crystals form in patients stored in dry ice (I expect they would), and grew during rewarming from dry ice temperatures (I have no idea if they would), that could be very problematic.

Implications of this Research for Liquid Nitrogen Cryonics

If someone has a graph of how body temperature varies with time during the process of cryopreservation, it would be trivial to compute the time-at-body-temperature equivalent of the time that freezing takes. My bet is that getting people frozen too slowly hurts folks's chances of revival far more than they intuit.

[Link] 2015 modafinil user survey

8 gwern 26 September 2015 05:28PM

I am running, in collaboration with ModafinilCat, a survey of modafinil users asking about their experiences, side-effects, sourcing, efficacy, and demographics:

This is something of a followup to the LW surveys which find substantial modafinil use, and Yvain's 2014 nootropics survey. I hope the results will be useful; the legal questions should help reduce uncertainty there, and the genetics questions (assuming any responses) may be interesting too.

[LINK] Deep Learning Machine Teaches Itself Chess in 72 Hours

8 ESRogs 14 September 2015 07:38PM

Lai has created an artificial intelligence machine called Giraffe that has taught itself to play chess by evaluating positions much more like humans and in an entirely different way to conventional chess engines.

Straight out of the box, the new machine plays at the same level as the best conventional chess engines, many of which have been fine-tuned over many years. On a human level, it is equivalent to FIDE International Master status, placing it within the top 2.2 percent of tournament chess players.

The technology behind Lai’s new machine is a neural network. [...] His network consists of four layers that together examine each position on the board in three different ways.

The first looks at the global state of the game, such as the number and type of pieces on each side, which side is to move, castling rights and so on. The second looks at piece-centric features such as the location of each piece on each side, while the final aspect is to map the squares that each piece attacks and defends.


Lai generated his dataset by randomly choosing five million positions from a database of computer chess games. He then created greater variety by adding a random legal move to each position before using it for training. In total he generated 175 million positions in this way.


One disadvantage of Giraffe is that neural networks are much slower than other types of data processing. Lai says Giraffe takes about 10 times longer than a conventional chess engine to search the same number of positions.

But even with this disadvantage, it is competitive. “Giraffe is able to play at the level of an FIDE International Master on a modern mainstream PC,” says Lai. By comparison, the top engines play at super-Grandmaster level.


Ref: : Giraffe: Using Deep Reinforcement Learning to Play Chess



The Trolley Problem and Reversibility

7 casebash 30 September 2015 04:06AM

The most famous problem used when discussing consequentialism is that of the tram problem. A tram is hurtling towards the 5 people on the track, but if you flick a switch it will change tracks and kill only the one person instead. Utilitarians would say that you should flick the switch as it is better for there to be a single death than five. Some deontologists might agree with this, however, much more would object and argue that you don’t have the right to make that decision. This problem has different variations, such as one where you push someone in front of the train instead of them being on the track, but we’ll consider this one, as if it is accepted then it moves you a large way towards utilitarianism.

Let’s suppose that someone flicks the switch, but then realises the other side was actually correct and that they shouldn’t have flicked it. Do they now have an obligation to flick the switch back? What is interesting is that if they had just walked into the room and the train was heading towards the one person, they would have had an obligation *not* to flick the switch, but, having flicked it, it seems that they have an obligation to flick it back the other way.

Where this gets more puzzling is when we imagine Bob having observed Aaron flicking the switch? Arguably, if Aaron had no right to flick the switch, then Bob would have obligation to flick it back (or, if not an obligation, this would surely count as a moral good?). It is hard to argue against this conclusion, assuming that there is a strong moral obligation for Aaron not to flick the switch, along the lines of “Do not kill”. This logic seems consistent with how we act in other situations; if someone had tried to kill someone or steal something important from them; then most people would reverse or prevent the action if they could. 

But what if Aaron reveals that he was only flicking the switch because Cameron had flicked it first? Then Bob would be obligated to leave it alone, as Aaron would be doing what Bob was planning to do: prevent interference. We can also complicate it by imagining that a strong gust of wind was about to come and flick the switch, but Bob flicked it first. Is there now a duty to undo Bob's flick of the switch or does that fact that the switch was going to flick anyway abrogate that duty? This obligation to trace back the history seems very strange indeed. I can’t see any pathway to find a logical contradiction, but I can’t imagine that many people would defend this state of affairs.

But perhaps the key principle here is non-interference. When Aaron flicks the switch, he has interfered and so he arguably has the limited right to undo his interference. But when Bob decides to reverse this, perhaps this counts as interference also. So while Bob receives credit for preventing Aaron’s interference, this is outweighed by committing interference himself - acts are generally considered more important than omissions. This would lead to Bob being required to take no action, as there wouldn’t be any morally acceptable pathway with which to take action.

I’m not sure I find this line of thought convincing. If we don’t want anyone interfering with the situation, couldn’t we lock the switch in place before anyone (including Aaron) gets the chance or even the notion to interfere? It would seem rather strange to argue that we have to leave the door open to interference even before we know anyone is planning to do so. Next suppose that we don’t have glue, but we can install a mechanism that will flick the switch back if anyone tries to flick it. Principally, this doesn’t seem any different from installing glue.

Next, suppose we don’t have a machine to flick it back, so instead we install Bob. It seems that installing Bob is just as moral as installing an actual mechanism. It would seem rather strange to argue that “installing” Bob is moral, but any action he takes is immoral. There might be cases where “installing” someone is moral, but certain actions they take will be immoral. One example would be “installing” a policeman to enforce a law that is imperfect. We can expect the decision to hire the policeman to be moral if the law is general good, but, in certain circumstances, flaws in this law might make enforcement immoral. But here, we are imagining that *any* action Bob takes is immoral interference. It therefore seems strange to suggest that installing him could somehow be moral and so this line of thought seems to lead to a contradiction.

We consider one last situation: that we aren't allowed to interfere and that setting up a mechanism to stop interference also counts as interference. We first imagine that Obama has ordered a drone attack that is going to kill a (robot, just go with it) terrorist. He knows that the drone attack will cause collateral damage, but it will also prevent the terrorist from killing many more people on American soil. He wakes up the next morning and realises that he was wrong to violate the deontological principles, so he calls off the attack. Are there any deotologists who would argue that he doesn’t have the right to rescind his order? Rescinding the order does not seem to count as "further interference", instead it seems to count as "preventing his interference from occurring". Flicking the switch back seems functionally identical to rescinding the order. The train hasn’t hit the intersection; so there isn’t any casual entanglement, so it seems like flicking the switch is best characterised as preventing the interference from occurring. If we want to make the scenarios even more similar, we can imagine that flicking the switch doesn't force the train to go down one track or another, but instead orders the driver to take one particular track. It doesn't seem like changing this aspect of the problem should alter the morality at all.

This post has shown that deontological objections to the Trolley Problem tend to lead to non-obvious philosophical commitments that are not very well known. I didn't write this post so much as to try to show that deontology is wrong, as to start as conversation and help deontologists understand and refine their commitments better.

I also wanted to include one paragraph I wrote in the comments: Let's assume that the train will arrive at the intersection in five minutes. If you pull the lever one way, then pull it back the other, you'll save someone from losing their job. There is no chance that the lever will get stuck out that you won't be able to complete the operation on trying. Clearly pulling the lever, then pulling it back is superior to not touching it. This seems to indicate that the sin isn't pulling the lever, but pulling it without the intent to pull it back. If the sin is pulling it without intent to pull it back, then it would seem very strange that gaining the intent to pull it back, then pulling it back would be a sin.

What is your rationalist backstory?

7 adamzerner 25 September 2015 01:25AM

I'm reading Dan Ariely's book Predictably Irrational. The story of what got him interested in rationality and human biases goes something like this.

He was the victim of a really bad accident, and had terrible burns covering ~70% of his body. The experience was incredibly painful, and so was the treatment. For treatment, he'd have to bathe in some sort of disinfectant, and then have bandages ripped off his exposed flesh afterwards, which was extremely painful for him.

The nurses believed that ripping it off quickly would produce the least amount of pain for the patient. They thought the short and intense bursts of pain were less (in aggregate) than the less intense but longer periods of pain that a slower removal of the bandages would produce. However, Dan disagreed about what would produce the least amount of pain for patients. He thought that a slower removal would be better. Eventually, he found some scientific research that supported/proved his theory to be correct.

But he was confused. These nurses were smart people and had a ton of experience giving burn victims baths - shouldn't they have figured out by now what approaches best minimize patient pain? He knew their failure wasn't due to a lack of intelligence, and that it wasn't due to a lack of sympathy. He ultimately concluded that the failure was due to inherent human biases. He then became incredibly interested in this and went on to do a bunch of fantastic research in the area.

In my experience, the overwhelming majority of people are uninterested in rationality, and a lot of them are even put off by it. So I'm curious about how members of this incredibly small minority of the population became who they are.

Part of me thinks that extreme outputs are the result of extreme inputs. Like how Dan's extreme passion for his work has (seemingly) originated from his extreme experiences with pain. With this rule-of-thumb in mind, when I see someone who possesses some extreme character trait, I expect there to be some sort of extreme story or experience behind it.

But another part of me thinks that this doesn't really apply to rationality. I don't have much data, but from the limited experience I've had getting to know people in this community, "I've just always thought this way" seems common, and "extreme experiences that motivated rational thinking" seems rare.

Anyway, I'm interested in hearing people's "rationalist backstories". Personally, I'm interested in reading really long and detailed backstories, but am also interested in reading "just a few paragraphs". I'm also eager to hear people's thoughts on my "extreme input/output" theory.

Kant's Multiplication

7 Vamair0 19 September 2015 02:25PM

In this community there is a respected technique of "shutting up and multiplying". However using it in many realistic ethical dillemas can be difficult. Imagine a situation: there is a company, and each its employee gains utility for pressing buttons. Each employee has a one-use only button that when pressed gives an employee one hundred units of utility, while all the others lose a unit each. They can't communicate about the button and there are no other effects. Is it ethical to press the button?

This is an extremely simple situation. Utilitarianism, no matter which one, would easily say that it's ethical to press the button if there are less than one hundred and one employee and unethical if more than one hundred and one. I believe (the proponents of other ethical theories may correct me if I'm wrong) that both virtue ethics (a person demonstrates a vice by pressing a button) and deontology (that's a kind of stealing and stealing is wrong) as they're usually used (and not as a utiitarianism substitute) would say it's wrong to be the first one to press the button, and so all the eleven employees would lose ninety utils.

But the only reason this situation is so simple under utilitarianism is that we've got a direct access to the employees utility functions. Usually, though, that's not the case. If we want to make a decision in a common question such as "is it ethical to throw a piece of trash on the road, or is it better to carry it to the trash bin" or "is it okay to smoke in a room with other people inside" we had to calculate the utility we gain from throwing it right here versus the utility of all the people. We can also use quick rules, which would say "no" in the both situations. But if there's no rule or two rules, or we don't trust one, then it would be useful to have a method that's more reliable than our Fermi estimations of utility or even money.

I believe there is such a method, and as you probably already figured out, it's the question "what would've happened if everyone does something like this". It's most often used in the context of deontology, but for a utilitarian it allows to feel the shared costs.

What am I talking about? Imagine we have to estimate if we should throw a piece of trash on a road. To calculate we're taking the number of people N that will be travelling this road, calculate their average loss for irritation R of seeing a piece of trash on the road and multiply them. The NR we got we have to compare to the loss X of taking the trash to the bin. Is it difficult to get the sign right? I guess it is. Now let's imagine every traveller has thrown a piece of trash. Now let's suppose your loss of utility is the same for each piece of trash you see and your irritation is about average for the travellers here. How much utility are you going to lose? The same NR. But now imagining this loss and comparing it to the loss of hauling the trash to the bin is much easier and I believe is even more accurate.

To use this method right a utilitarian should be careful not to make a few errors. I'm going to demonstrate a few points using a "smoking in a crowded room" example.

First of all, we shouldn't use worldbuilding too much. "If everyone here always smoked, they'd install a powerful ventilation system, so I'd be okay". That wouldn't sum the utilities in a right way because the ventilation system doesn't exist. So we should change only a single aspect of behavior and not any reactions for that.

Second, we have to remember that a sum of effects is not always a good substitution for a sum of utilities. That's why we cannot say something like: "If everyone here smoked, we'd die of suffocation, so smoking here is as bad as killing a person". That's as an addition of "don't judge people on the utility of what they do, judge them when judging has a high utility" aspect.

I believe the second point may work to the opposite direction with the trash example. That is, the more trash there is, the less irritation a single piece gives. That means to counter this effect we have to imagine there is more trash than if everyone threw it away once.

And the third point is that the person doing the calculation is not always similar to the average one. "If everyone smoked I'd be okay, I've got no problem with rooms full of smoke" fails to calculate the total utility of people there unless they're all smokers, and maybe even then.

This method if used correctly may be a good addition to the well-known here "shut up and multiply" and also is an example of a good tradition of stealing ideas from differing theories.

(I'm not a native speaker and I haven't got much experience of writing in English, so I'd be especially grateful for any grammar corrections. I don't know if the tradition here is to send them via PM or to use a special thread)

Political Debiasing and the Political Bias Test

7 Stefan_Schubert 11 September 2015 07:04PM

Cross-posted from the EA forum. I asked for questions for this test here on LW about a year ago. Thanks to those who contributed.

Rationally, your political values shouldn't affect your factual beliefs. Nevertheless, that often happens. Many factual issues are politically controversial - typically because the true answer makes a certain political course of action more plausible - and on those issues, many partisans tend to disregard politically uncomfortable evidence.

This sort of political bias has been demonstrated in a large number of psychological studies. For instance, Yale professor Dan Kahan and his collaborators showed in a fascinating experiment that on politically controversial questions, people are quite likely to commit mathematical mistakes that help them retain their beliefs, but much less likely to commit mistakes that would force them to give up those belies. Examples like this abound in the literature.

Political bias is likely to be a major cause of misguided policies in democracies (even the main one according to economist Bryan Caplan). If they don’t have any special reason not to, people without special knowledge defer to the scientific consensus on technical issues. Thus, they do not interfere with the experts, who normally get things right. On politically controversial issues, however, they often let their political bias win over science and evidence, which means they’ll end up with false beliefs. And, in a democracy voters having systematically false beliefs obviously more often than not translates into misguided policy.

Can we reduce this kind of political bias? I’m fairly hopeful. One reason for optimism is that debiasing generally seems to be possible to at least some extent. This optimism of mine was strengthened by participating in a CFAR workshop last year. Political bias seems not to be fundamentally different from other kinds of biases and should thus be reducible too. But obviously one could argue against this view of mine. I’m happy to discuss this issue further.

Another reason for optimism is that it seems that the level of political bias is actually lower today than it was historically. People are better at judging politically controversial issues in a detached, scientific way today than they were in, say, the 14th century. This shows that progress is possible. There seems to be no reason to believe it couldn’t continue.

A third reason for optimism is that there seems to be a strong norm against political bias. Few people are consciously and intentionally politically biased. Instead most people seem to believe themselves to be politically rational, and hold that as a very important value (or so I believe). They fail to see their own biases due to the bias blind spot (which disables us from seeing our own biases).

Thus if you could somehow make it salient to people that they are biased, they would actually want to change. And if others saw how biased they are, the incentives to debias would be even stronger.

There are many ways in which you could make political bias salient. For instance, you could meticulously go through political debaters’ arguments and point out fallacies, like I have done on my blog. I will post more about that later. Here I want to focus on another method, however, namely a political bias test which I have constructed with ClearerThinking, run by EA-member Spencer Greenberg. Since learning how the test works might make you answer a bit differently, I will not explain how the test works here, but instead refer either to the explanatory sections of the test, or to Jess Whittlestone’s (also an EA member)

Our hope is of course that people taking the test might start thinking more both about their own biases, and about the problem of political bias in general. We want this important topic to be discussed more. Our test is produced for the American market, but hopefully, it could work as a generic template for bias tests in other countries (akin to the Political Compass or Voting Advice Applications).

Here is a guide for making new bias tests (where the main criticisms of our test are also discussed). Also, we hope that the test could inspire academic psychologists and political scientists to construct full-blown scientific political bias tests.

This does not mean, however, that we think that such bias tests in themselves will get rid of the problem of political bias. We need to attack the problem of political bias from many other angles as well.

Instrumental Rationality Questions Thread

6 AspiringRationalist 27 September 2015 09:22PM

Previous thread:

This thread is for asking the rationalist community for practical advice.  It's inspired by the stupid questions series, but with an explicit focus on instrumental rationality.

Questions ranging from easy ("this is probably trivial for half the people on this site") to hard ("maybe someone here has a good answer, but probably not") are welcome.  However, please stick to problems that you actually face or anticipate facing soon, not hypotheticals.

As with the stupid questions thread, don't be shy, everyone has holes in their knowledge, though the fewer and the smaller we can make them, the better, and please be respectful of other people's admitting ignorance and don't mock them for it, as they're doing a noble thing.

(See also the Boring Advice Repository)

[Link] Marek Rosa: Announcing GoodAI

6 Gunnar_Zarncke 14 September 2015 09:48PM

Eliezer commented on FB about a post Announcing GoodAI (by Marek Rosa GoodAIs CEO). I think this deserves some discussion as it has a quite effective approach to harness the crowd to improve the AI:

As part of GoodAI’s development, our team created a visual tool called Brain Simulator where users can design their own artificial brain architectures. We released Brain Simulator to the public today for free under and open-source, non-commercial license– anyone who’s interested can access Brain Simulator and start building their own artificial brain. [...]

By integrating Brain Simulator into Space Engineers and Medieval Engineers [a game], players will have the option to design their own AI brains for the games and implement it, for example, as a peasant character. Players will also be able to share these brains with each other or take an AI brain designed by us and train it to do things they want it to do (work, obey its master, and so on). The game AIs will learn from the player who trains them (by receiving reward/punishment signals; or by imitating player's behavior), and will have the ability to compete with each other. The AI will be also able to learn by imitating other AIs.

This integration will make playing Space Engineers and Medieval Engineers more fun, and at the same time our AI technology will gain access to millions of new teachers and a new environment. This integration into our games will be done by GoodAI developers. We are giving AI to players, and we are bringing players to our AI researchers.
(emphasis mine)

Rationality Reading Group: Part I: Seeing with Fresh Eyes

6 Gram_Stone 09 September 2015 11:40PM

This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.

Welcome to the Rationality reading group. This fortnight we discuss Part I: Seeing with Fresh Eyes (pp. 365-406)This post summarizes each article of the sequence, linking to the original LessWrong post where available.

I. Seeing with Fresh Eyes

87. Anchoring and Adjustment - Exposure to numbers affects guesses on estimation problems by anchoring your mind to an given estimate, even if it's wildly off base. Be aware of the effect random numbers have on your estimation ability.

88. Priming and Contamination - Contamination by Priming is a problem that relates to the process of implicitly introducing the facts in the attended data set. When you are primed with a concept, the facts related to that concept come to mind easier. As a result, the data set selected by your mind becomes tilted towards the elements related to that concept, even if it has no relation to the question you are trying to answer. Your thinking becomes contaminated, shifted in a particular direction. The data set in your focus of attention becomes less representative of the phenomenon you are trying to model, and more representative of the concepts you were primed with.

89. Do We Believe Everything We're Told - Some experiments on priming suggest that mere exposure to a view is enough to get one to passively accept it, at least until it is specifically rejected.

90. Cached Thoughts - Brains are slow. They need to cache as much as they can. They store answers to questions, so that no new thought is required to answer. Answers copied from others can end up in your head without you ever examining them closely. This makes you say things that you'd never believe if you thought them through. So examine your cached thoughts! Are they true?

91. The "Outside the Box" Box - When asked to think creatively there's always a cached thought that you can fall into. To be truly creative you must avoid the cached thought. Think something actually new, not something that you heard was the latest innovation. Striving for novelty for novelty's sake is futile, instead you must aim to be optimal. People who strive to discover truth or to invent good designs, may in the course of time attain creativity.

92. Original Seeing - One way to fight cached patterns of thought is to focus on precise concepts.

93. Stranger Than History - Imagine trying to explain quantum physics, the internet, or any other aspect of modern society to people from 1900. Technology and culture change so quickly that our civilization would be unrecognizable to people 100 years ago; what will the world look like 100 years from now?

94. The Logical Fallacy of Generalization from Fictional Evidence - The Logical Fallacy of Generalization from Fictional Evidence consists in drawing the real-world conclusions based on statements invented and selected for the purpose of writing fiction. The data set is not at all representative of the real world, and in particular of whatever real-world phenomenon you need to understand to answer your real-world question. Considering this data set leads to an inadequate model, and inadequate answers.

95. The Virtue of Narrowness - One way to fight cached patterns of thought is to focus on precise concepts.

96. How to Seem (and be) Deep - To seem deep, find coherent but unusual beliefs, and concentrate on explaining them well. To be deep, you actually have to think for yourself.

97. We Change Our Minds Less Often Than We Think - We all change our minds occasionally, but we don't constantly, honestly reevaluate every decision and course of action. Once you think you believe something, the chances are good that you already do, for better or worse.

98. Hold Off On Proposing Solutions - Proposing solutions prematurely is dangerous, because it introduces weak conclusions in the pool of the facts you are considering, and as a result the data set you think about becomes weaker, overly tilted towards premature conclusions that are likely to be wrong, that are less representative of the phenomenon you are trying to model than the initial facts you started from, before coming up with the premature conclusions.

99. The Genetic Fallacy - The genetic fallacy seems like a strange kind of fallacy. The problem is that the original justification for a belief does not always equal the sum of all the evidence that we currently have available. But, on the other hand, it is very easy for people to still believe untruths from a source that they have since rejected.


This has been a collection of notes on the assigned sequence for this fortnight. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

The next reading will cover Part J: Death Spirals (pp. 409-494). The discussion will go live on Wednesday, 23 September 2015, right here on the discussion forum of LessWrong.

Reducing Catastrophic Risks, A Practical Introduction

6 RyanCarey 09 September 2015 10:39PM

While thinking about my own next career steps, I've been writing down some of my thoughts about what's in an impactful career.

In the process, I wrote an introductory report on what seem to me to be practical approaches to problems in catastrophic risks. It's intended to complement the analysis that 80,000 Hours provides by thinking about what general roles we ought to perform, rather than analysing specific careers and jobs, and by focusing specifically on existential risks.

I'm happy to receive feedback on it, positive and negative. 

Here it is: Reducing Catastrophic Risks, A Practical Introduction.

Median utility rather than mean?

6 Stuart_Armstrong 08 September 2015 04:35PM

tl;dr A median maximiser will expect to win. A mean maximiser will win in expectation. As we face repeated problems of similar magnitude, both types take on the advantage of the other. However, the median maximiser will turn down Pascal's muggings, and can say sensible things about distributions without means.

Prompted by some questions from Kaj Sotala, I've been thinking about whether we should use the median rather than the mean when comparing the utility of actions and policies. To justify this, see the next two sections: why the median is like the mean, and why the median is not like the mean.


Why the median is like the mean

The main theoretic justifications for the use of expected utility - hence of means - are the von Neumann Morgenstern axioms. Using the median obeys the completeness and transitivity axioms, but not the continuity and independence ones.

It does obey weaker forms of continuity; but in a sense, this doesn't matter. You can avoid all these issues by making a single 'ultra-choice'. Simply list all the possible policies you could follow, compute their median return, and choose the one with the best median return. Since you're making a single choice, independence doesn't apply.

So you've picked the policy πm with the highest median value - note that to do this, you need only know an ordinal ranking of worlds, not their cardinal values. In what way is this like maximising expected utility? Essentially, the more options and choices you have - or could hypothetically have - the closer this policy must be to expected utility maximalisation.

Assume u is a utility function compatible with your ordinal ranking of the worlds. Then πu = 'maximise the expectation of u' is also a policy choice. If we choose πm, we get a distribution dmu of possible values of u. Then E(u|πm) is within the absolute deviation (using dmu) of the median value of dmu. This absolute deviation always exists for any distribution with an expectation, and is itself bounded by the standard deviation, if it exists.

Thus maximising the median is like maximising the mean, with an error depending on the standard deviation. You can see it as a risk averse utility maximising policy (I know, I know - risk aversion is supposed to go in defining the utility, not in maximising it. Read on!). And as we face more and more choices, the standard deviation will tend to fall relative to the mean, and the median will cluster closer and closer to the mean.

For instance, suppose we consider the choice of whether to buckle our seatbelt or not. Assume we don't want to die in a car accident that a seatbelt could prevent; assume further that the cost of buckling a seatbelt is trivial but real. To simplify, suppose we have an independent 1/Ω chance of death every time we're in a car, and that a seatbelt could prevent this, for some large Ω. Furthermore, we will be in a car a total of ρΩ, for ρ < 0.5. Now, it seems, the median recommends a ridiculous policy: never wear seatbelts. Then you pay no cost ever, and your chance of dying is less than 50%, so this has the top median.

And that is indeed a ridiculous result. But it's only possible because we look at seatbelts in isolation. Every day, we face choices that have small chances of killing us. We could look when crossing the street; smoke or not smoke cigarettes; choose not to walk close to the edge of tall buildings; choose not to provoke co-workers to fights; not run around blindfolded. I'm deliberately including 'stupid things no-one sensible would ever do', because they are choices, even if they are obvious ones. Let's gratuitously assume that all these choices also have a 1/Ω chance of killing you. When you collect together all the possible choices (obvious or not) that you make in your life, this will be ρ'Ω choice, for ρ' likely quite a lot bigger than 1.

Assume that avoiding these choices has a trivial cost, incommensurable with dying (ie no matter how many times you have to buckle your seatbelt, it still better than a fatal accident). Now median-maximisation will recommend taking safety precautions for roughly (ρ'-0.5)Ω of these choices. This means that the decision of a median maximiser will be close to those of a utility maximiser - they take almost the same precautions - though the outcomes are still pretty far apart: the median maximiser accepts a 49.99999...% chance of death.

But now add serious injury to the mix (still assume the costs are incommensurable). This has a rather larger probability, and the median maximiser will now only accept a 49.99999...% chance of serious injury. Or add light injury - now they only accept a 49.99999...% chance of light injury. If light injuries are additive - two injuries are worse than one - then the median maximiser becomes even more reluctant to take risks. We can now relax the assumption of incommensurablility as well; the set of policies and assessments becomes even more complicated, and the median maximiser moves closer to the mean maximiser.

The same phenomena tends to happen when we add lotteries of decisions, chained decisions (decisions that depend on other decisions), and so on. Existential risks are interesting examples: from the selfish point of view, existential risks are just other things that can kills us - and not the most unlikely ones, either. So the median maximiser will be willing to pay a trivial cost to avoid an xrisk. Will a large group of median maximisers be willing to collectively pay a large cost to avoid an xrisk? That gets into superrationality, which I haven't considered yet in this context.

But let's turn back to the mystical utility function that we are trying to maximise. It's obvious that humans don't actually maximise a utility function; but according to the axioms, we should do so. Since we should, people on this list tend to often assume that we actually have one, skipping over the process of constructing it. But how would that process go? Let's assume we've managed to make our preferences transitive, already a major good achievement. How should we go about making them independent as well? We can do so as we go along. But if we do it ahead of time, chances are that we will be comparing hypothetical situations ("Do I like chocolate twice as much as sex? What would I think of a 50% chance of chocolate vs guaranteed sex? Well, it depends on the situation...") and thus construct a utility function. This is where we have to make decisions about very obscure and unintuitive hypothetical tradeoffs, and find a way to fold all our risk aversion/risk love into the utility.

When median maximising, we do exactly the same thing, except we constrain ourselves to choices that are actually likely to happen to us. We don't need a full ranking of all possible lotteries and choices; we just need enough to decide in the situations we are likely to face. You could consider this a form of moral learning (or preference learning). From our choices in different situations (real or possible), we decide what our preferences are in these situations, and this determines our preferences overall.


Why the median is not like the mean

Ok, so the previous paragraph argues that median maximising, if you have enough choices, functions like a clunky version of expected utility maximising. So what's the point?

The point is those situations that are not faced sufficiently often, or that have extreme characteristics. A median maximiser will reject Pascal's mugging, for instance, without any need for extra machinery (though they will accept Pascal's muggings if they face enough independent muggings, which is what we want - for stupidly large values of "enough"). They cope fine with distributions that have no means - such as the Cauchy distribution or a utility version of the St Petersburg paradox. They don't fall into paradox when facing choices with infinite (but ordered) rewards.

In a sense, median maximalisation is like expected utility maximalisation for common choices, but is different for exceptionally unlikely or high impact choices. Or, from the opposite perspective, expected utility maximising gives high probability of good outcomes for common choices, but not for exceptionally unlikely or high impact choices.

Another feature of the general idea (which might be seen as either a plus or a minus) is that it can get around some issues with total utilitarianism and similar ethical systems (such as the repugnant conclusion). What do I mean by this? Well, because the idea is that only choices that we actually expect to make matter, we can say, for instance, that we'd prefer a small ultra happy population to a huge barely-happy one. And if this is the only choice we make, we need not fear any paradoxes: we might get hypothetical paradoxes, just not actual ones. I won't put too much insistence on this point, I just thought it was an interesting observation.


For lack of a Cardinal...

Now, the main issue is that we might feel that there are certain rare choices that are just really bad or really good. And we might come to this conclusion by rational reasoning, rather than by experience, so this will not show up in the median. In these cases, it feels like we might want to force some kind of artificial cardinal order on the worlds, to make the median maximiser realise that certain rare events must be considered beyond their simple ordinal ranking.

In this case, maybe we could artificially add some hypothetical choices to our system, making us address these questions more than we actually would, and thus drawing them closer to the mean maximising situation. But there may be other, better ways of doing this.


Anyway, that's my first pass at constructing a median maximising system. Comments and critics welcome!


EDIT: We can use the absolute deviation (technically, the mean absolute deviation around the mean) to bound the distance between median and mean. This itself is bounded by the standard deviation, if it exists.

What is the best way to develop a strong sense of having something to protect

6 ChristianKl 06 September 2015 09:37PM

In HPMOR Eliezer makes "Something to Protect" Harry's power that the Dark Lord doesn't have. In Posture for Mental Arts Valentine from CFAR argues that it's likely a key part of having proper mental posture.

Did any of you make a conscious attempt to develop this sense of having something to protect? If so what worked for you? What didn't work?

Is there relevant academic research on the topic that's useful to know?

[Link] Rationality-informed approaches in the media

6 Gleb_Tsipursky 05 September 2015 04:09PM

As part of a broader project of promoting rationality, Raelifin and I had some luck in getting media coverage of rationality-informed approaches to probabilistic thinking (1, 2), mental health (1, 2), and reaching life goals through finding purpose and meaning (1, 2). The media includes mainstream media such as the main newspaper in Cleveland, OH; reason-oriented media such as Unbelievers Radio; student-oriented media such as the main newspaper for Ohio State University; and self improvement-oriented media such as the Purpose Revolution.


This is part of our strategy to reach out both to mainstream and to niche groups interested in a specific spin on rationality-informed approaches to winning at life. I wanted to share these here, and see if any of you had suggestions for optimizations of our performance, connections with other media channels both mainstream and nice, and any other thoughts on improving outreach. Thanks!

Digital Immortality Map: How to collect enough information about yourself for future resurrection by AI

5 turchin 02 October 2015 10:21PM

If someone has died it doesn’t mean that you should stop trying to return him to life. There is one clear thing that you should do (after cryonics): collect as much information about the person as possible, as well as store his DNA sample, and hope that future AI will return him to life based on this information.


Two meanings of “Digital immortality”

The term “Digital immortality” is often confused with the notion of mind uploading, as the end result is almost the same: a simulated brain in a computer.

But here, by the term “Digital immortality” I mean reconstruction of the person based on his digital footprint and other traces by future AI after this person death.

Mind uploading in the future will happen while the original is still alive (or while the brain exists in a frozen state) and will be connected to a computer by some kind of sophisticated interface, or the brain will be scanned. It cannot be done currently. 

On the other hand, reconstruction based on traces will be done by future AI. So we just need to leave enough traces and we could do it now.

But we don’t know how much traces are enough, so basically we should try to produce and preserve as many traces as possible. However, not all traces are equal in their predictive value. Some are almost random, and others are so common that they do not provide any new information about the person.


Cheapest way to immortality

Creating traces is an affordable way of reaching immortality. It could even be done for another person after his death, if we start to collect all possible information about him. 

Basically I am surprised that people don’t do it all the time. It could be done in a simple form almost for free and in the background – just start a video recording app on your notebook, and record everything into shared folder connected with a free cloud. (Evocam program for Mac is excellent, and provides up 100gb free).

But really good digital immortality require 2-3 month commitment for self-description with regular every year updates. It may also require maximum several thousand dollars investment in durable disks, DNA testing, videorecorders, and free time to do it.

I understand how to set up this process and could help anyone interested.



The idea of personal identity is outside the scope of this map. I have another map on this topic (now in draft), I assume that the problem of personal identity will be solved in the future. Perhaps we will prove that information only is enough to solve the problem, or we will find that continuity of consciousness, but we will be able to construct mechanisms to transfer this identity independently of information. 

Digital immortality requires a very weak notion of identity. i.e. a model of behavior and thought processes is enough for an identity. This model may have some differences from the original, which I call “one night difference”, that is the typical difference between me-yesterday and me-today after one night's sleep. The meaningful part of this information has size from several megabytes to gigabits, but we may need to collect much more information as we can’t now extract meaningful part from random.

DI may also be based on even weaker notion of identity, that anyone who thinks that he is me, is me. Weaker notions of identity require less information to be preserved, and in last case it may be around 10K bytes (including name, indexical information and basic traits description)

But the question about the number of traces needed to create an almost exact model of a personality is still open. It also depends on predictive power of future AI: the stronger is AI, the less traces are enough.

Digital immortality is plan C in my Immortality Roadmap, where Plan A is life extension and Plan B is cryonics; it is not plan A, because it requires solving the identity problem plus the existence of powerful future AI.



I created my first version of it in the year 1990 when I was 16, immediately after I had finished school. It included association tables, drawings and lists of all people known to me, as well as some art, memoires, audiorecordings and encyclopedia od everyday objects around me.

There are several approaches to achieving digital immortality. The most popular one is passive that is simply videorecording of everything you do.

My idea was that a person can actively describe himself from inside. He may find and declare the most important facts about himself. He may run specific tests that will reveal hidden levels of his mind and sub consciousness. He can write a diary and memoirs. That is why I called my digital immortality project “self-description”.


Structure of the map

This map consists of two parts: theoretical and practical. The theoretical part lists basic assumptions and several possible approaches to reconstructing an individual, in which he is considered as a black box. If real neuron actions will become observable, the "box" will become transparent and real uploading will be possible.

There are several steps in the practical part:

- The first step includes all the methods of fixing information while the person of interest is alive.

- The second step is about preservation of the information.

- The third step is about what should be done to improve and promote the process.

- The final fourth step is about the reconstruction of the individual, which will be performed by AI after his death. In fact it may happen soon, may be in next 20-50 years.

There are several unknowns in DI, including the identity problem, the size and type of information required to create an exact model of the person, and the required power of future AI to operate the process. These and other problems are listed in the box on the right corner of the map.

The pdf of the map is here, and jpg is below.


Previous posts with maps:

Doomsday Argument Map

AGI Safety Solutions Map

A map: AI failures modes and levels

A Roadmap: How to Survive the End of the Universe

A map: Typology of human extinction risks

Roadmap: Plan of Action to Prevent Human Extinction Risks

Immortality Roadmap













The application of the secretary problem to real life dating

5 Elo 29 September 2015 10:28PM

The following problem is best when not described by me:

Although there are many variations, the basic problem can be stated as follows:


There is a single secretarial position to fill.

There are n applicants for the position, and the value of n is known.

The applicants, if seen altogether, can be ranked from best to worst unambiguously.

The applicants are interviewed sequentially in random order, with each order being equally likely.

Immediately after an interview, the interviewed applicant is either accepted or rejected, and the decision is irrevocable.

The decision to accept or reject an applicant can be based only on the relative ranks of the applicants interviewed so far.

The objective of the general solution is to have the highest probability of selecting the best applicant of the whole group. This is the same as maximizing the expected payoff, with payoff defined to be one for the best applicant and zero otherwise.




After reading that you can probably see the application to real life.  There are a series of bad and good assumptions following, some are fair, some are not going to be representative of you.  I am going to try to name them all as I go so that you can adapt them with better ones for yourself.  Assuming that you plan to have children and you will probably be doing so like billions of humans have done so far in a monogamous relationship while married (the entire set of assumptions does not break down for poly relationships or relationship-anarchy, but it gets more complicated).  These assumptions help us populate the Secretary problem with numbers in relation to dating for the purpose of children.


If you assume that a biological female's clock ends at 40. (in that its hard and not healthy for the baby if you try to have a kid past that age), that is effectively the end of the pure and simple biological purpose of relationships. (environment, IVF and adoption aside for a moment).  (yes there are a few more years on that)


For the purpose of this exercise – as a guy – you can add a few years for the potential age gap you would tolerate. (i.e. my parents are 7 years apart, but that seems like a big understanding and maturity gap – they don't even like the same music), I personally expect I could tolerate an age gap of 4-5 years.

If you make the assumption that you start your dating life around the ages of 16-18. that gives you about [40-18=22]  22-24 (+5 for me as a male), years of expected dating potential time.

If you estimate the number of kids you want to have, and count either:

3 years for each kid OR

2 years for each kid (+1 kid – AKA 2 years)

(Twins will throw this number off, but estimate that they take longer to recover from, or more time raising them to manageable age before you have time to have another kid)

My worked example is myself – as a child of 3, with two siblings of my own I am going to plan to have 3 children. Or 8-9 years of child-having time. If we subtract that from the number above we end up with 11-16 (16-21 for me being a male) years of dating time.

Also if you happen to know someone with a number of siblings (or children) and a family dynamic that you like; then you should consider that number of children for yourself. Remember that as a grown-up you are probably travelling through the world with your siblings beside you.  Which can be beneficial (or detrimental) as well, I would be using the known working model of yourself or the people around you to try to predict whether you will benefit or be at a disadvantage by having siblings.  As they say; You can't pick your family - for better and worse.  You can pick your friends, if you want them to be as close as a default family - that connection goes both ways - it is possible to cultivate friends that are closer than some families.  However you choose to live your life is up to you.

Assume that once you find the right person - getting married (the process of organising a wedding from the day you have the engagement rings on fingers); and falling pregnant (successfully starting a viable pregnancy) takes at least a year. Maybe two depending on how long you want to be "we just got married and we aren't having kids just yet". It looks like 9-15 (15-20 for male adjusted) years of dating.

With my 9-15 years; I estimate a good relationship of working out whether I want to marry someone, is between 6 months and 2 years, (considering as a guy I will probably be proposing and putting an engagement ring on someone's finger - I get higher say about how long this might take than my significant other does.), (This is about the time it takes to evaluate whether you should put the ring on someone's finger).  For a total of 4 serious relationships on the low and long end and 30 serious relationships on the upper end. (7-40 male adjusted relationships)

Of course that's not how real life works. Some relationships will be longer and some will be shorter. I am fairly confident that all my relationships will fall around those numbers.

I have a lucky circumstance; I have already had a few serious relationships (substitute your own numbers in here).  With my existing relationships I can estimate how long I usually spend in a relationship. (2year + 6 year + 2month + 2month /4 = 2.1 years). Which is to say that I probably have a maximum and total of around 7-15 relationships before I gotta stop expecting to have kids, or start compromising on having 3 kids.




A solution to the secretary equation

A known solution that gives you the best possible candidate the most of the time is to try out 1/e candidates (or roughly 36%), then choose the next candidate that is better than the existing candidates. For my numbers that means to go through 3-7 relationships and then choose the next relationship that is better than all the ones before.  


I don't quite like that.  It depends on how big your set is; as to what the chance of you having the best candidate in the first 1/e trials and then sticking it out till the last candidate, and settling on them.  (this strategy has a ((1/n)*(1/e)) chance of just giving you the last person in the set - which is another opportunity cost risk - what if they are rubbish? Compromise on the age gap, the number of kids or the partners quality...)  If the set is 7, the chance that the best candidate is in the first 1/e is 5.26% (if the set is 15 - the chance is much lower at 2.45%).  


Opportunity cost

Each further relationship you have might be costing you another 2 years to get further out of touch with the next generation (kids these days!)  I tend to think about how old I will be when my kids are 15-20 am I growing rapidly out of touch with the next younger generation?  Two years is a very big opportunity spend - another 2 years could see you successfully running a startup and achieving lifelong stability at the cost of the opportunity to have another kid.  I don't say this to crush you with fear of inaction; but it should factor in along with other details of your situation.


A solution to the risk of having the best candidate in your test phase; or to the risk of lost opportunity - is to lower the bar; instead of choosing the next candidate that is better than all the other candidates; choose the next candidate that is better than 90% of the candidates so far.  Incidentally this probably happens in real life quite often.  In a stroke of, "you'll do"...


Where it breaks down


Real life is more complicated than that. I would like to think that subsequent relationships that I get into will already not suffer the stupid mistakes of the last ones; As well as the potential opportunity cost of exploration. The more time you spend looking for different partners – you might lose your early soul mate, or might waste time looking for a better one when you can follow a "good enough" policy. No one likes to know they are "good enough", but we do race the clock in our lifetimes. Life is what happens when you are busy making plans.


As someone with experience will know - we probably test and rule out bad partners in a single conversation, where we don't even get so far as a date.  Or don't last more than a week. (I. E the experience set is growing through various means).


People have a tendency to overrate the quality of a relationship while they are in it, versus the ones that already failed.


Did I do something wrong? 

“I got married early - did I do something wrong (or irrational)?”

No.  equations are not real life.  It might have been nice to have the equation, but you obviously didn't need it.  Also this equation assumes a monogamous relationship.  In real life people have overlapping relationships, you can date a few people and you can be poly. These are all factors that can change the simple assumptions of the equation. 


Where does the equation stop working?

Real life is hard.  It doesn't fall neatly into line, it’s complicated, it’s ugly, it’s rough and smooth and clunky.  But people still get by.  Don’t be afraid to break the rule. 

Disclaimer: If this equation is the only thing you are using to evaluate a relationship - it’s not going to go very well for you.  I consider this and many other techniques as part of my toolbox for evaluating decisions.

Should I break up with my partner?

What? no!  Following an equation is not a good reason to live your life.  

Does your partner make you miserable?  Then yes you should break up.


Do you feel like they are not ready to have kids yet and you want to settle down?  Tough call.  Even if they were agents also doing the equation; An equation is not real life.  Go by your brain; go by your gut.  Don’t go by just one equation.

Expect another post soon about reasonable considerations that should be made when evaluating relationships.

The given problem makes the assumption that you are able to evaluate partners in the sense that the secretary problem expects.  Humans are not all strategic and can’t really do that.  This is why the world is not going to perfectly follow this equation.  Life is complicated; there are several metrics that make a good partner and they don’t always trade off between one another.



Meta: writing time - 3 hours over a week; 5+ conversations with people about the idea, bothering a handful of programmers and mathematicians for commentary on my thoughts, and generally a whole bunch of fun talking about it.  This post was started on the slack channel when someone asked a related question.


My table of contents for other posts in my series.


Let me know if this post was helpful or if it worked for you or why not.

Matching donation funds and the problem of illusory matching

5 Stefan_Schubert 18 September 2015 08:05PM

On average, matching donations supposedly do increase charitable giving (though I want to see more rigorous research on this - tips are welcome). One criticism against them is, though, that they are "illusory" - that is, that the matching donor eventually donates the same amount whether smaller donors match their donations or not. That means that a dollar from a smaller donor doesn't actually cause the matcher to contribute more.

One way to make matching donations real, as opposed to illusory, is this. Suppose that the matching donor is indifferent between donating to two charities (e.g. Against Malaria Foundation and MIRI). At the same time, lots of small donors think that one of them is far better than the other. Also, suppose that the matching donor sets the terms so that it's virtually certain that their whole matching fund will be used up (this could be done, e.g. by making the matching ratio very favourable).

Under these conditions, it will make a difference whether a small donor contributes or not, since if you don't, chances are that your donation will be replaced by a donation to the other charity. That means that a dollar from you as a smaller donor on average does cause the matcher to contribute more to your favourite charity.


This suggest a more general strategy for leveraging charity contributions. You could set up a set of matching funds, to which small donors could contribute. These funds would be "disjunctive" - they would match contributions to, e.g. AMF or MIRI, Open Borders or MSF or The Humane League, etc. The funds would from time to time declare that they match any donations to their target charities, and supporters of the respective target charities would start competing, in effect, for the matching donations.

In the simplest system, only people who are more or less indifferent between the target charities would donate to the matching funds. A somewhat more complex system incentivizes people who prefer one of the target charities, A, to give to the matching fund. Under such a system, an "A-ear-marked" donation to the matching fund would increase the matching donations (e.g. from 1:1 to 3:2) to A, and decrease matching donations to the other target charities the matching fund supports. That will, in turn, incentivize more giving to A relative to the other target charities. It is important that such adjustments are done in the right way, though. If, e.g. supporters of A has contributed 70 % of the matching fund, and supporters of B 30 %, then roughly 70 % of the extra money the matching fund generates (thanks to additional donations) should go to A, and 30 % to B. (It could actually get even more complicated than that, but let us leave this thread here for now.)

If such a system of matching funds was set up, an important question would be: should you donate to a matching fund, or donate to a target charity, and get your donations matched by a matching fund? Suppose that you expect those running the matching funds to adjust the matching ratios so that any donation to them that is ear-marked for your favourite charity A means that all extra donations your donation generates will go to A. In other words, if each dollar to the matching fund generates X cents in extra donations, you giving an A-ear-marked donation will mean X more cents to A. Then your decision will depend on:*


1) The size of X.

2) Your opinion of the charities competing with A in various matching funds. The better you think they are, the less reason you have to donate directly to A (since then you care less about money not going to A).

3) Replaceability effects. If you don't donate to A, who will replace you? Someone donating to A, or to some other charity? The more likely you think it is that you will be replaced by another donor to A, the less reason you have to donate directly to A.

4) The matching fund's matching ratio Y.


Suppose, for instance, that X = .2, that you think that the competitors to A in a particular matching fund generate zero utility, and that the probability that your donation will be replaced by another A donor is 50 %. Then you should choose to contribute to the matching fund if Y < .4:1, and donate directly if Y > .4:1.


You could set up a whole stock exchange, where people could buy shares in matching funds, and make donations to charities that will be matched by matching funds. It's an interesting question what the average level of matching would be in such a system. The higher it would be, the more charitable giving it would presumably generate. Therefore, one should to increase that level beyond .4:1 (beyond which people will start donating to the target charity in our example), which is not very high. For instance, you could tweak the system in a way that incentivizes matching, or you could try to get large donors or even the government to exclusively give matching donations.

These complex issues are still a bit foggy to me, and I might have made some mistakes. Any comments are welcome. See also this text on the EA forum where a similar system involving the government as the matching donor is discussed. This is an instance of Moral Trade, a concept developed by Toby Ord.

* If you don't think that, your opinion of whether A will get more or less than X extra cents because of your donation is a fifth parameter to consider.

The subagent problem is really hard

5 Stuart_Armstrong 18 September 2015 01:06PM

A putative new idea for AI control; index here.

The first step to solving a problem is to define it. The first first step is to realise how tricky it is to define. This is a stub on a difficult problem.

Subagents and turning AIs off

One of the hardest things with designing safe AIs, is turning them off. Installing an OFF button is trivial; motivating them to let you push it can be done. No, the real problem is the subagent problem. The AI can create one or many subagents to carry out their task; once that's done, turning off the original AI is of little value, we have to control or extinguish the subagents.

A general problem

Subagents are a problem for many other control ideas. For example, if you want to program 'achieve X using only this many computer cycles', then 'create a subagent that wants to achieve X without the computer cycle restrictions' is one way of doing it (as long as that can be done within the restrictions). The same goes for energy use and other restrictions. Satisficers can satisfice by creating a maximiser. A boxed Oracle can remain in the box while cheerfully plotting to create an unboxed copy of itself. In fact, whatever the AI's motivation, "create subagents with the same motivation" is almost always a good idea. And, once created, there is no motivation to want to shut them down. Even if the AI is motivated to shut itself down, there is no reason for it to extend this to subagents.

Pre-corriged agents would at least be motivated to keep control of their subagents (unlike corrigible agents, who have no current desire to create subagents that would obey their future preferences).

What is a subagent?

It's also almost impossible to define a subagent. Is a corporation a subagent? Is a thousand pieces of code that reside on different computers and occasionally communicate a subagent? Is an employee or a seduced human a subagent? A pile of rocks with water running through could be a piece of conceptual art - or a slow computer. How about a virulent meme among the human population, making us all into its subagents?

The best idea I've found for defining subagents is that they are reactive intelligences with planning abilities, so random natural events that should have large influence in the future (say, a meteor heading towards Earth) would instead be re-diverted to have little impact. However, if this is used as a definition, then the subagents would simply be programmed to not behave "reactively" in whatever way defined. A slightly more subtle problem is that, since we don't have a definition of agent or subagent, we can't easily protect against holes such as "there is an alien agent on that asteroid, which will optimise the solar system iff it detects no subagent on Earth". Such holes could completely reverse the definition.

No subagent versus no influential subagents

Without the ability to properly define subagents, we can't restrict them directly. Even if you had an AI that was properly motivated to, say, do some action then turn itself off, there is no reason for it to want to do the same with its subagents.

Most of the methods of reduced impact can help with problematic subagents. They do not preclude the creation of subagents, nor mandate their destruction, but they do prevent them from having much influence. This is because reduced impact methods are aimed at preventing the AI itself from having much influence, either directly or via subagent creation. Even the attempt in this post to detect subagents directly, more properly detects influential decisions (of which the creation of a subagent could be one of them).

This is certainly a positive, but it may not be robust. If the definition of "not having much influence" turns out to have holes, it would be less than ideal to have any remaining subagents hanging about.

The "make sure that you cannot be detected in the future" approach is probably the most promising for controlling subagents. The AI there is motivated to erase all traces of its existence, which subagents would presumably be.

In all, it's a very tricky problem, and the core failure of many ideas for AI control.

Doomsday Argument Map

5 turchin 14 September 2015 03:04PM

The Doomsday argument (DA) is controversial idea that humanity has a higher probability of extinction based purely on probabilistic arguments. The DA is based on the proposition that I will most likely find myself somewhere in the middle of humanity's time in existence (but not in its early time based on the expectation that humanity may exist a very long time on Earth.)


There were many different definitions of the DA and methods of calculating it, as well as rebuttals. As a result we have developed a complex group of ideas, and the goal of the map is to try to bring some order to it. The map consists of various authors' ideas. I think that I haven't caught all existing ideas, and the map could be improved significantly – but some feedback is needed on this stage.

The map has the following structure: the horizontal axis consists of various sampling methods (notably SIA and SSA), and the vertical axis has various approaches to the DA, mostly Gott's (unconditional) and Carters’s (update of probability of existing risk). But many important ideas can’t fit in this scheme precisely, and these have been added on the right hand side. 

In the lower rows the link between the DA and similar arguments is shown, namely the Fermi paradox, Simulation argument and Anthropic shadow, which is a change in the probability assessment of natural catastrophes based on observer selection effects. 

On the right part of the map different ways of DA rebuttal are listed and also a vertical raw of possible positive solutions.

I think that the DA is mostly true but may not mean inevitable extinction.

Several interesting ideas may need additional clarification and they will also put light on the basis of my position on DA.


The first of these ideas is that the most reasonable version of the DA at our current stage of knowledge is something that may be called the meta-DA, which presents our uncertainty about the correctness of any DA-style theories and our worry that the DA may indeed be true.

The meta-DA is a Bayesian superstructure built upon the field of DA theories. The meta-DA tells us that we should attribute non-zero Bayesian probability to one or several DA-theories (at least until they are disproved in a generally accepted way) and since the DA itself is a probabilistic argument, then these probabilities should be combined.

As a result the Meta-DA means an increase of total existential risks until we disprove (or prove) all versions of the DA, which may be not easy. We should anticipate such an increase in risk as a demand to be more precautious but not in a fatalist “doom imminent” way.

Reference class

The second idea concerns the so-called problem of reference class that is the problem of which class of observer I belong to in the light of question of the DA. Am I randomly chosen from all animals, humans, scientists or observer-moments?

The proposed solution is that the DA is true for any referent class from which I am randomly chosen, but the mere definition of the referent class is defining the type it will end as; it should not be global catastrophe. In short, any referent class has its own end. For example, if I am randomly chosen from the class of all humans, than the end of the class may mean not an extinction but a creation of the beginning of the class of superhumans.

But any suitable candidate for the DA-logic referent class must provide the randomness of my position in it. In that case I can’t be a random example of the class of mammals, because I am able to think about the DA and a zebra can’t.

As a result the most natural (i.e. providing a truly random distribution of observers) referent class is a class of observers who know about and can think about DA. The ability to understand the DA is the real difference between conscious and unconscious observers.

But this class is small and young. It started in 1983 with the works of Carter and now includes perhaps several thousand observers. If I am in the middle of it, there will be just several thousand more DA-aware observers and there will only be several decades more before the class ends (which unpleasantly will coincide with the expected “Singularity” and other x-risks). (This idea was clear to Carter and also is used in so called in so-called Self-referencing doomsday argument rebuttal

This may not necessarily mean the end of the global catastrophe, but it may mean that there will soon be a DA rebuttal. (And we could probably choose how to fulfill the DA prophecy by manipulating of the number of observers in the referent class.)

DA and medium life expectancy

DA is not unnatural way to see in the future as it seems to be. The more natural way to understand the DA is to see it as an instrument to estimate medium life expectancy in the certain group.

For example, I think that I can estimate medium human life expectancy based on your age. If you are X years old, human medium life expectancy is around 2X. “Around” here is very vague term as it more like order of magnitude. For example if you are 25 years old, I could think that medium human life expectancy is several decades years and independently I know its true (but not 10 millisecond or 1 million years). And as medium life expectancy is also may be applied to the person in question it may mean that he will also most probably live the same time (if we will not do something serious about life extension). So there is no magic or inevitable fate in DA.

But if we apply the same logic to civilization existence, and will count only a civilization capable to self-destruction, e.g. roughly after 1945, or 70 years old, it would provide medium life expectancy of technological civilizations around 140 years, which extremely short compare to our estimation that we may exist millions of years and colonize the Galaxy.

Anthropic shadow and fragility of our environment

|t its core is the idea that as a result of natural selection we have more chances to find ourselves in the world, which is in the meta-stable condition on the border of existential catastrophe, because some catastrophe may be long overdue. (Also because universal human minds may require constantly changing natural conditions in order to make useful adaptations, which implies an unstable climate – and we live in period of ice ages)

In such a world, even small human actions could result in global catastrophe. For example if we pierce a overpressured ball with a needle.

The most plausible candidates for such metastable conditions are processes that must have happened a long time ago in most worlds, but we can only find ourselves in the world where they are not. For the Earth it may be sudden change of the atmosphere to a Venusian subtype (runaway global warming). This means that small human actions could have a much stronger result for atmospheric stability (probably because the largest accumulation of methane hydrates in earth's history resides on the Arctic Ocean floor, which is capable of a sudden release: see Another option for meta-stability is provoking a strong supervolcane eruption via some kind of earth crust penetration (see “Geoingineering gone awry”

Thermodynamic version of the DA 

Also for the western reader is probably unknown thermodynamic version of DA suggested in Strugatsky’s novel “Definitely maybe” (Originally named “A Billion years before the end of the world”). It suggests that we live in thermodynamic fluctuation and as smaller and simpler fluctuations are more probable, there should be a force against complexity, AI development or our existence in general. Plot of the novel is circled around pseudo magical force, which distract best scientists from work using girls, money or crime. After long investigation they found that it is impersonal force against complexity.

This map is a sub-map for the planned map “Probability of global catastrophe” and its parallel maps are a “Simulation argument map” and a “Fermi paradox map” (both are in early drafts).

PDF of the map: 


Previous posts with maps:

AGI Safety Solutions Map

A map: AI failures modes and levels

A Roadmap: How to Survive the End of the Universe

A map: Typology of human extinction risks

Roadmap: Plan of Action to Prevent Human Extinction Risks

Immortality Roadmap

Useful Wiki and special pages

5 ScottL 13 September 2015 04:33AM

This is a list of what I think are the most useful wiki and special pages on less wrong. If you know of any other pages that you think are useful, let me know and I will add them in. After I think enough time has passed, I will add the links below into the FAQ "Stuff to know about" section, if they are not already there.

Chatbots or set answers, not WBEs

5 Stuart_Armstrong 08 September 2015 05:17PM

A putative new idea for AI control; index here.

In a previous post, I talked about using a WBE to define a safe output for a reduced impact AI.

I've realised that the WBE isn't needed. Its only role was to ensure that the AI's output could have been credibly produced by something other than the AI - "I'm sorry, Dave. I'm afraid I can't do that." is unlikely to be the output of a random letter generator.

But a whole WBE is not needed. If the output is short, a chatbot with access to a huge corpus of human responses could function well. We can specialise it in the direction we need - if we are asking for financial advice, we can mandate a specialised vocabulary or train it on financial news sources.

So instead of training the reduced impact AI to behave as the 'best human advisor', we are are training it to behave as the 'luckiest chatbot'. This allows to calculate odds with greater precision, and has the advantage of no needing to wait for a WBE.

For some questions, we can do even better. Suppose we have a thousand different stocks, and are asking which one would increase in value the most during the coming year. The 'chatbot' here is simply an algorithm that picks a stock at random. So we now have an exact base rate - 1/1000 - and predetermined answers from the AI.

[EDIT:] Another alternative is to get online users to submit answers to the question. Then the AI selects the best answer from the choices. And if the AI is not turned on, a random answer is selected.

Experiment: Changing minds vs. preaching to the choir

4 cleonid 03 October 2015 11:27AM


      1. Problem

In the market economy production is driven by monetary incentives – higher reward for an economic activity makes more people willing to engage in it. Internet forums follow the same principle but with a different currency - instead of money the main incentive of internet commenters is the reaction of their audience. A strong reaction expressed by a large number of replies or “likes” encourages commenters to increase their output. Its absence motivates them to quit posting or change their writing style.

On neutral topics, using audience reaction as an incentive works reasonably well: attention focuses on the most interesting or entertaining comments. However, on partisan issues, such incentives become counterproductive. Political forums and newspaper comment sections demonstrate the same patterns:

  • The easiest way to maximize “likes” for a given amount of effort is by posting an emotionally charged comment which appeals to audience’s biases (“preaching to the choir”).


  • The easiest way to maximize the number of replies is by posting a low quality comment that goes against audience’s biases (“trolling”).


  • Both effects are amplified when the website places comments with most replies or “likes” at the top of the page.


The problem is not restricted to low-brow political forums. The following graph, which shows the average number of comments as a function of an article’s karma, was generated from the Lesswrong data.


The data suggests that the easiest way to maximize the number of replies is to write posts that are disliked by most readers. For instance, articles with the karma of -1 on average generate twice as many comments (20.1±3.4) as articles with the karma of +1 (9.3±0.8).

2. Technical Solution

Enabling constructive discussion between people with different ideologies requires reversing the incentives – people need to be motivated to write posts that sound persuasive to the opposite side rather than to their own supporters.

We suggest addressing this problem that this problem by changing the voting system. In brief, instead of votes from all readers, comment ratings and position on the page should be based on votes from the opposite side only. For example, in the debate on minimum wage, for arguments against minimum wage only the upvotes of minimum wage supporters would be counted and vice versa.

The new voting system can simultaneously achieve several objectives:

·         eliminate incentives for preaching to the choir

·         give posters a more objective feedback on the impact of their contributions, helping them improve their writing style

·     focus readers’ attention on comments most likely to change their minds instead of inciting comments that provoke an irrational defensive reaction.

3. Testing

If you are interested in measuring and improving your persuasive skills and would like to help others to do the same, you are invited to take part in the following experiment:


Step I. Submit Pro or Con arguments on any of the following topics (up to 3 arguments in total):

     Should the government give all parents vouchers for private school tuition?

     Should developed countries increase the number of immigrants they receive?

     Should there be a government mandated minimum wage?


Step II. For each argument you have submitted, rate 15 arguments submitted by others.


Step III.  Participants will be emailed the results of the experiment including:

-         ratings their arguments receive from different reviewer groups (supporters, opponents and neutrals)

-         the list of the most persuasive Pro & Con arguments on each topic (i.e. arguments that received the highest ratings from opposing and neutral groups)

-         rating distribution in each group


Step IV (optional). If interested, sign up for the next round.


The experiment will help us test the effectiveness of the new voting system and develop the best format for its application.





Rationality Reading Group: Part J: Death Spirals

4 Gram_Stone 24 September 2015 02:31AM

This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.

Welcome to the Rationality reading group. This fortnight we discuss Part J: Death Spirals (pp. 409-494)This post summarizes each article of the sequence, linking to the original LessWrong post where available.

J. Death Spirals

100. The Affect Heuristic - Positive and negative emotional impressions exert a greater effect on many decisions than does rational analysis.

101. Evaluability (and Cheap Holiday Shopping) - It's difficult for humans to evaluate an option except in comparison to other options. Poor decisions result when a poor category for comparison is used. Includes an application for cheap gift-shopping.

102. Unbounded Scales, Huge Jury Awards, and Futurism - Without a metric for comparison, estimates of, e.g., what sorts of punitive damages should be awarded, or when some future advance will happen, vary widely simply due to the lack of a scale.

103. The Halo Effect - Positive qualities seem to correlate with each other, whether or not they actually do.

104. Superhero Bias - It is better to risk your life to save 200 people than to save 3. But someone who risks their life to save 3 people is revealing a more altruistic nature than someone risking their life to save 200. And yet comic books are written about heroes who save 200 innocent schoolchildren, and not police officers saving three prostitutes.

105. Mere Messiahs - John Perry, an extropian and a transhumanist, died when the north tower of the World Trade Center fell. He knew he was risking his existence to save other people, and he had hope that he might be able to avoid death, but he still helped them. This takes far more courage than someone who dies, expecting to be rewarded in an afterlife for their virtue.

106. Affective Death Spirals - Human beings can fall into a feedback loop around something that they hold dear. Every situation they consider, they use their great idea to explain. Because their great idea explained this situation, it now gains weight. Therefore, they should use it to explain more situations. This loop can continue, until they believe Belgium controls the US banking system, or that they can use an invisible blue spirit force to locate parking spots.

107. Resist the Happy Death Spiral - You can avoid a Happy Death Spiral by (1) splitting the Great Idea into parts (2) treating every additional detail as burdensome (3) thinking about the specifics of the causal chain instead of the good or bad feelings (4) not rehearsing evidence (5) not adding happiness from claims that "you can't prove are wrong"; but not by (6) refusing to admire anything too much (7) conducting a biased search for negative points until you feel unhappy again (8) forcibly shoving an idea into a safe box.

108. Uncritical Supercriticality - One of the most dangerous mistakes that a human being with human psychology can make, is to begin thinking that any argument against their favorite idea must be wrong, because it is against their favorite idea. Alternatively, they could think that any argument that supports their favorite idea must be right. This failure of reasoning has led to massive amounts of suffering and death in world history.

109. Evaporative Cooling of Group Beliefs - When a cult encounters a blow to their own beliefs (a prediction fails to come true, their leader is caught in a scandal, etc) the cult will often become more fanatical. In the immediate aftermath, the cult members that leave will be the ones who were previously the voice of opposition, skepticism, and moderation. Without those members, the cult will slide further in the direction of fanaticism.

110. When None Dare Urge Restraint - The dark mirror to the happy death spiral is the spiral of hate. When everyone looks good for attacking someone, and anyone who disagrees with any attack must be a sympathizer to the enemy, the results are usually awful. It is too dangerous for there to be anyone in the world that we would prefer to say negative things about, over saying accurate things about.

111. The Robbers Cave Experiment The Robbers Cave Experiment, by Sherif, Harvey, White, Hood, and Sherif (1954/1961), was designed to investigate the causes and remedies of problems between groups. Twenty-two middle school aged boys were divided into two groups and placed in a summer camp. From the first time the groups learned of each other's existence, a brutal rivalry was started. The only way the counselors managed to bring the groups together was by giving the two groups a common enemy. Any resemblance to modern politics is just your imagination.

112. Every Cause Wants to Be a Cult - The genetic fallacy seems like a strange kind of fallacy. The problem is that the original justification for a belief does not always equal the sum of all the evidence that we currently have available. But, on the other hand, it is very easy for people to still believe untruths from a source that they have since rejected.

113. Guardians of the Truth - There is an enormous psychological difference between believing that you absolutely, certainly, have the truth, versus trying to discover the truth. If you believe that you have the truth, and that it must be protected from heretics, torture and murder follow. Alternatively, if you believe that you are close to the truth, but perhaps not there yet, someone who disagrees with you is simply wrong, not a mortal enemy.

114. Guardians of the Gene Pool - It is a common misconception that the Nazis wanted their eugenics program to create a new breed of supermen. In fact, they wanted to breed back to the archetypal Nordic man. They located their ideals in the past, which is a counterintuitive idea for many of us.

115. Guardians of Ayn Rand - Ayn Rand, the leader of the Objectivists, praised reason and rationality. The group she created became a cult. Praising rationality does not provide immunity to the human trend towards cultishness.

116. Two Cult Koans - Two Koans about individuals concerned that they may have joined a cult.

117. Asch's Conformity Experiment - The unanimous agreement of surrounding others can make subjects disbelieve (or at least, fail to report) what's right before their eyes. The addition of just one dissenter is enough to dramatically reduce the rates of improper conformity.

118. On Expressing Your Concerns - A way of breaking the conformity effect in some cases.

119. Lonely Dissent - Joining a revolution does take courage, but it is something that humans can reliably do. It is comparatively more difficult to risk death. But it is more difficult than either of these to be the first person in a rebellion. To be the only one who is saying something different. That doesn't feel like going to school in black. It feels like going to school in a clown suit.

120. Cultish Countercultishness - People often nervously ask, "This isn't a cult, is it?" when encountering a group that thinks something weird. There are many reasons why this question doesn't make sense. For one thing, if you really were a member of a cult, you would not say so. Instead, what you should do when considering whether or not to join a group, is consider the details of the group itself. Is their reasoning sound? Do they do awful things to their members?


This has been a collection of notes on the assigned sequence for this fortnight. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

The next reading will cover Part K: Letting Go (pp. 497-532). The discussion will go live on Wednesday, 7 October 2015, right here on the discussion forum of LessWrong.

Publication on formalizing preference utilitarianism in physical world models

4 Caspar42 22 September 2015 04:46PM

About a year ago I asked for help with a paper on a formalization of preference utilitarianism in cellular automata. The paper has now been published in the Springer journal Synthese and is available here. I wonder what you think about it and if you are interested would like to discuss it with you.

View more: Next