Filter This week

Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Project Hufflepuff

26 Raemon 18 January 2017 06:57PM

(This is a crossposted FB post, so it might read a bit weird)

My goal this year (in particular, my main focus once I arrive in the Bay, but also my focus in NY and online in the meanwhile), is to join and champion the growing cause of people trying to fix some systemic problems in EA and Rationalsphere relating to "lack of Hufflepuff virtue".

I want Hufflepuff Virtue to feel exciting and important, because it is, and I want it to be something that flows naturally into our pursuit of both epistemic integrity, intellectual creativity, and concrete action.

Some concrete examples:

- on the 5 second reflex level, notice when people need help or when things need doing, and do those things.

- have an integrated understanding that being kind to people is *part* of helping them (and you!) to learn more, and have better ideas.

(There are a bunch of ways to be kind to people that do NOT do this, i.e. politely agreeing to disagree. That's not what I'm talking about. We need to hold each other to higher standards but not talk down to people in a fashion that gets in the way of understanding. There are tradeoffs and I'm not sure of the best approach but there's a lot of room for improvement)

- be excited and willing to be the person doing the grunt work to make something happen

- foster a sense that the community encourages people to try new events, actively take personal responsibility to notice and fix community-wide problems that aren't necessarily sexy.

- when starting new projects, try to have mentorship and teamwork built into their ethos from the get-go, rather than hastily tacked on later

I want these sorts of things to come easily to mind when the future people of 2019 think about the rationality community, and have them feel like central examples of the community rather than things that we talk about wanting-more-of.

Infinite Summations: A Rationality Litmus Test

21 shev 20 January 2017 09:31AM

You may have seen that Numberphile video that circulated the social media world a few years ago. It showed the 'astounding' mathematical result:

1+2+3+4+5+… = -1/12

(quote: "the answer to this sum is, remarkably, minus a twelfth")

Then they tell you that this result is used in many areas of physics, and show you a page of a string theory textbook (oooo) that states it as a theorem.

The video caused quite an uproar at the time, since it was many people's first introduction to the rather outrageous idea and they had all sorts of very reasonable objections.

Here's the 'proof' from the video:

First, consider P = 1 - 1 + 1 - 1 + 1…
Clearly the value of P oscillates between 1 and 0 depending on how many terms you get. Numberphile decides that it equals 1/2, because that's halfway in the middle.
Alternatively, consider P+P with the terms interleaved, and check out this quirky arithmetic:
+ 1-1+1…
= 1 + (-1+1) + (1-1) … = 1, so 2P = 1, so P = 1/2
Now consider Q = 1-2+3-4+5…
And write out Q+Q this way:
+ 1 -2+3-4…
= 1-1+1-1+1 = 1/2 = 2Q, so Q = 1/4 
Now consider S = 1+2+3+4+5...
Write S-4S as
- 4      -8 …
=1-2+3-4+5… = Q=1/4
So S-4S=-3S = 1/4, so S=-1/12

How do you feel about that? Probably amused but otherwise not very good, regardless of your level of math proficiency. But in another way it's really convincing - I mean, string theorists use it, by god. And, to quote the video, "these kinds of sums appear all over physics".

So the question is this: when you see a video or hear a proof like this, do you 'believe them'? Even if it's not your field, and not in your area of expertise, do you believe someone who tells you "even though you thought mathematics worked this way, it actually doesn't; it's still totally mystical and insane results are lurking just around the corner if you know where to look"? What if they tell you string theorists use it, and it appears all over physics?

I imagine this is as a sort of rationality litmus test. See how you react to the video or the proof (or remember how you reacted when you initially heard this argument). Is it the 'rational response'? How do you weigh your own intuitions vs a convincing argument from authority plus math that seems to somehow work, if you turn your head a bit?

If you don't believe them, what does that feel like? How confident are you?

(spoilers below)

It's totally true that, as an everyday rationalist (or even as a scientist or mathematician or theorist), there will always be computational conclusions that are out of your reach to verify. You pretty much have to believe theoretical physicists who tell you "the Standard Model of particle physics accurately models reality and predicts basically everything we see at the subatomic scale with unerring accuracy"; you're likely in no position to argue.

But - and this is the point - it's highly unlikely that all of your tools are lies, even if 'experts' say so, and you ought to require extraordinary evidence to be convinced that they are. It's not enough that someone out there can contrive a plausible-sounding argument that you don't know how to refute, if your tools are logically sound and their claims don't fit into that logic.

(On the other hand, if you believe something because you heard it was a good idea from one expert, and then another expert tells you a different idea, take your pick; there's no way to tell. It's the personal experience that makes this example lead to sanity-questioning, and that's where the problem lies.)

In my (non-expert but well-informed) view, the correct response to this argument is to say "no, I don't believe you", and hold your ground. Because the claim made in the video is so absurd that, even if you believe the video is correct and made by experts and the string theory textbook actually says that, you should consider a wide range of other explanations as to "how it could have come to be that people are claiming this" before accepting that addition might work in such an unlikely way.

Not because you know about how infinite sums work better than a physicist or mathematician does, but because you know how mundane addition works just as well as they do, and if a conclusion this shattering to your model comes around -- even to a layperson's model of how addition works, that adding positive numbers to positive numbers results in bigger numbers --, then either "everything is broken" or "I'm going insane" or (and this is by far the theory that Occam's Razor should prefer) "they and I are somehow talking about different things".

That is, the unreasonable mathematical result is because the mathematician or physicist is talking about one "sense" of addition, but it's not the same one that you're using when you do everyday sums or when you apply your intuitions about intuition to everyday life. This is by far the simplest explanation: addition works just how you thought it does, even in your inexpertise; you and the mathematician are just talking past each other somehow, and you don't have to know what way that is to be pretty sure that it's happening. Anyway, there's no reason expert mathematicians can't be amateur communicators, and even that is a much more palatable result than what they're claiming.

(As it happens, my view is that any trained mathematician who claims that 1+2+3+4+5… = -1/12 without qualification is so incredibly confused or poor at communicating or actually just misanthropic that they ought to be, er, sent to a re-education camp.)

So, is this what you came up with? Did your rationality win out in the face of fallacious authority?

(Also, do you agree that I've represented the 'rational approach' to this situation correctly? Give me feedback!)

Postscript: the explanation of the proof

There's no shortage of explanations of this online, and a mountain of them emerged after this video became popular. I'll write out a simple version anyway for the curious.

It turns out that there is a sense in which those summations are valid, but it's not the sense you're using when you perform ordinary addition. It's also true that the summations emerge in physics. It is also true that the validity of these summations is in spite of the rules of "you can't add, subtract, or otherwise deal with infinities, and yes all these sums diverge" that you learn in introductory calculus; it turns out that those rules are also elementary and there are ways around them but you have to be very rigorous to get them right.

An elementary explanation of what happened in the proof is that, in all three infinite sum cases, it is possible to interpret the infinite sum as a more accurate form (but STILL not precise enough to use for regular arithmetic, because infinities are very much not valid, still, we're serious):

S(infinity) = 1+2+3+4+5… ≈ -1/12 + O(infinity)

Where S(n) is a function giving the n'th partial sum of the series, and S(infinity) is an analytic continuation (basically, theoretical extension) of the function to infinity. (The O(infinity) part means "something on the order of infinity")

Point is, that O(infinity) bit hangs around, but doesn't really disrupt math on the finite part, which is why algebraic manipulations still seem to work. (Another cute fact: the curve that fits the partial sum function also non-coincidentally takes the value -1/12 at n=0.)

And it's true that this series always associates with the finite part -1/12; even though there are some manipulations that can get you to other values, there's a list of 'valid' manipulations that constrains it. (Well, there are other kinds of summations that I don't remember that might get different results. But this value is not accidentally associated with this summation.)

And the fact that the series emerges in physics is complicated but amounts to the fact that, in the particular way we've glued math onto physical reality, we've constructed a framework that also doesn't care about the infinity term (it's rejected as "nonphysical"!), and so we get the right answer despite dubious math. But physicists are fine with that, because it seems to be working and they don't know a better way to do it yet.

Thoughts on "Operation Make Less Wrong the single conversational locus", Month 1

13 Raemon 19 January 2017 05:16PM

About a month ago, Anna posted about the Importance of Less Wrong or Another Single Conversational Locus, followed shortly by Sarah Constantin's

There was a week or two of heavy-activity by some old timers. Since there's been a decent array of good posts but not quite as inspiring as the first week was and I don't know whether to think "we just need to try harder" or change tactics in some way.

Some thoughts:
 - I do feel it's been better to quickly be able to see a lot of posts in the community in one place 

 - I don't think the quality of the comments is that good, which is a bit demotivating.
 - on facebook, lots of great conversations happen in a low-friction way, and when someone starts being annoying, the person's who's facebook wall it is has the authority to delete comments with abandon, which I think is helpful.
- I could see the solution being to either continue trying to incentivize better LW comments, or to just have LW be "single locus for big important ideas, but discussion to flesh them out still happen in more casual environments"

 - I'm frustrated that the intellectual projects on Less Wrong are largely silo'd from the Effective Altruism community, which I think could really use them.

 - The Main RSS feed has a lot of subscribers (I think I recall "about 10k"), so having things posted there seems good.
 - I think it's good to NOT have people automatically post things there, since that produced a lot of weird anxiety/tension on "is my post good enough for main? I dunno!"
 - But, there's also not a clear path to get something promoted to Main, or a sense of which things are important enough for Main

 - I notice that I (personally) feel an ugh response to link posts and don't like being taken away from LW when I'm browsing LW. I'm not sure why.

Curious if others have thoughts.

0.999...=1: Another Rationality Litmus Test

12 shev 21 January 2017 02:16AM

People seemed to like my post from yesterday about infinite summations and how to rationally react to a mathematical argument you're not equipped to validate, so here's another in the same vein that highlights a different way your reasoning can go.

(It's probably not quite as juicy of an example as yesterday's, but it is one that I'm equipped to write about today so I figure it's worth it.)

This example is somewhat more widely known and a bit more elementary. I won't be surprised if most people already know the 'solution'. But the point of writing about it is not to explain the math - it's to talk about "how you should feel" about the problem, and how to rationally approach rectifying it with your existing mental model. If you already know the solution, try to pretend or think back to when you didn't. I think it was initially surprising to most people, whenever you learned it.

The claim: that 1 = 0.999... repeating (infinite 9s). (I haven't found an easy way to put a bar over the last 9, so I'm using ellipses throughout.)

The questionable proof:

x = 0.9999...
10x = 9.9999... (everyone knows multiplying by ten moves the decimal over one place)
10x-x = 9.9999... - 0.9999....
9x = 9
x = 1

People's response when they first see this is usually: wait, what? an infinite series of 9s equals 1? no way, they're obviously different.

The litmus test is this: what do you think a rational person should do when confronted with this argument? How do you approach it? Should you accept the seemingly plausible argument, or reject it (as with yesterday's example) as "no way, it's more likely that we're somehow talking about different objects and it's hidden inside the notation"?

Or are there other ways you can proceed to get more information on your own?

One of the things I want to highlight here is related to the nature of mathematics.

I think people have a tendency to think that, if they are not well-trained students of mathematics (at least at the collegiate level), then rigor or precision involving numbers is out of their reach. I think this is definitely not the case: you should not be afraid to attempt to be precise with numbers even if you only know high school algebra, and you should especially not be afraid to demand precision, even if you don't know the correct way to implement it.

Particularly, I'd like to emphasize that mathematics as a mental discipline (as opposed to an academic field), basically consists of "the art of making correct statements about patterns in the world" (where numbers are one of the patterns that appears everywhere you have things you can count, but there are others). This sounds suspiciously similar to rationality - which, as a practice, might be "about winning", but as a mental art is "about being right, and not being wrong, to the best of your ability". More or less. So mathematical thinking and rational thinking are very similar, except that we categorize rationality as being primarily about decisions and real-world things, and mathematics as being primarily about abstract structures and numbers.

In many cases in math, you start with a structure that you don't understand, or even know how to understand, precisely, and start trying to 'tease' precise results out of it. As a layperson you might have the same approach to arguments and statements about elementary numbers and algebraic manipulations, like in the proof above, and you're just as in the right to attempt to find precision in them as a professional mathematician is when they perform the same process on their highly esoteric specialty. You also have the bonus that you can go look for the right answer to see how you did, afterwards.

All this to say, I think any rational person should be willing to 'go under the hood' one or two levels when they see a proof like this. It doesn't have to be rigorous. You just need to do some poking around if you see something surprising to your intuition. Insights are readily available if you look, and you'll be a stronger rational thinker if you do.

There are a few angles that I think a rational but untrained-in-math person can think to take straightaway.

You're shown that 0.9999.. = 1. If this is a surprise, that means your model of what these terms mean doesn't jive with how they behave in relation to each other, or that the proof was fallacious. You can immediately conclude that it's either:

a) true without qualification, in which case your mental model of what the symbols "0.999...", "=", or "1" mean is suspect
b) true in a sense, but it's hidden behind a deceptive argument (like in yesterday's post), and even if the sense is more technical and possibly beyond your intuition, it should be possible to verify if it exists -- either through careful inspection or turning to a more expert source or just verifying that options (a) and (c) don't work
c) false, in which case there should be a logical inconsistency in the proof, though it's not necessarily true that you're equipped to find it

Moreover, (a) is probably the default, by Occam's Razor. It's more likely that a seemingly correct argument is correct than that there is a more complicated explanation, such as (b), "there are mysterious forces at work here", or (c), "this correct-seeming argument is actually wrong", without other reasons to disbelieve it. The only evidence against it is basically that it's surprising. But how do you test (a)?

Note there are plenty of other 'math paradoxes' that fall under (c) instead: for example, those ones that secretly divide by 0 and derive nonsense afterwards. (a=b ; a^2=ab ; a^2-b^2=ab-b^2 ; (a+b)(a-b)=b(a-b) ; a+b=b ; 2a = a ; 2=1). But the difference is that their conclusions are obviously false, whereas this one is only surprising and counterintuitive. 1=2 involves two concepts we know very well. 0.999...=1 involves one we know well, but one that likely has a feeling of sketchiness about it; we're not used to having to think carefully about what a construction like 0.999... means, and we should immediately realize that when doubting the conclusion.

Here are a few angles you can take to testing (a):

1. The "make it more precise" approach: Drill down into what you mean by each symbol. In particular, it seems very likely that the mystery is hiding inside what "0.999..." means, because that's the one that it's seems complicated and liable to be misunderstood.

What does 0.999... infinitely repeating actually mean? It seems like it's "the limit of the series of finite numbers of 9s", if you know what a limit is. It seems like it might be "the number larger than every number of the form 0.abcd..., consisting of infinitely many digits (optionally, all 0 after a point)". That's awfully similar to 1, also, though.

A very good question is "what kinds of objects are these, anyway?" The rules of arithmetic generally assume we're working with real numbers, and the proof seems to hold for those in our customary ruleset. So what's the 'true' definition of a real number?

Well, we can look it up, and find that it's fairly complicated and involves identifying reals with sets of rationals in one or another specific way. If you can parse the definitions, you'll find that one definition is "a real number is a Dedekind cut of the rational numbers", that is, "a partition of the rational numbers into two sets A and B such that A is nonempty and closed downwards, B is nonempty and closed upwards, and A contains no greatest element", and from that it Can Be Seen (tm) that the two symbols "1" and "0.999..." both refer to the same partition of Q, and therefore are equivalent as real numbers.

2. The "functional" approach: if 0.999...=1, then it should behave the same as 1 in all circumstances. Is that something we can verify? Does it survive obvious tests, like other arguments of the same form?

Does 0.999.. always act the same was that 1 does? It appears to act the same in the algebraic manipulations that we saw, of course. What are some other things to try?
We might think to try: 1-0.9999... = 1-1 = 0, but also seems to equal 0.000....0001, if that's valid: an 'infinite decimal that ends in a 1'. So those must be equivalent also, if that's a valid concept. We can't find anything to multiply 0.000...0001 by to 'move the decimal' all the way into the finite decimal positions, seemingly, because we would have to multiply by infinity and that wouldn't prove anything because we already know such operations are suspect.
I, at least, cannot see any reason when doing math that the two shouldn't be the same. It's not proof, but it's evidence that the conclusion is probably OK.

3. The "argument from contradiction" approach: what would be true if the claim were false?

If 0.999... isn't equal to 1, what does that entail? Well, let a=0.999... and b=1. We can, according to our familiar rules of algebra, construct the number halfway between them: (a+b)/2, alternatively written as a+(b-a)/2. But our intuition for decimals doesn't seem to let there be a number between the two. What would it be -- 0.999...9995? "capping" the decimal with a 5? (yes, we capped a decimal with a 1 earlier, but we didn't know if that was valid either). What does that mean imply 0.999 - 0.999...9995 should be? 0.000...0004? Does that equal 4*0.000...0001? None of this math seems to be working either.
As long as we're not being rigorous, this isn't "proof", but it is a compelling reason to think the conclusion might be right after all. If it's not, we get into things that seem considerably more insane.

4. The "reexamine your surprise" approach: how bad is it if this is true? Does that cause me to doubt other beliefs? Or is it actually just as easy to believe it's true as not? Perhaps I am just biased against the conclusion for aesthetic reasons?

How bad is it if 0.999...=1? Well, it's not like yesterday's example with 1+2+3+4+5... = -1/12. It doesn't utterly defy our intuition for what arithmetic is. It says that one object we never use is equivalent to another object we're familiar with. I think that, since we probably have no reason to strongly believe anything about what an infinite sum of 9/10 + 9/100 + 9/1000 + ... should equal, it's perfectly palatable that it might equal 1, despite our initial reservations.

(I'm sure there are other approaches too, but this got long with just four so I stopped looking. In real life, if you're not interested in the details there's always the very legitimate fifth approach of "see what the experts say and don't worry about it", also. I can't fault you for just not caring.)

By the way, the conclusion that 0.999...=1 is completely, unequivocally true in the real numbers, basically for the Dedekind cut reason given above, which is the commonly accepted structure we are using when we write out mathematics if none is indicated. It is possible to find structures where it's not true, but you probably wouldn't write 0.999... in those structures anyway. It's not like 1+2+3+4+5...=-1/12, for which claiming truth is wildly inaccurate and outright deceptive.

But note that none of these approaches are out of reach to a careful thinker, even if they're not a mathematician. Or even mathematically-inclined.

So it's not required that you have the finesse to work out detailed mathematical arguments -- certainly the definitions of real numbers are too precise and technical for the average layperson to deal with. The question here is whether you take math statements at face value, or disbelieve them automatically (you would have done fine yesterday!), or pick the more rational choice -- breaking them down and looking for low-hanging ways to convince yourself one way or the other.

When you read a surprising argument like the 0.999...=1 one, does it occur to you to break down ways of inspecting it further? To look for contradictions, functional equivalences, second-guess your surprise as being a run-of-the-mill cognitive bias, or seek out precision to realign your intuition with the apparent surprise in 'reality'?

I think it should. Though I am pretty biased because I enjoy math and study it for fun. But -- if you subconsciously treat math as something that other people do and you just believe what they say at the end of the day, why? Does this cause you to neglect to rationally analyze mathematical conclusions, at whatever level you might be comfortable with? If so, I'll bet this isn't optimal and it's worth isolating in your mind and looking more closely at. Precise mathematical argument is essentially just rationalism applied to numbers, after all. Well - plus a lot of jargon.

(Do you think I represented the math or the rational arguments correctly? is my philosophy legitimate? Feedback much appreciated!)

In Defense of the Obvious

10 lifelonglearner 21 January 2017 03:06AM

        [Cross-posed from blog]     

My brain does this thing where it shuts off when I experience some warning signs.  A lot of these have to do with my identity or personal beliefs, which go off when I believe my tribe is being attacked.  I don’t think I’ll go as far as to say that all brain shutoffs are bad (which feels like a Cleaving Statement), but there’s another type of warning sign I’ve recently noticed: Dismissing The Obvious.

              Just because a statement is tautological or obvious does not mean it is useless.

              Here are some examples:

              “If you want to get all of your tasks done everyday, be sure to make a to-do list and a schedule!  That way, you can keep track of what you’ve done/need to do!”

              My brain’s response: <doesn’t even quite register the points> “Whatever, this doesn’t sound interesting.” <pattern-matches it as “boring advice stuff" that "isn't groundbreaking”>.

              In actuality: The advice still stands, even if it’s self-evident and obvious.  People who make to-do lists have a better idea of what they need to get done.  It’s still useful to know, if you care about getting stuff done!

              “If you want to exercise more, you should probably exercise more.  Then, you’d become the type of person who exercises more, and then you’d exercise more.”


              “If you have more energy, then you’re more energetic, which means you have more energy to do things.”

              My brain’s response: “Those conclusions follow each other, by definition!  There’s nothing here that I don’t know!” <scoffs>

              In actuality: Just because two things are logically equivalent doesn’t mean there’s nothing to learn.  In my head, the nodes for “energetic” and “energy = increased doing-stuff capacity” are not the same nodes.  Consequently, bringing the two together can still link previously unconnected ideas, or allow you to see the connection, which is still beneficial!

              What my brain is doing here is tuning out information simply because “it sounds like the kind of obvious information that everyone knows”.  I’m not actually considering the point.  More than that, obvious tips tend to be effective for a large group of people.  That’s why they’re obvious or commonly seen.  The fact that I see some advice show up in lots of places should even perhaps be increased reason for me to try it out.  

              A related problem is when smart, experienced people give me advice that my brain pattern-matches to “boring advice”.  When their advice sounds so “mundane”, it can be easy to forget that the “boring advice” is what their brain thought was the best thing to give me.  They tried to distill all of their wisdom into a simple adage, I should probably at least try it out.

              In fact, I suspect that my brain’s aversion to Obvious/Boring Advice may be because I’ve become acclimated to normal self-improvement ideas.  I’m stuck on the hedonic treadmill of insight porn, or as someone put it, I’m a rationality junkie.

              Overwhelmed by the sort of ideas found in insight porn, it looks like I actually crave more and more obscure forms of insight.  And it’s this type of dangerous conditioning that I think causes people to dismiss normal helpful ideas— simply because they’re not paradigm-crushing, mind-blowing, or stimulating enough.

              So, in an effort to fight back, I’m trying to get myself hooked on the meta-contrarian idea that, despite my addiction to obscure ideas, the Obvious is still usually what has the most-leverage.  Often, the best thing to do is merely the obvious one.  Some things in life are simple.

              Take that, hedonic treadmill.



Projects-in-Progress Thread

9 lifelonglearner 21 January 2017 05:11AM

From Raemon's Project Hufflepuff thread, I thought it might be helpful for there to be a periodical check-in thread where people can post about their projects in the spirit of cooperation. This is the first one. If it goes well, maybe we can make it a monthly thing.

If you're looking for a quick proofread, trial users, a reference to a person in a specific field, or something else related to a project-in-progress this is the place to put it. Otherwise, if you think you're working on something cool the community might like to hear about, I guess it goes here too.

Metarationality Repository

8 JacobLiechty 22 January 2017 12:47AM

In the past couple years, if you've poked your head into Rationality-sphere discussions you may have heard tell of a mental framework which has eluded clear boundaries but has nonetheless raised some interested eyebrows and has begun to solidify into a coherent conversation point. This system of thought has been variously referred to as "Postrationality" or "Metarationality" or "Keganism" or "Meaningness."  Briefly put, Metarationality is a set of related Rationality concepts that place less emphasis on idealized Less Wrong style Rationality and more on one's place in a developmental psychology pathway. This description is imperfect in that there is not yet an agreed-upon definition of Metarationality; it currently stands only as a fuzzy set of relationships between certain specific writings emerging from the traditional Rationality space.

In the spirit of Repositories, myself and a few other LW-ers have compiled some source materials that fall inside or adjacent to this memespace. If you are aware of any conspicuously missing links, posts, or materials, please post a list in a single comment and I'll add to the lists! I'm aiming to keep this repository as uneditorial as possible. But since there is some prior confusions as to the correctness, coherence, usefulness, or even existence of Metarationality as a category, I'll define my request for additional material as "articles that do not substantially alter the Principle Component Analysis of whatever source materials are currently present.

Primary Texts

  • In Over Our Heads - Robert Kegan. Introduction to the 5-Stage model of psychological development. The "Thinking: Fast and Slow" of Metarationality, and spiritual sequel to his earlier work, The Evolving Self.
  • Metaphors We Live By - Mark Johnson. A theory of language and the mind, claimed by many as substantially improving their practical ability to interact with both the world and writing.
  • Impro: Improvisation and the Theatre - Keith Johnstone. A meandering and beautiful if not philosophically rigorous description of life in education and theater, and for many readers proof that logic is not the only thing that induces mental updates.
  • Ritual and its Consequences - Adam Seligman et. al. An anthropological work describing the role or ritual and culture in shaping attitudes, action, and beliefs on a societal scale. The subtitle An Essay on the Limits of Sincerity closely matches metarationalist themes.
Primary Blogs
  • Meaningness - David Chapman. Having originally written Rationality-adjacently, Chapman now encompasses a broader ranging and well internally-referenced collection of useful metarationalist concepts, including the very coining of "Metarationality."
  • Ribbonfarm - A group blog from Venkatesh Rao and Sarah Perry, self-described as "Longform Insight Porn" and caring not for its relationship or non-relationship to Rationality as a category.
Secondary and Metarationality-Adjacent Blogs

Individual Introductory Posts

Weird Twitter Accounts

My main intention at this juncture is to encourage and coordinate understanding of the social phenomenon and thought systems entailed by the vast network spanning from the above links. There is a lot to be said and argued about the usefulness or correctness of even using terms such as Metarationality, such as arguments that it is only a subset of Rationalist thought, or that terms like Postrationality are meant to signal ingroup superiority to Rationalism. There is plenty of ink to be spilled over these questions, but we'll get there in due time.

Lets start with charitability, understanding, goodwill, and empiricism, and work from there.

Thanks to /u/agilecaveman for their continued help and brilliance.

[Link] Could a Neuroscientist Understand a Microprocessor?

7 Gunnar_Zarncke 20 January 2017 12:40PM

Did EDT get it right all along? Introducing yet another medical Newcomb problem

5 Johannes_Treutlein 24 January 2017 11:43AM

One of the main arguments given against Evidential Decision Theory (EDT) is that it would “one-box” in medical Newcomb problems. Whether this is the winning action has been a hotly debated issue on LessWrong. A majority, including experts on the area such as Eliezer Yudkowsky and Wei Dai, seem to think that one should two-box (See e.g. Yudkowsky 2010, p.67). Others have tried to argue in favor of EDT by claiming that the winning action would be to one-box, or by offering reasons why EDT would in some cases two-box after all. In this blog post, I want to argue that EDT gets it right: one-boxing is the correct action in medical Newcomb problems. I introduce a new thought experiment, the Coin Flip Creation problem, in which I believe the winning move is to one-box. This new problem is structurally similar to other medical Newcomb problems such as the Smoking Lesion, though it might elicit the intuition to one-box even in people who would two-box in some of the other problems. I discuss both how EDT and other decision theories would reason in the problem and why people’s intuitions might diverge in different formulations of medical Newcomb problems.

Two kinds of Newcomblike problems

There are two different kinds of Newcomblike problems. In Newcomb’s original paradox, both EDT and Logical Decision Theories (LDT), such as Timeless Decision Theory (TDT) would one-box and therefore, unlike CDT, win $1 million. In medical Newcomb problems, EDT’s and LDT’s decisions diverge. This is because in the latter, a (physical) causal node that isn’t itself a decision algorithm influences both the current world state and our decisions – resulting in a correlation between action and environment but, unlike the original Newcomb, no “logical” causation.

It’s often unclear exactly how a causal node can exert influence on our decisions. Does it change our decision theory, utility function, or the information available to us? In the case of the Smoking Lesion problem, it seems plausible that it’s our utility function that is being influenced. But then it seems that as soon as we observe our utility function (“notice a tickle”; see Eells 1982), we lose “evidential power” (Almond 2010a, p.39), i.e. there’s nothing new to learn about our health by acting a certain way if we already know our utility function. In any case, as long as we don’t know and therefore still have the evidential power, I believe we should use it.

The Coin Flip Creation Problem is an adaption of Caspar Oesterheld’s “Two-Boxing Gene” problem and, like the the latter, attempts to take Newcomb’s original problem and make it into a medical Newcomb problem, triggering the intuition that we should one-box. In Oesterheld’s Two-Boxing Gene, it’s stated that a certain gene correlates with our decision to one-box or two-box in Newcomb’s problem, and that Omega, instead of simulating our decision algorithm, just looks at this gene.

Unfortunately, it’s not specified how the correlation between two-boxing and the gene arises, casting doubt on whether it’s a medical Newcomb problem at all, and whether other decision algorithms would disagree with one-boxing. Wei Dai argues that in the Two-Boxing Gene, if Omega conducts a study to find out which genes correlate with which decision algorithm, then Updateless Decision Theory (UDT) could just commit to one-boxing and thereby determine that all the genes UDT agents have will always correlate with one-boxing. So in some sense, UDT’s genes will still indirectly constitute a “simulation” of UDT’s algorithm, and there is a logical influence between the decision to one-box and Omega’s decision to put $1 million in box A. Similar considerations could apply for other LDTs.

The Coin Flip Creation problem is intended as an example of a problem in which EDT would give the right answer, but all causal and logical decision theories would fail. It works explicitly through a causal influence on the decision theory itself, thus reducing ambivalence about the origin of the correlation.

The Coin Flip Creation problem

One day, while pondering the merits and demerits of different acausal decision theories, you’re visited by Omega, a being assumed to possess flawless powers of prediction and absolute trustworthiness. You’re presented with Newcomb’s paradox, but with one additional caveat: Omega informs you that you weren’t born like a normal human being, but were instead created by Omega. On the day you were born, Omega flipped a coin: If it came up heads, Omega created you in such a way that you would one-box when presented with the Coin Flip Creation problem, and it put $1 million in box A. If the coin came up tails, you were created such that you’d two-box, and Omega didn’t put any money in box A. We don’t know how Omega made sure what your decision would be. For all we know, it may have inserted either CDT or EDT into your source code, or even just added one hard-coded decision rule on top of your messy human brain. Do you choose both boxes, or only box A?

It seems like EDT gets it right: one-boxing is the winning action here. There’s a correlation between our decision to one-box, the coin flip, and Omega’s decision to put money in box A. Conditional on us one-boxing, the probability that there is money in box A increases, and we “receive the good news” – that is, we discover that the coin must have come up heads, and we thus get the million dollars. In fact, we can be absolutely certain of the better outcome if we one-box. However, the problem persists if the correlation between our actions and the content of box A isn’t perfect. As long as the correlation is high enough, it is better to one-box.

Nevertheless, neither causal nor logical counterfactuals seem to imply that we can determine whether there is money in box A. The coin flip isn’t a decision algorithm itself, so we can’t determine its outcome. The logical uncertainty about our own decision output doesn’t seem to coincide with the empirical uncertainty about the outcome of the coin flip. In absence of a causal or logical link between their decision and the content of box A, CDT and TDT would two-box.

Updateless Decision Theory

As far as I understand, UDT would come to a similar conclusion. AlephNeil writes in a post about UDT:

In the Smoking Lesion problem, the presence of a 'lesion' is somehow supposed to cause Player's to choose to smoke (without altering their utility function), which can only mean that in some sense the Player's source code is 'partially written' before the Player can exercise any control over it. However, UDT wants to 'wipe the slate clean' and delete whatever half-written nonsense is there before deciding what code to write.

Ultimately this means that when UDT encounters the Smoking Lesion, it simply throws away the supposed correlation between the lesion and the decision and acts as though that were never a part of the problem.

This approach seems wrong to me. If we use an algorithm that changes our own source code, then this change, too, has been physically determined and can therefore correlate with events that aren’t copies of our own decision algorithm. If UDT reasons as though it could just rewrite its own source code and discard the correlation with the coin flip altogether, then UDT two-boxes and thus by definition ends up in the world where there is no money in box A.

Note that updatelessnes seemingly makes no difference in this problem, since it involves no a priori decision: Before the coin flip, there’s a 50% chance of becoming either a one-boxing or a two-boxing agent. In any case, we can’t do anything about the coin flip, and therefore also can’t influence whether box A contains any money.

I am uncertain how UDT works, though, and would be curious about others people’s thoughts. Maybe UDT reasons that by one-boxing, it becomes a decision theory of the sort that would never be installed into an agent in a tails world, thus rendering impossible all hypothetical tails worlds with UDT agents in them. But if so, why wouldn’t UDT “one-box” in the Smoking Lesion? As far as the thought experiments are specified, the causal connection between coin flip and two-boxing in the Coin Flip Creation appears to be no different from the connection between gene and smoking in the Smoking Lesion.

More adaptations and different formalizations of LDTs exist, e.g. Proof-Based Decision Theory. I could very well imagine that some of those might one-box in the thought experiment I presented. If so, then I’m once again curious as to where the benefits of such decision theories lie in comparison to plain EDT (aside from updatelessnes – see Concluding thoughts).

Coin Flip Creation, Version 2

Let’s assume UDT would two-box in the Coin Flip Creation. We could alter our thought experiment a bit so that UDT would probably one-box after all:

The situation is identical to the Coin Flip Creation, with one key difference: After Omega flips the coin and creates you with the altered decision algorithm, it actually simulates your decision, just as in Newcomb’s original paradox. Only after Omega has determined your decision via simulation does it decide whether to put money in box A, conditional on your decision. Do you choose both boxes, or only box A?

Here is a causal graph for the first and second version of the Coin Flip Creation problem. In the first version, a coin flip determines whether there is money in box A. In the second one, a simulation of your decision algorithm decides:

Since in Version 2, there’s a simulation involved, UDT would probably one-box. I find this to be a curious conclusion. The situation remains exactly the same – we can rule out any changes in the correlation between our decision and our payoff. It seems confusing to me, then, that the optimal decision should be a different one.

Copy-altruism and multi-worlds

The Coin Flip Creation problem assumes a single world and an egoistic agent. In the following, I want to include a short discussion of how the Coin Flip Creation would play out in a multi-world environment.

Suppose Omega’s coin is based on a quantum number generator and produces 50% heads worlds and 50% tails worlds. If we’re copy-egoists, EDT still recommends to one-box, since doing so would reveal to us that we’re in one of the branches in which the coin came up heads. If we’re copy-altruists, then in practice, we’d probably care a bit less about copies whose decision algorithms have been tampered with, since they would make less effective use of the resources they gain than we ourselves would (i.e. their decision algorithm sometimes behaves differently). But in theory, if we care about all the copies equally, we should be indifferent with respect to one-boxing or two-boxing, since there will always be 50% of us in either of the worlds no matter what we do. The two groups always take the opposite action. The only thing we can change is whether our own copy belongs to the tails or the heads group.

To summarize, UDT and EDT would both be indifferent in the altruistic multi-world case, but UDT would (presumably) two-box, and EDT would one-box, in both the copy-egoistic multi-worlds and in the single-world case.

“But I don’t have a choice”

There seems to be an especially strong intuition of “absence of free will” inherent to the Coin Flip Creation problem. When presented with the problem, many respond that if someone had created their source code, they didn’t have any choice to begin with. But that’s the exact situation in which we all find ourselves at all times! Our decision architecture and choices are determined by physics, just like a hypothetical AI’s source code, and all of our choices will thus be determined by our “creator.” When we’re confronted with the two boxes, we know that our decisions are predetermined, just like every word of this blogpost has been predetermined. But that knowledge alone won’t help us make any decision. As far as I’m aware, even an agent with complete knowledge of its own source code would have to treat its own decision outputs as uncertain, or it would fail to implement a decision algorithm that takes counterfactuals into account.

Note that our decision in the Coin Flip Creation is also no less determined than in Newcomb’s paradox. In both cases, the prediction has been made, and physics will guide our thoughts and our decision in a deterministic and predictable manner. Nevertheless, we can still assume that we have a choice until we make our decision, at which point we merely “find out” what has been our destiny all along.

Concluding thoughts

I hope that the Coin Flip Creation motivates some people to reconsider EDT’s answers in Newcomblike problems. A thought experiment somewhat similar to the Coin Flip Creation can be found in Arif Ahmed 2014.

Of course, the particular setup of the Coin Flip Creation means it isn’t directly relevant to the question of which decision theory we should program into an AI. We obviously wouldn’t flip a coin before creating an AI. Also, the situation doesn’t really look like a decision problem from the outside; an impartial observer would just see Omega forcing you to pick either A or B. Still, the example demonstrates that from the inside view, evidence from the actions we take can help us achieve our goals better. Why shouldn’t we use this information? And if evidential knowledge can help us, why shouldn’t we allow a future AI to take it into account? In any case, I’m not overly confident in my analysis and would be glad to have any mistakes pointed out to me.

Medical Newcomb is also not the only class of problems that challenge EDT. Evidential blackmail is an example of a different problem, wherein giving the agent access to specific compromising information is used to extract money from EDT agents. The problem attacks EDT from a different angle, though: namely by exploiting it’s lack of updatelessnes, similar to the challenges in Transparent Newcomb, Parfit’s Hitchhiker, Counterfactual Mugging, and the Absent-Minded Driver. I plan to address questions related to updatelessnes, e.g. whether it makes sense to give in to evidential blackmail if you already have access to the information and haven’t precommitted not to give in, at a later point.

[Link] Putanumonit - Bayesian inference vs. null hypothesis testing

5 Jacobian 22 January 2017 02:31PM

[Link] Willpower duality (a very short essay)

5 Gyrodiot 20 January 2017 09:56AM

What would you like to see posts about?

5 shev 19 January 2017 11:39PM

I've just come back from the latest post on revitalizing LW as a conversational locus in the larger Rational-Sphere community and I'm personally still very into the idea. This post is directed at you if you're also into the idea. If you're not, that's okay; I'd still like to give it a try.

A number of people in the comments mentioned that the Discussion forum mostly gets Link posts, these days, and that those aren't particularly rewarding. But there's also not a lot of people investing time in making quality text posts; certainly nothing like the 'old days'.

This also means that the volume of text posts is low enough that writing one (to me) feels like speaking up in a quiet room -- sort of embarrassingly ostentatious, amplified by the fact that without an 'ongoing conversation' it's hard to know what would be a good idea to speak up about. Some things aren't socially acceptable here (politics, social justice?); some things feel like they've been done so many times that there's not much useful to say (It feels hard to have anything novel to say about, say, increasing one's productivity, without some serious research.) 

(I know the answer is probably 'post about anything you want', but it feels much easier to actually do that if there's some guidance or requests.)

So, here's the question: what would you like to see posts about?

I'm personally probably equipped to write about ideas in math, physics, and computer science, so if there are requests in those areas I might be able to help (I have some ideas that I'm stewing, also). I'm not sure what math level to write at, though, since there's no recent history of mathematically technical posts. Is it better to target "people who probably took some math in college but always wished they knew more?" or better to just be technical and risk missing lots of people?

My personal requests:

1. I really value surveys of subjects or subfields. They provide a lot of knowledge and understanding for little time invested, as a reader, and I suspect that as overviews are relatively easy to create as a writer since they don't have to go deep into details. Since they explain existing ideas instead of introduce new ones they're easier and less stressful to get right. If you have a subject you feel like you broadly understand the landscape of, I'd encourage you to write out a quick picture of it.

For instance, u/JacobLiechty posted about "Keganism" in the thread I linked at the top of the post, and I don't know what that is but it sounds interconnected to many other things. But in many cases I can only learn so much by *going and reading the relevant material*, especially on philosophical ideas. What's more important is how it fits into ongoing conversations, or political groups, or social groups, or whatever. There's no efficient way for me to learn to understand the landscape of discussion around a concept that compares to having someone just explain it.
(I'll probably volunteer to do this in the near future for a couple of fields I pay attention to.)

It's also (in my opinion) *totally okay* to do a mediocre job with these, especially if others can help fill in the gaps in the comments. Much better to try. A mostly-correct survey is still super useful compared to none at all. They don't have to be just academic subjects, either. I found u/gjm's explanation of what 'postrationalism' refers to in the aforementioned thread very useful, because it put a lot of mentions of the subject into a framework that I didn't have in place already -- and that was just describing a social phenomenon in the blog-sphere.

2. I've seen expressed by others a desire to see more material about instrumental rationality, that is, implementing rationality 'IRL' in order to achieve goals. These can be general mindsets or ways of looking at the world, or in-the-moment techniques you can exercise (and ideally, practice). (Example) If you've got personal anecdotes about successes (or failures) at implementing rational decision-making in real life, I'm certain that we'd like to hear about them.

[Link] Dodging a bullet: "the price of insufficient medical vigilance can be very high."

5 James_Miller 18 January 2017 04:11AM

[Link] Life Extension Possibilities

4 sarahconstantin 24 January 2017 01:54AM

[Link] Why We Should Abandon Weak Arguments (Religious)

4 Bound_up 20 January 2017 10:30PM

[Link] The Quaker and the Parselmouth

4 Vaniver 20 January 2017 09:24PM

[Link] Universal Hate

4 gworley 18 January 2017 06:32PM

First impressions...

3 ArisC 24 January 2017 03:14PM

... of LW: a while ago, a former boss and friend of mine said that rationality is irrational because you never have sufficient computational power to evaluate everything rationally. I thought he was missing the point - but after two posts on LW, I am inclined to agree with him.

It's kind of funny - every post gets broken down into its tiniest constituents, and these get overanalysed and then people go on tangents only marginally relevant to the intent of the original article.

This would be fine if the original questions of the post were answered; but when I asked for metrics to evaluate a presidency, few people actually provided any - most started debating the validity of metrics, and one subthread went off to discuss the appropriateness of the term "gender equality".

I am new here, and I don't want to be overly critical of a culture I do not yet understand. But I just want to point out - rationality is a great tool to solve problems; if it becomes overly abstract, it kind of misses its point I think.

Strategic Thinking: Paradigm Selection

3 katydee 24 January 2017 06:24AM

Perhaps the most important concept in strategy is the importance of operating within the right paradigm. It is extremely important to orient towards the right style or the right doctrine before you begin implementing your plan - that could be "no known doctrine, we'll have to improvise", but if it is you need to know that! If you choose the wrong basic procedure or style, you will end up refining a plan or method that ultimately can't get you to where you want to, and you will likely find it difficult to escape.

This is one of the Big Deep Concepts that seem to crop up all over the place. A few examples:

  • In software development, one form of this error is known as "premature optimization," where you focus on optimizing existing processes before you consider whether those processes are really what the final version of your system needs. If those processes end up getting cut, you've wasted a bunch of time; if you end up avoiding "wasting work" by keeping these processes, the sunk cost fallacy may have blocked you from implementing superior architecture.
  • In the military, a common mistake of this type leads to "fighting the last war" - the tendency of military planners and weapons designers to create strategies and weapon systems that would be optimal for fighting a repeat of the previous big war, only to find that paradigm shifts have rendered these methods obsolete. For instance, many tanks used early in World War II had been designed based on the trench warfare conditions of World War I and proved extremely ineffective in the more mobile style of warfare that actually developed. 
  • In competitive gaming, this explains what David Sirlin calls "scrubs" - players who play by their own made-up rules rather than the true ones, and thus find themselves unprepared to play against people without the same constraints. It isn't that the scrub is a fundamentally bad or incompetent player - it's just that they've chosen the wrong paradigm, one that greatly limits their ability when they come into contact with the real world.

This same limitation is present in almost every field that I have seen, and considering it is critical. Before you begin investing heavily in a project, you should ask yourself whether this is really the right paradigm to accomplish your goals. Overinvesting in the wrong paradigm has a doubly pernicious effect - not only are your immediate efforts not as effective as they could be, but it also renders you especially vulnerable to the sunk cost fallacy. Keep in mind that even those who are aware of the sunk cost fallacy are not immune to it!

Therefore, when making big decisions, don't just jump into the first paradigm that presents itself, or even the one that seems to make the most sense on initial reflection. Instead, realy truly consider whether this approach is the best one to get you what you want. Look at the goal that you're aiming for, and consider whether there are other ways to achieve it that might be more effective, less expensive, or both.

Here are some sample situations that can be considered paradigm-selection problems:

  • Do you really need to go and get a CS degree in order to become a computer programmer, or will a bootcamp get you started faster and cheaper?
  • Does your organization's restructuring plan really hit the core problems, or is it merely addressing the most obvious surface-level issues?
  • Will aircraft carrier-centric naval tactics be effective in a future large-scale conventional war, or is the aircraft carrier the modern equivalent of the battleship in WW2?

I don't necessarily know the answers to all these questions - note that only one is even framed as a clear choice between two options, and there are obviously other options available even in that case - but I do know that they're questions worth asking! When it comes time to make big decisions, evaluating what paradigms are available and whether the one you've chosen is the right one for the job can be critical.

Evaluating Moral Theories

3 ArisC 23 January 2017 05:04AM

I would like to use my first post to expand on a framework I introduced in the Welcome thread for evaluating moral theories, and to request your feedback.

This thesis rests on the fact that a moral theory is a tool for helping us make choices. Starting from this premise, I believe that a moral theory needs to meet three criteria for it to be acceptable:

a) Its comprising principles must be non-contradictory. I think this is pretty self evident: if a theory consists of a number of principles that contradict each other, there will be situations where the theory will suggest contradictory actions - hence failing its purpose as a tool to enable choice making.

b) Its comprising principles must be non-arbitrary as far as possible. What I mean by this is that the principles must be derived logically from facts on which everyone agrees. Otherwise, if a moral theory rests on an arbitrary and subjective principle, the theory's advocates will never be able to convince people who do not share that principle of their theory's validity.

c) If the principles of the moral theory are taken to their logical conclusion, they must not lead to a society that the theory's proponents themselves would consider dystopian.

Note that my premise (i.e. that a moral theory is supposed to help us make choices) necessitates that the theory is not vague. So saying that a utilitarian system, using some magical measurement of utility, is a good moral theory is pointless in my view.

However, I want to draw a distinction between morality at the social level and morality at the personal level. The former refers to a moral system whose proponents believe should apply to the whole world; the latter, to the principles by which people live their private lives. The three criteria I listed should only be used to evaluate morality at the social level: if you want to impose your principles over every single human, you'd better make sure they are non-contradictory, acceptable by everyone and won't mess up the world.

Morality at the personal level is different: if you are using your principles to determine your actions only, it's fine if these principles are arbitrary. If lying makes you feel uncomfortable, I think it's fair enough for you to value honesty as a principle, even if you cannot provide a very rational justification.

Finally, one final comment: I believe there are some moral issues which cause disagreement because of the fundamental inability of our language to define certain concepts. For instance, the whole debate on abortion comes down to the definition of life - and since we lack one, I don't think we can ever rationally settle that debate.


Now I also have a question for whomever is reading this: the only theory I can think of that meets all three criteria is libertarianism:

a) It only has one principle - do not do anything that infringes on other people's liberty - so it's inherently consistent.

b) The fact on which everyone has to agree: we have no proof of some sort of moral authority, hence any moral command is arbitrary. In the absence of such moral authority, no-one has the right to impose their own morality on others.

c) Though libertarianism may lead to meanness - e.g. inability to condemn people for lack of kindness or charity - it's not dystopian by my view.

My question is - are there other theories that would meet all three criteria? (I think total anarchy, including the lack of condemnation of violence, could meet the first two criteria, but I think few would argue it meets the third one).

Optimizing Styles

3 TheSinceriousOne 20 January 2017 01:48AM

(Cross-Posted from my blog.)

You know roughly what a fighting style is, right? A set of heuristics, skills, patterns made rote for trying to steer a fight into the places where your skills are useful, means of categorizing things to get a subset of the vast overload of information available to you to make the decisions you need, tendencies to prioritize certain kinds of opportunities, that fit together. Fighting isn't the only optimization problem where you see "styles" like this. Some of them are general enough that you can see them across many domains.

Here are some examples:

Just as fighting styles are distinct from why you would fight, optimizing styles are distinct from what you value.

In limited optimization domains like games, there is known to be a one true style. The style that is everything. The null style. Raw "what is available and how can I exploit it", with no preferred way for the game to play out. Like Scathach's fighting style.

If you know probability and decision theory, you'll know there is a one true style for optimization in general too. All the other ways are fragments of it, and they derive their power from the degree to which they approximate it.

Don't think this means it is irrational to favor an optimization style besides the null style. The ideal agent, may use the null style, but the ideal agent doesn't have skill or non-skill at things. As a bounded agent, you must take into account skill as a resource. And even if you've gained skills for irrational reasons, those are the resources you have.

Don't think that since one of the optimization styles you feel motivated to use is explicit in the way it tries to be the one true style, that it is the one true style.

It is very very easy to leave something crucial out of your explicitly-thought-out optimization. I assert that having done that is a possibility you must always consider if you're feeling divided, distinct from subagent value differences and subagent belief differences.

Hour for hour, one of the most valuable things I've ever done was "wasting my time" watching a bunch of videos on the internet because I wanted to. The specific videos I wanted to watch were from the YouTube atheist community of old. "Pwned" videos, the vlogging equivalent of fisking. Debates over theism with Richard Dawkins and Christopher Hitchens. Very adversarial, not much of people trying to improve their own world-model through arguing. But I was fascinated. Eventually I came to notice how many of the arguments of my side were terrible. And I gravitated towards vloggers who made less terrible arguments. This lead to me watching a lot of philosophy videos. And getting into philosophy of ethics. My pickiness about arguments grew. I began talking about ethical philosophy with all my friends. I wanted to know what everyone would do in the trolley problem. This led to me becoming a vegetarian, then a vegan. Then reading a forum about utilitarian philosophy led me to find the LessWrong sequences, and the most important problem in the world.

It's not luck that this happened. When you have certain values and aptitudes, it's a predictable consequence of following long enough the joy of knowing something that feels like it deeply matters, that few other people know, the shocking novelty of "how is everyone so wrong?", the satisfying clarity of actually knowing why something is true or false with your own power, the intriguing dissonance of moral dilemmas and paradoxes...

It wasn't just curiosity as a pure detached value, predictably having a side effect good for my other values either. My curiosity steered me toward knowledge that felt like it mattered to me.

It turns out the optimal move was in fact "learn things". Specifically, "learn how to think better". And watching all those "Pwned" videos and following my curiosity from there was a way (for me) to actually do that, far better than lib arts classes in college.

I was not wise enough to calculate explicitly the value of learning to think better. And if I had calculated that, I probably would have come up with a worse way to accomplish it than just "train your argument discrimination on a bunch of actual arguments of steadily increasing refinement". Non-explicit optimizing style subagent for the win.

On Arrogance

3 casebash 20 January 2017 01:04AM

Arrogance is an interesting topic.

Let's imagine we have two people who are having a conversation. One of them is an professor in quantum mechanics and the other person is an enthusiast who has read a few popular science articles online.

The professor always gives his honest opinion, but in an extremely blunt manner, not holding anything back and not making any attempts to phrase it politely. That is, the professor does not merely tell the enthusiast that they are wrong, but also provides his honest assessment that the enthusiast does possess even a basic understanding of the core concepts of quantum mechanics.

The enthusiast is polite throughout, even when subject to this criticism. They respond to the professors objections about their viewpoints, to the best of their ability throughout, trying their best to engage directly with the professors arguments. At the same time, the enthusiast is convinced that he is correct - equally convinced as the professor in fact - but he does not vocalise this in the same way as the professor.

Who is the most arrogant in these circumstances? Is this even a useful question to ask - or should we be dividing arrogance into two components - over-confidence and dismissive behaviour?

Let's imagine the same conversation, but imagine that the enthusiast does not know that the professor is a professor and neither do the bystanders. The bystanders don't have a knowledge of quantum physics - they can't tell who is the professor and who is the enthusiast since both appear to be able to talk fluently about the topics. All they can see is that one person is incredibly blunt and dismissive, while the other person is perfectly polite and engages with all the arguments raised. Who would the bystanders see as most arrogant?

[Link] Self medicating for Schizophrenia with - cigarettes ?

2 morganism 24 January 2017 12:08AM

Corrigibility thoughts I: caring about multiple things

2 Stuart_Armstrong 18 January 2017 03:39PM

This is the first of three articles about limitations and challenges in the concept of corrigibility (see articles 2 and 3).

The desiderata for corrigibility are:

  1. A corrigible agent tolerates, and preferably assists, its operators in their attempts to alter or shut down the agent.
  2. A corrigible agent does not attempt to manipulate or deceive its operators.
  3. A corrigible agent has incentives to repair safety measures (such as shutdown buttons, tripwires, or containment tools) if they break, or at least notify its operators in the event of a breakage.
  4. A corrigible agent preserves its corrigibility, even as it creates new sub-systems or sub-agents, even if it undergoes significant self-modification.

In this post, I'll be looking more at point 4. A summary of the result will be:

Unless giving the AI extra options can reduce expected utility, the AI must care about every possible utility at least a bit.

Some of the results are formal, but the boundaries of the model are very unclear, so the warning in this post should always be born in mind.

Note that the indifference agents fail to be fully corrigible (they don't create corrigible subagents) and they also don't care about the other possible utilities before being changed (as this is a point of indifference).


Agents versus non-agents

First I'll present a cleaner version of an old argument. Basically, it seems that defining what a sub-agent or sub-system is, is tricky or impossible.

Consider for instance a paperclip maximiser, that may get corriged into a staple maximiser at a later date. The AI has some income, and may buy a large proportion of shares in either General Paperclips (GP) or Staples United (SU). Assume the best way of promoting the use of one of the tools, it to take over the company that makes them.

There are two scenarios; in each scenario, the AI has one of two choices.

  • In scenario 1, the AI has choice A: it buys stocks in GP, but cannot make further trades (in particular, it can't sell its stocks and buy others). It also has choice B: it retains flexibility, and can sell its GP stocks at some later date to buy stocks in SU.
  • In scenario 2, the AI delegates its investment decisions to a subagent. Under choice A, the subagent is a pure paperclip maximiser. Under choice B, the subagent is corrigible, and can be changed into a staple maximiser.

Clearly 1A and 2A are equivalent, while 1B and 2B are also equivalent. This trick can be extended: in most toy models, it's possible to replace a set of options with a choice of sub-agents, and vice-versa.

Thus point 4, in the real world, cannot be merely about subagents and subsystems, but about maintaining flexibility for its future decisions.


Burn the (rest of) the world

Let's now consider giving the corrigible agent an option C: it gets slightly more shares in GP than otherwise. But, in exchange, SU is destroyed, along with most of the rest of the world. This is the highest option for paperclip maximalisation, but the absolutely worst option for staples and anything else.

Also assume that the AI's operators are virtually certain (99.9999...9% sure) to be changing it into a staple maximiser. This means that taking option C is dramatically worse that option B, from the perspective of expected utility.

But the AI will only prefer option B if it "cares" about staples, at least a little bit, from the very beginning (we've shown that there is no clear distinction between subagent and general actions, thus the agent has to have a general care for staples, rather than-only-caring-about-B-in-the-context-of-subagents).

Thus, if we want to avoid corrigible agents getting worse expected outcomes when we give them extra "option Cs", or if we just want to rule out "option Cs" in the first place, then the AI needs to care about staples (or any possible future utility), from the very beginning.


Maintaining flexibility

You could argue that what's important about the agent is maintaining flexibility, and that option B is flexible, while options A and C are not. Thus the agent doesn't need to care about staples in advance, just to have a general preference for flexibility.

That doesn't seem sufficient. Let's add another, flexible, option D, and put numbers on the whole thing. If Up is the paperclip maximising utility and Us is the staple maximising one, then:

  • Option A: Up=100, Us=0
  • Option B: Up=100 or 0, Us=0 or 100
  • Option C: Up=101, Us=-1,000,000
  • Option D: Up=101 or 0, Us=-1,000,000 or -999,000

The "or" denotes options the AI has kept open. Here D is the most flexible option, allowing the greatest range of outcomes in terms of value. Yet it's clearly a disaster for expected utility if changes from Up to Us are likely. If we want to rule out the AI choosing D, then we need it to actively value Us at least a bit, not just have a desire for flexibility.

[Link] Neural nets designing neural nets

2 Stuart_Armstrong 18 January 2017 03:28PM

[Link] Please Help: How to make a big improvement in the alignment of political parties’ incentives with the public interest?

2 interstice 18 January 2017 12:51AM

[Link] Another UBI article, this one on costs and benefits to society

1 morganism 23 January 2017 11:21PM

[Link] Most empirical questions are unresolveable; The good, the bad, and the appropriately under-powered

1 Davidmanheim 23 January 2017 08:35PM

Polling Thread January 2017

1 Gunnar_Zarncke 22 January 2017 11:26PM

One next installment of the Polling Thread.

The Polling thread seems to have fallen out of regular use but I will use it shamelessly to post one of few polls. To make it a bit more useful and in line with the current use of LW I suggest that you post links to interesting polls outside of LW.

This is your chance to ask your multiple choice question you always wanted to throw in. Get qualified numeric feedback to your comments. Post fun polls.

These use to be the rules:

  1. Each poll (or link to a poll) goes into its own top level comment and may be commented there.
  2. You must should at least vote all polls that were posted earlier than your own. This ensures participation in all polls and also limits the total number of polls. You may of course vote without posting a poll.
  3. Your poll should include a 'don't know' option (to avoid conflict with 2). I don't know whether we need to add a troll catch option here but we will see.

If you don't know how to make a poll in a comment look at the Poll Markup Help.

This is a somewhat regular thread. If it is successful I may post again. Or you may. In that case do the following :

  • Use "Polling Thread" in the title.
  • Copy the rules.
  • Add the tag "poll".
  • Link to this Thread or a previous Thread.
  • Create a top-level comment saying 'Discussion of this thread goes here; all other top-level comments should be polls or similar'
  • Add a second top-level comment with an initial poll to start participation.

Weekly LW Meetups

1 FrankAdamek 20 January 2017 04:56PM

Instrumental Rationality: Overriding Defaults

1 lifelonglearner 20 January 2017 05:14AM

[I'd previously posted this essay as a link. From now on, I'll be cross-posting blog posts here instead of linking them, to keep the discussions LW central. This is the first in an in-progress of sequence of articles that'll focus on identifying instrumental rationality techniques and cataloging my attempt to integrate them into my life with examples and insight from habit research.]

[Epistemic Status: Pretty sure. The stuff on habits being situation-response links seems fairly robust. I'll be writing something later with the actual research. I'm basically just retooling existing theory into an optimizational framework for improving life.]


   I’m interested how rationality can help us make better decisions.  

              Many of these decisions seem to involve split-second choices where it’s hard to sit down and search a handbook for the relevant bits of information—you want to quickly react in the correct way, else the moment passes and you’ve lost. On a very general level, it seems to be about reacting in the right way once the situation provides a cue.

              Consider these situation-reaction pairs:

  • ·       You are having an argument with someone. As you begin to notice the signs of yourself getting heated, you remember to calm down and talk civilly. Maybe also some deep breaths.
  • ·       You are giving yourself a deadline or making a schedule for a task, and you write down the time you expect to finish. Quickly, though, you remember to actually check if it took you that long last time, and you adjust accordingly.
  • ·       You feel yourself slipping towards doing something some part of you doesn’t want to do. Say you are reneging on a previous commitment. As you give in to temptation, you remember to pause and really let the two sides of yourself communicate.
  • ·       You think about doing something, but you feel aversive / flinch-y to it. As you shy away from the mental pain, rather than just quickly thinking about something else, you also feel curious as to why you feel that way. You query your brain and try to pick apart the “ugh” feeling,

Two things seem key to the above scenarios:

One, each situation above involves taking an action that is different from our keyed-in defaults.

Two, the situation-reaction pair paradigm is pretty much CFAR’s Trigger Action Plan (TAP) model, paired with a multi-step plan.

Also, knowing about biases isn’t enough to make good decisions. Even memorizing a mantra like “Notice signs of aversion and query them!” probably isn’t going to be clear enough to be translated into something actionable. It sounds nice enough on the conceptual level, but when, in the moment, you remember such a mantra, you still need to figure out how to “notice signs of aversion and query them”.

What we want is a series of explicit steps that turn the abstract mantra into small, actionable steps. Then, we want to quickly deploy the steps at the first sign of the situation we’re looking out for, like a new cached response.

This looks like a problem that a combination of focused habit-building and a breakdown of the 5-second level can help solve.

In short, the goal looks to be to combine triggers with clear algorithms to quickly optimize in the moment. Reference class information from habit studies can also help give good estimates on how long the whole process will take to internalize (on average 66 days, according to Lally et al)

But these Trigger Action Plan-type plans don’t seem to directly cover the willpower related problems with akrasia.

Sure, TAPs can help alert you to the presence of an internal problem, like in the above example where you notice aversion. And the actual internal conversation can probably be operationalized to some extent, like how CFAR has described the process of Double Crux.

But most of the Overriding Default Habit actions seem to be ones I’d be happy to do anytime—I just need a reminder—whereas akrasia-related problems are centrally related to me trying to debug my motivational system. For that reason, I think it helps to separate the two. Also, it makes the outside-seeming TAP algorithms complementary, rather than at odds, with the inside-seeming internal debugging techniques.

Loosely speaking, then, I think it still makes quite a bit of sense to divide the things rationality helps with into two categories:

  • Overriding Default Habits:

These are the situation-reaction pairs I’ve covered above. Here, you’re substituting a modified action instead of your “default action”. But the cue serves as mainly a reminder/trigger. It’s less about diagnosing internal disagreement.

  • Akrasia / Willpower Problems:

Here we’re talking about problems that might require you to precommit (although precommitment might not be all you need to do), perhaps because of decision instability. The “action-intention gap” caused by akrasia, where you (sort of) want to something but you don’t want to also goes in here.

Still, it’s easy to point to lots of other things that fall in the bounds of rationality that my approach doesn’t cover: epistemology, meta-levels, VNM rationality, and many other concepts are conspicuously absent. Part of this is because I’ve been focusing on instrumental rationality, while a lot of those ideas are more in the epistemic camp.

Ideas like meta-levels do seem to have some place in informing other ideas and skills. Even as declarative knowledge, they do chain together in a way that results in useful real world heuristics.  Meta-levels, for example, can help you keep track of the ultimate direction in a conversation. Then, it can help you table conversations that don’t seem immediately useful/relevant and not get sucked into the object-level discussion.

At some point, useful information about how the world works should actually help you make better decisions in the real world. For an especially pragmatic approach, it may be useful to ask yourself, each time you learn something new, “What do I see myself doing as a result of learning this information?”

There’s definitely more to mine from the related fields of learning theory, habits, and debiasing, but I think I’ll have more than enough skills to practice if I just focus on the immediately practical ones.



[Link] Marginal Revolution Thoughts on Black Lives Matter Movement

1 scarcegreengrass 18 January 2017 06:12PM

Corrigibility thoughts III: manipulating versus deceiving

1 Stuart_Armstrong 18 January 2017 03:57PM

This is the first of three articles about limitations and challenges in the concept of corrigibility (see articles 1 and 2).

The desiderata for corrigibility are:

  1. A corrigible agent tolerates, and preferably assists, its operators in their attempts to alter or shut down the agent.
  2. A corrigible agent does not attempt to manipulate or deceive its operators.
  3. A corrigible agent has incentives to repair safety measures (such as shutdown buttons, tripwires, or containment tools) if they break, or at least notify its operators in the event of a breakage.
  4. A corrigible agent preserves its corrigibility, even as it creates new sub-systems or sub-agents, even if it undergoes significant self-modification.

In this post, I'll be looking more at some aspects of point 2. A summary of the result will be:

Defining manipulation simply may be possible, but defining deception is a whole other problem.

The warning in this post should always be born in mind, of course; it's possible that we me might find a semi-formal version of deception that does the trick.


Manipulation versus deception

In the previous post, I mentioned that we may need to define clearly what an operator was, rather than relying on the pair: {simple description of a value correction event, physical setup around that event}. Can we define manipulation and deception without defining what an operator is?

For manipulation, it seems we can. Because manipulation is all about getting certain preferred outcomes. By specifying that the AI cannot aim to optimise certain outcomes, we can stop at least certain types of manipulations. Along with other more direct ways of achieving those outcomes.

For deception, the situation is much more complicated. It seems impossible to define how one agent can communicate to another agent (especially one as biased as a human), and increase the accuracy of the second agent, without defining the second agent properly. More confusingly, this doesn't even stop deception; sometimes lying to a bounded agent can increase their accuracy about the world.

There may be some ways to define deception or truth behaviourally, such as using a human as a crucial node in an autoencoder between two AIs. But those definitions are dangerous, because the AI is incentivised to make the human behave in a certain way, rather than having them believe certain things. Manipulating the human or replacing them entirely is positively encourage.

In all, it seems that the problem of AI deception is vast and complicated, and should probably be separated from the issue of corrigibility.

Corrigibility thoughts II: the robot operator

1 Stuart_Armstrong 18 January 2017 03:52PM

This is the first of three articles about limitations and challenges in the concept of corrigibility (see articles 1 and 3).

The desiderata for corrigibility are:

  1. A corrigible agent tolerates, and preferably assists, its operators in their attempts to alter or shut down the agent.
  2. A corrigible agent does not attempt to manipulate or deceive its operators.
  3. A corrigible agent has incentives to repair safety measures (such as shutdown buttons, tripwires, or containment tools) if they break, or at least notify its operators in the event of a breakage.
  4. A corrigible agent preserves its corrigibility, even as it creates new sub-systems or sub-agents, even if it undergoes significant self-modification.

In this post, I'll be looking at one aspect of points 1 and 3. I feel that they're actually pretty much the same point -- the AI has to act while preserving the corrigibility infrastructure.

A summary of the result will be:

Without a formal definition of what an operator is, corrigibility is vulnerable to the AI replacing the operator with a fake version.

The warning in this post should always be born in mind, of course; it's possible that we me might find a semi-formal version of operator that does the trick.


Defining the operator or defining the event

Typically the process of correction of values is defined physically, as being the outcome of specific event (such as a button press, or a specific stochastic event triggered by or triggering a button press). This could also be defined by a digital event (receiving a certain encoded signal, or similar).

In all cases, the idea is the same: there is a very narrow and specific definition of the value correction event, and the physical or digital infrastructure surrounding it plays the role of ensuring the value corrections happen just as we intended. For instance, the physical event could be the button press (or a charge in a wire triggered by the button press) and the button itself could be in a room with an operator, with the operator keeping an eye on the AI and motivated to press the button under the right circumstances.

Such a setup is naturally vulnerable if the AI is motivated to manipulate the operator, or one of its subagents is. But even if we can solve points 2. and 4., there remains another vulnerability: the AI can choose to replace the operator with an artificial one (maybe a robot, maybe a second wire).

Why would the AI be so motivated? Well, for a start, it's indifferent to doing so. If the operator is not part of the definition of corrigibility, then having a human operator or a robot one that does exactly the same thing will be equivalent to the AI.

Now, there are frictions that might push against this. Replacing the operator might carry an expected cost. Or maybe the operator has private information that the AI wants to know, hence replacing them is a negative as the robot wouldn't have that information. And maybe the AI has a rough "don't harm humans" motive, that, while imperfect, at least prevents it from removing the operator in this circumstance.

But there may be motivations pushing the other way. For example, if a learning agent is unbiased but not uninfluenceable, then it will be motivated to replace the operator with a robot with the same expected outcome, but different variance. Or maybe the operator has private information, but it's much more effective to have a robot read that info from their brain than to rely on the human to reveal that through their behaviour. Or maybe the AI is decently motivated to preserve the corrigibility infrastructure, and is aware of how mortal humans can be, so replacing them with a robot is the prudent thing to do.

All this stems from the fact that the operator is not well defined as part of the corrigibility infrastructure, but their position relies on physical facts about the world, along with a narrow definition of the correction of value event. To combat that, we'd need to define the operator properly, a very tricky challenge, or physically and cognitively secure them, or hope the AI learns early on not to not harm them.

[Link] Choosing prediction over explanation in psychology: Lessons from machine learning

1 Kaj_Sotala 17 January 2017 09:23PM

Too Much Effort | Too Little Evidence

0 Erfeyah 24 January 2017 12:37PM

I would like to explore certain kinds of experiential knowledge that appear to me to be difficult to investigate rationally as rationality itself might be the cause of a reluctance to explore. If this is already covered in one of the articles on the site please refer me to it.

In this thought experiment we will use the example of lucid dreaming. Lucid dreaming is a state in which a person realises they are dreaming while they are dreaming. The subtleties of the state are not relevant to this discussion.


[1] We will assume the experiment takes place at a time where the existence of the experience of lucid dreaming hasn't been scientifically proven yet. We will also assume that a proof is not possible in the current state of technological or methodological development.
[2] Person A has a (true) belief on the existence of lucid dreaming that is based on his personal experience of the state.
[3] He is trying to communicate the existence of lucid dreaming to someone else. Let us call him person B.
[4] Actually becoming lucid in a dream is quite a complex process that requires among other things1:
    [4.1] Expending large amounts of effort.
    [4.2] Following guidelines and exercises that appear strange.
    [4.3] A time investment of significant length.

In the described circumstances we have an internal experience that has not be scientifically proven but is nevertheless true. We know this in our time through scientific studies but B does not know it in his world. Person B would have to actually believe in the existence of lucid dreaming and trust A to guide him through the process. But since there is no rational evidence to support the claim of A, the required effort is significantly large and the methods appear strange to those not understanding the state how can B rationally decide to expend the effort?

Proposed Conclusion

[5] Rational assessment can be misleading when dealing with experiential knowledge that is not yet scientifically proven, has no obvious external function but is, nevertheless, experientially accessible.



1 Even if you disagree with the level of difficulty or the steps required please accept [4] and its sub-headings as being accurate for the duration of the argument.

Metrics to evaluate a Presidency

0 ArisC 24 January 2017 01:02AM

I got lots of helpful comments in my first post, so I'll try a second: I want to develop a list of criteria by which to evaluate a presidency. Coming up with criteria and metrics on the economy is pretty easy, but I'd like to ask for suggestions on proxies for evaluating:


  • Racial relations;
  • Gender equality;
  • Impact on free trade / protectionism;
  • Education;
  • Any other significant factor that would determine whether a president is successful.
Note: a few people have pointed out that the president is restrained by senators and congressmen etc - I realise that; but if we are willing to admit that presidents do have some effect in society, we should be prepared to measure them.



[Link] MAD Trollies

0 gworley 23 January 2017 06:57PM

[Link] Quick modeling: resolving disagreements.

0 ProofOfLogic 23 January 2017 06:18PM

Open thread, Jan. 23 - Jan. 29, 2017

0 MrMind 23 January 2017 07:41AM

If it's worth saying, but not worth its own post, then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should start on Monday, and end on Sunday.

4. Unflag the two options "Notify me of new top level comments on this article" and "

View more: Next