Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Munchkining for Fun and Profit, Ideas, Experience, Successes, Failures

0 Username 19 December 2014 05:39AM

A munchkin is someone who follows the letter of the rules of a game while breaking their spirit, someone who wins by exploiting possibilities that others don't see, reject as impossible or unsporting, or just don't believe can possibly work.

If you have done something that everyone around you thought would not work, something that people around you didn't do after they saw it work, please share your experiences. If you tried something and failed or have ideas you want to hear critique of, likewise please share those with us.

[Short, Meta] Should open threads be more frequent?

2 Metus 18 December 2014 11:41PM

Currently open threads are weekly and very well received. However they tend to fill up quickly. Personally I fear that my contribution will drown unless posted early on so I tend to wait if I want to add a new top level post. Does anyone else have this impression? Someone with better coding skills than me could put this statistically by plotting the number of top level posts and total posts over time: If the curve is convex people tend to delay their posts.

So should open threads be more frequent and if so what frequency?

An explanation of the 'Many Interacting Worlds' theory of quantum mechanics (by Sean Carroll and Chip Sebens)

2 Ander 18 December 2014 11:36PM

This is the first explanation of a 'many worlds' theory of quantum mechanics that has ever made sense to me. The animations are excellent:



Bayes Academy Development Report 2 - improved data visualization

2 Kaj_Sotala 18 December 2014 10:11PM

See here for the previous update if you missed / forgot it.

In this update, no new game content, but new graphics.

I wasn’t terribly happy about the graphical representation of the various nodes in the last update. Especially in the first two networks, if you didn’t read the descriptions of the nodes carefully, it was very easy to just click your way through them without really having a clue of what the network was actually doing. Needless to say, for a game that’s supposed to teach how the networks function, this is highly non-optimal.

Here’s the representation that I’m now experimenting with: the truth table of the nodes is represented graphically inside the node. The prior variable at the top doesn’t really have a truth table, it’s just true or false. The “is” variable at the bottom is true if its parent is true, and false if its parent is false.

You may remember that in the previous update, unobservable nodes were represented in grayscale. I ended up dropping that, because that would have been confusing in this representation: if the parent is unobservable, should the blobs representing its truth values in the child node be in grayscale as well? Both “yes” and “no” answers felt confusing.

Instead the observational state of a node is now represented by its border color. Black for unobservable, gray for observable, no border for observed. The metaphor is supposed to be something like, a border is a veil of ignorance blocking us from seeing the node directly, but if the veil is gray it’s weak enough to be broken, whereas a black veil is strong enough to resist a direct assault. Or something.

When you observe a node, not only does its border disappear, but the truth table entries that get reduced to a zero probability disappear, to be replaced by white boxes. I experimented with having the eliminated entries still show up in grayscale, so you could e.g. see that the “is” node used to contain the entry for (false -> false), but felt that this looked clearer.

The “or” node at the bottom is getting a little crowded, but hopefully not too crowded. Since we know that its value is “true”, the truth table entry showing (false, false -> false) shows up in all whites. It’s also already been observed, so it starts without a border.

After we observe that there’s no monster behind us, the “or” node loses its entries for (monster, !waiting -> looks) and (monster, waiting -> looks), leaving only (!monster, waiting -> looks): meaning that the boy must be waiting for us to answer.

This could still be made clearer: currently the network updates instantly. I’m thinking about adding a brief animation where the “monster” variable would first be revealed as false, which would then propagate an update to the values of “looks at you” (with e.g. the red tile in “monster” blinking at the same time as the now-invalid truth table entries, and when the tiles stopped blinking, those now-invalid entries would have disappeared), and that would in turn propagate the update to the “waiting” node, deleting the red color from it. But I haven’t yet implemented this.

The third network is where things get a little tricky. The “attacking” node is of type “majority vote” - i.e. it’s true if at least two of its parents are true, and false otherwise. That would make for a truth table with eight entries, each holding four blobs each, and we could already see the “or” node in the previous screen being crowded. I’m not quite sure of what to do here. At this moment I’m thinking of just leaving the node as is, and displaying more detailed information in the sidebar.

Here’s another possible problem. Just having the truth table entries works fine to make it obvious where the overall probability of the node comes from… for as long as the valid values of the entries are restricted to “possible” and “impossible”. Then you can see at a glance that, say, of the three possible entries, two would make this node true and one would make this false, so there’s a ⅔ chance of it being true.

But in this screen, that has ceased to be the case. The “attacking” node has a 75% chance of being true, meaning that, for instance, the “is / block” node’s “true -> true” entry also has a 75% chance of being the right one. This isn’t reflected in the truth table visualization. I thought of adding small probability bars under each truth table entry, or having the size of the truth table blobs reflect their probability, but then I’d have to make the nodes even bigger, and it feels like it would easily start looking cluttered again. But maybe it’d be the right choice anyway? Or maybe just put the more detailed information in the sidebar? I’m not sure of the best thing to do here.

If anyone has good suggestions, I would be grateful to get advice from people who have more of a visual designer gene than I do!

Rationality Jokes Thread

4 Gunnar_Zarncke 18 December 2014 04:17PM

This is an experimental thread. It is somewhat in the spirit of the Rationality Quotes Thread but without the requirements and with a focus on humorous value. You may post insightful jokes, nerd or math jokes or try out rationality jokes of your own invention. 

ADDED: Apparently there has been an earlier Jokes Thread which was failry successful. Consider this another instance.

(Very Short) PSA: Combined Main and Discussion Feed

6 Gondolinian 18 December 2014 03:46PM

For anyone who's annoyed by having to check newest submissions for Main and Discussion separately, there is a feed for combined submissions from both, in the form of Newest Submissions - All (RSS feed).  (There's also Comments - All (RSS feed), but for me at least, it seems to only show comments from Main and none from Discussion.)

Thanks to RichardKennaway for bringing this to my attention, and to Unknowns for asking the question that prompted him.  (If you've got the time, head over there and give them some karma.)  I thought this deserved the visibility of a post in Discussion, as not everyone reads through the Open Thread, and I think there's a chance that many would benefit from this information.

How many words do we have and how many distinct concepts do we have?

-4 MazeHatter 17 December 2014 11:04PM

In another message, I suggested that, given how many cultures we have to borrow from, that our language may include multiple words from various sources that apply to a single concept.

An example is Reality, or Existence, or Being, or Universe, or Cosmos, or Nature, ect.

Another is Subjectivity, Mind, Consciousness, Experience, Qualia, Phenomenal, Mental, ect

Is there any problem with accepting these claims so far? Curious what case would be made to the contrary.

(Here's a bit of a contextual aside, between quantum mechanics and cosmology, the words "universe", "multiverse", and "observable universe" mean at least 10 different things, depending on who you ask. People often say the Multiverse comes from Hugh Everett. But what they are calling the multiverse, Everett called "universal wave function", or "universe". How did Everett's universe become the Multiverse? DeWitt came along and emphasized some part of the wave function branching into different worlds. So, if you're following, one Universe, many worlds. Over the next few decades, this idea was popularized as having "many parallel universes", which is obviously inaccurate. Well, a Scottish chap decided to correct this. He stated the Universe was the Universal Wave Function, where it was "a complete one", because that's what "uni" means. And that our perceived worlds of various objects is a "multiverse". One Universe, many Multiverses. Again, the "parallel universes" idea seemed cooler, so as it became more popular the Multiverse became one and the universe became many. What's my point? The use of these words is legitimate fiasco, and I suggest we abandon them altogether.)

If these claims are found to be palatable, what do they suggest?

I propose, respectfully and humbly as I can imagine there may be compelling alternatives presented here, that in the 21st century, we make a decision about which concepts are necessary, which term we will use to describe that concept, and respectfully leave the remaining terms for the domain of poetry.

Here are the words I think we need:

  1. reality
  2. model
  3. absolute
  4. relative
  5. subjective
  6. objective
  7. measurement
  8. observer

With these terms I feel we can construct a concise metaphysical framework, consistent with the great rationalists of history, and that accurately described Everett's "Relative State Formulation of Quantum Mechanics".

  1. Absolute reality is what is. It is relative to no observer. It is real prior to measurement.
  2. Subjective reality is what is, relative to a single observer. It exists at measurement.
  3. Objective reality is the model relative to all observers. It exists post-measurement.

Everett's Relative State formulation, is roughly this:

  1. The wave function is the "absolute state" of the model
  2. The wave function contains an observer and their measurement apparatus
  3. An observer makes a measurements and records the result in a memory
  4. those measurement records are the "relative state" of the model

Here we see that the words multiverse and universe are abandoned for absolute and relative states, which is actually the language used in the Relative State Formulation.

My conclusion then, for you consideration and comment, is that a technical view of reality can be attained by having a select set of terms, and this view is not only consistent with themes of philosophy (which I didn't really explain) but also the proper framework in which to interpret quantum mechanics (ala Everett).

(I'm not sure how familiar everyone here is with Everett specifically or not. His thesis depended on "automatically function machines" that make measurements with sensory gear and record them. After receiving his PhD, he left theoretical physics, and had a life long fascination with computer vision and computer hearing. That suggests to me, the reason his papers have been largely confounding to the general physicists, is because they didn't realize the extent to which Everett really thought he could mathematically model an observer.)

I should note, it may clarify things to add another term "truth", though this would in general be taken as an analog of "real". For example, if something is absolute true, then it is of absolute reality. If something is objectively true, then it is of objective reality. The word "knowledge" in this sense is a poetic word for objective truth, understood on the premise that objective truth is not absolute truth.

Giving What We Can - New Year drive

5 Smaug123 17 December 2014 03:26PM

If you’ve been planning to get around to maybe thinking about Effective Altruism, we’re making your job easier. A group of UK students has set up a drive for people to sign up to the Giving What We Can pledge to donate 10% of their future income to charity. It does not specify the charities - that decision remains under your control. The pledge is not legally binding, but honour is a powerful force when it comes to promising to help. If 10% is a daunting number, or you don't want to sign away your future earnings in perpetuity, there is a Try Giving scheme in which you may donate less money for less time. I suggest five years (that is, from 2015 to 2020) of 5% as a suitable "silver" option to the 10%-until-retirement "gold medal".


We’re hoping to take advantage of the existing Schelling point of “new year” as a time for resolutions, as well as building the kind of community spirit that gets people signing up in groups. If you feel it’s a word worth spreading, please feel free to spread it. As of this writing, GWWC reported 41 new members this month, which is a record for monthly acquisitions (and we’re only halfway through the month, three days into the event).


If anyone has suggestions about how to better publicise this event (or Effective Altruism generally), please do let me know. We’re currently talking to various news outlets and high-profile philanthropists to see if they can give us a mention, but suggestions are always welcome. Likewise, comments on the effectiveness of this post itself will be gratefully noted.


About Giving What We Can: GWWC is under the umbrella of the Centre for Effective Altruism, was co-founded by a LessWronger, and in 2013 had verbal praise from lukeprog.

Entropy and Temperature

26 spxtr 17 December 2014 08:04AM

Eliezer Yudkowsky previously wrote (6 years ago!) about the second law of thermodynamics. Many commenters were skeptical about the statement, "if you know the positions and momenta of every particle in a glass of water, it is at absolute zero temperature," because they don't know what temperature is. This is a common confusion.


To specify the precise state of a classical system, you need to know its location in phase space. For a bunch of helium atoms whizzing around in a box, phase space is the position and momentum of each helium atom. For N atoms in the box, that means 6N numbers to completely specify the system.

Lets say you know the total energy of the gas, but nothing else. It will be the case that a fantastically huge number of points in phase space will be consistent with that energy.* In the absence of any more information it is correct to assign a uniform distribution to this region of phase space. The entropy of a uniform distribution is the logarithm of the number of points, so that's that. If you also know the volume, then the number of points in phase space consistent with both the energy and volume is necessarily smaller, so the entropy is smaller.

This might be confusing to chemists, since they memorized a formula for the entropy of an ideal gas, and it's ostensibly objective. Someone with perfect knowledge of the system will calculate the same number on the right side of that equation, but to them, that number isn't the entropy. It's the entropy of the gas if you know nothing more than energy, volume, and number of particles.


The existence of temperature follows from the zeroth and second laws of thermodynamics: thermal equilibrium is transitive, and entropy is maximum in equilibrium. Temperature is then defined as the thermodynamic quantity that is the shared by systems in equilibrium.

If two systems are in equilibrium then they cannot increase entropy by flowing energy from one to the other. That means that if we flow a tiny bit of energy from one to the other (δU1 = -δU2), the entropy change in the first must be the opposite of the entropy change of the second (δS1 = -δS2), so that the total entropy (S1 + S2) doesn't change. For systems in equilibrium, this leads to (∂S1/∂U1) = (∂S2/∂U2). Define 1/T = (∂S/∂U), and we are done.

Temperature is sometimes taught as, "a measure of the average kinetic energy of the particles," because for an ideal gas U/= (3/2) kBT. This is wrong, for the same reason that the ideal gas entropy isn't the definition of entropy.

Probability is in the mind. Entropy is a function of probabilities, so entropy is in the mind. Temperature is a derivative of entropy, so temperature is in the mind.

Second Law Trickery

With perfect knowledge of a system, it is possible to extract all of its energy as work. EY states it clearly:

So (again ignoring quantum effects for the moment), if you know the states of all the molecules in a glass of hot water, it is cold in a genuinely thermodynamic sense: you can take electricity out of it and leave behind an ice cube.

Someone who doesn't know the state of the water will observe a violation of the second law. This is allowed. Let that sink in for a minute. Jaynes calls it second law trickery, and I can't explain it better than he does, so I won't try:

A physical system always has more macroscopic degrees of freedom beyond what we control or observe, and by manipulating them a trickster can always make us see an apparent violation of the second law.

Therefore the correct statement of the second law is not that an entropy decrease is impossible in principle, or even improbable; rather that it cannot be achieved reproducibly by manipulating the macrovariables {X1, ..., Xn} that we have chosen to define our macrostate. Any attempt to write a stronger law than this will put one at the mercy of a trickster, who can produce a violation of it.

But recognizing this should increase rather than decrease our confidence in the future of the second law, because it means that if an experimenter ever sees an apparent violation, then instead of issuing a sensational announcement, it will be more prudent to search for that unobserved degree of freedom. That is, the connection of entropy with information works both ways; seeing an apparent decrease of entropy signifies ignorance of what were the relevant macrovariables.


I've actually given you enough information on statistical mechanics to calculate an interesting system. Say you have N particles, each fixed in place to a lattice. Each particle can be in one of two states, with energies 0 and ε. Calculate and plot the entropy if you know the total energy: S(E), and then the energy as a function of temperature: E(T). This is essentially a combinatorics problem, and you may assume that N is large, so use Stirling's approximation. What you will discover should make sense using the correct definitions of entropy and temperature.

*: How many combinations of 1023 numbers between 0 and 10 add up to 5×1023?

"incomparable" outcomes--multiple utility functions?

3 Emanresu 17 December 2014 12:06AM

I know that this idea might sound a little weird at first, so just hear me out please?

A couple weeks ago I was pondering decision problems where a human decision maker has to choose between two acts that lead to two "incomparable" outcomes. I thought, if outcome A is not more preferred than outcome B, and outcome B is not more preferred than outcome A, then of course the decision maker is indifferent between both outcomes, right? But if that's the case, the decision maker should be able to just flip a coin to decide. Not only that, but adding even a tiny amount of extra value to one of the outcomes should always make that outcome be preferred. So why can't a human decision maker just make up their mind about their preferences between "incomparable" outcomes until they're forced to choose between them? Also, if a human decision maker is really indifferent between both outcomes, then they should be able to know that ahead of time and have a plan for deciding, such as flipping a coin. And, if they're really indifferent between both outcomes, then they should not be regretting and/or doubting their decision before an outcome even occurs regardless of which act they choose. Right?

I thought of the idea that maybe the human decision maker has multiple utility functions that when you try to combine them into one function some parts of the original functions don't necessarily translate well. Like some sort of discontinuity that corresponds to "incomparable" outcomes, or something. Granted, it's been a while since I've taken Calculus, so I'm not really sure how that would look on a graph.

I had read Yudkowsky's "Thou Art Godshatter" a couple months ago, and there was a point where it said "one pure utility function splintered into a thousand shards of desire". That sounds like the "shards of desire" are actually a bunch of different utility functions.

I'd like to know what others think of this idea. Strengths? Weaknesses? Implications?

Superintelligence 14: Motivation selection methods

5 KatjaGrace 16 December 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.

Welcome. This week we discuss the fourteenth section in the reading guideMotivation selection methods. This corresponds to the second part of Chapter Nine.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Motivation selection methods” and “Synopsis” from Chapter 9.


  1. One way to control an AI is to design its motives. That is, to choose what it wants to do (p138)
  2. Some varieties of 'motivation selection' for AI safety:
    1. Direct specification: figure out what we value, and code it into the AI (p139-40)
      1. Isaac Asimov's 'three laws of robotics' are a famous example
      2. Direct specification might be fairly hard: both figuring out what we want and coding it precisely seem hard
      3. This could be based on rules, or something like consequentialism
    2. Domesticity: the AI's goals limit the range of things it wants to interfere with (140-1)
      1. This might make direct specification easier, as the world the AI interacts with (and thus which has to be thought of in specifying its behavior) is simpler.
      2. Oracles are an example
      3. This might be combined well with physical containment: the AI could be trapped, and also not want to escape.
    3. Indirect normativity: instead of specifying what we value, specify a way to specify what we value (141-2)
      1. e.g. extrapolate our volition
      2. This means outsourcing the hard intellectual work to the AI
      3. This will mostly be discussed in chapter 13 (weeks 23-5 here)
    4. Augmentation: begin with a creature with desirable motives, then make it smarter, instead of designing good motives from scratch. (p142)
      1. e.g. brain emulations are likely to have human desires (at least at the start)
      2. Whether we use this method depends on the kind of AI that is developed, so usually we won't have a choice about whether to use it (except inasmuch as we have a choice about e.g. whether to develop uploads or synthetic AI first).
  3. Bostrom provides a summary of the chapter:
  4. The question is not which control method is best, but rather which set of control methods are best given the situation. (143-4)

Another view


Would you say there's any ethical issue involved with imposing limits or constraints on a superintelligence's drives/motivations? By analogy, I think most of us have the moral intuition that technologically interfering with an unborn human's inherent desires and motivations would be questionable or wrong, supposing that were even possible. That is, say we could genetically modify a subset of humanity to be cheerful slaves; that seems like a pretty morally unsavory prospect. What makes engineering a superintelligence specifically to serve humanity less unsavory?


1. Bostrom tells us that it is very hard to specify human values. We have seen examples of galaxies full of paperclips or fake smiles resulting from poor specification. But these - and Isaac Asimov's stories - seem to tell us only that a few people spending a small fraction of their time thinking does not produce any watertight specification. What if a thousand researchers spent a decade on it? Are the millionth most obvious attempts at specification nearly as bad as the most obvious twenty? How hard is it? A general argument for pessimism is the thesis that 'value is fragile', i.e. that if you specify what you want very nearly but get it a tiny bit wrong, it's likely to be almost worthless. Much like if you get one digit wrong in a phone number. The degree to which this is so (with respect to value, not phone numbers) is controversial. I encourage you to try to specify a world you would be happy with (to see how hard it is, or produce something of value if it isn't that hard).

2. If you'd like a taste of indirect normativity before the chapter on it, the LessWrong wiki page on coherent extrapolated volition links to a bunch of sources.

3. The idea of 'indirect normativity' (i.e. outsourcing the problem of specifying what an AI should do, by giving it some good instructions for figuring out what you value) brings up the general question of just what an AI needs to be given to be able to figure out how to carry out our will. An obvious contender is a lot of information about human values. Though some people disagree with this - these people don't buy the orthogonality thesis. Other issues sometimes suggested to need working out ahead of outsourcing everything to AIs include decision theory, priors, anthropics, feelings about pascal's mugging, and attitudes to infinity. MIRI's technical work often fits into this category.

4. Danaher's last post on Superintelligence (so far) is on motivation selection. It mostly summarizes and clarifies the chapter, so is mostly good if you'd like to think about the question some more with a slightly different framing. He also previously considered the difficulty of specifying human values in The golem genie and unfriendly AI (parts one and two), which is about Intelligence Explosion and Machine Ethics.

5. Brian Clegg thinks Bostrom should have discussed Asimov's stories at greater length:

I think it’s a shame that Bostrom doesn’t make more use of science fiction to give examples of how people have already thought about these issues – he gives only half a page to Asimov and the three laws of robotics (and how Asimov then spends most of his time showing how they’d go wrong), but that’s about it. Yet there has been a lot of thought and dare I say it, a lot more readability than you typically get in a textbook, put into the issues in science fiction than is being allowed for, and it would have been worthy of a chapter in its own right.

If you haven't already, you might consider (sort-of) following his advice, and reading some science fiction.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Can you think of novel methods of specifying the values of one or many humans?
  2. What are the most promising methods for 'domesticating' an AI? (i.e. constraining it to only care about a small part of the world, and not want to interfere with the larger world to optimize that smaller part).
  3. Think more carefully about the likely motivations of drastically augmenting brain emulations
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will start to talk about a variety of more and less agent-like AIs: 'oracles', genies' and 'sovereigns'. To prepare, read Chapter “Oracles” and “Genies and Sovereigns” from Chapter 10The discussion will go live at 6pm Pacific time next Monday 22nd December. Sign up to be notified here.

How many people am I?

4 Manfred 15 December 2014 06:11PM

Strongly related: the Ebborians

Imagine mapping my brain into two interpenetrating networks. For each brain cell, half of it goes to one map and half to the other. For each connection between cells, half of each connection goes to one map and half to the other. We can call these two mapped out halves Manfred One and Manfred Two. Because neurons are classical, as I think, both of these maps change together. They contain the full pattern of my thoughts. (This situation is even more clear in the Ebborians, who can literally split down the middle.)

So how many people am I? Are Manfred One and Manfred Two both people? Of course, once we have two, why stop there - are there thousands of Manfreds in here, with "me" as only one of them? Put like that it sounds a little overwrought - what's really going on here is the question of what physical system corresponds to "I" in english statements like "I wake up." This may matter.

The impact on anthropic probabilities is somewhat straightforward. With everyday definitions of "I wake up," I wake up just once per day no matter how big my head is. But if the "I" in that sentence is some constant-size physical pattern, then "I wake up" is an event that happens more times if my head is bigger. And so using the variable people-number definition, I expect to wake up with a gigantic head.

The impact on decisions is less big. If I'm in this head with a bunch of other Manfreds, we're all on the same page - it's a non-anthropic problem of coordinated decision-making. For example, if I were to make any monetary bets about my head size, and then donate profits to charity, no matter what definition I'm using, I should bet as if my head size didn't affect anthropic probabilities. So to some extent the real point of this effect is that it is a way anthropic probabilities can be ill-defined. On the other hand, what about preferences that depend directly on person-numbers like how to value people with different head sizes? Or for vegetarians, should we care more about cows than chickens, because each cow is more animals than a chicken is?


According to my common sense, it seems like my body has just one person in it. Why does my common sense think that? I think there are two answers, one unhelpful and one helpful.

The first answer is evolution. Having kids is an action that's independent of what physical system we identify with "I," and so my ancestors never found modeling their bodies as being multiple people useful.

The second answer is causality. Manfred One and Manfred Two are causally distinct from two copies of me in separate bodies but the same input/output. If a difference between the two separated copies arose somehow, (reminiscent of Dennett's factual account) henceforth the two bodies would do and say different things and have different brain states. But if some difference arises between Manfred One and Manfred Two, it is erased by diffusion.

Which is to say, the map that is Manfred One is statically the same pattern as my whole brain, but it's causally different. So is "I" the pattern, or is "I" the causal system? 

In this sort of situation I am happy to stick with common sense, and thus when I say me, I think the causal system is referring to the causal system. But I'm not very sure.


Going back to the Ebborians, one interesting thing about that post is the conflict between common sense and common sense - it seems like common sense that each Ebborian is equally much one person, but it also seems like common sense that if you looked at an Ebborian dividing, there doesn't seem to be a moment where the amount of subjective experience should change, and so amount of subjective experience should be proportional to thickness. But as it is said, just because there are two opposing ideas doesn't mean one of them is right.

On the questions of subjective experience raised in that post, I think this mostly gets cleared up by precise description an  anthropic narrowness. I'm unsure of the relative sizes of this margin and the proof, but the sketch is to replace a mysterious "subjective experience" that spans copies with individual experiences of people who are using a TDT-like theory to choose so that they individually achieve good outcomes given their existence.

Has LessWrong Ever Backfired On You?

24 Evan_Gaensbauer 15 December 2014 05:44AM

Several weeks ago I wrote a heavily upvoted post called Don't Be Afraid of Asking Personally Important Questions on LessWrong. I thought it would only be due diligence if I tried to track users on LessWrong who have received advice on this site and it's backfired. In other words, to avoid bias in the record, we might notice what LessWrong as a community is bad at giving advice about. So, I'm seeking feedback. If you have anecdotes or data of how a plan or advice directly from LessWrong backfired, failed, or didn't lead to satisfaction, please share below. 

Group Rationality Diary, December 16-31

3 therufs 15 December 2014 03:30AM

This is the public group rationality diary for December 16-31.

It's a place to record and chat about it if you have done, or are actively doing, things like: 

  • Established a useful new habit
  • Obtained new evidence that made you change your mind about some belief
  • Decided to behave in a different way in some set of situations
  • Optimized some part of a common routine or cached behavior
  • Consciously changed your emotions or affect with respect to something
  • Consciously pursued new valuable information about something that could make a big difference in your life
  • Learned something new about your beliefs, behavior, or life that surprised you
  • Tried doing any of the above and failed

Or anything else interesting which you want to share, so that other people can think about it, and perhaps be inspired to take action themselves. Try to include enough details so that everyone can use each other's experiences to learn about what tends to work out, and what doesn't tend to work out.

Thanks to cata for starting the Group Rationality Diary posts, and to commenters for participating.

Previous diary: December 1-15

Rationality diaries archive

Open thread, Dec. 15 - Dec. 21, 2014

2 Gondolinian 15 December 2014 12:01AM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Previous Open Thread

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

Podcast: Rationalists in Tech

12 JoshuaFox 14 December 2014 04:14PM
I'll appreciate feedback on a new podcast, Rationalists in Tech. 

I'm interviewing founders, executives, CEOs, consultants, and other people in the tech sector, mostly software. Thanks to Laurent Bossavit, Daniel Reeves, and Alexei Andreev who agreed to be the guinea pigs for this experiment. 
  • The audience:
Software engineers and other tech workers, at all levels of seniority.
  • The hypothesized need
Some of you are thinking: "I see that some smart and fun people hang out at LessWrong. It's hard to find people like that to work with. I wonder if my next job/employee/cofounder could come from that community."
  • What this podcast does for you
You will get insights into other LessWrongers as real people in the software profession. (OK, you knew that, but this helps.) You will hear the interviewees' ideas on CfAR-style techniques as a productivity booster, on working with other aspiring rationalists, and on the interviewees' own special areas of expertise. (At the same time, interviewees benefit from exposure that can get them business contacts,  employees, or customers.) Software engineers from LW will reach out to interviewees and others in the tech sector, and soon, more hires and startups will emerge. 

Please give your feedback on the first episodes of the podcast. Do you want to hear more? Should there be other topics? A different interview style? Better music?

Discussion of AI control over at worldbuilding.stackexchange [LINK]

6 ike 14 December 2014 02:59AM


Go insert some rationality into the discussion! (There are actually some pretty good comments in there, and some links to the right places, including LW).

[Resolved] Is the SIA doomsday argument wrong?

5 Brian_Tomasik 13 December 2014 06:01AM

[EDIT: I think the SIA doomsday argument works after all, and my objection to it was based on framing the problem in a misguided way. Feel free to ignore this post or skip to the resolution at the end.]


Katja Grace has developed a kind of doomsday argument from SIA combined with the Great Filter. It has been discussed by Robin HansonCarl Shulman, and Nick Bostrom. The basic idea is that if the filter comes late, there are more civilizations with organisms like us than if the filter comes early, and more organisms in positions like ours means a higher expected number of (non-fake) experiences that match ours. (I'll ignore simulation-argument possibilities in this post.)

I used to agree with this reasoning. But now I'm not sure, and here's why. Your subjective experience, broadly construed, includes knowledge of a lot of Earth's history and current state, including when life evolved, which creatures evolved, the Earth's mass and distance from the sun, the chemical composition of the soil and atmosphere, and so on. The information that you know about your planet is sufficient to uniquely locate you within the observable universe. Sure, there might be exact copies of you in vastly distant Hubble volumes, and there might be many approximate copies of Earth in somewhat nearer Hubble volumes. But within any reasonable radius, probably what you know about Earth requires that your subjective experiences (if veridical) could only take place on Earth, not on any other planet in our Hubble volume.

If so, then whether there are lots of human-level extraterrestrials (ETs) or none doesn't matter anthropically, because none of those ETs within any reasonable radius could contain your exact experiences. No matter how hard or easy the emergence of human-like life is in general, it can happen on Earth, and your subjective experiences can only exist on Earth (or some planet almost identical to Earth).

A better way to think about SIA is that it favors hypotheses containing more copies of our Hubble volume within the larger universe. Within a given Hubble volume, there can be at most one location where organisms veridically perceive what we perceive.

Katja's blog post on the SIA doomsday draws orange boxes with humans waving their hands. She has us update on knowing we're in the human-level stage, i.e., that we're one of those orange boxes. But we know much more: We know that we're a particular one of those boxes, which is easily distinguished from the others based on what we observe about the world. So any hypothesis that contains us at all will have the same number of boxes containing us (namely, just one box). Hence, no anthropic update.

Am I missing something? :)



The problem with my argument was that I compared the hypothesis "filter is early and you exist on Earth" against "filter is late and you exist on Earth". If the hypotheses already say that you exist on Earth, then there's no more anthropic work to be done. But the heart of the anthropic question is whether an early or late filter predicts that you exist on Earth at all.

Here's an oversimplified example. Suppose that the hypothesis of "early filter" tells us that there are four planets, exactly one of which contains life. "Late filter" says there are four planets, all of which contain life. Suppose for convenience that if life exists on Earth at all, you will exist on Earth. Then P(you exist | early filter) = 1/4 while P(you exist | late filter) = 1. This is where the doomsday update comes from.

A forum for researchers to publicly discuss safety issues in advanced AI

11 RobbBB 13 December 2014 12:33AM

MIRI has an organizational goal of putting a wider variety of mathematically proficient people in a position to advance our understanding of beneficial smarter-than-human AI. The MIRIx workshops, our new research guide, and our more detailed in-the-works technical agenda are intended to further that goal.

To encourage the growth of a larger research community where people can easily collaborate and get up to speed on each other's new ideas, we're also going to roll out an online discussion forum that's specifically focused on resolving technical problems in Friendly AI. MIRI researchers and other interested parties will be able to have more open exchanges there, and get rapid feedback on their ideas and drafts. A relatively small group of people with relevant mathematical backgrounds will be authorized to post on the forum, but all discussion on the site will be publicly visible to visitors.

Topics will run the gamut from logical uncertainty in formal agents to cognitive models of concept generation. The exact range of discussion topics is likely to evolve over time as researchers' priorities change and new researchers join the forum.

We're currently tossing around possible names for the forum, and I wanted to solicit LessWrong's input, since you've been helpful here in the past. (We're also getting input from non-LW mathematicians and computer scientists.) We want to know how confusing, apt, etc. you perceive these variants on 'forum for doing exploratory engineering research in AI' to be:

1. AI Exploratory Research Forum (AIXRF)

2. Forum for Exploratory Engineering in AI (FEEAI)

3. Forum for Exploratory Research in AI (FERAI, or FXRAI)

4. Exploratory AI Research Forum (XAIRF, or EAIRF)

We're also looking at other name possibilities, including:

5. AI Foundations Forum (AIFF)

6. Intelligent Agent Foundations Forum (IAFF)

7. Reflective Agents Research Forum (RARF)

We're trying to avoid names like "friendly" and "normative" that could reinforce someone's impression that we think of AI risk in anthropomorphic terms, that we're AI-hating technophobes, or that we're moral philosophers.

Feedback on the above ideas is welcome, as are new ideas. Feel free to post separate ideas in separate comments, so they can be upvoted individually. We're especially looking for feedback along the lines of: 'I'm a grad student in theoretical computer science and I feel that the name [X] would look bad in a comp sci bibliography or C.V.' or 'I'm friends with a lot of topologists, and I'm pretty sure they'd find the name [Y] unobjectionable and mildly intriguing; I don't know how well that generalizes to mathematical logicians.'

Approval-directed agents

8 paulfchristiano 12 December 2014 10:38PM

Most concern about AI comes down to the scariness of goal-oriented behavior. A common response to such concerns is “why would we give an AI goals anyway?” I think there are good reasons to expect goal-oriented behavior, and I’ve been on that side of a lot of arguments. But I don’t think the issue is settled, and it might be possible to get better outcomes without them. I flesh out one possible alternative here, based on the dictum "take the action I would like best" rather than "achieve the outcome I would like best."

(As an experiment I wrote the post on medium, so that it is easier to provide sentence-level feedback, especially feedback on writing or low-level comments.)

Harper's Magazine article on LW/MIRI/CFAR and Ethereum

37 gwern 12 December 2014 08:34PM

Cover title: “Power and paranoia in Silicon Valley”; article title: “Come with us if you want to live: Among the apocalyptic libertarians of Silicon Valley” (mirrors: 1, 2, 3), by Sam Frank; Harper’s Magazine, January 2015, pg26-36 (~8500 words). The beginning/ending are focused on Ethereum and Vitalik Buterin, so I'll excerpt the LW/MIRI/CFAR-focused middle:

…Blake Masters-the name was too perfect-had, obviously, dedicated himself to the command of self and universe. He did CrossFit and ate Bulletproof, a tech-world variant of the paleo diet. On his Tumblr’s About page, since rewritten, the anti-belief belief systems multiplied, hyperlinked to Wikipedia pages or to the confoundingly scholastic website Less Wrong: “Libertarian (and not convinced there’s irreconcilable fissure between deontological and consequentialist camps). Aspiring rationalist/Bayesian. Secularist/agnostic/ ignostic . . . Hayekian. As important as what we know is what we don’t. Admittedly eccentric.” Then: “Really, really excited to be in Silicon Valley right now, working on fascinating stuff with an amazing team.” I was startled that all these negative ideologies could be condensed so easily into a positive worldview. …I saw the utopianism latent in capitalism-that, as Bernard Mandeville had it three centuries ago, it is a system that manufactures public benefit from private vice. I started CrossFit and began tinkering with my diet. I browsed venal tech-trade publications, and tried and failed to read Less Wrong, which was written as if for aliens.

…I left the auditorium of Alice Tully Hall. Bleary beside the silver coffee urn in the nearly empty lobby, I was buttonholed by a man whose name tag read MICHAEL VASSAR, METAMED research. He wore a black-and-white paisley shirt and a jacket that was slightly too big for him. “What did you think of that talk?” he asked, without introducing himself. “Disorganized, wasn’t it?” A theory of everything followed. Heroes like Elon and Peter (did I have to ask? Musk and Thiel). The relative abilities of physicists and biologists, their standard deviations calculated out loud. How exactly Vassar would save the world. His left eyelid twitched, his full face winced with effort as he told me about his “personal war against the universe.” My brain hurt. I backed away and headed home. But Vassar had spoken like no one I had ever met, and after Kurzweil’s keynote the next morning, I sought him out. He continued as if uninterrupted. Among the acolytes of eternal life, Vassar was an eschatologist. “There are all of these different countdowns going on,” he said. “There’s the countdown to the broad postmodern memeplex undermining our civilization and causing everything to break down, there’s the countdown to the broad modernist memeplex destroying our environment or killing everyone in a nuclear war, and there’s the countdown to the modernist civilization learning to critique itself fully and creating an artificial intelligence that it can’t control. There are so many different - on different time-scales - ways in which the self-modifying intelligent processes that we are embedded in undermine themselves. I’m trying to figure out ways of disentangling all of that. . . .I’m not sure that what I’m trying to do is as hard as founding the Roman Empire or the Catholic Church or something. But it’s harder than people’s normal big-picture ambitions, like making a billion dollars.” Vassar was thirty-four, one year older than I was. He had gone to college at seventeen, and had worked as an actuary, as a teacher, in nanotech, and in the Peace Corps. He’d founded a music-licensing start-up called Sir Groovy. Early in 2012, he had stepped down as president of the Singularity Institute for Artificial Intelligence, now called the Machine Intelligence Research Institute (MIRI), which was created by an autodidact named Eliezer Yudkowsky, who also started Less Wrong. Vassar had left to found MetaMed, a personalized-medicine company, with Jaan Tallinn of Skype and Kazaa, $500,000 from Peter Thiel, and a staff that included young rationalists who had cut their teeth arguing on Yudkowsky’s website. The idea behind MetaMed was to apply rationality to medicine-“rationality” here defined as the ability to properly research, weight, and synthesize the flawed medical information that exists in the world. Prices ranged from $25,000 for a literature review to a few hundred thousand for a personalized study. “We can save lots and lots and lots of lives,” Vassar said (if mostly moneyed ones at first). “But it’s the signal-it’s the ‘Hey! Reason works!’-that matters. . . . It’s not really about medicine.” Our whole society was sick - root, branch, and memeplex - and rationality was the only cure. …I asked Vassar about his friend Yudkowsky. “He has worse aesthetics than I do,” he replied, “and is actually incomprehensibly smart.” We agreed to stay in touch.

One month later, I boarded a plane to San Francisco. I had spent the interim taking a second look at Less Wrong, trying to parse its lore and jargon: “scope insensitivity,” “ugh field,” “affective death spiral,” “typical mind fallacy,” “counterfactual mugging,” “Roko’s basilisk.” When I arrived at the MIRI offices in Berkeley, young men were sprawled on beanbags, surrounded by whiteboards half black with equations. I had come costumed in a Fermat’s Last Theorem T-shirt, a summary of the proof on the front and a bibliography on the back, printed for the number-theory camp I had attended at fifteen. Yudkowsky arrived late. He led me to an empty office where we sat down in mismatched chairs. He wore glasses, had a short, dark beard, and his heavy body seemed slightly alien to him. I asked what he was working on. “Should I assume that your shirt is an accurate reflection of your abilities,” he asked, “and start blabbing math at you?” Eight minutes of probability and game theory followed. Cogitating before me, he kept grimacing as if not quite in control of his face. “In the very long run, obviously, you want to solve all the problems associated with having a stable, self-improving, beneficial-slash-benevolent AI, and then you want to build one.” What happens if an artificial intelligence begins improving itself, changing its own source code, until it rapidly becomes - foom! is Yudkowsky’s preferred expression - orders of magnitude more intelligent than we are? A canonical thought experiment devised by Oxford philosopher Nick Bostrom in 2003 suggests that even a mundane, industrial sort of AI might kill us. Bostrom posited a “superintelligence whose top goal is the manufacturing of paper-clips.” For this AI, known fondly on Less Wrong as Clippy, self-improvement might entail rearranging the atoms in our bodies, and then in the universe - and so we, and everything else, end up as office supplies. Nothing so misanthropic as Skynet is required, only indifference to humanity. What is urgently needed, then, claims Yudkowsky, is an AI that shares our values and goals. This, in turn, requires a cadre of highly rational mathematicians, philosophers, and programmers to solve the problem of “friendly” AI - and, incidentally, the problem of a universal human ethics - before an indifferent, unfriendly AI escapes into the wild.

Among those who study artificial intelligence, there’s no consensus on either point: that an intelligence explosion is possible (rather than, for instance, a proliferation of weaker, more limited forms of AI) or that a heroic team of rationalists is the best defense in the event. That MIRI has as much support as it does (in 2012, the institute’s annual revenue broke $1 million for the first time) is a testament to Yudkowsky’s rhetorical ability as much as to any technical skill. Over the course of a decade, his writing, along with that of Bostrom and a handful of others, has impressed the dangers of unfriendly AI on a growing number of people in the tech world and beyond. In August, after reading Superintelligence, Bostrom’s new book, Elon Musk tweeted, “Hope we’re not just the biological boot loader for digital superintelligence. Unfortunately, that is increasingly probable.” In 2000, when Yudkowsky was twenty, he founded the Singularity Institute with the support of a few people he’d met at the Foresight Institute, a Palo Alto nanotech think tank. He had already written papers on “The Plan to Singularity” and “Coding a Transhuman AI,” and posted an autobiography on his website, since removed, called “Eliezer, the Person.” It recounted a breakdown of will when he was eleven and a half: “I can’t do anything. That’s the phrase I used then.” He dropped out before high school and taught himself a mess of evolutionary psychology and cognitive science. He began to “neuro-hack” himself, systematizing his introspection to evade his cognitive quirks. Yudkowsky believed he could hasten the singularity by twenty years, creating a superhuman intelligence and saving humankind in the process. He met Thiel at a Foresight Institute dinner in 2005 and invited him to speak at the first annual Singularity Summit. The institute’s paid staff grew. In 2006, Yudkowsky began writing a hydra-headed series of blog posts: science-fictionish parables, thought experiments, and explainers encompassing cognitive biases, self-improvement, and many-worlds quantum mechanics that funneled lay readers into his theory of friendly AI. Rationality workshops and Meetups began soon after. In 2009, the blog posts became what he called Sequences on a new website: Less Wrong. The next year, Yudkowsky began publishing Harry Potter and the Methods of Rationality at fanfiction.net. The Harry Potter category is the site’s most popular, with almost 700,000 stories; of these, HPMoR is the most reviewed and the second-most favorited. The last comment that the programmer and activist Aaron Swartz left on Reddit before his suicide in 2013 was on /r/hpmor. In Yudkowsky’s telling, Harry is not only a magician but also a scientist, and he needs just one school year to accomplish what takes canon-Harry seven. HPMoR is serialized in arcs, like a TV show, and runs to a few thousand pages when printed; the book is still unfinished. Yudkowsky and I were talking about literature, and Swartz, when a college student wandered in. Would Eliezer sign his copy of HPMoR? “But you have to, like, write something,” he said. “You have to write, ‘I am who I am.’ So, ‘I am who I am’ and then sign it.” “Alrighty,” Yudkowsky said, signed, continued. “Have you actually read Methods of Rationality at all?” he asked me. “I take it not.” (I’d been found out.) “I don’t know what sort of a deadline you’re on, but you might consider taking a look at that.” (I had taken a look, and hated the little I’d managed.) “It has a legendary nerd-sniping effect on some people, so be warned. That is, it causes you to read it for sixty hours straight.”

The nerd-sniping effect is real enough. Of the 1,636 people who responded to a 2013 survey of Less Wrong’s readers, one quarter had found the site thanks to HPMoR, and many more had read the book. Their average age was 27.4, their average IQ 138.2. Men made up 88.8% of respondents; 78.7% were straight, 1.5% transgender, 54.7 % American, 89.3% atheist or agnostic. The catastrophes they thought most likely to wipe out at least 90% of humanity before the year 2100 were, in descending order, pandemic (bioengineered), environmental collapse, unfriendly AI, nuclear war, pandemic (natural), economic/political collapse, asteroid, nanotech/gray goo. Forty-two people, 2.6 %, called themselves futarchists, after an idea from Robin Hanson, an economist and Yudkowsky’s former coblogger, for reengineering democracy into a set of prediction markets in which speculators can bet on the best policies. Forty people called themselves reactionaries, a grab bag of former libertarians, ethno-nationalists, Social Darwinists, scientific racists, patriarchists, pickup artists, and atavistic “traditionalists,” who Internet-argue about antidemocratic futures, plumping variously for fascism or monarchism or corporatism or rule by an all-powerful, gold-seeking alien named Fnargl who will free the markets and stabilize everything else. At the bottom of each year’s list are suggestive statistical irrelevancies: “every optimizing system’s a dictator and i’m not sure which one i want in charge,” “Autocracy (important: myself as autocrat),” “Bayesian (aspiring) Rationalist. Technocratic. Human-centric Extropian Coherent Extrapolated Volition.” “Bayesian” refers to Bayes’s Theorem, a mathematical formula that describes uncertainty in probabilistic terms, telling you how much to update your beliefs when given new information. This is a formalization and calibration of the way we operate naturally, but “Bayesian” has a special status in the rationalist community because it’s the least imperfect way to think. “Extropy,” the antonym of “entropy,” is a decades-old doctrine of continuous human improvement, and “coherent extrapolated volition” is one of Yudkowsky’s pet concepts for friendly artificial intelligence. Rather than our having to solve moral philosophy in order to arrive at a complete human goal structure, C.E.V. would computationally simulate eons of moral progress, like some kind of Whiggish Pangloss machine. As Yudkowsky wrote in 2004, “In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together.” Yet can even a single human’s volition cohere or compute in this way, let alone humanity’s? We stood up to leave the room. Yudkowsky stopped me and said I might want to turn my recorder on again; he had a final thought. “We’re part of the continuation of the Enlightenment, the Old Enlightenment. This is the New Enlightenment,” he said. “Old project’s finished. We actually have science now, now we have the next part of the Enlightenment project.”

In 2013, the Singularity Institute changed its name to the Machine Intelligence Research Institute. Whereas MIRI aims to ensure human-friendly artificial intelligence, an associated program, the Center for Applied Rationality, helps humans optimize their own minds, in accordance with Bayes’s Theorem. The day after I met Yudkowsky, I returned to Berkeley for one of CFAR’s long-weekend workshops. The color scheme at the Rose Garden Inn was red and green, and everything was brocaded. The attendees were mostly in their twenties: mathematicians, software engineers, quants, a scientist studying soot, employees of Google and Facebook, an eighteen-year-old Thiel Fellow who’d been paid $100,000 to leave Boston College and start a company, professional atheists, a Mormon turned atheist, an atheist turned Catholic, an Objectivist who was photographed at the premiere of Atlas Shrugged II: The Strike. There were about three men for every woman. At the Friday-night meet and greet, I talked with Benja, a German who was studying math and behavioral biology at the University of Bristol, whom I had spotted at MIRI the day before. He was in his early thirties and quite tall, with bad posture and a ponytail past his shoulders. He wore socks with sandals, and worried a paper cup as we talked. Benja had felt death was terrible since he was a small child, and wanted his aging parents to sign up for cryonics, if he could figure out how to pay for it on a grad-student stipend. He was unsure about the risks from unfriendly AI - “There is a part of my brain,” he said, “that sort of goes, like, ‘This is crazy talk; that’s not going to happen’” - but the probabilities had persuaded him. He said there was only about a 30% chance that we could make it another century without an intelligence explosion. He was at CFAR to stop procrastinating. Julia Galef, CFAR’s president and cofounder, began a session on Saturday morning with the first of many brain-as-computer metaphors. We are “running rationality on human hardware,” she said, not supercomputers, so the goal was to become incrementally more self-reflective and Bayesian: not perfectly rational agents, but “agent-y.” The workshop’s classes lasted six or so hours a day; activities and conversations went well into the night. We got a condensed treatment of contemporary neuroscience that focused on hacking our brains’ various systems and modules, and attended sessions on habit training, urge propagation, and delegating to future selves. We heard a lot about Daniel Kahneman, the Nobel Prize-winning psychologist whose work on cognitive heuristics and biases demonstrated many of the ways we are irrational. Geoff Anders, the founder of Leverage Research, a “meta-level nonprofit” funded by Thiel, taught a class on goal factoring, a process of introspection that, after many tens of hours, maps out every one of your goals down to root-level motivations-the unchangeable “intrinsic goods,” around which you can rebuild your life. Goal factoring is an application of Connection Theory, Anders’s model of human psychology, which he developed as a Rutgers philosophy student disserting on Descartes, and Connection Theory is just the start of a universal renovation. Leverage Research has a master plan that, in the most recent public version, consists of nearly 300 steps. It begins from first principles and scales up from there: “Initiate a philosophical investigation of philosophical method”; “Discover a sufficiently good philosophical method”; have 2,000-plus “actively and stably benevolent people successfully seek enough power to be able to stably guide the world”; “People achieve their ultimate goals as far as possible without harming others”; “We have an optimal world”; “Done.” On Saturday night, Anders left the Rose Garden Inn early to supervise a polyphasic-sleep experiment that some Leverage staff members were conducting on themselves. It was a schedule called the Everyman 3, which compresses sleep into three twenty-minute REM naps each day and three hours at night for slow-wave. Anders was already polyphasic himself. Operating by the lights of his own best practices, goal-factored, coherent, and connected, he was able to work 105 hours a week on world optimization. For the rest of us, for me, these were distant aspirations. We were nerdy and unperfected. There was intense discussion at every free moment, and a genuine interest in new ideas, if especially in testable, verifiable ones. There was joy in meeting peers after years of isolation. CFAR was also insular, overhygienic, and witheringly focused on productivity. Almost everyone found politics to be tribal and viscerally upsetting. Discussions quickly turned back to philosophy and math. By Monday afternoon, things were wrapping up. Andrew Critch, a CFAR cofounder, gave a final speech in the lounge: “Remember how you got started on this path. Think about what was the time for you when you first asked yourself, ‘How do I work?’ and ‘How do I want to work?’ and ‘What can I do about that?’ . . . Think about how many people throughout history could have had that moment and not been able to do anything about it because they didn’t know the stuff we do now. I find this very upsetting to think about. It could have been really hard. A lot harder.” He was crying. “I kind of want to be grateful that we’re now, and we can share this knowledge and stand on the shoulders of giants like Daniel Kahneman . . . I just want to be grateful for that. . . . And because of those giants, the kinds of conversations we can have here now, with, like, psychology and, like, algorithms in the same paragraph, to me it feels like a new frontier. . . . Be explorers; take advantage of this vast new landscape that’s been opened up to us in this time and this place; and bear the torch of applied rationality like brave explorers. And then, like, keep in touch by email.” The workshop attendees put giant Post-its on the walls expressing the lessons they hoped to take with them. A blue one read RATIONALITY IS SYSTEMATIZED WINNING. Above it, in pink: THERE ARE OTHER PEOPLE WHO THINK LIKE ME. I AM NOT ALONE.

That night, there was a party. Alumni were invited. Networking was encouraged. Post-its proliferated; one, by the beer cooler, read SLIGHTLY ADDICTIVE. SLIGHTLY MIND-ALTERING. Another, a few feet to the right, over a double stack of bound copies of Harry Potter and the Methods of Rationality: VERY ADDICTIVE. VERY MIND-ALTERING. I talked to one of my roommates, a Google scientist who worked on neural nets. The CFAR workshop was just a whim to him, a tourist weekend. “They’re the nicest people you’d ever meet,” he said, but then he qualified the compliment. “Look around. If they were effective, rational people, would they be here? Something a little weird, no?” I walked outside for air. Michael Vassar, in a clinging red sweater, was talking to an actuary from Florida. They discussed timeless decision theory (approximately: intelligent agents should make decisions on the basis of the futures, or possible worlds, that they predict their decisions will create) and the simulation argument (essentially: we’re living in one), which Vassar traced to Schopenhauer. He recited lines from Kipling’s “If-” in no particular order and advised the actuary on how to change his life: Become a pro poker player with the $100k he had in the bank, then hit the Magic: The Gathering pro circuit; make more money; develop more rationality skills; launch the first Costco in Northern Europe. I asked Vassar what was happening at MetaMed. He told me that he was raising money, and was in discussions with a big HMO. He wanted to show up Peter Thiel for not investing more than $500,000. “I’m basically hoping that I can run the largest convertible-debt offering in the history of finance, and I think it’s kind of reasonable,” he said. “I like Peter. I just would like him to notice that he made a mistake . . . I imagine a hundred million or a billion will cause him to notice . . . I’d like to have a pi-billion-dollar valuation.” I wondered whether Vassar was drunk. He was about to drive one of his coworkers, a young woman named Alyssa, home, and he asked whether I would join them. I sat silently in the back of his musty BMW as they talked about potential investors and hires. Vassar almost ran a red light. After Alyssa got out, I rode shotgun, and we headed back to the hotel.

It was getting late. I asked him about the rationalist community. Were they really going to save the world? From what? “Imagine there is a set of skills,” he said. “There is a myth that they are possessed by the whole population, and there is a cynical myth that they’re possessed by 10% of the population. They’ve actually been wiped out in all but about one person in three thousand.” It is important, Vassar said, that his people, “the fragments of the world,” lead the way during “the fairly predictable, fairly total cultural transition that will predictably take place between 2020 and 2035 or so.” We pulled up outside the Rose Garden Inn. He continued: “You have these weird phenomena like Occupy where people are protesting with no goals, no theory of how the world is, around which they can structure a protest. Basically this incredibly, weirdly, thoroughly disempowered group of people will have to inherit the power of the world anyway, because sooner or later everyone older is going to be too old and too technologically obsolete and too bankrupt. The old institutions may largely break down or they may be handed over, but either way they can’t just freeze. These people are going to be in charge, and it would be helpful if they, as they come into their own, crystallize an identity that contains certain cultural strengths like argument and reason.” I didn’t argue with him, except to press, gently, on his particular form of elitism. His rationalism seemed so limited to me, so incomplete. “It is unfortunate,” he said, “that we are in a situation where our cultural heritage is possessed only by people who are extremely unappealing to most of the population.” That hadn’t been what I’d meant. I had meant rationalism as itself a failure of the imagination. “The current ecosystem is so totally fucked up,” Vassar said. “But if you have conversations here”-he gestured at the hotel-“people change their mind and learn and update and change their behaviors in response to the things they say and learn. That never happens anywhere else.” In a hallway of the Rose Garden Inn, a former high-frequency trader started arguing with Vassar and Anna Salamon, CFAR’s executive director, about whether people optimize for hedons or utilons or neither, about mountain climbers and other high-end masochists, about whether world happiness is currently net positive or negative, increasing or decreasing. Vassar was eating and drinking everything within reach. My recording ends with someone saying, “I just heard ‘hedons’ and then was going to ask whether anyone wants to get high,” and Vassar replying, “Ah, that’s a good point.” Other voices: “When in California . . .” “We are in California, yes.”

…Back on the East Coast, summer turned into fall, and I took another shot at reading Yudkowsky’s Harry Potter fanfic. It’s not what I would call a novel, exactly, rather an unending, self-satisfied parable about rationality and trans-humanism, with jokes.

…I flew back to San Francisco, and my friend Courtney and I drove to a cul-de-sac in Atherton, at the end of which sat the promised mansion. It had been repurposed as cohousing for children who were trying to build the future: start-up founders, singularitarians, a teenage venture capitalist. The woman who coined the term “open source” was there, along with a Less Wronger and Thiel Capital employee who had renamed himself Eden. The Day of the Idealist was a day for self-actualization and networking, like the CFAR workshop without the rigor. We were to set “mega goals” and pick a “core good” to build on in the coming year. Everyone was a capitalist; everyone was postpolitical. I squabbled with a young man in a Tesla jacket about anti-Google activism. No one has a right to housing, he said; programmers are the people who matter; the protesters’ antagonistic tactics had totally discredited them.

…Thiel and Vassar and Yudkowsky, for all their far-out rhetoric, take it on faith that corporate capitalism, unchecked just a little longer, will bring about this era of widespread abundance. Progress, Thiel thinks, is threatened mostly by the political power of what he calls the “unthinking demos.”

Pointer thanks to /u/Vulture.

Weekly LW Meetups

3 FrankAdamek 12 December 2014 05:06PM

This summary was posted to LW Main on December 5th. The following week's summary is here.

Irregularly scheduled Less Wrong meetups are taking place in:

The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:

Locations with regularly scheduled meetups: Austin, Berkeley, Berlin, Boston, Brussels, Buffalo, Cambridge UK, Canberra, Columbus, London, Madison WI, Melbourne, Moscow, Mountain View, New York, Philadelphia, Research Triangle NC, Seattle, Sydney, Toronto, Vienna, Washington DC, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers.

continue reading »

Robin Hanson talking about Bias on Stossel tonight

3 buybuydandavis 12 December 2014 05:15AM


Stossel has a page with full episodes. I don't know when it will show there. Hanson was the first guest, and was done by the 12 minute mark.




What Peter Thiel thinks about AI risk

12 Dr_Manhattan 11 December 2014 09:22PM

This is probably the clearest statement from him on the issue:


25:30 mins in


TL;DR: he thinks its an issue but also feels AGI is very distant and hence less worried about it than Musk.


I recommend the rest of the lecture as well, it's a good summary of "Zero to One"  and a good QA afterwards.

Make your own cost-effectiveness Fermi estimates for one-off problems

8 owencb 11 December 2014 12:00PM

In some recent work (particularly this article) I built models for estimating the cost effectiveness of work on problems when we don’t know how hard those problems are. The estimates they produce aren’t perfect, but they can get us started where it’s otherwise hard to make comparisons.

Now I want to know: what can we use this technique on? I have a couple of applications I am working on, but I’m keen to see what estimates other people produce.

There are complicated versions of the model which account for more factors, but we can start with a simple version. This is a tool for initial Fermi calculations: it’s relatively easy to use but should get us around the right order of magnitude. That can be very useful, and we can build more detailed models for the most promising opportunities.

The model is given by:


This expresses the expected benefit of adding another unit of resources to solving the problem. You can denominate the resources in dollars, researcher-years, or another convenient unit. To use this formula we need to estimate four variables:

  • R(0) denotes the current resources going towards the problem each year. Whatever units you measure R(0) in, those are the units we’ll get an estimate for the benefit of. So if R(0) is measured in researcher-years, the formula will tell us the expected benefit of adding a researcher year.

    • You want to count all of the resources going towards the problem. That includes the labour of those who work on it in their spare time, and some weighting for the talent of the people working in the area (if you doubled the budget going to an area, you couldn’t get twice as many people who are just as good; ideally we’d use an elasticity here).

    • Some resources may be aimed at something other than your problem, but be tangentially useful. We should count some fraction of those, according to how much resources devoted entirely to the problem they seem equivalent to.

  • B is the annual benefit that we’d get from a solution to the problem. You can measure this in its own units, but whatever you use here will be the units of value that come out in the cost-effectiveness estimate.

  • p and y/z are parameters that we will estimate together. p is the probability of getting a solution by the time y resources have been dedicated to the problem, if z resources have been dedicated so far. Note that we only need the ratio y/z, so we can estimate this directly.

    • Although y/z is hard to estimate, we will take a (natural) logarithm of it, so don’t worry too much about making this term precise.

    • I think it will often be best to use middling values of p, perhaps between 0.2 and 0.8.

And that’s it.

Example: How valuable is extra research into nuclear fusion? Assume:

  • R(0) = $5 billion (after a quick google turns up $1.5B for current spending, and adjusting upwards to account for non-financial inputs);

  • B = $1000 billion (guesswork, a bit over 1% of the world economy; a fraction of the current energy sector);

  • There’s a 50% chance of success (p = 0.5) by the time we’ve spent 100 times as many resources as today (log(y/z) = log(100) = 4.6).

Putting these together would give an expected societal benefit of (0.5*$1000B)/(5B*4.6) = $22 for every dollar spent. This is high enough to suggest that we may be significantly under-investing in fusion, and that a more careful calculation (with better-researched numbers!) might be justified.


To get the simple formula, the model made a number of assumptions. Since we’re just using it to get rough numbers, it’s okay if we don’t fit these assumptions exactly, but if they’re totally off then the model may be inappropriate. One restriction in particular I’d want to bear in mind:

  • It should be plausible that we could solve the problem in the next decade or two.

It’s okay if this is unlikely, but I’d want to change the model if I were estimating the value of e.g. trying to colonise the stars.

Request for applications

So -- what would you like to apply this method to? What answers do you get?

To help structure the comment thread, I suggest attempting only one problem in each  comment. Include the value of p, and the units of R(0) and units of B that you’d like to use. Then you can give your estimates for R(0), B, and y/z as a comment reply, and so can anyone else who wants to give estimates for the same thing.

I’ve also set up a google spreadsheet where we can enter estimates for the questions people propose. For the time being anyone can edit this.

Have fun!

[link] Etzioni: AI will empower us, not exterminate us

4 CBHacking 11 December 2014 08:51AM


(Slashdot discussion: http://tech.slashdot.org/story/14/12/10/1719232/ai-expert-ai-wont-exterminate-us----it-will-empower-us)

Not sure what the local view of Oren Etzioni or the Allen Institute for AI is, but I'm curious what people think if his views on UFAI risk. As far as I can tell from this article, it basically boils down to "AGI won't happen, at least not any time soon." Is there (significant) reason to believe he's wrong, or is it simply too great a risk to leave to chance?

Cognitive distortions of founders

3 Dr_Manhattan 11 December 2014 03:19AM

Interesting take on entrepreneurial success:


More in depth here:


Curious what people here think of this. 


Lifehack Ideas December 2014

9 Gondolinian 10 December 2014 12:21AM
Life hacking refers to any trick, shortcut, skill, or novelty method that increases productivity and efficiency, in all walks of life.



This thread is for posting any promising or interesting ideas for lifehacks you've come up with or heard of.  If you've implemented your idea, please share the results.  You are also encouraged to post lifehack ideas you've tried out that have not been successful, and why you think they weren't.  If you can, please give credit for ideas that you got from other people.

To any future posters of Lifehack Ideas threads, please remember to add the "lifehacks_thread" tag.

The Limits of My Rationality

1 JoshuaMyer 09 December 2014 09:08PM

As requested here is an introductory abstract.

The search for bias in the linguistic representations of our cognitive processes serves several purposes in this community. By pruning irrational thoughts, we can potentially effect each other in complex ways. Leaning heavy on cognitivist pedagogy, this essay represents my subjective experience trying to reconcile a perceived conflict between the rhetorical goals of the community and the absence of a generative, organic conceptualization of rationality.

The Story

    Though I've only been here a short time, I find myself fascinated by this discourse community. To discover a group of individuals bound together under the common goal of applied rationality has been an experience that has enriched my life significantly. So please understand, I do not mean to insult by what I am about to say, merely to encourage a somewhat more constructive approach to what I understand as the goal of this community: to apply collectively reinforced notions of rational thought to all areas of life.
    As I followed the links and read the articles on the homepage, I found myself somewhat disturbed by the juxtaposition of these highly specific definitions of biases to the narrative structures of parables providing examples in which a bias results in an incorrect conclusion. At first, I thought that perhaps my emotional reaction stemmed from rejecting the unfamiliar; naturally, I decided to learn more about the situation.

    As I read on, my interests drifted from the rhetorical structure of each article (if anyone is interested I might pursue an analysis of rhetoric further though I'm not sure I see a pressing need for this), towards the mystery of how others in the community apply the lessons contained therein. My belief was that the parables would cause most readers to form a negative association of the bias with an undesirable outcome.

    Even a quick skim of the discussions taking place on this site will reveal energetic debate on a variety of topics of potential importance, peppered heavily with accusations of bias. At this point, I noticed the comments that seem to get voted up are ones that are thoughtfully composed, well informed, soundly conceptualized and appropriately referential. Generally, this is true of the articles as well, and so it should be in productive discourse communities. Though I thought it prudent to not read every conversation in absolute detail, I also noticed that the most participated in lines of reasoning were far more rhetorically complex than the parables' portrayal of bias alone could explain. Sure the establishment of bias still seemed to represent the most commonly used rhetorical device on the forums ...

    At this point, I had been following a very interesting discussion on this site about politics. I typically have little or no interest in political theory, but "NRx" vs. "Prog" Assumptions: Locating the Sources of Disagreement Between Neoreactionaries and Progressives (Part 1) seemed so out of place in a community whose political affiliations might best be summarized the phrase "politics is the mind killer" that I couldn't help but investigate. More specifically, I was trying to figure out why it had been posted here at all (I didn't take issue with either the scholarship or intent of the article, but the latter wasn't obvious to me, perhaps because I was completely unfamiliar with the coinage "neoreactionary").

    On my third read, I made a connection to an essay about the socio-historical foundations of rhetoric. In structure, the essay progressed through a wide variety of specific observations on both theory and practice of rhetoric in classical Europe, culminating in a well argued but very unwieldy thesis; at some point in the middle of the essay, I recall a paragraph that begins with the assertion that every statement has political dimensions. I conveyed this idea as eloquently as I could muster, and received a fair bit of karma for it. And to think that it all began with a vague uncomfortable feeling and a desire to understand!

The Lesson

    So you are probably wondering what any of this has to do with rationality, cognition, or the promise of some deeply insightful transformative advice mentioned in the first paragraph. Very good.

    Cognition, a prerequisite for rationality, is a complex process; cognition can be described as the process by which ideas form, interact and evolve. Notice that this definition alone cannot explain how concepts like rationality form, why ideas form or how they should interact to produce intelligence. That specific shortcoming has long crippled cognitivist pedagogies in many disciplines -- no matter which factors you believe to determine intelligence, it is undeniably true that the process by which it occurs organically is not well-understood.

    More intricate models of cognition traditionally vary according to the sets of behavior they seek to explain; in general, this forum seems to concern itself with the wider sets of human behavior, with a strange affinity for statistical analysis. It also seems as if most of the people here associate agency with intelligence, though this should be regarded as unsubstantiated anecdote; I have little interest in what people believe, but those beliefs can have interesting consequences. In general, good models of cognition that yield a sense of agency have to be able to explain how a mushy organic collection of cells might become capable of generating a sense of identity. For this reason, our discussion of cognition will treat intelligence as a confluence of passive processes that lead to an approximation of agency.

    Who are we? What is intelligence? To answer these or any natural language questions we first search for stored-solutions to whatever we perceive as the problem, even as we generate our conception of the question as a set of abstract problems from interactions between memories. In the absence of recognizing a pattern that triggers a stored solution, a new solution is generated by processes of association and abstraction. This process may be central to the generation of every rational and irrational thought a human will ever have. I would argue that the phenomenon of agency approximates an answer to the question: "who am I?" and that any discussion of consciousness should at least acknowledge how critical natural language use is to universal agreement on any matter. I will gladly discuss this matter further and in greater detail if asked.

    At this point, I feel compelled to mention that my initial motivation for pursuing this line of reasoning stems from the realization that this community discusses rationality in a way that differs somewhat from my past encounters with the word.

    Out there, it is commonly believed that rationality develops (in hindsight) to explain the subjective experience of cognition; here we assert a fundamental difference between rationality and this other concept called rationalization. I do not see the utility of this distinction, nor have I found a satisfying explanation of how this distinction operates within accepted models for human learning in such a way that does not assume an a priori method of sorting the values which determine what is considered "rational". Thus we find there is a general derth of generative models of rational cognition beside a plethora of techniques for spotting irrational or biased methods of thinking.

    I see a lot of discussion on the forums very concerned with objective predictions of the future wherein it seems as if rationality (often of a highly probabilistic nature) is, in many cases, expected to bridge the gap between the worlds we can imagine to be possible and our many somewhat subjective realities. And the force keeping these discussions from splintering off into unproductive pissing about is a constant search for bias.

    I know I'm not going to be the first among us to suggest that the search for bias is not truly synonymous with rationality, but I would like to clarify before concluding. Searching for bias in cognitive processes can be a very productive way to spend one's waking hours, and it is a critical element to structuring the subjective world of cognition in such a way that allows abstraction to yield the kind of useful rules that comprise rationality. But it is not, at its core, a generative process.

    Let us consider the cognitive process of association (when beliefs, memories, stimuli or concepts become connected to form more complex structures). Without that period of extremely associative and biased cognition experienced during early childhood, we might never learn to attribute the perceived cause of a burn to a hot stove. Without concepts like better and worse to shape our young minds, I imagine many of us would simply lack the attention span to learn about ethics. And what about all the biases that make parables an effective way of conveying information? After all, the strength of a rhetorical argument is in it's appeal to the interpretive biases of it's intended audience and not the relative consistency of the conceptual foundations of that argument.

    We need to shift discussions involving bias towards models of cognition more complex than portraying it as simply an obstacle to rationality. In my conception of reality, recognizing the existence of bias seems to play a critical role in the development of more complex methods of abstraction; indeed, biases are an intrinsic side effect of the generative grouping of observations that is the core of Bayesian reasoning.

    In short, biases are not generative processes. Discussions of bias are not necessarily useful, rational or intelligent. A deeper understanding of the nature of intelligence requires conceptualizations that embrace the organic truths at the core of sentience; we must be able to describe our concepts of intelligence, our "rationality", such that it can emerge organically as the generative processes at the core of cognition.

    The Idea

    I'd be interested to hear some thoughts about how we might grow to recognize our own biases as necessary to the formative stages of abstraction alongside learning to collectively search for and eliminate biases from our decision making processes. The human mind is limited and while most discussions in natural language never come close to pressing us to those limits, our limitations can still be relevant to those discussions as well as to discussions of artificial intelligences. The way I see things, a bias free machine possessing a model of our own cognition would either have to have stored solutions for every situation it could encounter or methods of generating stored solutions for all future perceived problems (both of which sound like descriptions of oracles to me, though the latter seems more viable from a programmer's perspective).

    A machine capable of making the kinds of decisions considered "easy" for humans, might need biases at some point during it's journey to the complex and self consistent methods of decision making associated with rationality. This is a rhetorically complex community, but at the risk of my reach exceeding my grasp, I would be interested in seeing an examination of the Affect Heuristic in human decision making as an allegory for the historic utility of fuzzy values in chess AI.

    Thank you for your time, and I look forward to what I can only hope will be challenging and thoughtful responses.

Feedback requested by Intentional Insights on workbook conveying rational thinking about meaning and purpose to a broad audience

5 Gleb_Tsipursky 09 December 2014 07:03PM

We at Intentional Insights would appreciate your help with feedback on optimize a workbook that conveys rational thinking to find meaning and purpose in life for a broad audience. Last time, we asked for your feedback, and we changed our content offerings based on comments we received from fellow Less Wrongers, as you can see from the Edit to this post. We would be glad to update our beliefs again and revise the workbook based on your feedback.

For a bit of context, the workbook is part of our efforts to promote rational thinking to a broad audience and thus raise the sanity waterline. It’s based on research on how other societies besides the United States helped their citizens find meaning and purpose, such as research I did on the Soviet Union and Zuckerman did on Sweden and Denmark. It’s also based on research on the contemporary United States by psychologists such as Steger, Duffy and Dik, Seligman, and others.

The target audience is reason-minded youth and young adults, especially secular-oriented ones. The goal is to get such people to engage with academic research on how our minds work, and thus get them interested in exploring rational thinking more broadly, eventually getting them turned on to more advanced rationality, such as found on Less Wrong itself. The workbook is written in a style aimed to create cognitive ease, with narratives, personal stories, graphics, and research-based exercises.

Here is the link to the workbook draft itself. Any and all suggestions are welcomed, and thanks for taking the time to engage with this workbook and give your feedback – much appreciated!


Does utilitarianism "require" extreme self sacrifice? If not why do people commonly say it does?

5 Princess_Stargirl 09 December 2014 08:32AM

Chist Hallquist wrote the following in an article (if you know the article please, please don't bring it up, I don't want to discuss the article in general):

"For example, utilitarianism apparently endorses killing a single innocent person and harvesting their organs if it will save five other people. It also appears to imply that donating all your money to charity beyond what you need to survive isn’t just admirable but morally obligatory. "

The non-bold part is not what is confusing me. But where does the "obligatory" part come in. I don't really how its obvious what, if any, ethical obligations utilitarianism implies. given a set of basic assumptions utilitarianism lets you argue whether one action is more moral than another. But I don’t see how its obvious which, if any, moral benchmarks utilitarianism sets for “obligatory.” I can see how certain frameworks on top of utilitarianism imply certain moral requirements. But I do not see how the bolded quote is a criticism of the basic theory of utilitarianism.

However this criticism comes up all the time. Honestly the best explanation I could come up with was that people were being unfair to utilitarianism and not thinking through their statements. But the above quote is by HallQ who is intelligent and thoughtful. So now I am genuinely very curious.

Do you think utilitarianism really require such extreme self sacrifice and if so why? And if it does not require this why do so many people say it does? I am very confused and would appreciate help working this out.


I am having trouble asking this question clearly. Since utilitarianism is probably best thought of as a cluster of beliefs. So its not clear what asking "does utilitarianism imply X" actually means. Still I made this post since I am confused. Many thoughtful people identity as utilitarian (for example Ozy and theunitofcaring) yet do not think people have extreme obligations. However I can think of examples where people do not seem to understand the implications of their ethical frameowrks. For example many Jewish people endorse the message of the following story:

Rabbi Hilel was asked to explain the Torah while standing on one foot and responded "What is hateful to you, do not do to your neighbor. That is the whole Torah; the rest is the explanation of this--go and study it!"

The story is presumably apocryphal but it is repeated all the time by Jewish people. However its hard to see how the story makes even a semblance of sense. The torah includes huge amounts of material that violates the "golden Rule" very badly. So people who think this story gives even a moderately accurate picture of the Torah's message are mistaken imo.

An investment analogy for Pascal's Mugging

5 [deleted] 09 December 2014 07:50AM

A lottery ticket sometimes has positive expected value, (a $1 ticket might be expected to pay out $1.30). How many tickets should you buy?

Probably none. Informally, all but the richest players can expect to go broke before they win, despite the positive expected value of a ticket.

In more precise terms: In order to maximize the long-term growth rate of your money (or log money), you'll want to put a very small fraction of your bankroll into lotteries tickets, which will imply an "amount to invest" that is less than the cost of a single ticket, (excluding billionaires). If you put too great a proportion of your resources into a risky but positive expected value asset, the long-term growth rate of your resources can become negative. For an intuitive example, imagine Bill Gates dumping 99% percent of his wealth into a series of positive expected-value bets with single-lottery-ticket-like odds.

This article has some graphs and details on the lottery. This pdf on the Kelly criterion has some examples and general dicussion of this type of problem.

Can we think about Pascal mugging the same way?

The applicability might depend on whether we're trading resource-generating-resources for non-resource-generating assets. So if we're offered something like cash, the lottery ticket model (with payout inversely varying with estimated odds) is a decent fit. But what if we're offered utility in some direct and non-interest-bearing form?

Another limit: For a sufficiency unlikely but positive-expected-value gamble, you can expect the heat death of the universe before actually realizing any of the expected value.


Superintelligence 13: Capability control methods

7 KatjaGrace 09 December 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.

Welcome. This week we discuss the thirteenth section in the reading guide: capability control methods. This corresponds to the start of chapter nine.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Two agency problems” and “Capability control methods” from Chapter 9


  1. If the default outcome is doom, how can we avoid it? (p127)
  2. We can divide this 'control problem' into two parts:
    1. The first principal-agent problem: the well known problem faced by a sponsor wanting an employee to fulfill their wishes (usually called 'the principal agent problem')
    2. The second principal-agent problem: the emerging problem of a developer wanting their AI to fulfill their wishes
  3. How to solve second problem? We can't rely on behavioral observation (as seen in week 11). Two other options are 'capability control methods' and 'motivation selection methods'. We see the former this week, and the latter next week.
  4. Capability control methods: avoiding bad outcomes through limiting what an AI can do. (p129)
  5. Some capability control methods:
    1. Boxing: minimize interaction between the AI and the outside world. Note that the AI must interact with the world to be useful, and that it is hard to eliminate small interactions. (p129)
    2. Incentive methods: set up the AI's environment such that it is in the AI's interest to cooperate. e.g. a social environment with punishment or social repercussions often achieves this for contemporary agents. One could also design a reward system, perhaps with cryptographic rewards (so that the AI could not wirehead) or heavily discounted rewards (so that long term plans are not worth the short term risk of detection) (p131)
      • Anthropic capture: an AI thinks it might be in a simulation, and so tries to behave as will be rewarded by simulators (box 8; p134)
    3. Stunting: limit the AI's capabilities. This may be hard to do to a degree that avoids danger and is still useful. An option here is to limit the AI's information. A strong AI may infer much from little apparent access to information however. (p135)
    4. Tripwires: test the system without its knowledge, and shut it down if it crosses some boundary. This might be combined with 'honey pots' to attract undesirable AIs take an action that would reveal them. Tripwires could test behavior, ability, or content. (p137)

Another view

Brian Clegg reviews the book mostly favorably, but isn't convinced that controlling an AI via merely turning it off should be so hard:

I also think a couple of the fundamentals aren’t covered well enough, but pretty much assumed. One is that it would be impossible to contain and restrict such an AI. Although some effort is put into this, I’m not sure there is enough thought put into the basics of ways you can pull the plug manually – if necessary by shutting down the power station that provides the AI with electricity.
Kevin Kelly also apparently doubts that AI will substantially impede efforts to modify it:

...We’ll reprogram the AIs if we are not satisfied with their performance...

...This is an engineering problem. So far as I can tell, AIs have not yet made a decision that its human creators have regretted. If they do (or when they do), then we change their algorithms. If AIs are making decisions that our society, our laws, our moral consensus, or the consumer market, does not approve of, we then should, and will, modify the principles that govern the AI, or create better ones that do make decisions we approve. Of course machines will make “mistakes,” even big mistakes – but so do humans. We keep correcting them. There will be tons of scrutiny on the actions of AI, so the world is watching. However, we don’t have universal consensus on what we find appropriate, so that is where most of the friction about them will come from. As we decide, our AI will decide...

This may be related to his view that AI is unlikely to modify itself (from further down the same page):

3. Reprogramming themselves, on their own, is the least likely of many scenarios.

The great fear pumped up by some, though, is that as AI gain our confidence in making decisions, they will somehow prevent us from altering their decisions. The fear is they lock us out. They go rogue. It is very difficult to imagine how this happens. It seems highly improbable that human engineers would program an AI so that it could not be altered in any way. That is possible, but so impractical. That hobble does not even serve a bad actor. The usual scary scenario is that an AI will reprogram itself on its own to be unalterable by outsiders. This is conjectured to be a selfish move on the AI’s part, but it is unclear how an unalterable program is an advantage to an AI. It would also be an incredible achievement for a gang of human engineers to create a system that could not be hacked. Still it may be possible at some distant time, but it is only one of many possibilities. An AI could just as likely decide on its own to let anyone change it, in open source mode. Or it could decide that it wanted to merge with human will power. Why not? In the only example we have of an introspective self-aware intelligence (hominids), we have found that evolution seems to have designed our minds to not be easily self-reprogrammable. Except for a few yogis, you can’t go in and change your core mental code easily. There seems to be an evolutionary disadvantage to being able to easily muck with your basic operating system, and it is possible that AIs may need the same self-protection. We don’t know. But the possibility they, on their own, decide to lock out their partners (and doctors) is just one of many possibilities, and not necessarily the most probable one.



1. What do you do with a bad AI once it is under your control?

Note that capability control doesn't necessarily solve much: boxing, stunting and tripwires seem to just stall a superintelligence rather than provide means to safely use one to its full capacity. This leaves the controlled AI to be overtaken by some other unconstrained AI as soon as someone else isn't so careful. In this way, capability control methods seem much like slowing down AI research: helpful in the short term while we find better solutions, but not in itself a solution to the problem.

However this might be too pessimistic. An AI whose capabilities are under control might either be almost as useful as an uncontrolled AI who shares your goals (if interacted with the right way), or at least be helpful in getting to a more stable situation.

Paul Christiano outlines a scheme for safely using an unfriendly AI to solve some kinds of problems. We have both blogged on general methods for getting useful work from adversarial agents, which is related.

2. Cryptographic boxing

Paul Christiano describes a way to stop an AI interacting with the environment using a cryptographic box.

3. Philosophical Disquisitions

Danaher again summarizes the chapter well. Read it if you want a different description of any of the ideas, or to refresh your memory. He also provides a table of the methods presented in this chapter.

4. Some relevant fiction

That Alien Message by Eliezer Yudkowsky

5. Control through social integration

Robin Hanson argues that it matters more that a population of AIs are integrated into our social institutions, and that they keep the peace among themselves through the same institutions we keep the peace among ourselves, than whether they have the right values. He thinks this is why you trust your neighbors, not because you are confident that they have the same values as you. He has several followup posts.

6. More miscellaneous writings on these topics

LessWrong wiki on AI boxingArmstrong et al on controlling and using an oracle AIRoman Yampolskiy on 'leakproofing' the singularity. I have not necessarily read these.


In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.


  1. Choose any control method and work out the details better. For instance:
    1. Could one construct a cryptographic box for an untrusted autonomous system?
    2. Investigate steep temporal discounting as an incentives control method for an untrusted AGI.
  2. Are there other capability control methods we could add to the list?
  3. Devise uses for a malicious but constrained AI.
  4. How much pressure is there likely to be to develop AI which is not controlled?
  5. If existing AI methods had unexpected progress and were heading for human-level soon, what precautions should we take now?


If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about 'motivation selection methods'. To prepare, read “Motivation selection methods” and “Synopsis” from Chapter 9The discussion will go live at 6pm Pacific time next Monday 15th December. Sign up to be notified here.

Stupid Questions December 2014

14 Gondolinian 08 December 2014 03:39PM

This thread is for asking any questions that might seem obvious, tangential, silly or what-have-you. Don't be shy, everyone has holes in their knowledge, though the fewer and the smaller we can make them, the better.

Please be respectful of other people's admitting ignorance and don't mock them for it, as they're doing a noble thing.

To any future monthly posters of SQ threads, please remember to add the "stupid_questions" tag.

[Link] An exact mapping between the Variational Renormalization Group and Deep Learning]

4 Gunnar_Zarncke 08 December 2014 02:33PM

An exact mapping between the Variational Renormalization Group and Deep Learning by Pankaj Mehta, David J. Schwab

Deep learning is a broad set of techniques that uses multiple layers of representation to automatically learn relevant features directly from structured data. Recently, such techniques have yielded record-breaking results on a diverse set of difficult machine learning tasks in computer vision, speech recognition, and natural language processing. Despite the enormous success of deep learning, relatively little is understood theoretically about why these techniques are so successful at feature learning and compression. Here, we show that deep learning is intimately related to one of the most important and successful techniques in theoretical physics, the renormalization group (RG). RG is an iterative coarse-graining scheme that allows for the extraction of relevant features (i.e. operators) as a physical system is examined at different length scales. We construct an exact mapping from the variational renormalization group, first introduced by Kadanoff, and deep learning architectures based on Restricted Boltzmann Machines (RBMs). We illustrate these ideas using the nearest-neighbor Ising Model in one and two-dimensions. Our results suggests that deep learning algorithms may be employing a generalized RG-like scheme to learn relevant features from data.

To me this paper suggests that deep learning is an approach that could be made or is already conceptually general enough to learn everything there is to learn (assuming sufficient time and resources). Thus it could already be used as the base algorithm of a self-optimizing AGI. 

Open thread, Dec. 8 - Dec. 15, 2014

6 Gondolinian 08 December 2014 12:06AM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Previous OT

Next OT

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

If you have any comments about the Open Thread posts themselves or this post specifically, please post them as a reply to the [META] comment.  Aside from that, this thread is as organized as you collectively wish to make it.

PSA: Eugine_Nier evading ban?

13 Dahlen 07 December 2014 11:23PM

I know this reeks of witch-hunting, but... I have a hunch that u/Eugine_Nier is back under the guise of u/Azathoth123. Reasons:

  •  Same political views, with a tendency to be outspoken about them
  • Karma hovering in the 70s% for both accounts, occasionally going into the 60s%, significantly lower than the LW average
  • The dates match up. Kaj Sotala announced on July 03, 2014 that Eugine was to be permanently banned. The first comment from Azathoth123 was on July 12, 2014.
  • The one that got my attention was the posting pattern. Particularly, Eugine_Nier had a pervasive pattern of exceeding the quote limits per rationality thread. That's actually the first thing I had noticed about the guy back when he was first active, and a few times I thought about drawing attention to the way he flouted the rules, but never got around to it/cared enough about the matter. Now, I see Azathoth123 doing the same thing. The current Rationality Quotes thread has four quotes from him already and it hasn't even been a week since the thread was posted; all of them have something to do with his political views. As do basically all of his postings so far.
  • Each one of these points, separately, has a small prior probability if the two of them are not the same person. Together, they have an even smaller probability. Especially the predilection for posting one too many rationality quotes; seriously, how common an occurrence is that one in particular?
  • My experience so far with the internet has been that people like Eugine never really leave an online community they have pestered for so long. It doesn't matter if they're IP banned or something. They always come back, just under a different name, and they come back shortly.

I don't have an axe to grind against the guy, I've only spoken to him a couple of times and didn't notice any particularly large karma hits afterwards, I just really dislike it when someone skirts the rules like that. Disruptive users evading permanent bans never helped any community ever.

Obviously I'm posting this here because I think a moderator should look into the matter. Usually I would be posting a disclaimer of some sort, apologizing in advance to Azathoth123 for attacking his standing with slanderous accusations if this turned out not to be the case. Well, I won't. The more I look into the matter, the more confident I get that they're the same person. Azathoth, if you're reading this and you're not Eugine_Nier, then I strongly advise you go search for your twin brother, I think you'll get along very well. Seriously, I'm saying this in good faith. You have a suspiciously great deal of things in common.

If retributive downvoting is (still) a concern (if not, then disregard this paragraph): I'd like to request, if such a thing is possible, that a mod karma-blocks me until the issue is over, so as to not incur undeserved downvotes (it would also mean I'd get no upvotes). In turn, I promise not to abuse the system by spamming the boards with garbage without consequences, but then again given my history so far on LW I don't think that such an abuse should be expected from me. For the record, I could have made a throwaway account just to say this, and not risk being karmassassinated, but 1) a zero karma account has no credibility and 2) for signalling reasons I prefer to put my money where my mouth is.

P.S. I only made this announcement its own post because the latest open thread was about to "expire".

A bit of word-dissolving in political discussion

2 eli_sennesh 07 December 2014 05:05PM

I found Scott Alexander's steelmanning of the NRx critique to be an interesting, even persuassive critique of modern progressivism, having not been exposed to this movement prior to today. However I am also equally confused at the jump from "modern liberal democracies are flawed" to "restore the devine-right-of-kings!" I've always hated the quip "democracy is the worst form of government, except for all the others" (we've yet tried), but I think it applies here.

-- Mark Friedenbach

Of course, with the prompting to state my own thoughts, I simply had to go and start typing them out.  The following contains obvious traces of my own political leanings and philosophy (in short summary: if "Cthulhu only swims left", then I AM CTHULHU... at least until someone explains to me what a Great Old One is doing out of R'lyeh and in West Coast-flavored American politics), but those traces should be taken as evidence of what I believe rather than statements about it.

Because what I was actually trying to talk about, is rationality in politics.  Because in fact, while it is hard, while it is spiders, all the normal techniques work on it.  There is only one real Cardinal Sin of Attempting to be Rational in Politics, and it is the following argument, stated in generic form that I might capture it from the ether and bury it: "You only believe what you believe for political reasons!"  It does not matter if those "reasons" are signaling, privilege, hegemony, or having an invisible devil on your shoulder whispering into your bloody ear: to impugn someone else's epistemology entirely at the meta-level without saying a thing against their object-level claims is anti-epistemology.

Now, on to the ranting!  The following are more-or-less a semi-random collection of tips I vomited out for trying to deal with politics rationally.  I hope they help.  This is a Discussion post because Mark said that might be a good idea.

  1. Dissolve "democracy", and not just in the philosophical sense, but in the sense that there have been many different kinds of actually existing democracies.  There are always multiple object-level implementations of any meta-level idea, and most political ideas are sufficiently abstract to count as meta-level.  Even if, for purposes of a thought experiment, you find yourself saying, "I WILL ONLY EVER CONSIDER SYSTEMS THAT COUNT AS DEMOCRACY ACCORDING TO MY INTUITIVE DEMOCRACY-P() PREDICATE!", one can easily debate whether a mixed-member proportional Parliament performs better than a district-based bicameral Congress, or whether a pure Westminster system beats them both, or whether a Presidential system works better, or whatever.  Particular institutional designs yield particular institutional behaviors, and successfully inducing complex generalizations across large categories of institutional designs requires large amounts of evidence -- just as it does in any other form of hierarchical probabilistic reasoning.
  2. Dissolve words like "democracy", "capitalism", "socialism", and "government" in the philosophical sense, and ask: what are the terminal goals democracy serves?  How much do we support those goals, and how much do current democratic systems suffer approximation error by forcing our terminal goals to fit inside the hypothesis space our actual institutions instantiate?  For however much we do support those goals, why do we shape these particular institutions to serve those goals, and not other institutions? For all values of X, mah nishtana ha-X hazeh mikol ha-X-im? is a fundamental question of correct reasoning.  (Asking the question of why we instantiate particular institutions in particular places, when one believes in democratic states, is the core issue of democratic socialism, and I would indeed count myself a democratic socialist.  But you get different answers and inferences if you ask about schools or churches, don't you?)
  3. Learn first to explicitly identify yourself with a political "tribe", and next to consider political ideas individually, as questions of fact and value subject to investigation via epistemology and moral epistemology, rather than treating politics as "tribal".  Tribalism is the mind-killer: keeping your own explicit tribal identification in mind helps you notice when you're being tribalist, and helps you distinguish your own tribe's customs from universal truths -- both aids to your political rationality.  And yes, while politics has always been at least a little tribal, the particular form the tribes take varies through time and space: the division of society into a "blue tribe" and a "red tribe" (as oft-described by Yvain on Slate Star Codex), for example, is peculiar to late-20th-century and early-21st-century USA.  Those colors didn't even come into usage until the 2000 Presidential election, and hadn't firmly solidified as describing seemingly separate nationalities until 2004!  Other countries, and other times, have significantly different arrangements of tribes, so if you don't learn to distinguish between ideas and tribes, you'll not only fail at political rationality, you'll give yourself severe culture shock the first time you go abroad.
    1. General rule: you often think things are general rules of the world not because you have the large amount of evidence necessary to reason that they really are, but because you've seen so few alternatives that your subjective distribution over models contains only one or two models, both coarse-grained.  Unquestioned assumptions always feel like universal truths from the inside!
  4. Learn to check political ideas by looking at the actually-existing implementations, including the ones you currently oppose -- think of yourself as bloody Sauron if you have to!  This works, since most political ideas are not particularly original.  Commons trusts exist, for example, the "movement" supporting them just wants to scale them up to cover all society's important common assets rather than just tracts of land donated by philanthropists.  Universal health care exists in many countries.  Monarchy and dictatorship exist in many countries.  Religious rule exists in many countries.  Free tertiary education exists in some countries, and has previously existed in more.  Non-free but subsidized tertiary education exists in many countries.  Running the state off oil revenue has been tried in many countries.  Centrally-planned economies have been tried in many countries.  And it's damn well easier to compare "Canadian health-care" to "American health-care" to "Chinese health-care", all sampled in 2014, using fact-based policy studies, than to argue about the Visions of Human Life represented by each (the welfare state, the Company Man, and the Lone Fox, let's say) -- which of course assumes consequentialism.  In fact, I should issue a much stronger warning here: argumentation is an utterly unreliable guide to truth compared to data, and all these meta-level political conclusions require vast amounts of object-level data to induce correct causal models of the world that allow for proper planning and policy.
    1. This means that while the Soviet Union is not evidence for the total failure of "socialism" as I use the word, that's because I define socialism as a larger category of possible economies that strictly contains centralized state planning -- centralized state planning really was, by and large, a total fucking failure.  But there's a rationality lesson here: in politics, all opponents of an idea will have their own definition for it, but the supporters will only have one.  Learn to identify political terminology with the definitions advanced by supporters: these definitions might contain applause lights, but at least they pick out one single spot in policy-space or society-space (or, hopefully, a reasonably small subset of that space), while opponents don't generally agree on which precise point in policy-space or society-space they're actually attacking (because they're all opposed for their own reasons and thus not coordinating with each-other).
    2. This also means that if someone wants to talk about monarchies that rule by religious right, or even about absolute monarchies in general, they do have to account for the behavior of the Arab monarchies today, for example.  Or if they want to talk about religious rule in general (which very few do, to my knowledge, but hey, let's go with it), they actually do have to account for the behavior of Da3esh/ISIS.  Of course, they might do so by endorsing such regimes, just as some members of Western Communist Parties endorsed the Soviet Union -- and this can happen by lack of knowledge, by failure of rationality, or by difference of goals.
    3. And then of course, there are the complications of the real world: in the real world, neither perfect steelman-level central planning nor perfect steelman-level markets have ever been implemented, anywhere, with the result that once upon a time, the Soviet economy was allocatively efficient and prices in capitalist West Germany were just as bad at reflecting relative scarcities as those in centrally-planned East GermanyThe real advantage of market systems has ended up being the autonomy of firms, not allocative optimality (and that's being argued, right there, in the single most left-wing magazine I know of!).  Which leads us to repeat the warning: correct conclusions are induced from real-world data, not argued from a priori principles that usually turn out to be wildly mis-emphasized if not entirely wrong.
  5. Learn to notice when otherwise uninformed people are adopting political ideas as attire to gain status by joining a fashionable cause.  Keep in mind that what constitutes "fashionable" depends on the joiner's own place in society, not on your opinions about them.  For some people, things you and I find low-status (certain clothes or haircuts) are, in fact, high-status.  See Yvain's "Republicans are Douchebags" post for an example in a Western context: names that the American Red Tribe considers solid and respectable are viewed by the American Blue Tribe as "douchebag names".
  6. A heuristic that tends to immunize against certain failures of political rationality: if an argument does not base itself at all in facts external to itself or to the listener, but instead concentrates entirely on reinterpreting evidence, then it is probably either an argument about definitions, or sheer nonsense.  This is related to my comments on hierarchical reasoning above, and also to the general sense in which trying to refute an object-level claim by meta-level argumentation is not even wrong, but in fact anti-epistemology.
  7. A further heuristic, usable on actual electioneering campaigns the world over: whenever someone says "values", he is lying, and you should reach for your gun.  The word "values" is the single most overused, drained, meaningless word in politics.  It is a normative pronoun: it directs the listener to fill in warm fuzzy things here without concentrating the speaker and the listener on the same point in policy-space at all.  All over the world, politicians routinely seek power on phrases like "I have values", or "My opponent has no values", or "our values" or "our $TRIBE values", or "$APPLAUSE_LIGHT values".  Just cross those phrases and their entire containing sentences out with a big black marker, and then see what the speaker is actually saying.  Sometimes, if you're lucky (ie: voting for a Democrat), they're saying absolutely nothing.  Often, however, the word "values" means, "Good thing I'm here to tell you that you want this brand new oppressive/exploitative power elite, since you didn't even know!"
  8. As mentioned above, be very, very sure about what ethical framework you're working within before having a political discussion.  A consequentialist and a virtue-ethicist will often take completely different policy positions on, say, healthcare, and have absolutely nothing to talk about with each-other.  The consequentialist can point out the utilitarian gains of universal single-payer care, and the virtue-ethicist can point out the incentive structure of corporate-sponsored group plans for promoting hard work and loyalty to employers, but they are fundamentally talking past each-other.
    1. Often, the core matter of politics is how to trade off between ethical ideals that are otherwise left talking past each-other, because society has finite material resources, human morals are very complex, and real policies have unintended consequences.  For example, if we enact Victorian-style "poor laws" that penalize poverty for virtue-ethical reasons, the proponents of those laws need to be held accountable for accepting the unintended consequences of those laws, including higher crime rates, a less educated workforce, etc.  (This is a broad point in favor of consequentialism: a rational consequentialist always considers consequences, intended and unintended, or he fails at consequentialism.  A deontologist or virtue-ethicist, on the other hand, has license from his own ethics algorithm to not care about unintended consequences at all, provided the rules get followed or the rules or rulers are virtuous.)
  9. Almost all policies can be enacted more effectively with state power, and almost no policies can "take over the world" by sheer superiority of the idea all by themselves.  Demanding that a successful policy should "take over the world" by itself, as everyone naturally turns to the One True Path, is intellectually dishonest, and so is demanding that a policy should be maximally effective in miniature (when tried without the state, or in a small state, or in a weak state) before it is justified for the state to experiment with it.  Remember: the overwhelming majority of journals and conferences in professional science still employ frequentist statistics rather than Bayesianism, and this is 20 years after the PC revolution and the World Wide Web, and 40 years after computers became widespread in universities.  Human beings are utility-satisficing, adaptation-executing creatures with mostly-unknown utility functions: expecting them to adopt more effective policies quickly by mere effectiveness of the policy is downright unrealistic.
  10. The Appeal to Preconceptions is probably the single Darkest form of Dark Arts, and it's used everywhere in politics.  When someone says something to you that "stands to reason" or "sounds right", which genuinely seems quite plausible, actually, but without actually providing evidence, you need to interrogate your own beliefs and find the Equivalent Sample Size of the informative prior generating that subjective plausibility before you let yourself get talked into anything.  This applies triply in philosophy.


Linked decisions an a "nice" solution for the Fermi paradox

1 Beluga 07 December 2014 02:58PM

One of the more speculative solutions of the Fermi paradox is that all civilizations decide to stay home, thereby meta-cause other civilizations to stay home too, and thus allow the Fermi paradox to have a nice solution. (I remember reading this idea in Paul Almond’s writings about evidential decision theory, which unfortunately seem no longer available online.) The plausibility of this argument is definitely questionable. It requires a very high degree of goal convergence both within and among different civilizations. Let us grant this convergence and assume that, indeed, most civilizations arrive at the same decision and that they make their decision knowing this. One paradoxical implication then is: If a civilization decides to attempt space colonization, they are virtually guaranteed to face unexpected difficulties (for otherwise space would already be colonized, unless they are the first civilization in their neighborhood attempting space colonization). If, on the other hand, everyone decides to stay home, there is no reason for thinking that there would be any unexpected difficulties if one tried. Space colonization can either be easy, or you can try it, but not both.

Can the basic idea behind the argument be formalized? Consider the following game: There are N>>1 players. Each player is offered to push a button in turn. Pushing the button yields a reward R>0 with probability p and a punishment P<0 otherwise. (R corresponds to successful space colonization while P corresponds to a failed colonization attempt.) Not pushing the button gives zero utility. If a player pushes the button and receives R, the game is immediately aborted, while the game continues if a player receives P. Players do not know how many other players were offered to push the button before them, they only know that no player before them received R. Players also don’t know p. Instead, they have a probability distribution u(p) over possible values of p. (u(p)>=0 and the integral of u(p) from 0 to 1 is given by int_{0}^{1}u(p)dp=1.) We also assume that the decisions of the different players are perfectly linked.

Naively, it seems that players simply have an effective success probability p_eff,1=int_{0}^{1}p*u(p)dp and they should push the button iff p_eff,1*R+(1-p_eff,1)*P>0. Indeed, if players decide not to push the button they should expect that pushing the button would have given them R with probability p_eff,1. The situation becomes more complicated if a player decides to push the button. If a player pushes the button, they know that all players before them have also pushed the button and have received P. Before taking this knowledge into account, players are completely ignorant about the number i of players who were offered to push the button before them, and have to assign each number i from 0 to N-1 the same probability 1/N. Taking into account that all players before them have received P, the variables i and p become correlated: the larger i, the higher the probability of a small value of p. Formally, the joint probability distribution w(i,p) for the two variables is, according to Bayes’ theorem, given by w(i,p)=c*u(p)*(1-p)^i, where c is a normalization constant. The marginal distribution w(p) is given by w(p)=sum_{i=0}^{N-1}w(i,p). Using N>>1, we find w(p)=c*u(p)/p. The normalization constant is thus c=[int_{0}^{1}u(p)/p*dp]^{-1}. Finally, we find that the effective success probability taking the linkage of decisions into account is given by

p_eff,2 = int_{0}^{1}p*w(p)dp = c = [int_{0}^{1}u(p)/p*dp]^{-1} .

This is the expected chance of success if players decide to push the button. Players should push the button iff p_eff,2*R+(1-p_eff,2)*P>0. If follows from convexity of the function x->1/x (for positive x) that p_eff,2<=p_eff,1. So by deciding to push the button, players decrease their expected success probability from p_eff,1 to p_eff,2; they cannot both push the button and have the unaltered success probability p_eff,1. Linked decisions can explain why no one pushes the button if p_eff,2*R+(1-p_eff,2)*P<0, even though we might have p_eff,1*R+(1-p_eff,1)*P>0 and pushing the button naively seems to have positive expected utility.

It is also worth noting that if u(0)>0, the integral int_{0}^{1}u(p)/p*dp diverges such that we have p_eff,2=0. This means that given perfectly linked decisions and a sufficiently large number of players N>>1, players should never push the button if their distribution u(p) satisfies u(0)>0, irrespective of the ratio of R and P. This is due to an observer selection effect: If a player decides to push the button, then the fact that they are even offered to push the button is most likely due to p being very small and thus a lot of players being offered to push the button.

[Link] Eric S. Raymond - Me and Less Wrong

20 philh 05 December 2014 11:44PM


I’ve gotten questions from a couple of different quarters recently about my relationship to the the rationalist community around Less Wrong and related blogs. The one sentence answer is that I consider myself a fellow-traveler and ally of that culture, but not really part of it nor particularly wishing to be.

The rest of this post is a slightly longer development of that answer.

View more: Next