Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

[Link] The Dominant Life Form In the Cosmos Is Probably Superintelligent Robots

1 Gunnar_Zarncke 20 December 2014 12:28PM

An Article on Motherboard reports about  Alien Minds by Susan Schneider who claiThe Dominant Life Form In the Cosmos Is Probably Superintelligent Robots. The article is crosslinked to other posts about superintelligence and at the end discusses the question why these alien robots leave us along. The arguments puts forth on this don't convince me though. 


Velocity of behavioral evolution

3 Dr_Manhattan 19 December 2014 05:34PM

This suggested a major update on the velocity of behavioral trait evolution.

Basically mice transmitted fear of cherry smell reliably into the very next generation (via epigenetics). 


This seems pretty important.

Weekly LW Meetups

3 FrankAdamek 19 December 2014 05:27PM

This summary was posted to LW Main on December 12th. The following week's summary is here.

Irregularly scheduled Less Wrong meetups are taking place in:

The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:

Locations with regularly scheduled meetups: Austin, Berkeley, Berlin, Boston, Brussels, Buffalo, Cambridge UK, Canberra, Columbus, London, Madison WI, Melbourne, Moscow, Mountain View, New York, Philadelphia, Research Triangle NC, Seattle, Sydney, Toronto, Vienna, Washington DC, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers.

continue reading »

Post Resource Request

1 Minds_Eye 19 December 2014 05:04PM

After having viewed a recent post by Gworley I noticed that the material was deliberately opaque*.  It's not as complicated as it seems, and should be able to be taught to people at lower than "Level 4" on Kegan's Constructive Developmental Theory.  The only serious block I saw was the ridiculous gap in inferential distance. 
With that in mind I was hoping someone might have recommendations on even tangentially related material as what I have now appears to be insufficient.  (Simplifying CDT appears to be manageable, but not particularly useful without further material like Kantor's Four Player Model and Subject-Object Notation.)

*edit: Not Gworley's, it was Kegan's material that was opaque.

Letting it go: when you shouldn't respond to someone who is wrong

3 [deleted] 19 December 2014 03:12PM

I'm requesting that people follow a simple guide when determining whether to respond to a post. This simple algorithm should raise the quality of discussion here.

  • If you care about the answer to a question, you will research it.
  • If you don't care about the answer, don't waste people's time by arguing about it, even if someone's post seems wrong.
  • If you don't care and still want to argue, do the research.

Why should you follow these rules?


It takes very little effort to post a contradictory assertion. You just have to skim a post, find an assertion (preferably one that isn't followed or preceded immediately by paragraphs of backing evidence, but that's an optional filter), and craft a sentence indicating that that assertion is wrong or flawed. Humans can do this almost by instinct. It's magical.

Refuting a contradiction takes effort. I typically spend at least five minutes of research and five minutes of writing to make a reply refuting a bare contradiction when I have already studied the issue thoroughly and know which sources I want to use. I go to this effort because I care about these statements I've made and because I care about what other people believe. I want to craft a reply that is sufficiently thorough to be convincing. And, I'll admit, I want to crush my opponents with my impeccable data. I'm a bit petty sometimes.

If I haven't researched the issue well -- if my sources are second-hand, or if I'm using personal experience -- I might spend two hours researching a simple topic and ten to fifteen minutes creating a response. This is a fair amount of time invested. I don't mind doing it; it makes me learn more. It's a time investment, though.

So, let's compare. Half a second of thought and two minutes to craft a reply containing nothing but a contradiction, versus two hours of unpaid research. This is a huge imbalance. Let's address this by trying to research people's claims before posting a contradiction, shall we?


You are convinced that someone's argument is flawed. This means that they have not looked into the issue sufficiently, or their reasoning is wrong. As a result, you can't trust their argument to be a good example of arguments for their position. You can look for flaws of reasoning, which is easy. You can look for cases where their data is misleading or wrong -- but that requires actual effort. You have to either find a consensus in the relevant authorities that differs from what this other person is saying, or you have to look at their specific data in some detail. That means you have to do some research.


If you want people to stick around, and you're brusquely denying their points until they do hours of work to prove them, they're going to view lesswrong as a source of stress. This is not likely to encourage them to return. If you do the legwork yourself, you seem knowledgeable. If you're careful with your phrasing, you can also seem helpful. (I expect that to be the tough part.) This reduces the impact of having someone contradict you.

Advancing the argument.

From what I've seen, the flow of argument goes something like: argument → contradiction of two or three claims → proof of said claims → criticism of proof → rebuttal → acceptance, analysis of argument. By doing some research on your own rather than immediately posting a contradiction, you are more quickly getting to the meat of the issue. You aren't as likely to get sidetracked. You can say things like: "This premise seems a bit contentious, but it's a widely supported minority opinion for good reasons. Let's take it as read for now and see if your conclusions are supported, and we can come back to it if we need to."

Bonus: "You're contradicting yourself."

Spoiler: they're not contradicting themselves.

We read here a lot about how people's brains fail them in myriad interesting ways. Compartmentalization is one of them. People's beliefs can contradict each other. But people tend to compartmentalize between different contexts, not within the same context.

One post or article probably doesn't involve someone using two different compartments. What looks like a contradiction is more likely a nuance that you don't understand or didn't bother to read, or a rhetorical device like hyperbole. (I've seen someone here say I'm contradicting myself when I said "This group doesn't experience this as often, and when they do experience it, it's different." Apparently "not as often" is the same as "never"?) Read over the post again. Look for rhetorical devices. Look for something similar that would make sense. If you're uncertain, try to express that similar argument to the other person and ask if that's what they mean.

If you still haven't found anything besides a bare contradiction, a flat assertion that they're contradicting themselves is a bad way to proceed. If you're wrong and they aren't contradicting themselves, they will be annoyed at you. That's bad enough. They will have to watch everything they say very carefully so that they do not use rhetorical devices or idioms or anything that you could possibly lawyer into a contradiction. This takes a lot more effort than simply writing an argument in common modes of speech, as everyone who's worked on a journal article knows.

Arguing with you is not worth that amount of effort. Don't make it harder than it needs to be.

Munchkining for Fun and Profit, Ideas, Experience, Successes, Failures

3 Username 19 December 2014 05:39AM

A munchkin is someone who follows the letter of the rules of a game while breaking their spirit, someone who wins by exploiting possibilities that others don't see, reject as impossible or unsporting, or just don't believe can possibly work.

If you have done something that everyone around you thought would not work, something that people around you didn't do after they saw it work, please share your experiences. If you tried something and failed or have ideas you want to hear critique of, likewise please share those with us.

[Short, Meta] Should open threads be more frequent?

2 Metus 18 December 2014 11:41PM

Currently open threads are weekly and very well received. However they tend to fill up quickly. Personally I fear that my contribution will drown unless posted early on so I tend to wait if I want to add a new top level post. Does anyone else have this impression? Someone with better coding skills than me could put this statistically by plotting the number of top level posts and total posts over time: If the curve is convex people tend to delay their posts.

So should open threads be more frequent and if so what frequency?

An explanation of the 'Many Interacting Worlds' theory of quantum mechanics (by Sean Carroll and Chip Sebens)

4 Ander 18 December 2014 11:36PM

This is the first explanation of a 'many worlds' theory of quantum mechanics that has ever made sense to me. The animations are excellent:



Bayes Academy Development Report 2 - improved data visualization

4 Kaj_Sotala 18 December 2014 10:11PM

See here for the previous update if you missed / forgot it.

In this update, no new game content, but new graphics.

I wasn’t terribly happy about the graphical representation of the various nodes in the last update. Especially in the first two networks, if you didn’t read the descriptions of the nodes carefully, it was very easy to just click your way through them without really having a clue of what the network was actually doing. Needless to say, for a game that’s supposed to teach how the networks function, this is highly non-optimal.

Here’s the representation that I’m now experimenting with: the truth table of the nodes is represented graphically inside the node. The prior variable at the top doesn’t really have a truth table, it’s just true or false. The “is” variable at the bottom is true if its parent is true, and false if its parent is false.

You may remember that in the previous update, unobservable nodes were represented in grayscale. I ended up dropping that, because that would have been confusing in this representation: if the parent is unobservable, should the blobs representing its truth values in the child node be in grayscale as well? Both “yes” and “no” answers felt confusing.

Instead the observational state of a node is now represented by its border color. Black for unobservable, gray for observable, no border for observed. The metaphor is supposed to be something like, a border is a veil of ignorance blocking us from seeing the node directly, but if the veil is gray it’s weak enough to be broken, whereas a black veil is strong enough to resist a direct assault. Or something.

When you observe a node, not only does its border disappear, but the truth table entries that get reduced to a zero probability disappear, to be replaced by white boxes. I experimented with having the eliminated entries still show up in grayscale, so you could e.g. see that the “is” node used to contain the entry for (false -> false), but felt that this looked clearer.

The “or” node at the bottom is getting a little crowded, but hopefully not too crowded. Since we know that its value is “true”, the truth table entry showing (false, false -> false) shows up in all whites. It’s also already been observed, so it starts without a border.

After we observe that there’s no monster behind us, the “or” node loses its entries for (monster, !waiting -> looks) and (monster, waiting -> looks), leaving only (!monster, waiting -> looks): meaning that the boy must be waiting for us to answer.

This could still be made clearer: currently the network updates instantly. I’m thinking about adding a brief animation where the “monster” variable would first be revealed as false, which would then propagate an update to the values of “looks at you” (with e.g. the red tile in “monster” blinking at the same time as the now-invalid truth table entries, and when the tiles stopped blinking, those now-invalid entries would have disappeared), and that would in turn propagate the update to the “waiting” node, deleting the red color from it. But I haven’t yet implemented this.

The third network is where things get a little tricky. The “attacking” node is of type “majority vote” - i.e. it’s true if at least two of its parents are true, and false otherwise. That would make for a truth table with eight entries, each holding four blobs each, and we could already see the “or” node in the previous screen being crowded. I’m not quite sure of what to do here. At this moment I’m thinking of just leaving the node as is, and displaying more detailed information in the sidebar.

Here’s another possible problem. Just having the truth table entries works fine to make it obvious where the overall probability of the node comes from… for as long as the valid values of the entries are restricted to “possible” and “impossible”. Then you can see at a glance that, say, of the three possible entries, two would make this node true and one would make this false, so there’s a ⅔ chance of it being true.

But in this screen, that has ceased to be the case. The “attacking” node has a 75% chance of being true, meaning that, for instance, the “is / block” node’s “true -> true” entry also has a 75% chance of being the right one. This isn’t reflected in the truth table visualization. I thought of adding small probability bars under each truth table entry, or having the size of the truth table blobs reflect their probability, but then I’d have to make the nodes even bigger, and it feels like it would easily start looking cluttered again. But maybe it’d be the right choice anyway? Or maybe just put the more detailed information in the sidebar? I’m not sure of the best thing to do here.

If anyone has good suggestions, I would be grateful to get advice from people who have more of a visual designer gene than I do!

Rationality Jokes Thread

6 Gunnar_Zarncke 18 December 2014 04:17PM

This is an experimental thread. It is somewhat in the spirit of the Rationality Quotes Thread but without the requirements and with a focus on humorous value. You may post insightful jokes, nerd or math jokes or try out rationality jokes of your own invention. 

ADDED: Apparently there has been an earlier Jokes Thread which was failry successful. Consider this another instance.

(Very Short) PSA: Combined Main and Discussion Feed

10 Gondolinian 18 December 2014 03:46PM

For anyone who's annoyed by having to check newest submissions for Main and Discussion separately, there is a feed for combined submissions from both, in the form of Newest Submissions - All (RSS feed).  (There's also Comments - All (RSS feed), but for me at least, it seems to only show comments from Main and none from Discussion.)

Thanks to RichardKennaway for bringing this to my attention, and to Unknowns for asking the question that prompted him.  (If you've got the time, head over there and give them some karma.)  I thought this deserved the visibility of a post in Discussion, as not everyone reads through the Open Thread, and I think there's a chance that many would benefit from this information.

How many words do we have and how many distinct concepts do we have?

-4 MazeHatter 17 December 2014 11:04PM

In another message, I suggested that, given how many cultures we have to borrow from, that our language may include multiple words from various sources that apply to a single concept.

An example is Reality, or Existence, or Being, or Universe, or Cosmos, or Nature, ect.

Another is Subjectivity, Mind, Consciousness, Experience, Qualia, Phenomenal, Mental, ect

Is there any problem with accepting these claims so far? Curious what case would be made to the contrary.

(Here's a bit of a contextual aside, between quantum mechanics and cosmology, the words "universe", "multiverse", and "observable universe" mean at least 10 different things, depending on who you ask. People often say the Multiverse comes from Hugh Everett. But what they are calling the multiverse, Everett called "universal wave function", or "universe". How did Everett's universe become the Multiverse? DeWitt came along and emphasized some part of the wave function branching into different worlds. So, if you're following, one Universe, many worlds. Over the next few decades, this idea was popularized as having "many parallel universes", which is obviously inaccurate. Well, a Scottish chap decided to correct this. He stated the Universe was the Universal Wave Function, where it was "a complete one", because that's what "uni" means. And that our perceived worlds of various objects is a "multiverse". One Universe, many Multiverses. Again, the "parallel universes" idea seemed cooler, so as it became more popular the Multiverse became one and the universe became many. What's my point? The use of these words is legitimate fiasco, and I suggest we abandon them altogether.)

If these claims are found to be palatable, what do they suggest?

I propose, respectfully and humbly as I can imagine there may be compelling alternatives presented here, that in the 21st century, we make a decision about which concepts are necessary, which term we will use to describe that concept, and respectfully leave the remaining terms for the domain of poetry.

Here are the words I think we need:

  1. reality
  2. model
  3. absolute
  4. relative
  5. subjective
  6. objective
  7. measurement
  8. observer

With these terms I feel we can construct a concise metaphysical framework, consistent with the great rationalists of history, and that accurately described Everett's "Relative State Formulation of Quantum Mechanics".

  1. Absolute reality is what is. It is relative to no observer. It is real prior to measurement.
  2. Subjective reality is what is, relative to a single observer. It exists at measurement.
  3. Objective reality is the model relative to all observers. It exists post-measurement.

Everett's Relative State formulation, is roughly this:

  1. The wave function is the "absolute state" of the model
  2. The wave function contains an observer and their measurement apparatus
  3. An observer makes a measurements and records the result in a memory
  4. those measurement records are the "relative state" of the model

Here we see that the words multiverse and universe are abandoned for absolute and relative states, which is actually the language used in the Relative State Formulation.

My conclusion then, for you consideration and comment, is that a technical view of reality can be attained by having a select set of terms, and this view is not only consistent with themes of philosophy (which I didn't really explain) but also the proper framework in which to interpret quantum mechanics (ala Everett).

(I'm not sure how familiar everyone here is with Everett specifically or not. His thesis depended on "automatically function machines" that make measurements with sensory gear and record them. After receiving his PhD, he left theoretical physics, and had a life long fascination with computer vision and computer hearing. That suggests to me, the reason his papers have been largely confounding to the general physicists, is because they didn't realize the extent to which Everett really thought he could mathematically model an observer.)

I should note, it may clarify things to add another term "truth", though this would in general be taken as an analog of "real". For example, if something is absolute true, then it is of absolute reality. If something is objectively true, then it is of objective reality. The word "knowledge" in this sense is a poetic word for objective truth, understood on the premise that objective truth is not absolute truth.

Giving What We Can - New Year drive

8 Smaug123 17 December 2014 03:26PM

If you’ve been planning to get around to maybe thinking about Effective Altruism, we’re making your job easier. A group of UK students has set up a drive for people to sign up to the Giving What We Can pledge to donate 10% of their future income to charity. It does not specify the charities - that decision remains under your control. The pledge is not legally binding, but honour is a powerful force when it comes to promising to help. If 10% is a daunting number, or you don't want to sign away your future earnings in perpetuity, there is a Try Giving scheme in which you may donate less money for less time. I suggest five years (that is, from 2015 to 2020) of 5% as a suitable "silver" option to the 10%-until-retirement "gold medal".


We’re hoping to take advantage of the existing Schelling point of “new year” as a time for resolutions, as well as building the kind of community spirit that gets people signing up in groups. If you feel it’s a word worth spreading, please feel free to spread it. As of this writing, GWWC reported 41 new members this month, which is a record for monthly acquisitions (and we’re only halfway through the month, three days into the event).


If anyone has suggestions about how to better publicise this event (or Effective Altruism generally), please do let me know. We’re currently talking to various news outlets and high-profile philanthropists to see if they can give us a mention, but suggestions are always welcome. Likewise, comments on the effectiveness of this post itself will be gratefully noted.


About Giving What We Can: GWWC is under the umbrella of the Centre for Effective Altruism, was co-founded by a LessWronger, and in 2013 had verbal praise from lukeprog.

Entropy and Temperature

26 spxtr 17 December 2014 08:04AM

Eliezer Yudkowsky previously wrote (6 years ago!) about the second law of thermodynamics. Many commenters were skeptical about the statement, "if you know the positions and momenta of every particle in a glass of water, it is at absolute zero temperature," because they don't know what temperature is. This is a common confusion.


To specify the precise state of a classical system, you need to know its location in phase space. For a bunch of helium atoms whizzing around in a box, phase space is the position and momentum of each helium atom. For N atoms in the box, that means 6N numbers to completely specify the system.

Lets say you know the total energy of the gas, but nothing else. It will be the case that a fantastically huge number of points in phase space will be consistent with that energy.* In the absence of any more information it is correct to assign a uniform distribution to this region of phase space. The entropy of a uniform distribution is the logarithm of the number of points, so that's that. If you also know the volume, then the number of points in phase space consistent with both the energy and volume is necessarily smaller, so the entropy is smaller.

This might be confusing to chemists, since they memorized a formula for the entropy of an ideal gas, and it's ostensibly objective. Someone with perfect knowledge of the system will calculate the same number on the right side of that equation, but to them, that number isn't the entropy. It's the entropy of the gas if you know nothing more than energy, volume, and number of particles.


The existence of temperature follows from the zeroth and second laws of thermodynamics: thermal equilibrium is transitive, and entropy is maximum in equilibrium. Temperature is then defined as the thermodynamic quantity that is the shared by systems in equilibrium.

If two systems are in equilibrium then they cannot increase entropy by flowing energy from one to the other. That means that if we flow a tiny bit of energy from one to the other (δU1 = -δU2), the entropy change in the first must be the opposite of the entropy change of the second (δS1 = -δS2), so that the total entropy (S1 + S2) doesn't change. For systems in equilibrium, this leads to (∂S1/∂U1) = (∂S2/∂U2). Define 1/T = (∂S/∂U), and we are done.

Temperature is sometimes taught as, "a measure of the average kinetic energy of the particles," because for an ideal gas U/= (3/2) kBT. This is wrong, for the same reason that the ideal gas entropy isn't the definition of entropy.

Probability is in the mind. Entropy is a function of probabilities, so entropy is in the mind. Temperature is a derivative of entropy, so temperature is in the mind.

Second Law Trickery

With perfect knowledge of a system, it is possible to extract all of its energy as work. EY states it clearly:

So (again ignoring quantum effects for the moment), if you know the states of all the molecules in a glass of hot water, it is cold in a genuinely thermodynamic sense: you can take electricity out of it and leave behind an ice cube.

Someone who doesn't know the state of the water will observe a violation of the second law. This is allowed. Let that sink in for a minute. Jaynes calls it second law trickery, and I can't explain it better than he does, so I won't try:

A physical system always has more macroscopic degrees of freedom beyond what we control or observe, and by manipulating them a trickster can always make us see an apparent violation of the second law.

Therefore the correct statement of the second law is not that an entropy decrease is impossible in principle, or even improbable; rather that it cannot be achieved reproducibly by manipulating the macrovariables {X1, ..., Xn} that we have chosen to define our macrostate. Any attempt to write a stronger law than this will put one at the mercy of a trickster, who can produce a violation of it.

But recognizing this should increase rather than decrease our confidence in the future of the second law, because it means that if an experimenter ever sees an apparent violation, then instead of issuing a sensational announcement, it will be more prudent to search for that unobserved degree of freedom. That is, the connection of entropy with information works both ways; seeing an apparent decrease of entropy signifies ignorance of what were the relevant macrovariables.


I've actually given you enough information on statistical mechanics to calculate an interesting system. Say you have N particles, each fixed in place to a lattice. Each particle can be in one of two states, with energies 0 and ε. Calculate and plot the entropy if you know the total energy: S(E), and then the energy as a function of temperature: E(T). This is essentially a combinatorics problem, and you may assume that N is large, so use Stirling's approximation. What you will discover should make sense using the correct definitions of entropy and temperature.

*: How many combinations of 1023 numbers between 0 and 10 add up to 5×1023?

"incomparable" outcomes--multiple utility functions?

3 Emanresu 17 December 2014 12:06AM

I know that this idea might sound a little weird at first, so just hear me out please?

A couple weeks ago I was pondering decision problems where a human decision maker has to choose between two acts that lead to two "incomparable" outcomes. I thought, if outcome A is not more preferred than outcome B, and outcome B is not more preferred than outcome A, then of course the decision maker is indifferent between both outcomes, right? But if that's the case, the decision maker should be able to just flip a coin to decide. Not only that, but adding even a tiny amount of extra value to one of the outcomes should always make that outcome be preferred. So why can't a human decision maker just make up their mind about their preferences between "incomparable" outcomes until they're forced to choose between them? Also, if a human decision maker is really indifferent between both outcomes, then they should be able to know that ahead of time and have a plan for deciding, such as flipping a coin. And, if they're really indifferent between both outcomes, then they should not be regretting and/or doubting their decision before an outcome even occurs regardless of which act they choose. Right?

I thought of the idea that maybe the human decision maker has multiple utility functions that when you try to combine them into one function some parts of the original functions don't necessarily translate well. Like some sort of discontinuity that corresponds to "incomparable" outcomes, or something. Granted, it's been a while since I've taken Calculus, so I'm not really sure how that would look on a graph.

I had read Yudkowsky's "Thou Art Godshatter" a couple months ago, and there was a point where it said "one pure utility function splintered into a thousand shards of desire". That sounds like the "shards of desire" are actually a bunch of different utility functions.

I'd like to know what others think of this idea. Strengths? Weaknesses? Implications?

Superintelligence 14: Motivation selection methods

5 KatjaGrace 16 December 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.

Welcome. This week we discuss the fourteenth section in the reading guideMotivation selection methods. This corresponds to the second part of Chapter Nine.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Motivation selection methods” and “Synopsis” from Chapter 9.


  1. One way to control an AI is to design its motives. That is, to choose what it wants to do (p138)
  2. Some varieties of 'motivation selection' for AI safety:
    1. Direct specification: figure out what we value, and code it into the AI (p139-40)
      1. Isaac Asimov's 'three laws of robotics' are a famous example
      2. Direct specification might be fairly hard: both figuring out what we want and coding it precisely seem hard
      3. This could be based on rules, or something like consequentialism
    2. Domesticity: the AI's goals limit the range of things it wants to interfere with (140-1)
      1. This might make direct specification easier, as the world the AI interacts with (and thus which has to be thought of in specifying its behavior) is simpler.
      2. Oracles are an example
      3. This might be combined well with physical containment: the AI could be trapped, and also not want to escape.
    3. Indirect normativity: instead of specifying what we value, specify a way to specify what we value (141-2)
      1. e.g. extrapolate our volition
      2. This means outsourcing the hard intellectual work to the AI
      3. This will mostly be discussed in chapter 13 (weeks 23-5 here)
    4. Augmentation: begin with a creature with desirable motives, then make it smarter, instead of designing good motives from scratch. (p142)
      1. e.g. brain emulations are likely to have human desires (at least at the start)
      2. Whether we use this method depends on the kind of AI that is developed, so usually we won't have a choice about whether to use it (except inasmuch as we have a choice about e.g. whether to develop uploads or synthetic AI first).
  3. Bostrom provides a summary of the chapter:
  4. The question is not which control method is best, but rather which set of control methods are best given the situation. (143-4)

Another view


Would you say there's any ethical issue involved with imposing limits or constraints on a superintelligence's drives/motivations? By analogy, I think most of us have the moral intuition that technologically interfering with an unborn human's inherent desires and motivations would be questionable or wrong, supposing that were even possible. That is, say we could genetically modify a subset of humanity to be cheerful slaves; that seems like a pretty morally unsavory prospect. What makes engineering a superintelligence specifically to serve humanity less unsavory?


1. Bostrom tells us that it is very hard to specify human values. We have seen examples of galaxies full of paperclips or fake smiles resulting from poor specification. But these - and Isaac Asimov's stories - seem to tell us only that a few people spending a small fraction of their time thinking does not produce any watertight specification. What if a thousand researchers spent a decade on it? Are the millionth most obvious attempts at specification nearly as bad as the most obvious twenty? How hard is it? A general argument for pessimism is the thesis that 'value is fragile', i.e. that if you specify what you want very nearly but get it a tiny bit wrong, it's likely to be almost worthless. Much like if you get one digit wrong in a phone number. The degree to which this is so (with respect to value, not phone numbers) is controversial. I encourage you to try to specify a world you would be happy with (to see how hard it is, or produce something of value if it isn't that hard).

2. If you'd like a taste of indirect normativity before the chapter on it, the LessWrong wiki page on coherent extrapolated volition links to a bunch of sources.

3. The idea of 'indirect normativity' (i.e. outsourcing the problem of specifying what an AI should do, by giving it some good instructions for figuring out what you value) brings up the general question of just what an AI needs to be given to be able to figure out how to carry out our will. An obvious contender is a lot of information about human values. Though some people disagree with this - these people don't buy the orthogonality thesis. Other issues sometimes suggested to need working out ahead of outsourcing everything to AIs include decision theory, priors, anthropics, feelings about pascal's mugging, and attitudes to infinity. MIRI's technical work often fits into this category.

4. Danaher's last post on Superintelligence (so far) is on motivation selection. It mostly summarizes and clarifies the chapter, so is mostly good if you'd like to think about the question some more with a slightly different framing. He also previously considered the difficulty of specifying human values in The golem genie and unfriendly AI (parts one and two), which is about Intelligence Explosion and Machine Ethics.

5. Brian Clegg thinks Bostrom should have discussed Asimov's stories at greater length:

I think it’s a shame that Bostrom doesn’t make more use of science fiction to give examples of how people have already thought about these issues – he gives only half a page to Asimov and the three laws of robotics (and how Asimov then spends most of his time showing how they’d go wrong), but that’s about it. Yet there has been a lot of thought and dare I say it, a lot more readability than you typically get in a textbook, put into the issues in science fiction than is being allowed for, and it would have been worthy of a chapter in its own right.

If you haven't already, you might consider (sort-of) following his advice, and reading some science fiction.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Can you think of novel methods of specifying the values of one or many humans?
  2. What are the most promising methods for 'domesticating' an AI? (i.e. constraining it to only care about a small part of the world, and not want to interfere with the larger world to optimize that smaller part).
  3. Think more carefully about the likely motivations of drastically augmenting brain emulations
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will start to talk about a variety of more and less agent-like AIs: 'oracles', genies' and 'sovereigns'. To prepare, read Chapter “Oracles” and “Genies and Sovereigns” from Chapter 10The discussion will go live at 6pm Pacific time next Monday 22nd December. Sign up to be notified here.

How many people am I?

4 Manfred 15 December 2014 06:11PM

Strongly related: the Ebborians

Imagine mapping my brain into two interpenetrating networks. For each brain cell, half of it goes to one map and half to the other. For each connection between cells, half of each connection goes to one map and half to the other. We can call these two mapped out halves Manfred One and Manfred Two. Because neurons are classical, as I think, both of these maps change together. They contain the full pattern of my thoughts. (This situation is even more clear in the Ebborians, who can literally split down the middle.)

So how many people am I? Are Manfred One and Manfred Two both people? Of course, once we have two, why stop there - are there thousands of Manfreds in here, with "me" as only one of them? Put like that it sounds a little overwrought - what's really going on here is the question of what physical system corresponds to "I" in english statements like "I wake up." This may matter.

The impact on anthropic probabilities is somewhat straightforward. With everyday definitions of "I wake up," I wake up just once per day no matter how big my head is. But if the "I" in that sentence is some constant-size physical pattern, then "I wake up" is an event that happens more times if my head is bigger. And so using the variable people-number definition, I expect to wake up with a gigantic head.

The impact on decisions is less big. If I'm in this head with a bunch of other Manfreds, we're all on the same page - it's a non-anthropic problem of coordinated decision-making. For example, if I were to make any monetary bets about my head size, and then donate profits to charity, no matter what definition I'm using, I should bet as if my head size didn't affect anthropic probabilities. So to some extent the real point of this effect is that it is a way anthropic probabilities can be ill-defined. On the other hand, what about preferences that depend directly on person-numbers like how to value people with different head sizes? Or for vegetarians, should we care more about cows than chickens, because each cow is more animals than a chicken is?


According to my common sense, it seems like my body has just one person in it. Why does my common sense think that? I think there are two answers, one unhelpful and one helpful.

The first answer is evolution. Having kids is an action that's independent of what physical system we identify with "I," and so my ancestors never found modeling their bodies as being multiple people useful.

The second answer is causality. Manfred One and Manfred Two are causally distinct from two copies of me in separate bodies but the same input/output. If a difference between the two separated copies arose somehow, (reminiscent of Dennett's factual account) henceforth the two bodies would do and say different things and have different brain states. But if some difference arises between Manfred One and Manfred Two, it is erased by diffusion.

Which is to say, the map that is Manfred One is statically the same pattern as my whole brain, but it's causally different. So is "I" the pattern, or is "I" the causal system? 

In this sort of situation I am happy to stick with common sense, and thus when I say me, I think the causal system is referring to the causal system. But I'm not very sure.


Going back to the Ebborians, one interesting thing about that post is the conflict between common sense and common sense - it seems like common sense that each Ebborian is equally much one person, but it also seems like common sense that if you looked at an Ebborian dividing, there doesn't seem to be a moment where the amount of subjective experience should change, and so amount of subjective experience should be proportional to thickness. But as it is said, just because there are two opposing ideas doesn't mean one of them is right.

On the questions of subjective experience raised in that post, I think this mostly gets cleared up by precise description an  anthropic narrowness. I'm unsure of the relative sizes of this margin and the proof, but the sketch is to replace a mysterious "subjective experience" that spans copies with individual experiences of people who are using a TDT-like theory to choose so that they individually achieve good outcomes given their existence.

Has LessWrong Ever Backfired On You?

24 Evan_Gaensbauer 15 December 2014 05:44AM

Several weeks ago I wrote a heavily upvoted post called Don't Be Afraid of Asking Personally Important Questions on LessWrong. I thought it would only be due diligence if I tried to track users on LessWrong who have received advice on this site and it's backfired. In other words, to avoid bias in the record, we might notice what LessWrong as a community is bad at giving advice about. So, I'm seeking feedback. If you have anecdotes or data of how a plan or advice directly from LessWrong backfired, failed, or didn't lead to satisfaction, please share below. 

Group Rationality Diary, December 16-31

3 therufs 15 December 2014 03:30AM

This is the public group rationality diary for December 16-31.

It's a place to record and chat about it if you have done, or are actively doing, things like: 

  • Established a useful new habit
  • Obtained new evidence that made you change your mind about some belief
  • Decided to behave in a different way in some set of situations
  • Optimized some part of a common routine or cached behavior
  • Consciously changed your emotions or affect with respect to something
  • Consciously pursued new valuable information about something that could make a big difference in your life
  • Learned something new about your beliefs, behavior, or life that surprised you
  • Tried doing any of the above and failed

Or anything else interesting which you want to share, so that other people can think about it, and perhaps be inspired to take action themselves. Try to include enough details so that everyone can use each other's experiences to learn about what tends to work out, and what doesn't tend to work out.

Thanks to cata for starting the Group Rationality Diary posts, and to commenters for participating.

Previous diary: December 1-15

Rationality diaries archive

Open thread, Dec. 15 - Dec. 21, 2014

2 Gondolinian 15 December 2014 12:01AM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Previous Open Thread

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

Podcast: Rationalists in Tech

12 JoshuaFox 14 December 2014 04:14PM
I'll appreciate feedback on a new podcast, Rationalists in Tech. 

I'm interviewing founders, executives, CEOs, consultants, and other people in the tech sector, mostly software. Thanks to Laurent Bossavit, Daniel Reeves, and Alexei Andreev who agreed to be the guinea pigs for this experiment. 
  • The audience:
Software engineers and other tech workers, at all levels of seniority.
  • The hypothesized need
Some of you are thinking: "I see that some smart and fun people hang out at LessWrong. It's hard to find people like that to work with. I wonder if my next job/employee/cofounder could come from that community."
  • What this podcast does for you
You will get insights into other LessWrongers as real people in the software profession. (OK, you knew that, but this helps.) You will hear the interviewees' ideas on CfAR-style techniques as a productivity booster, on working with other aspiring rationalists, and on the interviewees' own special areas of expertise. (At the same time, interviewees benefit from exposure that can get them business contacts,  employees, or customers.) Software engineers from LW will reach out to interviewees and others in the tech sector, and soon, more hires and startups will emerge. 

Please give your feedback on the first episodes of the podcast. Do you want to hear more? Should there be other topics? A different interview style? Better music?

Discussion of AI control over at worldbuilding.stackexchange [LINK]

6 ike 14 December 2014 02:59AM


Go insert some rationality into the discussion! (There are actually some pretty good comments in there, and some links to the right places, including LW).

[Resolved] Is the SIA doomsday argument wrong?

5 Brian_Tomasik 13 December 2014 06:01AM

[EDIT: I think the SIA doomsday argument works after all, and my objection to it was based on framing the problem in a misguided way. Feel free to ignore this post or skip to the resolution at the end.]


Katja Grace has developed a kind of doomsday argument from SIA combined with the Great Filter. It has been discussed by Robin HansonCarl Shulman, and Nick Bostrom. The basic idea is that if the filter comes late, there are more civilizations with organisms like us than if the filter comes early, and more organisms in positions like ours means a higher expected number of (non-fake) experiences that match ours. (I'll ignore simulation-argument possibilities in this post.)

I used to agree with this reasoning. But now I'm not sure, and here's why. Your subjective experience, broadly construed, includes knowledge of a lot of Earth's history and current state, including when life evolved, which creatures evolved, the Earth's mass and distance from the sun, the chemical composition of the soil and atmosphere, and so on. The information that you know about your planet is sufficient to uniquely locate you within the observable universe. Sure, there might be exact copies of you in vastly distant Hubble volumes, and there might be many approximate copies of Earth in somewhat nearer Hubble volumes. But within any reasonable radius, probably what you know about Earth requires that your subjective experiences (if veridical) could only take place on Earth, not on any other planet in our Hubble volume.

If so, then whether there are lots of human-level extraterrestrials (ETs) or none doesn't matter anthropically, because none of those ETs within any reasonable radius could contain your exact experiences. No matter how hard or easy the emergence of human-like life is in general, it can happen on Earth, and your subjective experiences can only exist on Earth (or some planet almost identical to Earth).

A better way to think about SIA is that it favors hypotheses containing more copies of our Hubble volume within the larger universe. Within a given Hubble volume, there can be at most one location where organisms veridically perceive what we perceive.

Katja's blog post on the SIA doomsday draws orange boxes with humans waving their hands. She has us update on knowing we're in the human-level stage, i.e., that we're one of those orange boxes. But we know much more: We know that we're a particular one of those boxes, which is easily distinguished from the others based on what we observe about the world. So any hypothesis that contains us at all will have the same number of boxes containing us (namely, just one box). Hence, no anthropic update.

Am I missing something? :)



The problem with my argument was that I compared the hypothesis "filter is early and you exist on Earth" against "filter is late and you exist on Earth". If the hypotheses already say that you exist on Earth, then there's no more anthropic work to be done. But the heart of the anthropic question is whether an early or late filter predicts that you exist on Earth at all.

Here's an oversimplified example. Suppose that the hypothesis of "early filter" tells us that there are four planets, exactly one of which contains life. "Late filter" says there are four planets, all of which contain life. Suppose for convenience that if life exists on Earth at all, you will exist on Earth. Then P(you exist | early filter) = 1/4 while P(you exist | late filter) = 1. This is where the doomsday update comes from.

A forum for researchers to publicly discuss safety issues in advanced AI

12 RobbBB 13 December 2014 12:33AM

MIRI has an organizational goal of putting a wider variety of mathematically proficient people in a position to advance our understanding of beneficial smarter-than-human AI. The MIRIx workshops, our new research guide, and our more detailed in-the-works technical agenda are intended to further that goal.

To encourage the growth of a larger research community where people can easily collaborate and get up to speed on each other's new ideas, we're also going to roll out an online discussion forum that's specifically focused on resolving technical problems in Friendly AI. MIRI researchers and other interested parties will be able to have more open exchanges there, and get rapid feedback on their ideas and drafts. A relatively small group of people with relevant mathematical backgrounds will be authorized to post on the forum, but all discussion on the site will be publicly visible to visitors.

Topics will run the gamut from logical uncertainty in formal agents to cognitive models of concept generation. The exact range of discussion topics is likely to evolve over time as researchers' priorities change and new researchers join the forum.

We're currently tossing around possible names for the forum, and I wanted to solicit LessWrong's input, since you've been helpful here in the past. (We're also getting input from non-LW mathematicians and computer scientists.) We want to know how confusing, apt, etc. you perceive these variants on 'forum for doing exploratory engineering research in AI' to be:

1. AI Exploratory Research Forum (AIXRF)

2. Forum for Exploratory Engineering in AI (FEEAI)

3. Forum for Exploratory Research in AI (FERAI, or FXRAI)

4. Exploratory AI Research Forum (XAIRF, or EAIRF)

We're also looking at other name possibilities, including:

5. AI Foundations Forum (AIFF)

6. Intelligent Agent Foundations Forum (IAFF)

7. Reflective Agents Research Forum (RARF)

We're trying to avoid names like "friendly" and "normative" that could reinforce someone's impression that we think of AI risk in anthropomorphic terms, that we're AI-hating technophobes, or that we're moral philosophers.

Feedback on the above ideas is welcome, as are new ideas. Feel free to post separate ideas in separate comments, so they can be upvoted individually. We're especially looking for feedback along the lines of: 'I'm a grad student in theoretical computer science and I feel that the name [X] would look bad in a comp sci bibliography or C.V.' or 'I'm friends with a lot of topologists, and I'm pretty sure they'd find the name [Y] unobjectionable and mildly intriguing; I don't know how well that generalizes to mathematical logicians.'

Approval-directed agents

8 paulfchristiano 12 December 2014 10:38PM

Most concern about AI comes down to the scariness of goal-oriented behavior. A common response to such concerns is “why would we give an AI goals anyway?” I think there are good reasons to expect goal-oriented behavior, and I’ve been on that side of a lot of arguments. But I don’t think the issue is settled, and it might be possible to get better outcomes without them. I flesh out one possible alternative here, based on the dictum "take the action I would like best" rather than "achieve the outcome I would like best."

(As an experiment I wrote the post on medium, so that it is easier to provide sentence-level feedback, especially feedback on writing or low-level comments.)

Harper's Magazine article on LW/MIRI/CFAR and Ethereum

37 gwern 12 December 2014 08:34PM

Cover title: “Power and paranoia in Silicon Valley”; article title: “Come with us if you want to live: Among the apocalyptic libertarians of Silicon Valley” (mirrors: 1, 2, 3), by Sam Frank; Harper’s Magazine, January 2015, pg26-36 (~8500 words). The beginning/ending are focused on Ethereum and Vitalik Buterin, so I'll excerpt the LW/MIRI/CFAR-focused middle:

…Blake Masters-the name was too perfect-had, obviously, dedicated himself to the command of self and universe. He did CrossFit and ate Bulletproof, a tech-world variant of the paleo diet. On his Tumblr’s About page, since rewritten, the anti-belief belief systems multiplied, hyperlinked to Wikipedia pages or to the confoundingly scholastic website Less Wrong: “Libertarian (and not convinced there’s irreconcilable fissure between deontological and consequentialist camps). Aspiring rationalist/Bayesian. Secularist/agnostic/ ignostic . . . Hayekian. As important as what we know is what we don’t. Admittedly eccentric.” Then: “Really, really excited to be in Silicon Valley right now, working on fascinating stuff with an amazing team.” I was startled that all these negative ideologies could be condensed so easily into a positive worldview. …I saw the utopianism latent in capitalism-that, as Bernard Mandeville had it three centuries ago, it is a system that manufactures public benefit from private vice. I started CrossFit and began tinkering with my diet. I browsed venal tech-trade publications, and tried and failed to read Less Wrong, which was written as if for aliens.

…I left the auditorium of Alice Tully Hall. Bleary beside the silver coffee urn in the nearly empty lobby, I was buttonholed by a man whose name tag read MICHAEL VASSAR, METAMED research. He wore a black-and-white paisley shirt and a jacket that was slightly too big for him. “What did you think of that talk?” he asked, without introducing himself. “Disorganized, wasn’t it?” A theory of everything followed. Heroes like Elon and Peter (did I have to ask? Musk and Thiel). The relative abilities of physicists and biologists, their standard deviations calculated out loud. How exactly Vassar would save the world. His left eyelid twitched, his full face winced with effort as he told me about his “personal war against the universe.” My brain hurt. I backed away and headed home. But Vassar had spoken like no one I had ever met, and after Kurzweil’s keynote the next morning, I sought him out. He continued as if uninterrupted. Among the acolytes of eternal life, Vassar was an eschatologist. “There are all of these different countdowns going on,” he said. “There’s the countdown to the broad postmodern memeplex undermining our civilization and causing everything to break down, there’s the countdown to the broad modernist memeplex destroying our environment or killing everyone in a nuclear war, and there’s the countdown to the modernist civilization learning to critique itself fully and creating an artificial intelligence that it can’t control. There are so many different - on different time-scales - ways in which the self-modifying intelligent processes that we are embedded in undermine themselves. I’m trying to figure out ways of disentangling all of that. . . .I’m not sure that what I’m trying to do is as hard as founding the Roman Empire or the Catholic Church or something. But it’s harder than people’s normal big-picture ambitions, like making a billion dollars.” Vassar was thirty-four, one year older than I was. He had gone to college at seventeen, and had worked as an actuary, as a teacher, in nanotech, and in the Peace Corps. He’d founded a music-licensing start-up called Sir Groovy. Early in 2012, he had stepped down as president of the Singularity Institute for Artificial Intelligence, now called the Machine Intelligence Research Institute (MIRI), which was created by an autodidact named Eliezer Yudkowsky, who also started Less Wrong. Vassar had left to found MetaMed, a personalized-medicine company, with Jaan Tallinn of Skype and Kazaa, $500,000 from Peter Thiel, and a staff that included young rationalists who had cut their teeth arguing on Yudkowsky’s website. The idea behind MetaMed was to apply rationality to medicine-“rationality” here defined as the ability to properly research, weight, and synthesize the flawed medical information that exists in the world. Prices ranged from $25,000 for a literature review to a few hundred thousand for a personalized study. “We can save lots and lots and lots of lives,” Vassar said (if mostly moneyed ones at first). “But it’s the signal-it’s the ‘Hey! Reason works!’-that matters. . . . It’s not really about medicine.” Our whole society was sick - root, branch, and memeplex - and rationality was the only cure. …I asked Vassar about his friend Yudkowsky. “He has worse aesthetics than I do,” he replied, “and is actually incomprehensibly smart.” We agreed to stay in touch.

One month later, I boarded a plane to San Francisco. I had spent the interim taking a second look at Less Wrong, trying to parse its lore and jargon: “scope insensitivity,” “ugh field,” “affective death spiral,” “typical mind fallacy,” “counterfactual mugging,” “Roko’s basilisk.” When I arrived at the MIRI offices in Berkeley, young men were sprawled on beanbags, surrounded by whiteboards half black with equations. I had come costumed in a Fermat’s Last Theorem T-shirt, a summary of the proof on the front and a bibliography on the back, printed for the number-theory camp I had attended at fifteen. Yudkowsky arrived late. He led me to an empty office where we sat down in mismatched chairs. He wore glasses, had a short, dark beard, and his heavy body seemed slightly alien to him. I asked what he was working on. “Should I assume that your shirt is an accurate reflection of your abilities,” he asked, “and start blabbing math at you?” Eight minutes of probability and game theory followed. Cogitating before me, he kept grimacing as if not quite in control of his face. “In the very long run, obviously, you want to solve all the problems associated with having a stable, self-improving, beneficial-slash-benevolent AI, and then you want to build one.” What happens if an artificial intelligence begins improving itself, changing its own source code, until it rapidly becomes - foom! is Yudkowsky’s preferred expression - orders of magnitude more intelligent than we are? A canonical thought experiment devised by Oxford philosopher Nick Bostrom in 2003 suggests that even a mundane, industrial sort of AI might kill us. Bostrom posited a “superintelligence whose top goal is the manufacturing of paper-clips.” For this AI, known fondly on Less Wrong as Clippy, self-improvement might entail rearranging the atoms in our bodies, and then in the universe - and so we, and everything else, end up as office supplies. Nothing so misanthropic as Skynet is required, only indifference to humanity. What is urgently needed, then, claims Yudkowsky, is an AI that shares our values and goals. This, in turn, requires a cadre of highly rational mathematicians, philosophers, and programmers to solve the problem of “friendly” AI - and, incidentally, the problem of a universal human ethics - before an indifferent, unfriendly AI escapes into the wild.

Among those who study artificial intelligence, there’s no consensus on either point: that an intelligence explosion is possible (rather than, for instance, a proliferation of weaker, more limited forms of AI) or that a heroic team of rationalists is the best defense in the event. That MIRI has as much support as it does (in 2012, the institute’s annual revenue broke $1 million for the first time) is a testament to Yudkowsky’s rhetorical ability as much as to any technical skill. Over the course of a decade, his writing, along with that of Bostrom and a handful of others, has impressed the dangers of unfriendly AI on a growing number of people in the tech world and beyond. In August, after reading Superintelligence, Bostrom’s new book, Elon Musk tweeted, “Hope we’re not just the biological boot loader for digital superintelligence. Unfortunately, that is increasingly probable.” In 2000, when Yudkowsky was twenty, he founded the Singularity Institute with the support of a few people he’d met at the Foresight Institute, a Palo Alto nanotech think tank. He had already written papers on “The Plan to Singularity” and “Coding a Transhuman AI,” and posted an autobiography on his website, since removed, called “Eliezer, the Person.” It recounted a breakdown of will when he was eleven and a half: “I can’t do anything. That’s the phrase I used then.” He dropped out before high school and taught himself a mess of evolutionary psychology and cognitive science. He began to “neuro-hack” himself, systematizing his introspection to evade his cognitive quirks. Yudkowsky believed he could hasten the singularity by twenty years, creating a superhuman intelligence and saving humankind in the process. He met Thiel at a Foresight Institute dinner in 2005 and invited him to speak at the first annual Singularity Summit. The institute’s paid staff grew. In 2006, Yudkowsky began writing a hydra-headed series of blog posts: science-fictionish parables, thought experiments, and explainers encompassing cognitive biases, self-improvement, and many-worlds quantum mechanics that funneled lay readers into his theory of friendly AI. Rationality workshops and Meetups began soon after. In 2009, the blog posts became what he called Sequences on a new website: Less Wrong. The next year, Yudkowsky began publishing Harry Potter and the Methods of Rationality at fanfiction.net. The Harry Potter category is the site’s most popular, with almost 700,000 stories; of these, HPMoR is the most reviewed and the second-most favorited. The last comment that the programmer and activist Aaron Swartz left on Reddit before his suicide in 2013 was on /r/hpmor. In Yudkowsky’s telling, Harry is not only a magician but also a scientist, and he needs just one school year to accomplish what takes canon-Harry seven. HPMoR is serialized in arcs, like a TV show, and runs to a few thousand pages when printed; the book is still unfinished. Yudkowsky and I were talking about literature, and Swartz, when a college student wandered in. Would Eliezer sign his copy of HPMoR? “But you have to, like, write something,” he said. “You have to write, ‘I am who I am.’ So, ‘I am who I am’ and then sign it.” “Alrighty,” Yudkowsky said, signed, continued. “Have you actually read Methods of Rationality at all?” he asked me. “I take it not.” (I’d been found out.) “I don’t know what sort of a deadline you’re on, but you might consider taking a look at that.” (I had taken a look, and hated the little I’d managed.) “It has a legendary nerd-sniping effect on some people, so be warned. That is, it causes you to read it for sixty hours straight.”

The nerd-sniping effect is real enough. Of the 1,636 people who responded to a 2013 survey of Less Wrong’s readers, one quarter had found the site thanks to HPMoR, and many more had read the book. Their average age was 27.4, their average IQ 138.2. Men made up 88.8% of respondents; 78.7% were straight, 1.5% transgender, 54.7 % American, 89.3% atheist or agnostic. The catastrophes they thought most likely to wipe out at least 90% of humanity before the year 2100 were, in descending order, pandemic (bioengineered), environmental collapse, unfriendly AI, nuclear war, pandemic (natural), economic/political collapse, asteroid, nanotech/gray goo. Forty-two people, 2.6 %, called themselves futarchists, after an idea from Robin Hanson, an economist and Yudkowsky’s former coblogger, for reengineering democracy into a set of prediction markets in which speculators can bet on the best policies. Forty people called themselves reactionaries, a grab bag of former libertarians, ethno-nationalists, Social Darwinists, scientific racists, patriarchists, pickup artists, and atavistic “traditionalists,” who Internet-argue about antidemocratic futures, plumping variously for fascism or monarchism or corporatism or rule by an all-powerful, gold-seeking alien named Fnargl who will free the markets and stabilize everything else. At the bottom of each year’s list are suggestive statistical irrelevancies: “every optimizing system’s a dictator and i’m not sure which one i want in charge,” “Autocracy (important: myself as autocrat),” “Bayesian (aspiring) Rationalist. Technocratic. Human-centric Extropian Coherent Extrapolated Volition.” “Bayesian” refers to Bayes’s Theorem, a mathematical formula that describes uncertainty in probabilistic terms, telling you how much to update your beliefs when given new information. This is a formalization and calibration of the way we operate naturally, but “Bayesian” has a special status in the rationalist community because it’s the least imperfect way to think. “Extropy,” the antonym of “entropy,” is a decades-old doctrine of continuous human improvement, and “coherent extrapolated volition” is one of Yudkowsky’s pet concepts for friendly artificial intelligence. Rather than our having to solve moral philosophy in order to arrive at a complete human goal structure, C.E.V. would computationally simulate eons of moral progress, like some kind of Whiggish Pangloss machine. As Yudkowsky wrote in 2004, “In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together.” Yet can even a single human’s volition cohere or compute in this way, let alone humanity’s? We stood up to leave the room. Yudkowsky stopped me and said I might want to turn my recorder on again; he had a final thought. “We’re part of the continuation of the Enlightenment, the Old Enlightenment. This is the New Enlightenment,” he said. “Old project’s finished. We actually have science now, now we have the next part of the Enlightenment project.”

In 2013, the Singularity Institute changed its name to the Machine Intelligence Research Institute. Whereas MIRI aims to ensure human-friendly artificial intelligence, an associated program, the Center for Applied Rationality, helps humans optimize their own minds, in accordance with Bayes’s Theorem. The day after I met Yudkowsky, I returned to Berkeley for one of CFAR’s long-weekend workshops. The color scheme at the Rose Garden Inn was red and green, and everything was brocaded. The attendees were mostly in their twenties: mathematicians, software engineers, quants, a scientist studying soot, employees of Google and Facebook, an eighteen-year-old Thiel Fellow who’d been paid $100,000 to leave Boston College and start a company, professional atheists, a Mormon turned atheist, an atheist turned Catholic, an Objectivist who was photographed at the premiere of Atlas Shrugged II: The Strike. There were about three men for every woman. At the Friday-night meet and greet, I talked with Benja, a German who was studying math and behavioral biology at the University of Bristol, whom I had spotted at MIRI the day before. He was in his early thirties and quite tall, with bad posture and a ponytail past his shoulders. He wore socks with sandals, and worried a paper cup as we talked. Benja had felt death was terrible since he was a small child, and wanted his aging parents to sign up for cryonics, if he could figure out how to pay for it on a grad-student stipend. He was unsure about the risks from unfriendly AI - “There is a part of my brain,” he said, “that sort of goes, like, ‘This is crazy talk; that’s not going to happen’” - but the probabilities had persuaded him. He said there was only about a 30% chance that we could make it another century without an intelligence explosion. He was at CFAR to stop procrastinating. Julia Galef, CFAR’s president and cofounder, began a session on Saturday morning with the first of many brain-as-computer metaphors. We are “running rationality on human hardware,” she said, not supercomputers, so the goal was to become incrementally more self-reflective and Bayesian: not perfectly rational agents, but “agent-y.” The workshop’s classes lasted six or so hours a day; activities and conversations went well into the night. We got a condensed treatment of contemporary neuroscience that focused on hacking our brains’ various systems and modules, and attended sessions on habit training, urge propagation, and delegating to future selves. We heard a lot about Daniel Kahneman, the Nobel Prize-winning psychologist whose work on cognitive heuristics and biases demonstrated many of the ways we are irrational. Geoff Anders, the founder of Leverage Research, a “meta-level nonprofit” funded by Thiel, taught a class on goal factoring, a process of introspection that, after many tens of hours, maps out every one of your goals down to root-level motivations-the unchangeable “intrinsic goods,” around which you can rebuild your life. Goal factoring is an application of Connection Theory, Anders’s model of human psychology, which he developed as a Rutgers philosophy student disserting on Descartes, and Connection Theory is just the start of a universal renovation. Leverage Research has a master plan that, in the most recent public version, consists of nearly 300 steps. It begins from first principles and scales up from there: “Initiate a philosophical investigation of philosophical method”; “Discover a sufficiently good philosophical method”; have 2,000-plus “actively and stably benevolent people successfully seek enough power to be able to stably guide the world”; “People achieve their ultimate goals as far as possible without harming others”; “We have an optimal world”; “Done.” On Saturday night, Anders left the Rose Garden Inn early to supervise a polyphasic-sleep experiment that some Leverage staff members were conducting on themselves. It was a schedule called the Everyman 3, which compresses sleep into three twenty-minute REM naps each day and three hours at night for slow-wave. Anders was already polyphasic himself. Operating by the lights of his own best practices, goal-factored, coherent, and connected, he was able to work 105 hours a week on world optimization. For the rest of us, for me, these were distant aspirations. We were nerdy and unperfected. There was intense discussion at every free moment, and a genuine interest in new ideas, if especially in testable, verifiable ones. There was joy in meeting peers after years of isolation. CFAR was also insular, overhygienic, and witheringly focused on productivity. Almost everyone found politics to be tribal and viscerally upsetting. Discussions quickly turned back to philosophy and math. By Monday afternoon, things were wrapping up. Andrew Critch, a CFAR cofounder, gave a final speech in the lounge: “Remember how you got started on this path. Think about what was the time for you when you first asked yourself, ‘How do I work?’ and ‘How do I want to work?’ and ‘What can I do about that?’ . . . Think about how many people throughout history could have had that moment and not been able to do anything about it because they didn’t know the stuff we do now. I find this very upsetting to think about. It could have been really hard. A lot harder.” He was crying. “I kind of want to be grateful that we’re now, and we can share this knowledge and stand on the shoulders of giants like Daniel Kahneman . . . I just want to be grateful for that. . . . And because of those giants, the kinds of conversations we can have here now, with, like, psychology and, like, algorithms in the same paragraph, to me it feels like a new frontier. . . . Be explorers; take advantage of this vast new landscape that’s been opened up to us in this time and this place; and bear the torch of applied rationality like brave explorers. And then, like, keep in touch by email.” The workshop attendees put giant Post-its on the walls expressing the lessons they hoped to take with them. A blue one read RATIONALITY IS SYSTEMATIZED WINNING. Above it, in pink: THERE ARE OTHER PEOPLE WHO THINK LIKE ME. I AM NOT ALONE.

That night, there was a party. Alumni were invited. Networking was encouraged. Post-its proliferated; one, by the beer cooler, read SLIGHTLY ADDICTIVE. SLIGHTLY MIND-ALTERING. Another, a few feet to the right, over a double stack of bound copies of Harry Potter and the Methods of Rationality: VERY ADDICTIVE. VERY MIND-ALTERING. I talked to one of my roommates, a Google scientist who worked on neural nets. The CFAR workshop was just a whim to him, a tourist weekend. “They’re the nicest people you’d ever meet,” he said, but then he qualified the compliment. “Look around. If they were effective, rational people, would they be here? Something a little weird, no?” I walked outside for air. Michael Vassar, in a clinging red sweater, was talking to an actuary from Florida. They discussed timeless decision theory (approximately: intelligent agents should make decisions on the basis of the futures, or possible worlds, that they predict their decisions will create) and the simulation argument (essentially: we’re living in one), which Vassar traced to Schopenhauer. He recited lines from Kipling’s “If-” in no particular order and advised the actuary on how to change his life: Become a pro poker player with the $100k he had in the bank, then hit the Magic: The Gathering pro circuit; make more money; develop more rationality skills; launch the first Costco in Northern Europe. I asked Vassar what was happening at MetaMed. He told me that he was raising money, and was in discussions with a big HMO. He wanted to show up Peter Thiel for not investing more than $500,000. “I’m basically hoping that I can run the largest convertible-debt offering in the history of finance, and I think it’s kind of reasonable,” he said. “I like Peter. I just would like him to notice that he made a mistake . . . I imagine a hundred million or a billion will cause him to notice . . . I’d like to have a pi-billion-dollar valuation.” I wondered whether Vassar was drunk. He was about to drive one of his coworkers, a young woman named Alyssa, home, and he asked whether I would join them. I sat silently in the back of his musty BMW as they talked about potential investors and hires. Vassar almost ran a red light. After Alyssa got out, I rode shotgun, and we headed back to the hotel.

It was getting late. I asked him about the rationalist community. Were they really going to save the world? From what? “Imagine there is a set of skills,” he said. “There is a myth that they are possessed by the whole population, and there is a cynical myth that they’re possessed by 10% of the population. They’ve actually been wiped out in all but about one person in three thousand.” It is important, Vassar said, that his people, “the fragments of the world,” lead the way during “the fairly predictable, fairly total cultural transition that will predictably take place between 2020 and 2035 or so.” We pulled up outside the Rose Garden Inn. He continued: “You have these weird phenomena like Occupy where people are protesting with no goals, no theory of how the world is, around which they can structure a protest. Basically this incredibly, weirdly, thoroughly disempowered group of people will have to inherit the power of the world anyway, because sooner or later everyone older is going to be too old and too technologically obsolete and too bankrupt. The old institutions may largely break down or they may be handed over, but either way they can’t just freeze. These people are going to be in charge, and it would be helpful if they, as they come into their own, crystallize an identity that contains certain cultural strengths like argument and reason.” I didn’t argue with him, except to press, gently, on his particular form of elitism. His rationalism seemed so limited to me, so incomplete. “It is unfortunate,” he said, “that we are in a situation where our cultural heritage is possessed only by people who are extremely unappealing to most of the population.” That hadn’t been what I’d meant. I had meant rationalism as itself a failure of the imagination. “The current ecosystem is so totally fucked up,” Vassar said. “But if you have conversations here”-he gestured at the hotel-“people change their mind and learn and update and change their behaviors in response to the things they say and learn. That never happens anywhere else.” In a hallway of the Rose Garden Inn, a former high-frequency trader started arguing with Vassar and Anna Salamon, CFAR’s executive director, about whether people optimize for hedons or utilons or neither, about mountain climbers and other high-end masochists, about whether world happiness is currently net positive or negative, increasing or decreasing. Vassar was eating and drinking everything within reach. My recording ends with someone saying, “I just heard ‘hedons’ and then was going to ask whether anyone wants to get high,” and Vassar replying, “Ah, that’s a good point.” Other voices: “When in California . . .” “We are in California, yes.”

…Back on the East Coast, summer turned into fall, and I took another shot at reading Yudkowsky’s Harry Potter fanfic. It’s not what I would call a novel, exactly, rather an unending, self-satisfied parable about rationality and trans-humanism, with jokes.

…I flew back to San Francisco, and my friend Courtney and I drove to a cul-de-sac in Atherton, at the end of which sat the promised mansion. It had been repurposed as cohousing for children who were trying to build the future: start-up founders, singularitarians, a teenage venture capitalist. The woman who coined the term “open source” was there, along with a Less Wronger and Thiel Capital employee who had renamed himself Eden. The Day of the Idealist was a day for self-actualization and networking, like the CFAR workshop without the rigor. We were to set “mega goals” and pick a “core good” to build on in the coming year. Everyone was a capitalist; everyone was postpolitical. I squabbled with a young man in a Tesla jacket about anti-Google activism. No one has a right to housing, he said; programmers are the people who matter; the protesters’ antagonistic tactics had totally discredited them.

…Thiel and Vassar and Yudkowsky, for all their far-out rhetoric, take it on faith that corporate capitalism, unchecked just a little longer, will bring about this era of widespread abundance. Progress, Thiel thinks, is threatened mostly by the political power of what he calls the “unthinking demos.”

Pointer thanks to /u/Vulture.

Weekly LW Meetups

3 FrankAdamek 12 December 2014 05:06PM

This summary was posted to LW Main on December 5th. The following week's summary is here.

Irregularly scheduled Less Wrong meetups are taking place in:

The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:

Locations with regularly scheduled meetups: Austin, Berkeley, Berlin, Boston, Brussels, Buffalo, Cambridge UK, Canberra, Columbus, London, Madison WI, Melbourne, Moscow, Mountain View, New York, Philadelphia, Research Triangle NC, Seattle, Sydney, Toronto, Vienna, Washington DC, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers.

continue reading »

Robin Hanson talking about Bias on Stossel tonight

3 buybuydandavis 12 December 2014 05:15AM


Stossel has a page with full episodes. I don't know when it will show there. Hanson was the first guest, and was done by the 12 minute mark.




What Peter Thiel thinks about AI risk

12 Dr_Manhattan 11 December 2014 09:22PM

This is probably the clearest statement from him on the issue:


25:30 mins in


TL;DR: he thinks its an issue but also feels AGI is very distant and hence less worried about it than Musk.


I recommend the rest of the lecture as well, it's a good summary of "Zero to One"  and a good QA afterwards.

Make your own cost-effectiveness Fermi estimates for one-off problems

8 owencb 11 December 2014 12:00PM

In some recent work (particularly this article) I built models for estimating the cost effectiveness of work on problems when we don’t know how hard those problems are. The estimates they produce aren’t perfect, but they can get us started where it’s otherwise hard to make comparisons.

Now I want to know: what can we use this technique on? I have a couple of applications I am working on, but I’m keen to see what estimates other people produce.

There are complicated versions of the model which account for more factors, but we can start with a simple version. This is a tool for initial Fermi calculations: it’s relatively easy to use but should get us around the right order of magnitude. That can be very useful, and we can build more detailed models for the most promising opportunities.

The model is given by:


This expresses the expected benefit of adding another unit of resources to solving the problem. You can denominate the resources in dollars, researcher-years, or another convenient unit. To use this formula we need to estimate four variables:

  • R(0) denotes the current resources going towards the problem each year. Whatever units you measure R(0) in, those are the units we’ll get an estimate for the benefit of. So if R(0) is measured in researcher-years, the formula will tell us the expected benefit of adding a researcher year.

    • You want to count all of the resources going towards the problem. That includes the labour of those who work on it in their spare time, and some weighting for the talent of the people working in the area (if you doubled the budget going to an area, you couldn’t get twice as many people who are just as good; ideally we’d use an elasticity here).

    • Some resources may be aimed at something other than your problem, but be tangentially useful. We should count some fraction of those, according to how much resources devoted entirely to the problem they seem equivalent to.

  • B is the annual benefit that we’d get from a solution to the problem. You can measure this in its own units, but whatever you use here will be the units of value that come out in the cost-effectiveness estimate.

  • p and y/z are parameters that we will estimate together. p is the probability of getting a solution by the time y resources have been dedicated to the problem, if z resources have been dedicated so far. Note that we only need the ratio y/z, so we can estimate this directly.

    • Although y/z is hard to estimate, we will take a (natural) logarithm of it, so don’t worry too much about making this term precise.

    • I think it will often be best to use middling values of p, perhaps between 0.2 and 0.8.

And that’s it.

Example: How valuable is extra research into nuclear fusion? Assume:

  • R(0) = $5 billion (after a quick google turns up $1.5B for current spending, and adjusting upwards to account for non-financial inputs);

  • B = $1000 billion (guesswork, a bit over 1% of the world economy; a fraction of the current energy sector);

  • There’s a 50% chance of success (p = 0.5) by the time we’ve spent 100 times as many resources as today (log(y/z) = log(100) = 4.6).

Putting these together would give an expected societal benefit of (0.5*$1000B)/(5B*4.6) = $22 for every dollar spent. This is high enough to suggest that we may be significantly under-investing in fusion, and that a more careful calculation (with better-researched numbers!) might be justified.


To get the simple formula, the model made a number of assumptions. Since we’re just using it to get rough numbers, it’s okay if we don’t fit these assumptions exactly, but if they’re totally off then the model may be inappropriate. One restriction in particular I’d want to bear in mind:

  • It should be plausible that we could solve the problem in the next decade or two.

It’s okay if this is unlikely, but I’d want to change the model if I were estimating the value of e.g. trying to colonise the stars.

Request for applications

So -- what would you like to apply this method to? What answers do you get?

To help structure the comment thread, I suggest attempting only one problem in each  comment. Include the value of p, and the units of R(0) and units of B that you’d like to use. Then you can give your estimates for R(0), B, and y/z as a comment reply, and so can anyone else who wants to give estimates for the same thing.

I’ve also set up a google spreadsheet where we can enter estimates for the questions people propose. For the time being anyone can edit this.

Have fun!

[link] Etzioni: AI will empower us, not exterminate us

4 CBHacking 11 December 2014 08:51AM


(Slashdot discussion: http://tech.slashdot.org/story/14/12/10/1719232/ai-expert-ai-wont-exterminate-us----it-will-empower-us)

Not sure what the local view of Oren Etzioni or the Allen Institute for AI is, but I'm curious what people think if his views on UFAI risk. As far as I can tell from this article, it basically boils down to "AGI won't happen, at least not any time soon." Is there (significant) reason to believe he's wrong, or is it simply too great a risk to leave to chance?

Cognitive distortions of founders

3 Dr_Manhattan 11 December 2014 03:19AM

Interesting take on entrepreneurial success:


More in depth here:


Curious what people here think of this. 


Lifehack Ideas December 2014

9 Gondolinian 10 December 2014 12:21AM
Life hacking refers to any trick, shortcut, skill, or novelty method that increases productivity and efficiency, in all walks of life.



This thread is for posting any promising or interesting ideas for lifehacks you've come up with or heard of.  If you've implemented your idea, please share the results.  You are also encouraged to post lifehack ideas you've tried out that have not been successful, and why you think they weren't.  If you can, please give credit for ideas that you got from other people.

To any future posters of Lifehack Ideas threads, please remember to add the "lifehacks_thread" tag.

The Limits of My Rationality

1 JoshuaMyer 09 December 2014 09:08PM

As requested here is an introductory abstract.

The search for bias in the linguistic representations of our cognitive processes serves several purposes in this community. By pruning irrational thoughts, we can potentially effect each other in complex ways. Leaning heavy on cognitivist pedagogy, this essay represents my subjective experience trying to reconcile a perceived conflict between the rhetorical goals of the community and the absence of a generative, organic conceptualization of rationality.

The Story

    Though I've only been here a short time, I find myself fascinated by this discourse community. To discover a group of individuals bound together under the common goal of applied rationality has been an experience that has enriched my life significantly. So please understand, I do not mean to insult by what I am about to say, merely to encourage a somewhat more constructive approach to what I understand as the goal of this community: to apply collectively reinforced notions of rational thought to all areas of life.
    As I followed the links and read the articles on the homepage, I found myself somewhat disturbed by the juxtaposition of these highly specific definitions of biases to the narrative structures of parables providing examples in which a bias results in an incorrect conclusion. At first, I thought that perhaps my emotional reaction stemmed from rejecting the unfamiliar; naturally, I decided to learn more about the situation.

    As I read on, my interests drifted from the rhetorical structure of each article (if anyone is interested I might pursue an analysis of rhetoric further though I'm not sure I see a pressing need for this), towards the mystery of how others in the community apply the lessons contained therein. My belief was that the parables would cause most readers to form a negative association of the bias with an undesirable outcome.

    Even a quick skim of the discussions taking place on this site will reveal energetic debate on a variety of topics of potential importance, peppered heavily with accusations of bias. At this point, I noticed the comments that seem to get voted up are ones that are thoughtfully composed, well informed, soundly conceptualized and appropriately referential. Generally, this is true of the articles as well, and so it should be in productive discourse communities. Though I thought it prudent to not read every conversation in absolute detail, I also noticed that the most participated in lines of reasoning were far more rhetorically complex than the parables' portrayal of bias alone could explain. Sure the establishment of bias still seemed to represent the most commonly used rhetorical device on the forums ...

    At this point, I had been following a very interesting discussion on this site about politics. I typically have little or no interest in political theory, but "NRx" vs. "Prog" Assumptions: Locating the Sources of Disagreement Between Neoreactionaries and Progressives (Part 1) seemed so out of place in a community whose political affiliations might best be summarized the phrase "politics is the mind killer" that I couldn't help but investigate. More specifically, I was trying to figure out why it had been posted here at all (I didn't take issue with either the scholarship or intent of the article, but the latter wasn't obvious to me, perhaps because I was completely unfamiliar with the coinage "neoreactionary").

    On my third read, I made a connection to an essay about the socio-historical foundations of rhetoric. In structure, the essay progressed through a wide variety of specific observations on both theory and practice of rhetoric in classical Europe, culminating in a well argued but very unwieldy thesis; at some point in the middle of the essay, I recall a paragraph that begins with the assertion that every statement has political dimensions. I conveyed this idea as eloquently as I could muster, and received a fair bit of karma for it. And to think that it all began with a vague uncomfortable feeling and a desire to understand!

The Lesson

    So you are probably wondering what any of this has to do with rationality, cognition, or the promise of some deeply insightful transformative advice mentioned in the first paragraph. Very good.

    Cognition, a prerequisite for rationality, is a complex process; cognition can be described as the process by which ideas form, interact and evolve. Notice that this definition alone cannot explain how concepts like rationality form, why ideas form or how they should interact to produce intelligence. That specific shortcoming has long crippled cognitivist pedagogies in many disciplines -- no matter which factors you believe to determine intelligence, it is undeniably true that the process by which it occurs organically is not well-understood.

    More intricate models of cognition traditionally vary according to the sets of behavior they seek to explain; in general, this forum seems to concern itself with the wider sets of human behavior, with a strange affinity for statistical analysis. It also seems as if most of the people here associate agency with intelligence, though this should be regarded as unsubstantiated anecdote; I have little interest in what people believe, but those beliefs can have interesting consequences. In general, good models of cognition that yield a sense of agency have to be able to explain how a mushy organic collection of cells might become capable of generating a sense of identity. For this reason, our discussion of cognition will treat intelligence as a confluence of passive processes that lead to an approximation of agency.

    Who are we? What is intelligence? To answer these or any natural language questions we first search for stored-solutions to whatever we perceive as the problem, even as we generate our conception of the question as a set of abstract problems from interactions between memories. In the absence of recognizing a pattern that triggers a stored solution, a new solution is generated by processes of association and abstraction. This process may be central to the generation of every rational and irrational thought a human will ever have. I would argue that the phenomenon of agency approximates an answer to the question: "who am I?" and that any discussion of consciousness should at least acknowledge how critical natural language use is to universal agreement on any matter. I will gladly discuss this matter further and in greater detail if asked.

    At this point, I feel compelled to mention that my initial motivation for pursuing this line of reasoning stems from the realization that this community discusses rationality in a way that differs somewhat from my past encounters with the word.

    Out there, it is commonly believed that rationality develops (in hindsight) to explain the subjective experience of cognition; here we assert a fundamental difference between rationality and this other concept called rationalization. I do not see the utility of this distinction, nor have I found a satisfying explanation of how this distinction operates within accepted models for human learning in such a way that does not assume an a priori method of sorting the values which determine what is considered "rational". Thus we find there is a general derth of generative models of rational cognition beside a plethora of techniques for spotting irrational or biased methods of thinking.

    I see a lot of discussion on the forums very concerned with objective predictions of the future wherein it seems as if rationality (often of a highly probabilistic nature) is, in many cases, expected to bridge the gap between the worlds we can imagine to be possible and our many somewhat subjective realities. And the force keeping these discussions from splintering off into unproductive pissing about is a constant search for bias.

    I know I'm not going to be the first among us to suggest that the search for bias is not truly synonymous with rationality, but I would like to clarify before concluding. Searching for bias in cognitive processes can be a very productive way to spend one's waking hours, and it is a critical element to structuring the subjective world of cognition in such a way that allows abstraction to yield the kind of useful rules that comprise rationality. But it is not, at its core, a generative process.

    Let us consider the cognitive process of association (when beliefs, memories, stimuli or concepts become connected to form more complex structures). Without that period of extremely associative and biased cognition experienced during early childhood, we might never learn to attribute the perceived cause of a burn to a hot stove. Without concepts like better and worse to shape our young minds, I imagine many of us would simply lack the attention span to learn about ethics. And what about all the biases that make parables an effective way of conveying information? After all, the strength of a rhetorical argument is in it's appeal to the interpretive biases of it's intended audience and not the relative consistency of the conceptual foundations of that argument.

    We need to shift discussions involving bias towards models of cognition more complex than portraying it as simply an obstacle to rationality. In my conception of reality, recognizing the existence of bias seems to play a critical role in the development of more complex methods of abstraction; indeed, biases are an intrinsic side effect of the generative grouping of observations that is the core of Bayesian reasoning.

    In short, biases are not generative processes. Discussions of bias are not necessarily useful, rational or intelligent. A deeper understanding of the nature of intelligence requires conceptualizations that embrace the organic truths at the core of sentience; we must be able to describe our concepts of intelligence, our "rationality", such that it can emerge organically as the generative processes at the core of cognition.

    The Idea

    I'd be interested to hear some thoughts about how we might grow to recognize our own biases as necessary to the formative stages of abstraction alongside learning to collectively search for and eliminate biases from our decision making processes. The human mind is limited and while most discussions in natural language never come close to pressing us to those limits, our limitations can still be relevant to those discussions as well as to discussions of artificial intelligences. The way I see things, a bias free machine possessing a model of our own cognition would either have to have stored solutions for every situation it could encounter or methods of generating stored solutions for all future perceived problems (both of which sound like descriptions of oracles to me, though the latter seems more viable from a programmer's perspective).

    A machine capable of making the kinds of decisions considered "easy" for humans, might need biases at some point during it's journey to the complex and self consistent methods of decision making associated with rationality. This is a rhetorically complex community, but at the risk of my reach exceeding my grasp, I would be interested in seeing an examination of the Affect Heuristic in human decision making as an allegory for the historic utility of fuzzy values in chess AI.

    Thank you for your time, and I look forward to what I can only hope will be challenging and thoughtful responses.

Feedback requested by Intentional Insights on workbook conveying rational thinking about meaning and purpose to a broad audience

5 Gleb_Tsipursky 09 December 2014 07:03PM

We at Intentional Insights would appreciate your help with feedback on optimize a workbook that conveys rational thinking to find meaning and purpose in life for a broad audience. Last time, we asked for your feedback, and we changed our content offerings based on comments we received from fellow Less Wrongers, as you can see from the Edit to this post. We would be glad to update our beliefs again and revise the workbook based on your feedback.

For a bit of context, the workbook is part of our efforts to promote rational thinking to a broad audience and thus raise the sanity waterline. It’s based on research on how other societies besides the United States helped their citizens find meaning and purpose, such as research I did on the Soviet Union and Zuckerman did on Sweden and Denmark. It’s also based on research on the contemporary United States by psychologists such as Steger, Duffy and Dik, Seligman, and others.

The target audience is reason-minded youth and young adults, especially secular-oriented ones. The goal is to get such people to engage with academic research on how our minds work, and thus get them interested in exploring rational thinking more broadly, eventually getting them turned on to more advanced rationality, such as found on Less Wrong itself. The workbook is written in a style aimed to create cognitive ease, with narratives, personal stories, graphics, and research-based exercises.

Here is the link to the workbook draft itself. Any and all suggestions are welcomed, and thanks for taking the time to engage with this workbook and give your feedback – much appreciated!


Does utilitarianism "require" extreme self sacrifice? If not why do people commonly say it does?

5 Princess_Stargirl 09 December 2014 08:32AM

Chist Hallquist wrote the following in an article (if you know the article please, please don't bring it up, I don't want to discuss the article in general):

"For example, utilitarianism apparently endorses killing a single innocent person and harvesting their organs if it will save five other people. It also appears to imply that donating all your money to charity beyond what you need to survive isn’t just admirable but morally obligatory. "

The non-bold part is not what is confusing me. But where does the "obligatory" part come in. I don't really how its obvious what, if any, ethical obligations utilitarianism implies. given a set of basic assumptions utilitarianism lets you argue whether one action is more moral than another. But I don’t see how its obvious which, if any, moral benchmarks utilitarianism sets for “obligatory.” I can see how certain frameworks on top of utilitarianism imply certain moral requirements. But I do not see how the bolded quote is a criticism of the basic theory of utilitarianism.

However this criticism comes up all the time. Honestly the best explanation I could come up with was that people were being unfair to utilitarianism and not thinking through their statements. But the above quote is by HallQ who is intelligent and thoughtful. So now I am genuinely very curious.

Do you think utilitarianism really require such extreme self sacrifice and if so why? And if it does not require this why do so many people say it does? I am very confused and would appreciate help working this out.


I am having trouble asking this question clearly. Since utilitarianism is probably best thought of as a cluster of beliefs. So its not clear what asking "does utilitarianism imply X" actually means. Still I made this post since I am confused. Many thoughtful people identity as utilitarian (for example Ozy and theunitofcaring) yet do not think people have extreme obligations. However I can think of examples where people do not seem to understand the implications of their ethical frameowrks. For example many Jewish people endorse the message of the following story:

Rabbi Hilel was asked to explain the Torah while standing on one foot and responded "What is hateful to you, do not do to your neighbor. That is the whole Torah; the rest is the explanation of this--go and study it!"

The story is presumably apocryphal but it is repeated all the time by Jewish people. However its hard to see how the story makes even a semblance of sense. The torah includes huge amounts of material that violates the "golden Rule" very badly. So people who think this story gives even a moderately accurate picture of the Torah's message are mistaken imo.

An investment analogy for Pascal's Mugging

5 [deleted] 09 December 2014 07:50AM

A lottery ticket sometimes has positive expected value, (a $1 ticket might be expected to pay out $1.30). How many tickets should you buy?

Probably none. Informally, all but the richest players can expect to go broke before they win, despite the positive expected value of a ticket.

In more precise terms: In order to maximize the long-term growth rate of your money (or log money), you'll want to put a very small fraction of your bankroll into lotteries tickets, which will imply an "amount to invest" that is less than the cost of a single ticket, (excluding billionaires). If you put too great a proportion of your resources into a risky but positive expected value asset, the long-term growth rate of your resources can become negative. For an intuitive example, imagine Bill Gates dumping 99% percent of his wealth into a series of positive expected-value bets with single-lottery-ticket-like odds.

This article has some graphs and details on the lottery. This pdf on the Kelly criterion has some examples and general dicussion of this type of problem.

Can we think about Pascal mugging the same way?

The applicability might depend on whether we're trading resource-generating-resources for non-resource-generating assets. So if we're offered something like cash, the lottery ticket model (with payout inversely varying with estimated odds) is a decent fit. But what if we're offered utility in some direct and non-interest-bearing form?

Another limit: For a sufficiency unlikely but positive-expected-value gamble, you can expect the heat death of the universe before actually realizing any of the expected value.


Superintelligence 13: Capability control methods

7 KatjaGrace 09 December 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.

Welcome. This week we discuss the thirteenth section in the reading guide: capability control methods. This corresponds to the start of chapter nine.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Two agency problems” and “Capability control methods” from Chapter 9


  1. If the default outcome is doom, how can we avoid it? (p127)
  2. We can divide this 'control problem' into two parts:
    1. The first principal-agent problem: the well known problem faced by a sponsor wanting an employee to fulfill their wishes (usually called 'the principal agent problem')
    2. The second principal-agent problem: the emerging problem of a developer wanting their AI to fulfill their wishes
  3. How to solve second problem? We can't rely on behavioral observation (as seen in week 11). Two other options are 'capability control methods' and 'motivation selection methods'. We see the former this week, and the latter next week.
  4. Capability control methods: avoiding bad outcomes through limiting what an AI can do. (p129)
  5. Some capability control methods:
    1. Boxing: minimize interaction between the AI and the outside world. Note that the AI must interact with the world to be useful, and that it is hard to eliminate small interactions. (p129)
    2. Incentive methods: set up the AI's environment such that it is in the AI's interest to cooperate. e.g. a social environment with punishment or social repercussions often achieves this for contemporary agents. One could also design a reward system, perhaps with cryptographic rewards (so that the AI could not wirehead) or heavily discounted rewards (so that long term plans are not worth the short term risk of detection) (p131)
      • Anthropic capture: an AI thinks it might be in a simulation, and so tries to behave as will be rewarded by simulators (box 8; p134)
    3. Stunting: limit the AI's capabilities. This may be hard to do to a degree that avoids danger and is still useful. An option here is to limit the AI's information. A strong AI may infer much from little apparent access to information however. (p135)
    4. Tripwires: test the system without its knowledge, and shut it down if it crosses some boundary. This might be combined with 'honey pots' to attract undesirable AIs take an action that would reveal them. Tripwires could test behavior, ability, or content. (p137)

Another view

Brian Clegg reviews the book mostly favorably, but isn't convinced that controlling an AI via merely turning it off should be so hard:

I also think a couple of the fundamentals aren’t covered well enough, but pretty much assumed. One is that it would be impossible to contain and restrict such an AI. Although some effort is put into this, I’m not sure there is enough thought put into the basics of ways you can pull the plug manually – if necessary by shutting down the power station that provides the AI with electricity.
Kevin Kelly also apparently doubts that AI will substantially impede efforts to modify it:

...We’ll reprogram the AIs if we are not satisfied with their performance...

...This is an engineering problem. So far as I can tell, AIs have not yet made a decision that its human creators have regretted. If they do (or when they do), then we change their algorithms. If AIs are making decisions that our society, our laws, our moral consensus, or the consumer market, does not approve of, we then should, and will, modify the principles that govern the AI, or create better ones that do make decisions we approve. Of course machines will make “mistakes,” even big mistakes – but so do humans. We keep correcting them. There will be tons of scrutiny on the actions of AI, so the world is watching. However, we don’t have universal consensus on what we find appropriate, so that is where most of the friction about them will come from. As we decide, our AI will decide...

This may be related to his view that AI is unlikely to modify itself (from further down the same page):

3. Reprogramming themselves, on their own, is the least likely of many scenarios.

The great fear pumped up by some, though, is that as AI gain our confidence in making decisions, they will somehow prevent us from altering their decisions. The fear is they lock us out. They go rogue. It is very difficult to imagine how this happens. It seems highly improbable that human engineers would program an AI so that it could not be altered in any way. That is possible, but so impractical. That hobble does not even serve a bad actor. The usual scary scenario is that an AI will reprogram itself on its own to be unalterable by outsiders. This is conjectured to be a selfish move on the AI’s part, but it is unclear how an unalterable program is an advantage to an AI. It would also be an incredible achievement for a gang of human engineers to create a system that could not be hacked. Still it may be possible at some distant time, but it is only one of many possibilities. An AI could just as likely decide on its own to let anyone change it, in open source mode. Or it could decide that it wanted to merge with human will power. Why not? In the only example we have of an introspective self-aware intelligence (hominids), we have found that evolution seems to have designed our minds to not be easily self-reprogrammable. Except for a few yogis, you can’t go in and change your core mental code easily. There seems to be an evolutionary disadvantage to being able to easily muck with your basic operating system, and it is possible that AIs may need the same self-protection. We don’t know. But the possibility they, on their own, decide to lock out their partners (and doctors) is just one of many possibilities, and not necessarily the most probable one.



1. What do you do with a bad AI once it is under your control?

Note that capability control doesn't necessarily solve much: boxing, stunting and tripwires seem to just stall a superintelligence rather than provide means to safely use one to its full capacity. This leaves the controlled AI to be overtaken by some other unconstrained AI as soon as someone else isn't so careful. In this way, capability control methods seem much like slowing down AI research: helpful in the short term while we find better solutions, but not in itself a solution to the problem.

However this might be too pessimistic. An AI whose capabilities are under control might either be almost as useful as an uncontrolled AI who shares your goals (if interacted with the right way), or at least be helpful in getting to a more stable situation.

Paul Christiano outlines a scheme for safely using an unfriendly AI to solve some kinds of problems. We have both blogged on general methods for getting useful work from adversarial agents, which is related.

2. Cryptographic boxing

Paul Christiano describes a way to stop an AI interacting with the environment using a cryptographic box.

3. Philosophical Disquisitions

Danaher again summarizes the chapter well. Read it if you want a different description of any of the ideas, or to refresh your memory. He also provides a table of the methods presented in this chapter.

4. Some relevant fiction

That Alien Message by Eliezer Yudkowsky

5. Control through social integration

Robin Hanson argues that it matters more that a population of AIs are integrated into our social institutions, and that they keep the peace among themselves through the same institutions we keep the peace among ourselves, than whether they have the right values. He thinks this is why you trust your neighbors, not because you are confident that they have the same values as you. He has several followup posts.

6. More miscellaneous writings on these topics

LessWrong wiki on AI boxingArmstrong et al on controlling and using an oracle AIRoman Yampolskiy on 'leakproofing' the singularity. I have not necessarily read these.


In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.


  1. Choose any control method and work out the details better. For instance:
    1. Could one construct a cryptographic box for an untrusted autonomous system?
    2. Investigate steep temporal discounting as an incentives control method for an untrusted AGI.
  2. Are there other capability control methods we could add to the list?
  3. Devise uses for a malicious but constrained AI.
  4. How much pressure is there likely to be to develop AI which is not controlled?
  5. If existing AI methods had unexpected progress and were heading for human-level soon, what precautions should we take now?


If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about 'motivation selection methods'. To prepare, read “Motivation selection methods” and “Synopsis” from Chapter 9The discussion will go live at 6pm Pacific time next Monday 15th December. Sign up to be notified here.

Stupid Questions December 2014

14 Gondolinian 08 December 2014 03:39PM

This thread is for asking any questions that might seem obvious, tangential, silly or what-have-you. Don't be shy, everyone has holes in their knowledge, though the fewer and the smaller we can make them, the better.

Please be respectful of other people's admitting ignorance and don't mock them for it, as they're doing a noble thing.

To any future monthly posters of SQ threads, please remember to add the "stupid_questions" tag.

[Link] An exact mapping between the Variational Renormalization Group and Deep Learning]

4 Gunnar_Zarncke 08 December 2014 02:33PM

An exact mapping between the Variational Renormalization Group and Deep Learning by Pankaj Mehta, David J. Schwab

Deep learning is a broad set of techniques that uses multiple layers of representation to automatically learn relevant features directly from structured data. Recently, such techniques have yielded record-breaking results on a diverse set of difficult machine learning tasks in computer vision, speech recognition, and natural language processing. Despite the enormous success of deep learning, relatively little is understood theoretically about why these techniques are so successful at feature learning and compression. Here, we show that deep learning is intimately related to one of the most important and successful techniques in theoretical physics, the renormalization group (RG). RG is an iterative coarse-graining scheme that allows for the extraction of relevant features (i.e. operators) as a physical system is examined at different length scales. We construct an exact mapping from the variational renormalization group, first introduced by Kadanoff, and deep learning architectures based on Restricted Boltzmann Machines (RBMs). We illustrate these ideas using the nearest-neighbor Ising Model in one and two-dimensions. Our results suggests that deep learning algorithms may be employing a generalized RG-like scheme to learn relevant features from data.

To me this paper suggests that deep learning is an approach that could be made or is already conceptually general enough to learn everything there is to learn (assuming sufficient time and resources). Thus it could already be used as the base algorithm of a self-optimizing AGI. 

View more: Next