Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Superintelligence 24: Morality models and "do what I mean"

7 KatjaGrace 24 February 2015 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.

Welcome. This week we discuss the twenty-fourth section in the reading guideMorality models and "Do what I mean".

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Morality models” and “Do what I mean” from Chapter 13.


  1. Moral rightness (MR) AI: AI which seeks to do what is morally right
    1. Another form of 'indirect normativity'
    2. Requires moral realism to be true to do anything, but we could ask the AI to evaluate that and do something else if moral realism is false
    3. Avoids some complications of CEV
    4. If moral realism is true, is better than CEV (though may be terrible for us)
  2. We often want to say 'do what I mean' with respect to goals we try to specify. This is doing a lot of the work sometimes, so if we could specify that well perhaps it could also just stand alone: do what I want. This is much like CEV again.

Another view

Olle Häggström again, on Bostrom's 'Milky Way Preserve':

The idea [of a Moral Rightness AI] is that a superintelligence might be successful at the task (where we humans have so far failed) of figuring out what is objectively morally right. It should then take objective morality to heart as its own values.1,2

Bostrom sees a number of pros and cons of this idea. A major concern is that objective morality may not be in humanity's best interest. Suppose for instance (not entirely implausibly) that objective morality is a kind of hedonistic utilitarianism, where "an action is morally right (and morally permissible) if and only if, among all feasible actions, no other action would produce a greater balance of pleasure over suffering" (p 219). Some years ago I offered a thought experiment to demonstrate that such a morality is not necessarily in humanity's best interest. Bostrom reaches the same conclusion via a different thought experiment, which I'll stick with here in order to follow his line of reasoning.3 Here is his scenario:
    The AI [...] might maximize the surfeit of pleasure by converting the accessible universe into hedonium, a process that may involve building computronium and using it to perform computations that instantiate pleasurable experiences. Since simulating any existing human brain is not the most efficient way of producing pleasure, a likely consequence is that we all die.
Bostrom is reluctant to accept such a sacrifice for "a greater good", and goes on to suggest a compromise:
    The sacrifice looks even less appealing when we reflect that the superintelligence could realize a nearly-as-great good (in fractional terms) while sacrificing much less of our own potential well-being. Suppose that we agreed to allow almost the entire accessible universe to be converted into hedonium - everything except a small preserve, say the Milky Way, which would be set aside to accommodate our own needs. Then there would still be a hundred billion galaxies devoted to the maximization of pleasure. But we would have one galaxy within which to create wonderful civilizations that could last for billions of years and in which humans and nonhuman animals could survive and thrive, and have the opportunity to develop into beatific posthuman spirits.

    If one prefers this latter option (as I would be inclined to do) it implies that one does not have an unconditional lexically dominant preference for acting morally permissibly. But it is consistent with placing great weight on morality. (p 219-220)

What? Is it? Is it "consistent with placing great weight on morality"? Imagine Bostrom in a situation where he does the final bit of programming of the coming superintelligence, to decide between these two worlds, i.e., the all-hedonium one versus the all-hedonium-except-in-the-Milky-Way-preserve.4 And imagine that he goes for the latter option. The only difference it makes to the world is to what happens in the Milky Way, so what happens elsewhere is irrelevant to the moral evaluation of his decision.5 This may mean that Bostrom opts for a scenario where, say, 1024 sentient beings will thrive in the Milky Way in a way that is sustainable for trillions of years, rather than a scenarion where, say, 1045 sentient beings will be even happier for a comparable amount of time. Wouldn't that be an act of immorality that dwarfs all other immoral acts carried out on our planet, by many many orders of magnitude? How could that be "consistent with placing great weight on morality"?6



1. Do What I Mean is originally a concept from computer systems, where the (more modest) idea is to have a system correct small input errors.

2. To the extent that people care about objective morality, it seems coherent extrapolated volition (CEV) or Christiano's proposal would lead the AI to care about objective morality, and thus look into what it is. Thus I doubt it is worth considering our commitments to morality first (as Bostrom does in this chapter, and as one might do before choosing whether to use a MR AI), if general methods for implementing our desires are on the table. This is close to what Bostrom is saying when he suggests we outsource the decision about which form of indirect normativity to use, and eventually winds up back at CEV. But it seems good to be explicit.

3. I'm not optimistic that behind every vague and ambiguous command, there is something specific that a person 'really means'. It seems more likely there is something they would in fact try to mean, if they thought about it a bunch more, but this is mostly defined by further facts about their brains, rather than the sentence and what they thought or felt as they said it. It seems at least misleading to call this 'what they meant'. Thus even when '—and do what I mean' is appended to other kinds of goals than generic CEV-style ones, I would expect the execution to look much like a generic investigation of human values, such as that implicit in CEV.

4. Alexander Kruel criticizes 'Do What I Mean' being important, because every part of what an AI does is designed to be what humans really want it to be, so it seems unlikely to him that AI would do exactly what humans want with respect to instrumental behaviors (e.g. be able to understand language, and use the internet and carry out sophisticated plans), but fail on humans' ultimate goals:

Outsmarting humanity is a very small target to hit, requiring a very small margin of error. In order to succeed at making an AI that can outsmart humans, humans have to succeed at making the AI behave intelligently and rationally. Which in turn requires humans to succeed at making the AI behave as intended along a vast number of dimensions. Thus, failing to predict the AI’s behavior does in almost all cases result in the AI failing to outsmart humans.

As an example, consider an AI that was designed to fly planes. It is exceedingly unlikely for humans to succeed at designing an AI that flies planes, without crashing, but which consistently chooses destinations that it was not meant to choose. Since all of the capabilities that are necessary to fly without crashing fall into the category “Do What Humans Mean”, and choosing the correct destination is just one such capability.

I disagree that it would be surprising for an AI to be very good at flying planes in general, but very bad at going to the right places in them. However it seems instructive to think about why this is.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Are there other general forms of indirect normativity that might outsource the problem of deciding what indirect normativity to use?
  2. On common views of moral realism, is morality likely to be amenable to (efficient) algorithmic discovery?
  3. If you knew how to build an AI with a good understanding of natural language (e.g. it knows what the word 'good' means as well as your most intelligent friend), how could you use this to make a safe AI?
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about other abstract features of an AI's reasoning that we might want to get right ahead of time, instead of leaving to the AI to fix. We will also discuss how well an AI would need to fulfill these criteria to be 'close enough'. To prepare, read “Component list” and “Getting close enough” from Chapter 13. The discussion will go live at 6pm Pacific time next Monday 2 March. Sign up to be notified here.

An alarming fact about the anti-aging community

29 diegocaleiro 16 February 2015 05:49PM

Past and Present

Ten years ago teenager me was hopeful. And stupid.

The world neglected aging as a disease, Aubrey had barely started spreading memes, to the point it was worth it for him to let me work remotely to help with Metuselah foundation. They had not even received that initial 1,000,000 donation from an anonymous donor. The Metuselah prize was running for less than 400,000 if I remember well. Still, I was a believer.

Now we live in the age of Larry Page's Calico, 100,000,000 dollars trying to tackle the problem, besides many other amazing initiatives, from the research paid for by Life Extension Foundation and Bill Faloon, to scholars in top universities like Steve Garan and Kenneth Hayworth fixing things from our models of aging to plastination techniques. Yet, I am much more skeptical now.

Individual risk

I am skeptical because I could not find a single individual who already used a simple technique that could certainly save you many years of healthy life. I could not even find a single individual who looked into it and decided it wasn't worth it, or was too pricy, or something of that sort.

That technique is freezing some of your cells now.

Freezing cells is not a far future hope, this is something that already exists, and has been possible for decades. The reason you would want to freeze them, in case you haven't thought of it, is that they are getting older every day, so the ones you have now are the youngest ones you'll ever be able to use.

Using these cells to create new organs is not something that may help you if medicine and technology continue progressing according to the law of accelerating returns in 10 or 30 years. We already know how to make organs out of your cells. Right now. Some organs live longer, some shorter, but it can be done - for instance to bladders - and is being done.

Hope versus Reason

Now, you'd think if there was an almost non-invasive technique already shown to work in humans that can preserve many years of your life and involves only a few trivial inconveniences - compared to changing diet or exercising for instance- the whole longevist/immortalist crowd would be lining up for it and keeping back up tissue samples all over the place.

Well I've asked them. I've asked some of the adamant researchers, and I've asked the superwealthy; I've asked the cryonicists and supplement gorgers; I've asked those who work on this 8 hour a day every day, and I've asked those who pay others to do so. I asked it mostly for selfish reasons, I saw the TEDs by Juan Enriquez and Anthony Atala and thought: hey look, clearly beneficial expected life length increase, yay! let me call someone who found this out before me - anyone, I'm probably the last one, silly me - and fix this.

I've asked them all, and I have nothing to show for it.

My takeaway lesson is: whatever it is that other people are doing to solve their own impending death, they are far from doing it rationally, and maybe most of the money and psychology involved in this whole business is about buying hope, not about staring into the void and finding out the best ways of dodging it. Maybe people are not in fact going to go all-in if the opportunity comes.

How to fix this?

Let me disclose first that I have no idea how to fix this problem. I don't mean the problem of getting all longevists to freeze their cells, I mean the problem of getting them to take information from the world of science and biomedicine and applying it to themselves. To become users of the technology they are boasters of. To behave rationally in a CFAR or even homo economicus sense.

I was hoping for a grandiose idea in this last paragraph, but it didn't come. I'll go with a quote from this emotional song sung by us during last year's Secular Solstice celebration

Do you realize? that everyone, you know, someday will die...

And instead of sending all your goodbyes

Let them know you realize that life goes fast

It's hard to make the good things last

Discussion of concrete near-to-middle term trends in AI

13 Punoxysm 08 February 2015 10:05PM

Instead of prognosticating on AGI/Strong AI/Singularities, I'd like to discuss more concrete advancements to expect in the near-term in AI. I invite those who have an interest in AI to discuss predictions or interesting trends they've observed.

This discussion should be useful for anyone looking to research or work in companies involved in AI, and might guide longer-term predictions.

With that, here are my predictions for the next 5-10 years in AI. This is mostly straightforward extrapolation, so it won't excite those who know about these areas but may interest those who don't:

  • Speech Processing, the task of turning the spoken words into text, will continue to improve until it is essentially a solved problem. Smartphones and even weaker devices will be capable of quite accurately transcribing heavily-accented speech in many languages and noisy environments. This is the simple continuation of the rapid improvements in speech processing that have allowed brought us from Dragon Naturally-Speaking to Google Now and Siri.
  • Assistant and intent-based (they try to figure out the "intent" of your input) systems, like Siri, that need to interpret a sentence as a particular command they are capable of, will become substantially more accurate and varied and take cues like tone and emphasis into account. So for example, if you're looking for directions you won't have to repeat yourself in an increasingly loud, slowed and annoyed voice. You'll be able to phrase your requests naturally and conversationally. New tasks like "Should I get this rash checked out" will be available. A substantial degree of personalization and use of your personal history might also allow "show me something funny/sad/stimulating [from the internet]".
  • Natural language processing, the task of parsing the syntax and semantics of language, will improve substantially. Look at this list of traditional tasks with standard benchmarks: on Wikipedia. Every one of these tasks will have a several percentage point improvement, particularly in the understudied areas of informal text (Chat logs, tweet, anywhere where grammar and vocabulary are less rigorous). It won't get so good that it can be confused with solving AI-complete aspects of NLP, but it will allow vast improvements in text mining and information extraction. For instance, search queries like "What papers are critical of VerHoeven and Michaels '08" or "Summarize what twitter thinks of the 2018 superbowl" will be answerable. Open source libraries will continue to improve from their current just-above-boutique state (NLTK, CoreNLP). Medical diagnosis based on analysis of medical texts will be a major area of research. Large-scale analysis of scientific literature in areas where it is difficult for researchers to read all relevant texts will be another. Machine translation will not be ready for most diplomatic business, but it will be very very good across a wide variety of languages.
  • Computer Vision, interpreting the geometry and contents of images an video, will undergo tremendous advances. In act, it already has in the past 5 years, but now it makes sense for major efforts, academic, military and industrial, to try to integrate different modules that have been developed for subtasks like object recognition, motion/gesture recognition, segmentation, etc. I think the single biggest impact this will have will be the foundation for robotics development, since a lot of the arduous work of interpreting sensor input will be partly taken care of by excellent vision libraries.  Those general foundations will make it easy to program specialist tasks (like differentiating weeds from crops in an image, or identifying activity associated with crime in a video). This will be complemented by a general proliferation of cheap high-quality cameras and other sensors. Augmented reality also rests on computer vision, and the promise of the most fanciful tech demo videos will be realized in practice. 
  • Robotics will advance rapidly. The foundational factors of computer vision, growing availability of cheap platforms, and fast progress on tasks like motion planning and grasping has the potential to fuel an explosion of smarter industrial and consumer robotics that can perform more complex and unpredictable tasks than most current robots. Prototype ideas like search-and-rescue robots, more complex drones, and autonomous vehicles will come to fruition (though 10 years may be too short a time frame for ubiquity). Simpler robots with exotic chemical sensors will have important applications in medical and environmental research.


Human Minds are Fragile

21 diegocaleiro 11 February 2015 06:40PM

We are familiar with the thesis that Value is Fragile. This is why we are researching how to impart values to an AGI.

Embedded Minds are Fragile

Besides values, it may be worth remembering that human minds too are very fragile.

A little magnetic tampering with your amygdalas, and suddenly you are a wannabe serial killer. A small dose of LSD can get you to believe you can fly, or that the world will end in 4 hours. Remove part of your Ventromedial PreFrontal Cortex, and suddenly you are so utilitarian even Joshua Greene would call you a psycho.

It requires very little material change to substantially modify a human being's behavior. Same holds for other animals with embedded brains, crafted by evolution and made of squishy matter modulated by glands and molecular gates.

A Problem for Paul-Boxing and CEV?

One assumption underlying Paul-Boxing and CEV is that:

It is easier to specify and simulate a human-like mind then to impart values to an AGI by means of teaching it values directly via code or human language.

Usually we assume that because, as we know, value is fragile. But so are embedded minds. Very little tampering is required to profoundly transform people's moral intuitions. A large fraction of the inmate population in the US has frontal lobe or amygdala malfunctions.

Finding out the simplest description of a human brain that when simulated continues to act as that human brain would act in the real world may turn out to be as fragile, or even more fragile, than concept learning for AGI's.

Attempted Telekinesis

76 AnnaSalamon 07 February 2015 06:53PM

Related to: Compartmentalization in epistemic and instrumental rationality; That other kind of status.

Summary:  I’d like to share some techniques that made a large difference for me, and for several other folks I shared them with.  They are techniques for reducing stress, social shame, and certain other kinds of “wasted effort”.  These techniques are less developed and rigorous than the techniques that CFAR teaches in our workshops -- for example, they currently only work for perhaps 1/3rd of the dozen or so people I’ve shared them with -- but they’ve made a large enough impact for that 1/3rd that I wanted to share them with the larger group.  I’ll share them through a sequence of stories and metaphors, because, for now, that is what I have.

continue reading »

Reductionist research strategies and their biases

15 PhilGoetz 06 February 2015 04:11AM

I read an extract of (Wimsatt 1980) [1] which includes a list of common biases in reductionist research. I suppose most of us are reductionists most of the time, so these may be worth looking at.

This is not an attack on reductionism! If you think reductionism is too sacred for such treatment, you've got a bigger problem than anything on this list.

Here's Wimsatt's list, with some additions from the parts of his 2007 book Re-engineering Philosophy for Limited Beings that I can see on Google books. His lists often lack specific examples, so I came up with my own examples and inserted them in [brackets].

continue reading »

Request for proposals for Musk/FLI grants

22 danieldewey 05 February 2015 05:04PM

As a follow-on to the recent thread on purchasing research effectively, I thought it'd make sense to post the request for proposals for projects to be funded by Musk's $10M donation. LessWrong's been a place for discussing long-term AI safety and research for quite some time, so I'd be happy to see some applications come out of LW members.

Here's the full Request for Proposals.

If you have questions, feel free to ask them in the comments or to contact me!

Here's the email FLI has been sending around:

Initial proposals (300–1000 words) due March 1, 2015

The Future of Life Institute, based in Cambridge, MA and headed by Max Tegmark (MIT), is seeking proposals for research projects aimed to maximize the future societal benefit of artificial intelligence while avoiding potential hazards. Projects may fall in the fields of computer science, AI, machine learning, public policy, law, ethics, economics, or education and outreach. This 2015 grants competition will award funds totaling $6M USD.

This funding call is limited to research that explicitly focuses not on the standard goal of making AI more capable, but on making AI more robust and/or beneficial; for example, research could focus on making machine learning systems more interpretable, on making high-confidence assertions about AI systems' behavior, or on ensuring that autonomous systems fail gracefully. Funding priority will be given to research aimed at keeping AI robust and beneficial even if it comes to greatly supersede current capabilities, either by explicitly focusing on issues related to advanced future AI or by focusing on near-term problems, the solutions of which are likely to be important first steps toward long-term solutions.

Please do forward this email to any colleagues and mailing lists that you think would be appropriate.


Before applying, please read the complete RFP and list of example topics, which can be found online along with the application form:


As explained there, most of the funding is for $100K–$500K project grants, which will each support a small group of collaborators on a focused research project with up to three years duration. For a list of suggested topics, see the complete RFP [1] and the Research Priorities document [2]. Initial proposals, which are intended to require merely a modest amount of preparation time, must be received on our website [1] on or before March 1, 2015.

Initial proposals should include a brief project summary, a draft budget, the principal investigator’s CV, and co-investigators’ brief biographies. After initial proposals are reviewed, some projects will advance to the next round, completing a Full Proposal by May 17, 2015. Public award recommendations will be made on or about July 1, 2015, and successful proposals will begin receiving funding in September 2015.

References and further resources

[1] Complete request for proposals and application form: http://futureoflife.org/grants/large/initial

[2] Research Priorities document: http://futureoflife.org/static/data/documents/research_priorities.pdf

[3] An open letter from AI scientists on research priorities for robust and beneficial AI: http://futureoflife.org/misc/open_letter

[4] Initial funding announcement: http://futureoflife.org/misc/AI

Questions about Project Grants: dewey@futureoflife.org

Media inquiries: tegmark@mit.edu

4 Common Prediction Failures and How to Fix Them [LINK]

20 Peter_McIntyre 10 February 2015 01:34AM

A post I wrote on CFAR's blog about the errors we make in hedonic prospecting, based on Gilbert and Wilson's 2007 review article of the topic. Let me know what you think! Reposting from the discussion in case anyone missed it. :) 


How to save (a lot of) money on flying

8 T3t 03 February 2015 06:25PM

I was going to wait to post this for reasons, but realized that was pretty dumb when the difference of a few weeks could literally save people hundreds, if not thousands of collective dollars.


If you fly regularly (or at all), you may already know about this method of saving money.  The method is quite simple: instead of buying a round-trip ticket from the airline or reseller, you hunt down much cheaper one-way flights with layovers at your destination and/or your point of origin.  Skiplagged is a service that will do this automatically for you, and has been in the news recently because the creator was sued by United Airlines and Orbitz.  While Skiplagged will allow you to click-through to purchase the one-way ticket to your destination, they have broken or disabled the functionality of the redirect to the one-way ticket back (possibly in order to raise more funds for their legal defense).  However, finding the return flight manually is fairly easy as the provide all the information to filter for it on other websites (time, airline, etc).  I personally have benefited from this - I am flying to Texas from Southern California soon, and instead of a round-trip ticket which would cost me about $450, I spent ~$180 on two one-way tickets (with the return flight being the "layover" at my point-of-origin).  These are, perhaps, larger than usual savings; I think 20-25% is more common, but even then it's a fairly significant amount of money.


Relevant warnings by gwillen:

You should be EXTREMELY CAREFUL when using this strategy. It is, at a minimum, against airline policy.

If you have any kind of airline status or membership, and you do this too often, they will cancel it. If you try to do this on a round-trip ticket, they will cancel your return. If the airlines have any means of making your life difficult available to them, they WILL use it.

Obviously you also cannot check bags when using this strategy, since they will go to the wrong place (your ostensible, rather than your actual, destination.) This also means that if you have an overhead-sized carryon, and you board late and are forced to check it, your bag will NOT make it to your intended destination; it will go to the final destination marked on your ticket. If you try to argue about this, you run the risk of getting your ticket cancelled altogether, since you're violating airline policies by using a ticket in this way.


Additionally, you should do all of your airline/hotel/etc shopping using whatever private browsing mode your web browser has.  This will often let you purchase the exact same product for a cheaper price.


That is all.

Prediction Markets are Confounded - Implications for the feasibility of Futarchy

14 Anders_H 26 January 2015 10:39PM

(tl;dr:  In this post, I show that prediction markets estimate non-causal probabilities, and can therefore not be used for decision making by rational agents following causal decision theory.  I provide an example of a simple situation where such confounding leads to a society which has implemented futarchy making an incorrect decision)


It is October 2016, and the US Presidential Elections are nearing. The most powerful nation on earth is about to make a momentous decision about whether being the brother of a former president is a more impressive qualification than being the wife of a former president. However, one additional criterion has recently become relevant in light of current affairs:   Kim Jong-Un, Great Leader of the Glorious Nation of North Korea, is making noise about his deep hatred for Hillary Clinton. He also occasionally discusses the possibility of nuking a major US city. The US electorate, desperate to avoid being nuked, have come up with an ingenious plan: They set up a prediction market to determine whether electing Hillary will impact the probability of a nuclear attack. 

The following rules are stipulated:  There are four possible outcomes, either "Hillary elected and US Nuked", "Hillary elected and US not nuked", "Jeb elected and US nuked", "Jeb elected and US not nuked".   Participants in the market can buy and sell contracts for each of those outcomes,  the contract which correponds to the actual outcome will expire at $100, all other contracts will expire at $0

Simultaneously in a country far, far away,  a rebellion is brewing against the Great Leader.  The potential challenger not only appears not to have no problem with Hillary, he also seems like a reasonable guy who would be unlikely to use nuclear weapons. It is generally believed that the challenger will take power with probability 3/7; and will be exposed and tortured in a forced labor camp for the rest of his miserable life with probability 4/7.     Let us stipulate that this information is known to all participants  - I am adding this clause in order to demonstrate that this argument does not rely on unknown information or information asymmetry. 

A mysterious but trustworthy agent named "Laplace's Demon" has recently appeared, and informed everyone that, to a first approximation,  the world is currently in one of seven possible quantum states.  The Demon, being a perfect Bayesian reasoner with Solomonoff Priors, has determined that each of these states should be assigned probability 1/7.     Knowledge of which state we are in will perfectly predict the future, with one important exception:   It is possible for the US electorate to "Intervene" by changing whether Clinton or Bush is elected. This will then cause a ripple effect into all future events that depend on which candidate is elected President, but otherwise change nothing. 

The Demon swears up and down that the choice about whether Hillary or Jeb is elected has absolutely no impact in any of the seven possible quantum states. However, because the Prediction market has already been set up and there are powerful people with vested interests, it is decided to run the market anyways. 

 Roughly, the demon tells you that the world is in one of the following seven states:



Kim overthrown

Election winner (if no intervention)

US Nuked if Hillary elected

US Nuked if Jeb elected

US Nuked











































Let us use this table to define some probabilities:   If one intervenes to make Hillary win the election, the probability of the US being nuked is 2/7 (this is seen from column 4).  If one intervenes to make Jeb win the election, the probability of the US being nuked is 2/7 (this is seen from column 5).   In the language of causal inference, these probabilities are Pr (Nuked| Do (Elect Clinton)] and Pr[Nuked | Do(Elect Bush)].  The fact that these two quantities  are equal confirms the Demon’s claim that the choice of President has no effect on the outcome.  An agent operating under Causal Decision theory will use this information to correctly conclude that he has no preference about whether to elect Hillary or Jeb. 

However, if one were to condition on who actually was elected, we get different numbers:  Conditional on being in a state where Hillary is elected, the probability of the US being nuked is 1/3; whereas conditional on being in a state where Jeb is elected, the probability of being nuked is ¼.  Mathematically, these probabilities are Pr [Nuked | Clinton Elected] and Pr[Nuked | Bush Elected].  An agent operating under Evidentiary Decision theory will use this information to conclude that he will vote for Bush.  Because evidentiary decision theory is wrong, he will fail to optimize for the outcome he is interested in. 

Now, let us ask ourselves which probabilities our prediction markets will converge to, ie which probabilities participants in the market have an incentive to provide their best estimate of.  We defined our contract as "Hillary is elected and the US is nuked".  The probability of this occurring in 1/7;  if we normalize by dividing by the marginal probability that Hillary is elected, we get 1/3 which is equal to  Pr [Nuked | Clinton Elected].   In other words, the prediction market estimates the wrong quantities.

Essentially, what happens is structurally the same phenomenon as confounding in epidemiologic studies:  There was a common cause of Hillary being elected and the US being nuked.  This common cause - whether Kim Jong-Un was still Great Leader of North Korea - led to a correlation between the election of Hillary and the outcome, but that correlation is purely non-causal and not relevant to a rational decision maker. 

The obvious next question is whether there exists a way to save futarchy; ie any way to give traders an incentive to pay a price that reflects their beliefs about Pr (Nuked| Do (Elect Clinton)]  instead of Pr [Nuked | Clinton Elected]).    We discussed this question at the Less Wrong Meetup in Boston a couple of months ago. The only way we agreed will definitely solve the problem is the following procedure: 


  1. The governing body makes an absolute pre-commitment that no matter what happens, the next President will be determined solely on the basis of the prediction market 
  2. The following contracts are listed: “The US is nuked if Hillary is elected” and “The US is nuked if Jeb is elected”
  3. At the pre-specified date, the markets are closed and the President is chosen based on the estimated probabilities
  4. If Hillary is chosen,  the contract on Jeb cannot be settled, and all bets are reversed.  
  5. The Hillary contract is expired when it is known whether Kim Jong-Un presses the button. 


This procedure will get the correct results in theory, but it has the following practical problems:  It allows maximizing on only one outcome metric (because one cannot precommit to choose the President based on criteria that could potentially be inconsistent with each other).  Moreover, it requires the reversal of trades, which will be problematic if people who won money on the Jeb contract have withdrawn their winnings from the exchange. 

The only other option I can think of  in order to obtain causal information from a prediction market is to “control for confounding”.   If, for instance, the only confounder is whether Kim Jong-Un is overthrown, we can control for it by using Do-Calculus to show that Pr (Nuked| Do (Elect Clinton)] = Pr (Nuked| (Clinton elected,  Kim Overthrown)* Pr (Kim Overthrown) + Pr (Nuked| (Clinton elected,  Kim Not Overthrown)* Pr (Kim Not Overthrown).   All of these quantities can be estimated from separate prediction markets.  

 However, this is problematic for several reasons:


  1. There will be an exponential explosion in the number of required prediction markets, and each of them will ask participants to bet on complicated conditional probabilities that have no obvious causal interpretation. 
  2. There may be disagreement on what the confounders are, which will lead to contested contract interpretations.
  3. The expert consensus on what the important confounders are may change during the lifetime of the contract, which will require the entire thing to be relisted. Etc.    For practical reasons, therefore,  this approach does not seem feasible.


I’d like a discussion on the following questions:  Are there any other ways to list a contract that gives market participants an incentive to aggregate information on  causal quantities? If not, is futarchy doomed?

(Thanks to the Less Wrong meetup in Boston and particularly Jimrandomh for clarifying my thinking on this issue)

(I reserve the right to make substantial updates to this text in response to any feedback in the comments)

View more: Next