Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

MIRI's 2014 Summer Matching Challenge

17 lukeprog 07 August 2014 08:03PM

(Cross-posted from MIRI's blog. MIRI maintains Less Wrong, with generous help from Trike Apps, and much of the core content is written by salaried MIRI staff members.)

Thanks to the generosity of several major donors, every donation made to MIRI between now and August 15th, 2014 will be matched dollar-for-dollar, up to a total of $200,000!  

Now is your chance to double your impact while helping us raise up to $400,000 (with matching) to fund our research program.

Corporate matching and monthly giving pledges will count towards the total! Please email malo@intelligence.org if you intend on leveraging corporate matching (check here, to see if your employer will match your donation) or would like to pledge 6 months of monthly donations, so that we can properly account for your contributions towards the fundraiser.

(If you're unfamiliar with our mission, see: Why MIRI?)

Donate Now


Accomplishments Since Our Winter 2013 Fundraiser Launched:

Ongoing Activities You Can Help Support

  • We're writing an overview of the Friendly AI technical agenda (as we see it) so far.
  • We're currently developing and testing several tutorials on different pieces of the Friendly AI technical agenda (tiling agents, modal agents, etc.).
  • We're writing several more papers and reports.
  • We're growing the MIRIx program, largely to grow the pool of people we can plausibly hire as full-time FAI researchers in the next couple years.
  • We're planning, or helping to plan, multiple research workshops, including the May 2015 decision theory workshop at Cambridge University.
  • We're finishing the editing for a book version of Eliezer's Sequences.
  • We're helping to fund further SPARC activity, which provides education and skill-building to elite young math talent, and introduces them to ideas like effective altruism and global catastrophic risks.
  • We're continuing to discuss formal collaboration opportunities with UC Berkeley faculty and development staff.
  • We're helping Nick Bostrom promote his Superintelligence book in the U.S.
  • We're investigating opportunities for supporting Friendly AI research via federal funding sources such as the NSF.

Other projects are still being surveyed for likely cost and impact. See also our mid-2014 strategic plan. We appreciate your support for our work!

Donate now, and seize a better than usual opportunity to move our work forward. If you have questions about donating, please contact Malo Bourgon at (510) 292-8776 or malo@intelligence.org.

 $200,000 of total matching funds has been provided by Jaan Tallinn, Edwin Evans, and Rick Schwall.

Screenshot service provided by URL2PNG.com used to include self updating progress bar.

Will AGI surprise the world?

12 lukeprog 21 June 2014 10:27PM

Cross-posted from my blog.

Yudkowsky writes:

In general and across all instances I can think of so far, I do not agree with the part of your futurological forecast in which you reason, "After event W happens, everyone will see the truth of proposition X, leading them to endorse Y and agree with me about policy decision Z."


Example 2: "As AI gets more sophisticated, everyone will realize that real AI is on the way and then they'll start taking Friendly AI development seriously."

Alternative projection: As AI gets more sophisticated, the rest of society can't see any difference between the latest breakthrough reported in a press release and that business earlier with Watson beating Ken Jennings or Deep Blue beating Kasparov; it seems like the same sort of press release to them. The same people who were talking about robot overlords earlier continue to talk about robot overlords. The same people who were talking about human irreproducibility continue to talk about human specialness. Concern is expressed over technological unemployment the same as today or Keynes in 1930, and this is used to fuel someone's previous ideological commitment to a basic income guarantee, inequality reduction, or whatever. The same tiny segment of unusually consequentialist people are concerned about Friendly AI as before. If anyone in the science community does start thinking that superintelligent AI is on the way, they exhibit the same distribution of performance as modern scientists who think it's on the way, e.g. Hugo de Garis, Ben Goertzel, etc.

My own projection goes more like this:

As AI gets more sophisticated, and as more prestigious AI scientists begin to publicly acknowledge that AI is plausibly only 2-6 decades away, policy-makers and research funders will begin to respond to the AGI safety challenge, just like they began to respond to CFC damages in the late 70s, to global warming in the late 80s, and to synbio developments in the 2010s. As for society at large, I dunno. They'll think all kinds of random stuff for random reasons, and in some cases this will seriously impede effective policy, as it does in the USA for science education and immigration reform. Because AGI lends itself to arms races and is harder to handle adequately than global warming or nuclear security are, policy-makers and industry leaders will generally know AGI is coming but be unable to fund the needed efforts and coordinate effectively enough to ensure good outcomes.

At least one clear difference between my projection and Yudkowsky's is that I expect AI-expert performance on the problem to improve substantially as a greater fraction of elite AI scientists begin to think about the issue in Near mode rather than Far mode.

As a friend of mine suggested recently, current elite awareness of the AGI safety challenge is roughly where elite awareness of the global warming challenge was in the early 80s. Except, I expect elite acknowledgement of the AGI safety challenge to spread more slowly than it did for global warming or nuclear security, because AGI is tougher to forecast in general, and involves trickier philosophical nuances. (Nobody was ever tempted to say, "But as the nuclear chain reaction grows in power, it will necessarily become more moral!")

Still, there is a worryingly non-negligible chance that AGI explodes "out of nowhere." Sometimes important theorems are proved suddenly after decades of failed attempts by other mathematicians, and sometimes a computational procedure is sped up by 20 orders of magnitude with a single breakthrough.

Some alternatives to “Friendly AI”

19 lukeprog 15 June 2014 07:53PM

Cross-posted from my blog.

What does MIRI's research program study?

The most established term for this was coined by MIRI founder Eliezer Yudkowsky: "Friendly AI." The term has some advantages, but it might suggest that MIRI is trying to build C-3PO, and it sounds a bit whimsical for a serious research program.

What about safe AGI or AGI safety? These terms are probably easier to interpret than Friendly AI. Also, people like being safe, and governments like saying they're funding initiatives to keep the public safe.

A friend of mine worries that these terms could provoke a defensive response (in AI researchers) of "Oh, so you think me and everybody else in AI is working on unsafe AI?" But I've never actually heard that response to "AGI safety" in the wild, and AI safety researchers regularly discuss "software system safety" and "AI safety" and "agent safety" and more specific topics like "safe reinforcement learning" without provoking negative reactions from people doing regular AI research.

I'm more worried that a term like "safe AGI" could provoke a response of "So you're trying to make sure that a system which is smarter than humans, and able to operate in arbitrary real-world environments, and able to invent new technologies to achieve its goals, will be safe? Let me save you some time and tell you right now that's impossible. Your research program is a pipe dream."

My reply goes something like "Yeah, it's way beyond our current capabilities, but lots of things that once looked impossible are now feasible because people worked really hard on them for a long time, and we don't think we can get the whole world to promise never to build AGI just because it's hard to make safe, so we're going to give AGI safety a solid try for a few decades and see what can be discovered." But that's probably not all that reassuring.

How about high-assurance AGI? In computer science, a "high assurance system" is one built from the ground up for unusually strong safety and/or security guarantees, because it's going to be used in safety-critical applications where human lives — or sometimes simply billions of dollars — are at stake (e.g. autopilot software or Mars rover software). So there's a nice analogy to MIRI's work, where we're trying to figure out what an AGI would look like if it was built from the ground up to get the strongest safety guarantees possible for such an autonomous and capable system.

I think the main problem with this term is that, quite reasonably, nobody will believe that we can ever get anywhere near as much assurance in the behavior of an AGI as we can in the behavior of, say, the relatively limited AI software that controls the European Train Control System. "High assurance AGI" sounds a bit like "Totally safe all-powerful demon lord." It sounds even more wildly unimaginable to AI researchers than "safe AGI."

What about superintelligence control or AGI control, as in Bostrom (2014)? "AGI control" is perhaps more believable than "high-assurance AGI" or "safe AGI," since it brings to mind AI containment methods, which sound more feasible to most people than designing an unconstrained AGI that is somehow nevertheless safe. (It's okay if they learn later that containment probably isn't an ultimate solution to the problem.)

On the other hand, it might provoke a reaction of "What, you don't think sentient robots have any rights, and you're free to control and confine them in any way you please? You're just repeating the immoral mistakes of the old slavemasters!" Which of course isn't true, but it takes some time to explain how I can think it's obvious that conscious machines have moral value while also being in favor of AGI control methods.

How about ethical AGI? First, I worry that it sounds too philosophical, and philosophy is widely perceived as a confused, unproductive discipline. Second, I worry that it sounds like the research assumes moral realism, which many (most?) intelligent people reject. Third, it makes it sound like most of the work is in selecting the goal function, which I don't think is true.

What about beneficial AGI? That's better than "ethical AGI," I think, but like "ethical AGI" and "Friendly AI," the term sounds less like a serious math and engineering discipline and more like some enclave of crank researchers writing a flurry of words (but no math) about how AGI needs to be "nice" and "trustworthy" and "not harmful" and oh yeah it must be "virtuous" too, whatever that means.

So yeah, I dunno. I think "AGI safety" is my least-disliked term these days, but I wish I knew of some better options.

An onion strategy for AGI discussion

13 lukeprog 31 May 2014 07:08PM

Cross-posted from my blog.

"The stabilization of environments" is a paper about AIs that reshape their environments to make it easier to achieve their goals. This is typically called enforcement, but they prefer the term stabilization because it "sounds less hostile."

"I'll open the pod bay doors, Dave, but then I'm going to stabilize the ship..."

Sparrow (2013) takes the opposite approach to plain vs. dramatic language. Rather than using a modest term like iterated embryo selection, Sparrow prefers the phrase in vitro eugenics. Jeepers.

I suppose that's more likely to provoke public discussion, but... will much good will come of that public discussion? The public had a needless freak-out about in vitro fertilization back in the 60s and 70s and then, as soon as the first IVF baby was born in 1978, decided they were in favor of it.

Someone recently suggested I use an "onion strategy" for the discussion of novel technological risks. The outermost layer of the communication onion would be aimed at the general public, and focus on benefits rather than risks, so as not to provoke an unproductive panic. A second layer for a specialist audience could include a more detailed elaboration of the risks. The most complete discussion of risks and mitigation options would be reserved for technical publications that are read only by professionals.

Eric Drexler seems to wish he had more successfully used an onion strategy when writing about nanotechnology. Engines of Creation included frank discussions of both the benefits and risks of nanotechnology, including the "grey goo" scenario that was discussed widely in the media and used as the premise for the bestselling novel Prey.

Ray Kurzweil may be using an onion strategy, or at least keeping his writing in the outermost layer. If you look carefully, chapter 8 of The Singularity is Near takes technological risks pretty seriously, and yet it's written in such a way that most people who read the book seem to come away with an overwhelmingly optimistic perspective on technological change.

George Church may be following an onion strategy. Regenesis also contains a chapter on the risks of advanced bioengineering, but it's presented as an "epilogue" that many readers will skip.

Perhaps those of us writing about AGI for the general public should try to discuss:

  • astronomical stakes rather than existential risk
  • Friendly AI rather than AGI risk or the superintelligence control problem
  • the orthogonality thesis and convergent instrumental values and complexity of values rather than "doom by default"
  • etc.

MIRI doesn't have any official recommendations on the matter, but these days I find myself leaning toward an onion strategy.

Can noise have power?

9 lukeprog 23 May 2014 04:54AM

One of the most interesting debates on Less Wrong that seems like it should be definitively resolvable is the one between Eliezer Yudkowsky, Scott Aaronson, and others on The Weighted Majority Algorithm. I'll reprint the debate here in case anyone wants to comment further on it.

In that post, Eliezer argues that "noise hath no power" (read the post for details). Scott disagreed. He replied:

...Randomness provably never helps in average-case complexity (i.e., where you fix the probability distribution over inputs) -- since given any ensemble of strategies, by convexity there must be at least one deterministic strategy in the ensemble that does at least as well as the average.

On the other hand, if you care about the worst-case running time, then there are settings (such as query complexity) where randomness provably does help. For example, suppose you're given n bits, you're promised that either n/3 or 2n/3 of the bits are 1's, and your task is to decide which. Any deterministic strategy to solve this problem clearly requires looking at 2n/3 + 1 of the bits. On the other hand, a randomized sampling strategy only has to look at O(1) bits to succeed with high probability.

Whether randomness ever helps in worst-case polynomial-time computation is the P versus BPP question, which is in the same league as P versus NP. It's conjectured that P=BPP (i.e., randomness never saves more than a polynomial). This is known to be true if really good pseudorandom generators exist, and such PRG's can be constructed if certain problems that seem to require exponentially large circuits, really do require them (see this paper by Impagliazzo and Wigderson). But we don't seem close to proving P=BPP unconditionally.

Eliezer replied:

Scott, I don't dispute what you say. I just suggest that the confusing term "in the worst case" be replaced by the more accurate phrase "supposing that the environment is an adversarial superintelligence who can perfectly read all of your mind except bits designated 'random'".

Scott replied:

I often tell people that theoretical computer science is basically mathematicized paranoia, and that this is the reason why Israelis so dominate the field. You're absolutely right: we do typically assume the environment is an adversarial superintelligence. But that's not because we literally think it is one, it's because we don't presume to know which distribution over inputs the environment is going to throw at us. (That is, we lack the self-confidence to impose any particular prior on the inputs.) We do often assume that, if we generate random bits ourselves, then the environment isn't going to magically take those bits into account when deciding which input to throw at us. (Indeed, if we like, we can easily generate the random bits after seeing the input -- not that it should make a difference.)

Average-case analysis is also well-established and used a great deal. But in those cases where you can solve a problem without having to assume a particular distribution over inputs, why complicate things unnecessarily by making such an assumption? Who needs the risk?

And later added:

...Note that I also enthusiastically belong to a "derandomize things" crowd! The difference is, I think derandomizing is hard work (sometimes possible and sometimes not), since I'm unwilling to treat the randomness of the problems the world throws at me on the same footing as randomness I generate myself in the course of solving those problems. (For those watching at home tonight, I hope the differences are now reasonably clear...)

Eliezer replied:

I certainly don't say "it's not hard work", and the environmental probability distribution should not look like the probability distribution you have over your random numbers - it should contain correlations and structure. But once you know what your probability distribution is, then you should do your work relative to that, rather than assuming "worst case". Optimizing for the worst case in environments that aren't actually adversarial, makes even less sense than assuming the environment is as random and unstructured as thermal noise.

I would defend the following sort of statement: While often it's not worth the computing power to take advantage of all the believed-in regularity of your probability distribution over the environment, any environment that you can't get away with treating as effectively random, probably has enough structure to be worth exploiting instead of randomizing.

(This isn't based on career experience, it's how I would state my expectation given my prior theory.)

Scott replied:

> "once you know what your probability distribution is..."

I'd merely stress that that's an enormous "once." When you're writing a program (which, yes, I used to do), normally you have only the foggiest idea of what a typical input is going to be, yet you want the program to work anyway. This is not just a hypothetical worry, or something limited to cryptography: people have actually run into strange problems using pseudorandom generators for Monte Carlo simulations and hashing (see here for example, or Knuth vol 2).

Even so, intuition suggests it should be possible to design PRG's that defeat anything the world is likely to throw at them. I share that intuition; it's the basis for the (yet-unproved) P=BPP conjecture.

"Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin." --von Neumann

And that's where the debate drops off, at least between Eliezer and Scott, at least on that thread.

Calling all MIRI supporters for unique May 6 giving opportunity!

20 lukeprog 04 May 2014 11:45PM

(Cross-posted from MIRI's blog. MIRI maintains Less Wrong, with generous help from Trike Apps, and much of the core content is written by salaried MIRI staff members.)

Update: I'm liveblogging the fundraiser here.

Read our strategy below, then give here!

SVGives logo lrgAs previously announced, MIRI is participating in a massive 24-hour fundraiser on May 6th, called SV Gives. This is a unique opportunity for all MIRI supporters to increase the impact of their donations. To be successful we'll need to pre-commit to a strategy and see it through. If you plan to give at least $10 to MIRI sometime this year, during this event would be the best time to do it!

The plan

We need all hands on deck to help us win the following prize as many times as possible:

$2,000 prize for the nonprofit that has the most individual donors in an hour, every hour for 24 hours.

To paraphrase, every hour, there is a $2,000 prize for the organization that has the most individual donors during that hour. That's a total of $48,000 in prizes, from sources that wouldn't normally give to MIRI.  The minimum donation is $10, and an individual donor can give as many times as they want. Therefore we ask our supporters to:

  1. give $10 an hour, during every hour of the fundraiser that they are awake (I'll be up and donating for all 24 hours!);
  2. for those whose giving budgets won't cover all those hours, see below for list of which hours you should privilege; and
  3. publicize this effort as widely as possible.

International donors, we especially need your help!

MIRI has a strong community of international supporters, and this gives us a distinct advantage! While North America sleeps, you'll be awake, ready to target all of the overnight $2,000 hourly prizes.

continue reading »

Is my view contrarian?

22 lukeprog 11 March 2014 05:42PM

Previously: Contrarian Excuses, The Correct Contrarian Cluster, What is bunk?, Common Sense as a Prior, Trusting Expert Consensus, Prefer Contrarian Questions.

Robin Hanson once wrote:

On average, contrarian views are less accurate than standard views. Honest contrarians should admit this, that neutral outsiders should assign most contrarian views a lower probability than standard views, though perhaps a high enough probability to warrant further investigation. Honest contrarians who expect reasonable outsiders to give their contrarian view more than normal credence should point to strong outside indicators that correlate enough with contrarians tending more to be right.

I tend to think through the issue in three stages:

  1. When should I consider myself to be holding a contrarian[1] view? What is the relevant expert community?
  2. If I seem to hold a contrarian view, when do I have enough reason to think I’m correct?
  3. If I seem to hold a correct contrarian view, what can I do to give other people good reasons to accept my view, or at least to take it seriously enough to examine it at length?

I don’t yet feel that I have “answers” to these questions, but in this post (and hopefully some future posts) I’d like to organize some of what has been said before,[2] and push things a bit further along, in the hope that further discussion and inquiry will contribute toward significant progress in social epistemology.[3] Basically, I hope to say a bunch of obvious things, in a relatively well-organized fashion, so that less obvious things can be said from there.[4]

In this post, I’ll just address stage 1. Hopefully I’ll have time to revisit stages 2 and 3 in future posts.


Is my view contrarian?

World model differences vs. value differences

Is my effective altruism a contrarian view? It seems to be more of a contrarian value judgment than a contrarian world model,[5] and by “contrarian view” I tend to mean “contrarian world model.” Some apparently contrarian views are probably actually contrarian values.


Expert consensus

Is my atheism a contrarian view? It’s definitely a world model, not a value judgment, and only 2% of people are atheists.

But what’s the relevant expert population, here? Suppose it’s “academics who specialize in the arguments and evidence concerning whether a god or gods exist.” If so, then the expert population is probably dominated by academic theologians and religious philosophers, and my atheism is a contrarian view.

We need some heuristics for evaluating the soundness of the academic consensus in different fields. [6]

For example, we should consider the selection effects operating on communities of experts. If someone doesn’t believe in God, they’re unlikely to spend their career studying arcane arguments for and against God’s existence. So most people who specialize in this topic are theists, but nearly all of them were theists before they knew the arguments.

Perhaps instead the relevant expert community is “scholars who study the fundamental nature of the universe” — maybe, philosophers and physicists? They’re mostly atheists. [7] This is starting to get pretty ad-hoc, but maybe that’s unavoidable.

What about my view that the overall long-term impact of AGI will be, most likely, extremely bad? A recent survey of the top 100 authors in artificial intelligence (by citation index)[8] suggests that my view is somewhat out of sync with the views of those researchers.[9] But is that the relevant expert population? My impression is that AI experts know a lot about contemporary AI methods, especially within their subfield, but usually haven’t thought much about, or read much about, long-term AI impacts.

Instead, perhaps I’d need to survey “AGI impact experts” to tell whether my view is contrarian. But who is that, exactly? There’s no standard credential.

Moreover, the most plausible candidates around today for “AGI impact experts” are — like the “experts” of many other fields — mere “scholastic experts,” in that they[10] know a lot about the arguments and evidence typically brought to bear on questions of long-term AI outcomes.[11] They generally are not experts in the sense of “Reliably superior performance on representative tasks” — they don’t have uniquely good track records on predicting long-term AI outcomes, for example. As far as I know, they don’t even have uniquely good track records on predicting short-term geopolitical or sci-tech outcomes — e.g. they aren’t among the “super forecasters” discovered in IARPA’s forecasting tournaments.

Furthermore, we might start to worry about selection effects, again. E.g. if we ask AGI experts when they think AGI will be built, they may be overly optimistic about the timeline: after all, if they didn’t think AGI was feasible soon, they probably wouldn’t be focusing their careers on it.

Perhaps we can salvage this approach for determining whether one has a contrarian view, but for now, let’s consider another proposal.


Mildly extrapolated elite opinion

Nick Beckstead instead suggests that, at least as a strong prior, one should believe what one thinks “a broad coalition of trustworthy people would believe if they were trying to have accurate views and they had access to [one’s own] evidence.”[12] Below, I’ll propose a modification of Beckstead’s approach which aims to address the “Is my view contrarian?” question, and I’ll call it the “mildly extrapolated elite opinion” (MEEO) method for determining the relevant expert population. [13]

First: which people are “trustworthy”? With Beckstead, I favor “giving more weight to the opinions of people who can be shown to be trustworthy by clear indicators that many people would accept, rather than people that seem trustworthy to you personally.” (This guideline aims to avoid parochialism and self-serving cognitive biases.)

What are some “clear indicators that many people would accept”? Beckstead suggests:

IQ, business success, academic success, generally respected scientific or other intellectual achievements, wide acceptance as an intellectual authority by certain groups of people, or success in any area where there is intense competition and success is a function of ability to make accurate predictions and good decisions…

Of course, trustworthiness can also be domain-specific. Very often, elite common sense would recommend deferring to the opinions of experts (e.g., listening to what physicists say about physics, what biologists say about biology, and what doctors say about medicine). In other cases, elite common sense may give partial weight to what putative experts say without accepting it all (e.g. economics and psychology). In other cases, they may give less weight to what putative experts say (e.g. sociology and philosophy).

Hence MEEO outsources the challenge of evaluating academic consensus in different fields to the “generally trustworthy people.” But in doing so, it raises several new challenges. How do we determine which people are trustworthy? How do we “mildly extrapolate” their opinions? How do we weight those mildly extrapolated opinions in combination?

This approach might also be promising, or it might be even harder to use than the “expert consensus” method.


My approach

In practice, I tend to do something like this:

  • To determine whether my view is contrarian, I ask whether there’s a fairly obvious, relatively trustworthy expert population on the issue. If there is, I try to figure out what their consensus on the matter is. If it’s different than my view, I conclude I have a contrarian view.
  • If there isn’t an obvious trustworthy expert population on the issue from which to extract a consensus view, then I basically give up on step 1 (“Is my view contrarian?”) and just move to the model combination in step 2 (see below), retaining pretty large uncertainty about how contrarian my view might be.

When do I have good reason to think I’m correct?

Suppose I conclude I have a contrarian view, as I plausibly have about long-term AGI outcomes,[14] and as I might have about the technological feasibility of preserving myself via cryonics.[15] How much evidence do I need to conclude that my view is justified despite the informed disagreement of others?

I’ll try to tackle that question in a future post. Not surprisingly, my approach is a kind of model combination and adjustment.



  1. I don’t have a concise definition for what counts as a “contrarian view.” In any case, I don’t think that searching for an exact definition of “contrarian view” is what matters. In an email conversation with me, Holden Karnofsky concurred, making the point this way: “I agree with you that the idea of ‘contrarianism’ is tricky to define. I think things get a bit easier when you start looking for patterns that should worry you rather than trying to Platonically define contrarianism… I find ‘Most smart people think I’m bonkers about X’ and ‘Most people who have studied X more than I have plus seem to generally think like I do think I’m wrong about X’ both worrying; I find ‘Most smart people think I’m wrong about X’ and ‘Most people who spend their lives studying X within a system that seems to be clearly dysfunctional and to have a bad track record think I’m bonkers about X’ to be less worrying.”  ↩

  2. For a diverse set of perspectives on the social epistemology of disagreement and contrarianism not influenced (as far as I know) by the Overcoming Bias and Less Wrong conversations about the topic, see Christensen (2009); Ericsson et al. (2006); Kuchar (forthcoming); Miller (2013); Gelman (2009); Martin & Richards (1995); Schwed & Bearman (2010); Intemann & de Melo-Martin (2013). Also see Wikipedia’s article on scientific consensus.  ↩

  3. I suppose I should mention that my entire inquiry here is, ala Goldman (1998), premised on the assumptions that (1) the point of epistemology is the pursuit of correspondence-theory truth, and (2) the point of social epistemology is to evaluate which social institutions and practices have instrumental value for producing true or well-calibrated beliefs.  ↩

  4. I borrow this line from Chalmers (2014): “For much of the paper I am largely saying the obvious, but sometimes the obvious is worth saying so that less obvious things can be said from there.”  ↩

  5. Holden Karnofsky seems to agree: “I think effective altruism falls somewhere on the spectrum between ‘contrarian view’ and ‘unusual taste.’ My commitment to effective altruism is probably better characterized as ‘wanting/choosing to be an effective altruist’ than as ‘believing that effective altruism is correct.’”  ↩

  6. Without such heuristics, we can also rather quickly arrive at contradictions. For example, the majority of scholars who specialize in Allah’s existence believe that Allah is the One True God, and the majority of scholars who specialize in Yahweh’s existence believe that Yahweh is the One True God. Consistency isn’t everything, but contradictions like this should still be a warning sign.  ↩

  7. According to the PhilPapers Surveys, 72.8% of philosophers are atheists, 14.6% are theists, and 12.6% categorized themselves as “other.” If we look only at metaphysicians, atheism remains dominant at 73.7%. If we look only at analytic philosophers, we again see atheism at 76.3%. As for physicists: Larson & Witham (1997) found that 77.9% of physicists and astronomers are disbelievers, and Pew Research Center (2009) found that 71% of physicists and astronomers did not believe in a god.  ↩

  8. Muller & Bostrom (forthcoming). “Future Progress in Artificial Intelligence: A Poll Among Experts.”  ↩

  9. But, this is unclear. First, I haven’t read the forthcoming paper, so I don’t yet have the full results of the survey, along with all its important caveats. Second, distributions of expert opinion can vary widely between polls. For example, Schlosshauer et al. (2013) reports the results of a poll given to participants in a 2011 quantum foundations conference (mostly physicists). When asked “When will we have a working and useful quantum computer?”, 9% said “within 10 years,” 42% said “10–25 years,” 30% said “25–50 years,” 0% said “50–100 years,” and 15% said “never.” But when the exact same questions were asked of participants at another quantum foundations conference just two years later, Norsen & Nelson (2013) report, the distribution of opinion was substantially different: 9% said “within 10 years,” 22% said “10–25 years,” 20% said “25–50 years,” 21% said “50–100 years,” and 12% said “never.”  ↩

  10. I say “they” in this paragraph, but I consider myself to be a plausible candidate for an “AGI impact expert,” in that I’m unusually familiar with the arguments and evidence typically brought to bear on questions of long-term AI outcomes. I also don’t have a uniquely good track record on predicting long-term AI outcomes, nor am I among the discovered “super forecasters.” I haven’t participated in IARPA’s forecasting tournaments myself because it would just be too time consuming. I would, however, very much like to see these super forecasters grouped into teams and tasked with forecasting longer-term outcomes, so that we can begin to gather scientific data on which psychological and computational methods result in the best predictive outcomes when considering long-term questions. Given how long it takes to acquire these data, we should start as soon as possible.  ↩

  11. Weiss & Shanteau (2012) would call them “privileged experts.”  ↩

  12. Beckstead’s “elite common sense” prior and my “mildly extrapolated elite opinion” method are epistemic notions that involve some kind idealization or extrapolation of opinion. One earlier such proposal in social epistemology was Habermas’ “ideal speech situation,” a situation of unlimited discussion between free and equal humans. See Habermas’ “Wahrheitstheorien” in Schulz & Fahrenbach (1973) or, for an English description, Geuss (1981), pp. 65–66. See also the discussion in Tucker (2003), pp. 502–504.  ↩

  13. Beckstead calls his method the “elite common sense” prior. I’ve named my method differently for two reasons. First, I want to distinguish MEEO from Beckstead’s prior, since I’m using the method for a slightly different purpose. Second, I think “elite common sense” is a confusing term even for Beckstead’s prior, since there’s some extrapolation of views going on. But also, it’s only a “mild” extrapolation — e.g. we aren’t asking what elites would think if they knew everything, or if they could rewrite their cognitive software for better reasoning accuracy.  ↩

  14. My rough impression is that among the people who seem to have thought long and hard about AGI outcomes, and seem to me to exhibit fairly good epistemic practices on most issues, my view on AGI outcomes is still an outlier in its pessimism about the likelihood of desirable outcomes. But it’s hard to tell: there haven’t been systematic surveys of the important-to-me experts on the issue. I also wonder whether my views about long-term AGI outcomes are more a matter of seriously tackling a contrarian question rather than being a matter of having a particularly contrarian view. On this latter point, see this Facebook discussion.  ↩

  15. I haven’t seen a poll of cryobiologists on the likely future technological feasibility of cryonics. Even if there were such polls, I’d wonder whether cryobiologists also had the relevant philosophical and neuroscientific expertise. I should mention that I’m not personally signed up for cryonics, for these reasons.  ↩

Futurism's Track Record

12 lukeprog 29 January 2014 08:27PM

It would be nice (and expensive) to get a systematic survey on this, but my impressions [1] after tracking down lots of past technology predictions, and reading histories of technological speculation and invention, and reading about “elite common sense” at various times in the past, are that:

  • Elite common sense at a given time almost always massively underestimates what will be technologically feasible in the future.
  • “Futurists” in history tend to be far more accurate about what will be technologically feasible (when they don’t grossly violate known physics), but they are often too optimistic about timelines, and (like everyone else) show little ability to predict (1) the long-term social consequences of future technologies, or (2) the details of which (technologically feasible; successfully prototyped) things will make commercial sense, or be popular products.

Naturally, as someone who thinks it’s incredibly important to predict the long-term future as well as we can while also avoiding overconfidence, I try to put myself in a position to learn what past futurists were doing right, and what they were doing wrong. For example, I recommend: Be a fox not a hedgehog. Do calibration training. Know how your brain works. Build quantitative models even if you don’t believe the outputs, so that specific pieces of the model are easier to attack and update. Have broad confidence intervals over the timing of innovations. Remember to forecast future developments by looking at trends in many inputs to innovation, not just the “calendar years” input. Use model combination. Study history and learn from it. Etc.

Anyway: do others who have studied the history of futurism, elite common sense, innovation, etc. have different impressions about futurism’s track record? And, anybody want to do a PhD thesis examining futurism’s track record? Or on some piece of it, ala this or this or this? :)

  1. I should explain one additional piece of reasoning which contributes to my impressions on the matter. How do I think about futurist predictions of technologies that haven’t yet been definitely demonstrated to be technologically feasible or infeasible? For these, I try to use something like the truth-tracking fields proxy. E.g. very few intellectual elites (outside Turing, von Neumann, Good, etc.) in 1955 thought AGI would be technologically feasible. By 1980, we’d made a bunch of progress in computing and AI and neuroscience, and a much greater proportion of intellectual elites came to think AGI would be technologically feasible. Today, I think the proportion is even greater. The issue hasn’t been “definitely decided” yet (from a social point of view), but things are strongly trending in favor of Good and Turing, and against (e.g.) Dreyfus.  ↩

Tricky Bets and Truth-Tracking Fields

14 lukeprog 29 January 2014 08:52AM

While visiting Oxford for MIRI’s November 2013 workshop, I had the pleasure of visiting a meeting of “Skeptics in the Pub” in the delightfully British-sounding town of High Wycombe in Buckinghamshire. (Say that aloud in a British accent and try not to grin; I dare you!)

I presented a mildly drunk intro to applied rationality, followed by a 2-hour Q&A that, naturally, wandered into the subject of why AI will inevitably eat the Earth. I must have been fairly compelling despite the beer, because at one point I noticed the bartenders were leaning uncomfortably over one end of the bar in order to hear me, ignoring thirsty customers at the other end.

Anyhoo, at one point I was talking about the role of formal knowledge in applied rationality, so I explained Solomonoff’s lightsaber and why it made me think the wave function never collapses.

Someone — I can’t recall who; let’s say “Bob” — wisely asked, “But if quantum interpretations all predict the same observations, what does it mean for you to say the wave function never collapses? What do you anticipate?” [1]

Now, I don’t actually know whether the usual proposals for experimental tests of collapse make sense, so instead I answered:

Well, I think theoretical physics is truth-tracking enough that it eventually converges toward true theories, so one thing I anticipate as a result of favoring a no-collapse view is that a significantly greater fraction of physicists will reject collapse in 20 years, compared to today.

Had Bob and I wanted to bet on whether the wave function collapses or not, that would have been an awfully tricky bet to settle. But if we roughly agree on the truth-trackingness of physics as a field, then we can use the consensus of physicists a decade or two from now as a proxy for physical truth, and bet on that instead.

This won’t work for some fields. For example, philosophy sometimes looks more like a random walk than a truth-tracking inquiry — or, more charitably, it tracks truth on the scale of centuries rather than decades. For example, did you know that one year after the cover of TIME asked “Is God dead?”, a philosopher named Alvin Plantinga launched a renaissance in Christian philosophy, such that theism and Christian particularism were more commonly defended by analytic philosophers in the 1970s than they were in the 1930s? I also have the impression that moral realism was a more popular view in the 1990s than it was in the 1970s, and that physicalism is less common today than it was in the 1960s, but I’m less sure about those.

You can also do this for bets that are hard to settle for a different kind of reason, e.g. an apocalypse bet. [2] Suppose Bob and I want to bet on whether smarter-than-human AI is technologically feasible. Trouble is, if it’s ever proven that superhuman AI is feasible, that event might overthrow the global economy, making it hard to collect the bet, or at least pointless.

But suppose Bob and I agree that AI scientists, or computer scientists, or technology advisors to first-world governments, or some other set of experts, is likely to converge toward the true answer on the feasibility of superhuman AI as time passes, as humanity learns more, etc. Then we can instead make a bet on whether it will be the case, 20 years from now, that a significantly increased or decreased fraction of those experts will think superhuman AI is feasible.

Often, there won’t be acceptable polls of the experts at both times, for settling the bet. But domain experts typically have a general sense of whether some view has become more or less common in their field over time. So Bob and I could agree to poll a randomly chosen subset of our chosen expert community 20 years from now, asking them how common the view in question is at that time and how common it was 20 years earlier, and settle our bet that way.

Getting the details right for this sort of long-term bet isn’t trivial, but I don't see a fatal flaw. Is there a fatal flaw in the idea that I’ve missed? [3]


  1. I can’t recall exactly how the conversation went, but it was something like this.  ↩

  2. See also Jones, How to bet on bad futures.  ↩

  3. I also doubt I’m the first person to describe this idea in writing: please link to other articles making this point if you know of any.  ↩

MIRI's Winter 2013 Matching Challenge

20 lukeprog 17 December 2013 08:41PM

Update: The fundraiser has been completed! Details here. The original post follows...



(Cross-posted from MIRI's blog. MIRI maintains Less Wrong, with generous help from Trike Apps, and much of the core content is written by salaried MIRI staff members.)

Thanks to Peter Thiel, every donation made to MIRI between now and January 15th, 2014 will be matched dollar-for-dollar!

Also, gifts from "new large donors" will be matched 3x! That is, if you've given less than $5k to SIAI/MIRI ever, and you now give or pledge $5k or more, Thiel will donate $3 for every dollar you give or pledge.

We don't know whether we'll be able to offer the 3:1 matching ever again, so if you're capable of giving $5k or more, we encourage you to take advantage of the opportunity while you can. Remember that:

  • If you prefer to give monthly, no problem! If you pledge 6 months of monthly donations, your full 6-month pledge will be the donation amount to be matched. So if you give monthly, you can get 3:1 matching for only $834/mo (or $417/mo if you get matching from your employer).
  • We accept Bitcoin (BTC) and Ripple (XRP), both of which have recently jumped in value. If the market value of your Bitcoin or Ripple is $5k or more on the day you make the donation, this will count for matching.
  • If your employer matches your donations at 1:1 (check here), then you can take advantage of Thiel's 3:1 matching by giving as little as $2,500 (because it's $5k after corporate matching).

Please email malo@intelligence.org if you intend on leveraging corporate matching or would like to pledge 6 months of monthly donations, so that we can properly account for your contributions towards the fundraiser.

Thiel's total match is capped at $250,000. The total amount raised will depend on how many people take advantage of 3:1 matching. We don't anticipate being able to hit the $250k cap without substantial use of 3:1 matching — so if you haven't given $5k thus far, please consider giving/pledging $5k or more during this drive. (If you'd like to know the total amount of your past donations to MIRI, just ask malo@intelligence.org.)

Now is your chance to double or quadruple your impact in funding our research program.

Donate Today

Accomplishments Since Our July 2013 Fundraiser Launched:

How Will Marginal Funds Be Used?

  • Hiring Friendly AI researchers, identified through our workshops, as they become available for full-time work at MIRI.
  • Running more workshops (next one begins Dec. 14th), to make concrete Friendly AI research progress, to introduce new researchers to open problems in Friendly AI, and to identify candidates for MIRI to hire.
  • Describing more open problems in Friendly AI. Our current strategy is for Yudkowsky to explain them as quickly as possible via Facebook discussion, followed by more structured explanations written by others in collaboration with Yudkowsky.
  • Improving humanity's strategic understanding of what to do about superintelligence. In the coming months this will include (1) additional interviews and analyses on our blog, (2) a reader's guide for Nick Bostrom's forthcoming Superintelligence book, and (3) an introductory ebook currently titled Smarter Than Us.

Other projects are still being surveyed for likely cost and impact.

We appreciate your support for our work! Donate now, and seize a better than usual chance to move our work forward. If you have questions about donating, please contact Louie Helm at (510) 717-1477 or louie@intelligence.org. Screenshot Service provided by LinkPeek.com.

View more: Next