The Cryonics Strategy Space

24 Froolow 24 April 2014 04:11PM

In four paragraphs I’m going to claim, “It is highly likely reading this article will increase your chance of living forever”. I’m pretty sure you won’t disagree with me. First, however, I’d like to talk about how much I don’t like Monopoly.

I play a lot of Monopoly, because I am forced into it – against my will – by friends, family, work-related-bonding etc. I understand this is a controversial opinion, but I really, really don’t like Monopoly – there is very little scope for creative play. In fact there is so little scope for create play I spotted that I could win at Monopoly, in a probabilistic sense, by going online and looking up the optimal allocation of houses to properties and valuation of houses in the ‘bargaining’ mid-game. For a while, the fact that nobody but me played ‘perfect’ Monopoly meant I won nearly every game, and I felt much better about playing because games tended to conclude more quickly when one player was a soulless, utility-hungry robot – it left me more time to concentrate on the stuff I actually enjoyed, which was socialising.

But Monopoly, despite being an almost completely deterministic dice-rolling game, hides unexpected complexity; a salutary lesson for an aspiring rationalist. Winning the game was completely secondary to my actual aim, which was forcing the game to take as little time as possible. I realised a few months ago that it didn’t matter who won, as long as somebody won quickly, and it was very unlikely the strategy optimised for one player was the same as the strategy optimised for all of them. As a consequence, I reran the computer simulations I built and developed an optimal ‘turn reducing’ strategy (it won’t surprise you to know that the basic rule is ‘play with as much variance as you possibly can’; having one maverick player lowers the average number of turns to the first bankruptcy, and bankruptcy is gamebreaking in Monopoly).

I agree that I could lower the number of turns even more by simply flipping over the board and storming out when someone suggests I play, but let’s assume I am also trying to balance a nebulously-defined but nonetheless real value of ‘not losing all my friends’, which is satisfied when I play a risky-but-exciting strategy and not satisfied when I constantly demand to play games I find fun. The point is, I had what Kuhn calls a ‘paradigm shift’ – once I realised that my goal when playing Monopoly was not to win as quickly as possible, but to ensure anyone wins as quickly as possible I was able to greatly, greatly increase my utility with no troublesome side-effects.

I’m relating this story to you because noticing my aims and strategy weren’t perfectly aligned improved my experience of Monopoly without doing anything difficult like hacking my motivation, and I’m sure you have similar stories of paradigm shifts improving your experience of a certain event (I hear people talk about the day they discovered coding was fun once they learned the rules, or maths was awesome once they got past the spadework. That has yet to happen to me, but my experience at thrashing my friends at a children’s board games means I can totally relate). What’s striking with these paradigm shifts is how obvious the conclusion seems in retrospect, and how opaque it seemed before the lightbulb moment. With that in mind, let me make a claim you might find concerning; “The aims and strategy of people who want to live forever are highly likely to be out of alignment”. In particular, from what I read on LW and other pro-cryo communities, the strategy-space explored is vastly smaller than the strategy-space of all possible cryonics strategies. Indeed, the strategy space explored by people who want to live forever is – in some ways – smaller than that explored by me while trying to get out of playing tedious boardgames. I’m going to talk about that strategy space a little in this article, mostly with the aim of triggering a ‘lightbulb moment’ – if there are any to be had – in readers a lot more committed to cryonics than me. To draw an obvious conclusion, if there are such lightbulb moments to be had, it is highly likely reading this article will increase your chance of living forever by increasing the size of the cryonics strategy space you consider.

That the strategy space explored is small is pretty hard to disagree with; there is an option to freeze or not-freeze in the first place, go with Alcor or The Cryonics Institute (or possibly KryoRuss), go for your full body or just your head and – maybe – whether to hang on for plastination or begin investing in cryonics insurance now. As far as I can tell, more ‘fringe’ options are not discussed with very much regularity. A search of the LW archives turned up this thread which was along similar lines, but didn’t trigger anything like the discussion I thought it would; this surprises me – when the ‘prize’ for picking a marginal improvement in your cryonics strategy that doubles your chance of revivification is that you double your chance of living forever, I’m highly surprised the cryonics strategy space is not exhaustively searched at this point, certainly amongst people who turn rationality into an art form.

For example, there are at least three ways I can think of to raise your chance of being successfully frozen:

·         (Sensible) Redundancy cryonics: Make redundant copies of the information you intend to preserve. For example, MRI scans of your brain and detailed notes on your reactions to certain stimuli. In the event that current technology almost-but-not-quite preserves information in the brain, your notes and images might help future scientists reconstruct your personality. You might even go further and send hippocampal slices to multiple cryonics facilities, gambling on the fact that the increased probability of at least one facility’s survival outweighs the lower probability of revivification from a single hippocampal slice.

·         (Sensible) Diversified cryonics: In addition to cryonics, employ some other strategy(s) which might result in you living forever but which are as completely uncorrelated with the success or failure of cryonics as you can manage given that ‘the complete destruction of the earth on a molecular level by a malevolent alien race’ correlates with many bad outcomes and few good ones. I actually have a list of about ten of these, which I will happily make available on request (i.e. I’ll write another discussion post about them if people are interested) but I don’t want the whole discussion of this post to be about this one single issue, which it was when I tried the content of the post out on my friend. This is about the cryonics strategy-space only, not the living-forever strategy space, which is much bigger.

·         (Inadvisable) Suicide cryonics: Calculate the point at which your belief in the utility of cryonics outweighs the expected utility of the rest of your life (this will likely come a few seconds before the average age of death in your demographic). Kill yourself in the most cryonics-friendly way you can imagine, which I suspect will involve injecting yourself with toxic cryoprotectants on top of a platform suspended over a large vat of liquid nitrogen so that when you collapse, you collapse into the nitrogen and freeze yourself (which should limit the amount of time the dead brain is at body-temperature). If you are not concerned about your body, you should also try to decapitate yourself as you fall to raise the surface area to volume ratio of the object you are trying to freeze.

Here are three ways that raise your chance of successfully remaining frozen:

·         (Sensible) Positive cryonics: Lobby for laws that ensure the government protects your body. Either lobby for these laws directly (I talked about a ‘right to not-death’ in my last post on this subject) or promise to report to future!USA’s equivalent of the Department of Defence to see if they can weaponise any microbes on you after you’re unfrozen. Remember that we’re talking in terms of expected utility here; the chance that such lobbying is effective is minute, but it might be an effective way to spend your twilight years if you would otherwise be unproductive.

·         (Sensible, but worryingly immoral) Negative cryonics: Sabotage as many cryonics labs as possible before going under, or lobby for laws that make it illegal to freeze yourself which only come into force after you die. This raises the chances that you are the James Bedford of modern cryonics and society has a particular interest in keeping your body safe. Note that though sabotaging an entire lab is difficult and illegal, trashing the field of cryonics itself is pretty easy and socially high-status because people already think it’s pretty weird – you’d predict that at least some detractors of cryonics are actually extremely pro-cryonics and trying to raise their chances of being kept frozen as a cultural curiosity rather than as only one of millions of corpsicles.

·         (Sensible if your name is Lex Luthor, otherwise implausible) Ninja cryonics: Build a cryonics pod yourself, with enough liquid nitrogen to keep you frozen for several thousand years, known only to the highly trusted individual who transfers your cryo-preserved body from Alcor to this location (if you could somehow get yourself into an unprotected far-earth orbit after freezing this would be perfect). Hope that your pod is discovered by friendly future-humans before you run out of coolant. This is insurance against the possibility that society destroys all cryonics labs somehow and then later regrets it (although, now I think about it, someone following this strategy certainly wouldn’t tell anyone about it on a public forum…)

Here are three ways that raise your chance of successfully being revived:

·         (Sensible if legal) Compound-interest cryonics: Devote a small chunk of your resources towards a fund which you expect to grow faster than the rate of inflation, with exponential growth (the simplest example would be a bank account with a variable rate that pays epsilon percent higher than the rate of inflation in perpetuity). Sign a contract saying the person(s) who revive you receive the entire pot. Since after a few thousand years the pot will nominally contain almost all the money in the world this strategy will eventually incentivise almost the entire world to dedicate itself to seeking your revival. Although this strategy will not work if postscarcity happens before unfreezing, it collapses into the conventional cryonics problem and therefore costs you no more than the opportunity cost of spending the capital in the fund before you die. (Although apparently this is illegal)

·         (Sensible) Cultural-value cryonics: Freeze yourself with something which is relatively cheap now, but you predict might be worth a lot of money in the future. I suspect that – for example – rare earth metals or gold might be a decent guess at something that will increase in value whatever society does, but the real treasure trove will be things like first-editions of books you expect might become classics in the future, original paintings by artists who might become very trendy in the 25th Century or photographs of an important historic event which will become disputed or lionised in the future (my best bet would be anything involving the relationship between China and America if we’re talking a few centuries, and pre-technology parts of Africa if we’re talking millennia). It’s hard to believe even a post-singularity society won’t have some social signalling remaining, so you’ve got a respectable chance of finding a buyer for these artefacts. These fantastically valuable artefacts will be used to pay your way in a society where – thanks to the Flynn effect – you will have an IQ which breaks the curve at the ‘dangerously stupid’ end and you might not be able to survive otherwise. Be careful nobody knows you’re doing this, otherwise your cryopod will be raided like an Egyptian tomb! Even disregarding this financial advice, it might be a good idea to ensure you freeze yourself with e.g. a beloved pet, or the complete works of Shakespeare. This ensures that even if future society is so totally different to what you were expecting you will still have some information-age artefacts to protect you from culture-shock.

·         (Inadvisable and high-risk) Game-theory cryonics: Set up an alarm on your cryonics pod that unthaws you after five hundred years. This is insurance against the possibility that society is able to unfreeze you, but chooses not to, since no society would just let you die (you hope). You could go more supervillain-y than this by planting a deadly bomb somewhere, timed to go off in five hundred years unless you enter a 128-digit disarming key. This should incentivise society to develop revivification processes as a matter of urgency. Bear in mind if it is easier for future society to develop extremely strong counter-cryptography or radiation shielding your plan may backfire as research that would have been undertaken in cryopreservation is redeployed to stop your diabolical scheme.

I think most of these strategies have never been written about before, and of those that have been written about they have all been throwaway thought experiments on LW. Given that the strategy space of cryonics strategies is much bigger than cryonics advocates appear to instinctively gravitate around, I conclude that it is very unlikely there has been a serious effort to optimise the cryonics process beyond the scientific advances made by Alcor (and hence it is very unlikely we have all hit upon the optimal strategy by chance). This is especially true because the optimal strategy in some cases depends on the probability that the future resembles certain kinds of predictions, and I know people disagree over those predictions on LW. For example, the ratio of culturally-valuable artefacts to sanity-preserving artefacts you should take with you probably depends on the relative likelihood you assign that a post-scarcity or post-singularity world will be the one to revive you. I’m not in a very good position to make that particular judgement myself, but I am in a good position to say that there is a very real opportunity cost to considering a narrow strategy space when considering life-extending strategies, just as there is an opportunity cost when considering over-narrow Monopoly strategies. In the first case, the impact of your decision might result in you throwing your life away. In the second, it only feels like it does.

Is my view contrarian?

22 lukeprog 11 March 2014 05:42PM

Previously: Contrarian Excuses, The Correct Contrarian Cluster, What is bunk?, Common Sense as a Prior, Trusting Expert Consensus, Prefer Contrarian Questions.

Robin Hanson once wrote:

On average, contrarian views are less accurate than standard views. Honest contrarians should admit this, that neutral outsiders should assign most contrarian views a lower probability than standard views, though perhaps a high enough probability to warrant further investigation. Honest contrarians who expect reasonable outsiders to give their contrarian view more than normal credence should point to strong outside indicators that correlate enough with contrarians tending more to be right.

I tend to think through the issue in three stages:

  1. When should I consider myself to be holding a contrarian[1] view? What is the relevant expert community?
  2. If I seem to hold a contrarian view, when do I have enough reason to think I’m correct?
  3. If I seem to hold a correct contrarian view, what can I do to give other people good reasons to accept my view, or at least to take it seriously enough to examine it at length?

I don’t yet feel that I have “answers” to these questions, but in this post (and hopefully some future posts) I’d like to organize some of what has been said before,[2] and push things a bit further along, in the hope that further discussion and inquiry will contribute toward significant progress in social epistemology.[3] Basically, I hope to say a bunch of obvious things, in a relatively well-organized fashion, so that less obvious things can be said from there.[4]

In this post, I’ll just address stage 1. Hopefully I’ll have time to revisit stages 2 and 3 in future posts.

 

Is my view contrarian?

World model differences vs. value differences

Is my effective altruism a contrarian view? It seems to be more of a contrarian value judgment than a contrarian world model,[5] and by “contrarian view” I tend to mean “contrarian world model.” Some apparently contrarian views are probably actually contrarian values.

 

Expert consensus

Is my atheism a contrarian view? It’s definitely a world model, not a value judgment, and only 2% of people are atheists.

But what’s the relevant expert population, here? Suppose it’s “academics who specialize in the arguments and evidence concerning whether a god or gods exist.” If so, then the expert population is probably dominated by academic theologians and religious philosophers, and my atheism is a contrarian view.

We need some heuristics for evaluating the soundness of the academic consensus in different fields. [6]

For example, we should consider the selection effects operating on communities of experts. If someone doesn’t believe in God, they’re unlikely to spend their career studying arcane arguments for and against God’s existence. So most people who specialize in this topic are theists, but nearly all of them were theists before they knew the arguments.

Perhaps instead the relevant expert community is “scholars who study the fundamental nature of the universe” — maybe, philosophers and physicists? They’re mostly atheists. [7] This is starting to get pretty ad-hoc, but maybe that’s unavoidable.

What about my view that the overall long-term impact of AGI will be, most likely, extremely bad? A recent survey of the top 100 authors in artificial intelligence (by citation index)[8] suggests that my view is somewhat out of sync with the views of those researchers.[9] But is that the relevant expert population? My impression is that AI experts know a lot about contemporary AI methods, especially within their subfield, but usually haven’t thought much about, or read much about, long-term AI impacts.

Instead, perhaps I’d need to survey “AGI impact experts” to tell whether my view is contrarian. But who is that, exactly? There’s no standard credential.

Moreover, the most plausible candidates around today for “AGI impact experts” are — like the “experts” of many other fields — mere “scholastic experts,” in that they[10] know a lot about the arguments and evidence typically brought to bear on questions of long-term AI outcomes.[11] They generally are not experts in the sense of “Reliably superior performance on representative tasks” — they don’t have uniquely good track records on predicting long-term AI outcomes, for example. As far as I know, they don’t even have uniquely good track records on predicting short-term geopolitical or sci-tech outcomes — e.g. they aren’t among the “super forecasters” discovered in IARPA’s forecasting tournaments.

Furthermore, we might start to worry about selection effects, again. E.g. if we ask AGI experts when they think AGI will be built, they may be overly optimistic about the timeline: after all, if they didn’t think AGI was feasible soon, they probably wouldn’t be focusing their careers on it.

Perhaps we can salvage this approach for determining whether one has a contrarian view, but for now, let’s consider another proposal.

 

Mildly extrapolated elite opinion

Nick Beckstead instead suggests that, at least as a strong prior, one should believe what one thinks “a broad coalition of trustworthy people would believe if they were trying to have accurate views and they had access to [one’s own] evidence.”[12] Below, I’ll propose a modification of Beckstead’s approach which aims to address the “Is my view contrarian?” question, and I’ll call it the “mildly extrapolated elite opinion” (MEEO) method for determining the relevant expert population. [13]

First: which people are “trustworthy”? With Beckstead, I favor “giving more weight to the opinions of people who can be shown to be trustworthy by clear indicators that many people would accept, rather than people that seem trustworthy to you personally.” (This guideline aims to avoid parochialism and self-serving cognitive biases.)

What are some “clear indicators that many people would accept”? Beckstead suggests:

IQ, business success, academic success, generally respected scientific or other intellectual achievements, wide acceptance as an intellectual authority by certain groups of people, or success in any area where there is intense competition and success is a function of ability to make accurate predictions and good decisions…

Of course, trustworthiness can also be domain-specific. Very often, elite common sense would recommend deferring to the opinions of experts (e.g., listening to what physicists say about physics, what biologists say about biology, and what doctors say about medicine). In other cases, elite common sense may give partial weight to what putative experts say without accepting it all (e.g. economics and psychology). In other cases, they may give less weight to what putative experts say (e.g. sociology and philosophy).

Hence MEEO outsources the challenge of evaluating academic consensus in different fields to the “generally trustworthy people.” But in doing so, it raises several new challenges. How do we determine which people are trustworthy? How do we “mildly extrapolate” their opinions? How do we weight those mildly extrapolated opinions in combination?

This approach might also be promising, or it might be even harder to use than the “expert consensus” method.

 

My approach

In practice, I tend to do something like this:

  • To determine whether my view is contrarian, I ask whether there’s a fairly obvious, relatively trustworthy expert population on the issue. If there is, I try to figure out what their consensus on the matter is. If it’s different than my view, I conclude I have a contrarian view.
  • If there isn’t an obvious trustworthy expert population on the issue from which to extract a consensus view, then I basically give up on step 1 (“Is my view contrarian?”) and just move to the model combination in step 2 (see below), retaining pretty large uncertainty about how contrarian my view might be.


When do I have good reason to think I’m correct?

Suppose I conclude I have a contrarian view, as I plausibly have about long-term AGI outcomes,[14] and as I might have about the technological feasibility of preserving myself via cryonics.[15] How much evidence do I need to conclude that my view is justified despite the informed disagreement of others?

I’ll try to tackle that question in a future post. Not surprisingly, my approach is a kind of model combination and adjustment.

 

 


  1. I don’t have a concise definition for what counts as a “contrarian view.” In any case, I don’t think that searching for an exact definition of “contrarian view” is what matters. In an email conversation with me, Holden Karnofsky concurred, making the point this way: “I agree with you that the idea of ‘contrarianism’ is tricky to define. I think things get a bit easier when you start looking for patterns that should worry you rather than trying to Platonically define contrarianism… I find ‘Most smart people think I’m bonkers about X’ and ‘Most people who have studied X more than I have plus seem to generally think like I do think I’m wrong about X’ both worrying; I find ‘Most smart people think I’m wrong about X’ and ‘Most people who spend their lives studying X within a system that seems to be clearly dysfunctional and to have a bad track record think I’m bonkers about X’ to be less worrying.”  ↩

  2. For a diverse set of perspectives on the social epistemology of disagreement and contrarianism not influenced (as far as I know) by the Overcoming Bias and Less Wrong conversations about the topic, see Christensen (2009); Ericsson et al. (2006); Kuchar (forthcoming); Miller (2013); Gelman (2009); Martin & Richards (1995); Schwed & Bearman (2010); Intemann & de Melo-Martin (2013). Also see Wikipedia’s article on scientific consensus.  ↩

  3. I suppose I should mention that my entire inquiry here is, ala Goldman (1998), premised on the assumptions that (1) the point of epistemology is the pursuit of correspondence-theory truth, and (2) the point of social epistemology is to evaluate which social institutions and practices have instrumental value for producing true or well-calibrated beliefs.  ↩

  4. I borrow this line from Chalmers (2014): “For much of the paper I am largely saying the obvious, but sometimes the obvious is worth saying so that less obvious things can be said from there.”  ↩

  5. Holden Karnofsky seems to agree: “I think effective altruism falls somewhere on the spectrum between ‘contrarian view’ and ‘unusual taste.’ My commitment to effective altruism is probably better characterized as ‘wanting/choosing to be an effective altruist’ than as ‘believing that effective altruism is correct.’”  ↩

  6. Without such heuristics, we can also rather quickly arrive at contradictions. For example, the majority of scholars who specialize in Allah’s existence believe that Allah is the One True God, and the majority of scholars who specialize in Yahweh’s existence believe that Yahweh is the One True God. Consistency isn’t everything, but contradictions like this should still be a warning sign.  ↩

  7. According to the PhilPapers Surveys, 72.8% of philosophers are atheists, 14.6% are theists, and 12.6% categorized themselves as “other.” If we look only at metaphysicians, atheism remains dominant at 73.7%. If we look only at analytic philosophers, we again see atheism at 76.3%. As for physicists: Larson & Witham (1997) found that 77.9% of physicists and astronomers are disbelievers, and Pew Research Center (2009) found that 71% of physicists and astronomers did not believe in a god.  ↩

  8. Muller & Bostrom (forthcoming). “Future Progress in Artificial Intelligence: A Poll Among Experts.”  ↩

  9. But, this is unclear. First, I haven’t read the forthcoming paper, so I don’t yet have the full results of the survey, along with all its important caveats. Second, distributions of expert opinion can vary widely between polls. For example, Schlosshauer et al. (2013) reports the results of a poll given to participants in a 2011 quantum foundations conference (mostly physicists). When asked “When will we have a working and useful quantum computer?”, 9% said “within 10 years,” 42% said “10–25 years,” 30% said “25–50 years,” 0% said “50–100 years,” and 15% said “never.” But when the exact same questions were asked of participants at another quantum foundations conference just two years later, Norsen & Nelson (2013) report, the distribution of opinion was substantially different: 9% said “within 10 years,” 22% said “10–25 years,” 20% said “25–50 years,” 21% said “50–100 years,” and 12% said “never.”  ↩

  10. I say “they” in this paragraph, but I consider myself to be a plausible candidate for an “AGI impact expert,” in that I’m unusually familiar with the arguments and evidence typically brought to bear on questions of long-term AI outcomes. I also don’t have a uniquely good track record on predicting long-term AI outcomes, nor am I among the discovered “super forecasters.” I haven’t participated in IARPA’s forecasting tournaments myself because it would just be too time consuming. I would, however, very much like to see these super forecasters grouped into teams and tasked with forecasting longer-term outcomes, so that we can begin to gather scientific data on which psychological and computational methods result in the best predictive outcomes when considering long-term questions. Given how long it takes to acquire these data, we should start as soon as possible.  ↩

  11. Weiss & Shanteau (2012) would call them “privileged experts.”  ↩

  12. Beckstead’s “elite common sense” prior and my “mildly extrapolated elite opinion” method are epistemic notions that involve some kind idealization or extrapolation of opinion. One earlier such proposal in social epistemology was Habermas’ “ideal speech situation,” a situation of unlimited discussion between free and equal humans. See Habermas’ “Wahrheitstheorien” in Schulz & Fahrenbach (1973) or, for an English description, Geuss (1981), pp. 65–66. See also the discussion in Tucker (2003), pp. 502–504.  ↩

  13. Beckstead calls his method the “elite common sense” prior. I’ve named my method differently for two reasons. First, I want to distinguish MEEO from Beckstead’s prior, since I’m using the method for a slightly different purpose. Second, I think “elite common sense” is a confusing term even for Beckstead’s prior, since there’s some extrapolation of views going on. But also, it’s only a “mild” extrapolation — e.g. we aren’t asking what elites would think if they knew everything, or if they could rewrite their cognitive software for better reasoning accuracy.  ↩

  14. My rough impression is that among the people who seem to have thought long and hard about AGI outcomes, and seem to me to exhibit fairly good epistemic practices on most issues, my view on AGI outcomes is still an outlier in its pessimism about the likelihood of desirable outcomes. But it’s hard to tell: there haven’t been systematic surveys of the important-to-me experts on the issue. I also wonder whether my views about long-term AGI outcomes are more a matter of seriously tackling a contrarian question rather than being a matter of having a particularly contrarian view. On this latter point, see this Facebook discussion.  ↩

  15. I haven’t seen a poll of cryobiologists on the likely future technological feasibility of cryonics. Even if there were such polls, I’d wonder whether cryobiologists also had the relevant philosophical and neuroscientific expertise. I should mention that I’m not personally signed up for cryonics, for these reasons.  ↩

New LW Meetup: Saint Petersburg

2 FrankAdamek 18 October 2013 04:09PM

Meetup : Saint Petersburg, Russia

2 efim 08 October 2013 08:46PM

Discussion article for the meetup : Saint Petersburg, Russia

WHEN: 27 October 2013 04:00:00PM (+0400)

WHERE: Санкт-Петербург, м. Технологический Институт, ул.1-я Красноармейская, дом 15

Come to the (probably) first in a long time meetup in St.Petersburg!

It will be located in cafe "PMG" - for detailed description please read mailing list announcement. At least for the first time there will be no or little moderation - just socialising, getting to know each other, unstructured discussion. We will be looking at our interests and how meetups can meet them.

Please look in mailing list for more information:https://groups.google.com/forum/#!forum/less-wrong-saint-petersburg

Or contact me on 8 911 843 56 44. Every day from 18-00 to 00-00.

Discussion article for the meetup : Saint Petersburg, Russia

MIRI's 2013 Summer Matching Challenge

23 lukeprog 23 July 2013 07:05PM

(MIRI maintains Less Wrong, with generous help from Trike Apps, and much of the core content is written by salaried MIRI staff members.)

Update 09-15-2013: The fundraising drive has been completed! My thanks to everyone who contributed.

The original post follows below...

 

 

 

 

Thanks to the generosity of several major donors, every donation to the Machine Intelligence Research Institute made from now until (the end of) August 15th, 2013 will be matched dollar-for-dollar, up to a total of $200,000!  

Donate Now!

Now is your chance to double your impact while helping us raise up to $400,000 (with matching) to fund our research program.

This post is also a good place to ask your questions about our activities and plans — just post a comment!

If you have questions about what your dollars will do at MIRI, you can also schedule a quick call with MIRI Deputy Director Louie Helm: louie@intelligence.org (email), 510-717-1477 (phone), louiehelm (Skype).

progress bar


Early this year we made a transition from movement-building to research, and we've hit the ground running with six major new research papers, six new strategic analyses on our blog, and much more. Give now to support our ongoing work on the future's most important problem.

Accomplishments in 2013 so far

Future Plans You Can Help Support

  • We will host many more research workshops, including one in September in Berkeley, one in December (with John Baez attending) in Berkeley, and one in Oxford, UK (dates TBD).
  • Eliezer will continue to publish about open problems in Friendly AI. (Here is #1 and #2.)
  • We will continue to publish strategic analyses and expert interviews, mostly via our blog.
  • We will publish nicely-edited ebooks (Kindle, iBooks, and PDF) for more of our materials, to make them more accessible: The Sequences, 2006-2009 and The Hanson-Yudkowsky AI Foom Debate.
  • We will continue to set up the infrastructure (e.g. new offices, researcher endowments) required to host a productive Friendly AI research team, and (over several years) recruit enough top-level math talent to launch it.
  • We hope to hire an experienced development director (job ad not yet posted), so that the contributions of our current supporters can be multiplied even further by a professional fundraiser.

(Other projects are still being surveyed for likely cost and strategic impact.)

We appreciate your support for our high-impact work! Donate now, and seize a better than usual chance to move our work forward.

If you have questions about donating, please contact Louie Helm at (510) 717-1477 or louie@intelligence.org.

$200,000 of total matching funds has been provided by Jaan Tallinn, Loren Merritt, Rick Schwall, and Alexei Andreev.

Prisoner's dilemma tournament results

32 AlexMennen 09 July 2013 08:50PM

The prisoner's dilemma tournament is over. There were a total of 21 entries. The winner is Margaret Sy, with a total of 39 points. 2nd and 3rd place go to rpglover64 and THE BLACK KNIGHT, with scores of 38 and 36 points respectively. There were some fairly intricate strategies in the tournament, but all three of these top scorers submitted programs that completely ignored the source code of the other player and acted randomly, with the winner having a bias towards defecting.

You can download a chart describing the outcomes here, and the source codes for the entries can be downloaded here.

I represented each submission with a single letter while running the tournament. Here is a directory of the entries, along with their scores: (some people gave me a term to refer to the player by, while others gave me a term to refer to the program. I went with whatever they gave me, and if they gave me both, I put the player first and then the program)

A: rpglover64 (38)
B: Watson Ladd (27)
c: THE BLACK KNIGHT (36)
D: skepsci (24)
E: Devin Bayer (30)
F: Billy, Mimic-- (27)
G: itaibn (34)
H: CooperateBot (24)
I: Sean Nolan (28)
J: oaz (26)
K: selbram (34)
L: Alexei (25)
M: LEmma (25)
N: BloodyShrimp (34)
O: caa (32)
P: nshepperd (25)
Q: Margaret Sy (39)
R: So8res, NateBot (33)
S: Quinn (33)
T: HonoreDB (23)
U: SlappedTogetherAtTheLastMinuteBot (20)


Engelbart: Insufficiently Recursive

11 Eliezer_Yudkowsky 26 November 2008 08:31AM

Followup toCascades, Cycles, Insight, Recursion, Magic
Reply toEngelbart As Ubertool?

When Robin originally suggested that Douglas Engelbart, best known as the inventor of the computer mouse, would have been a good candidate for taking over the world via compound interest on tools that make tools, my initial reaction was "What on Earth?  With a mouse?"

On reading the initial portions of Engelbart's "Augmenting Human Intellect: A Conceptual Framework", it became a lot clearer where Robin was coming from.

Sometimes it's hard to see through the eyes of the past.  Engelbart was a computer pioneer, and in the days when all these things were just getting started, he had a vision of using computers to systematically augment human intelligence.  That was what he thought computers were for.  That was the ideology lurking behind the mouse.  Something that makes its users smarter - now that sounds a bit more plausible as an UberTool.

Looking back at Engelbart's plans with benefit of hindsight, I see two major factors that stand out:

  1. Engelbart committed the Classic Mistake of AI: underestimating how much cognitive work gets done by hidden algorithms running beneath the surface of introspection, and overestimating what you can do by fiddling with the visible control levers.
  2. Engelbart anchored on the way that someone as intelligent as Engelbart would use computers, but there was only one of him - and due to point 1 above, he couldn't use computers to make other people as smart as him.

continue reading »

RIP Doug Engelbart

11 Dr_Manhattan 03 July 2013 07:19PM

Prisoner's Dilemma (with visible source code) Tournament

47 AlexMennen 07 June 2013 08:30AM

After the iterated prisoner's dilemma tournament organized by prase two years ago, there was discussion of running tournaments for several variants, including one in which two players submit programs, each of which are given the source code of the other player's program, and outputs either “cooperate” or “defect”. However, as far as I know, no such tournament has been run until now.

Here's how it's going to work: Each player will submit a file containing a single Scheme lambda-function. The function should take one input. Your program will play exactly one round against each other program submitted (not including itself). In each round, two programs will be run, each given the source code of the other as input, and will be expected to return either of the symbols “C” or “D” (for "cooperate" and "defect", respectively). The programs will receive points based on the following payoff matrix:

“Other” includes any result other than returning “C” or “D”, including failing to terminate, throwing an exception, and even returning the string “Cooperate”. Notice that “Other” results in a worst-of-both-worlds scenario where you get the same payoff as you would have if you cooperated, but the other player gets the same payoff as if you had defected. This is an attempt to ensure that no one ever has incentive for their program to fail to run properly, or to trick another program into doing so.

Your score is the sum of the number of points you earn in each round. The player with the highest score wins the tournament. Edit: There is a 0.5 bitcoin prize being offered for the winner. Thanks, VincentYu!

Details:
All submissions must be emailed to wardenPD@gmail.com by July 5, at noon PDT (Edit: that's 19:00 UTC). Your email should also say how you would like to be identified when I announce the tournament results.
Each program will be allowed to run for 10 seconds. If it has not returned either “C” or “D” by then, it will be stopped, and treated as returning “Other”. For consistency, I will have Scheme collect garbage right before each run.
One submission per person or team. No person may contribute to more than one entry. Edit: This also means no copying from each others' source code. Describing the behavior of your program to others is okay.
I will be running the submissions in Racket. You may be interested in how Racket handles time (especially the (current-milliseconds) function), threads (in particular, “thread”, “kill-thread”, “sleep”, and “thread-dead?”), and possibly randomness.
Don't try to open the file you wrote your program in (or any other file, for that matter). I'll add code to the file before running it, so if you want your program to use a copy of your source code, you will need to use a quine. Edit: No I/O of any sort.
Unless you tell me otherwise, I assume I have permission to publish your code after the contest.
You are encouraged to discuss strategies for achieving mutual cooperation in the comments thread.
I'm hoping to get as many entries as possible. If you know someone who might be interested in this, please tell them.
It's possible that I've said something stupid that I'll have to change or clarify, so you might want to come back to this page again occasionally to look for changes to the rules. Any edits will be bolded, and I'll try not to change anything too drastically, or make any edits late in the contest.

Here is an example of a correct entry, which cooperates with you if and only if you would cooperate with a program that always cooperates (actually, if and only if you would cooperate with one particular program that always cooperates):

(lambda (x)
    (if (eq? ((eval x) '(lambda (y) 'C)) 'C)
        'C
        'D))

Tiling Agents for Self-Modifying AI (OPFAI #2)

55 Eliezer_Yudkowsky 06 June 2013 08:24PM

An early draft of publication #2 in the Open Problems in Friendly AI series is now available:  Tiling Agents for Self-Modifying AI, and the Lobian Obstacle.  ~20,000 words, aimed at mathematicians or the highly mathematically literate.  The research reported on was conducted by Yudkowsky and Herreshoff, substantially refined at the November 2012 MIRI Workshop with Mihaly Barasz and Paul Christiano, and refined further at the April 2013 MIRI Workshop.

Abstract:

We model self-modication in AI by introducing 'tiling' agents whose decision systems will approve the construction of highly similar agents, creating a repeating pattern (including similarity of the offspring's goals).  Constructing a formalism in the most straightforward way produces a Godelian difficulty, the Lobian obstacle.  By technical methods we demonstrate the possibility of avoiding this obstacle, but the underlying puzzles of rational coherence are thus only partially addressed.  We extend the formalism to partially unknown deterministic environments, and show a very crude extension to probabilistic environments and expected utility; but the problem of finding a fundamental decision criterion for self-modifying probabilistic agents remains open.

Commenting here is the preferred venue for discussion of the paper.  This is an early draft and has not been reviewed, so it may contain mathematical errors, and reporting of these will be much appreciated.

The overall agenda of the paper is introduce the conceptual notion of a self-reproducing decision pattern which includes reproduction of the goal or utility function, by exposing a particular possible problem with a tiling logical decision pattern and coming up with some partial technical solutions.  This then makes it conceptually much clearer to point out the even deeper problems with "We can't yet describe a probabilistic way to do this because of non-monotonicity" and "We don't have a good bounded way to do this because maximization is impossible, satisficing is too weak and Schmidhuber's swapping criterion is underspecified."  The paper uses first-order logic (FOL) because FOL has a lot of useful standard machinery for reflection which we can then invoke; in real life, FOL is of course a poor representational fit to most real-world environments outside a human-constructed computer chip with thermodynamically expensive crisp variable states.

As further background, the idea that something-like-proof might be relevant to Friendly AI is not about achieving some chimera of absolute safety-feeling, but rather about the idea that the total probability of catastrophic failure should not have a significant conditionally independent component on each self-modification, and that self-modification will (at least in initial stages) take place within the highly deterministic environment of a computer chip.  This means that statistical testing methods (e.g. an evolutionary algorithm's evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals).  Mathematical proofs have the property that they are as strong as their axioms and have no significant conditionally independent per-step failure probability if their axioms are semantically true, which suggests that something like mathematical reasoning may be appropriate for certain particular types of self-modification during some developmental stages.

Thus the content of the paper is very far off from how a realistic AI would work, but conversely, if you can't even answer the kinds of simple problems posed within the paper (both those we partially solve and those we only pose) then you must be very far off from being able to build a stable self-modifying AI.  Being able to say how to build a theoretical device that would play perfect chess given infinite computing power, is very far off from the ability to build Deep Blue.  However, if you can't even say how to play perfect chess given infinite computing power, you are confused about the rules of the chess or the structure of chess-playing computation in a way that would make it entirely hopeless for you to figure out how to build a bounded chess-player.  Thus "In real life we're always bounded" is no excuse for not being able to solve the much simpler unbounded form of the problem, and being able to describe the infinite chess-player would be substantial and useful conceptual progress compared to not being able to do that.  We can't be absolutely certain that an analogous situation holds between solving the challenges posed in the paper, and realistic self-modifying AIs with stable goal systems, but every line of investigation has to start somewhere.

Parts of the paper will be easier to understand if you've read Highly Advanced Epistemology 101 For Beginners including the parts on correspondence theories of truth (relevant to section 6) and model-theoretic semantics of logic (relevant to 3, 4, and 6), and there are footnotes intended to make the paper somewhat more accessible than usual, but the paper is still essentially aimed at mathematically sophisticated readers.

View more: Prev | Next