Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

In response to Meetup : Amsterdam
Comment author: steven0461 21 November 2013 01:15:36PM *  1 point [-]

I've edited the post to include a meetup location: the Starbucks on the Oosterdokseiland just east of Amsterdam Central Station. Hope to see you guys there!

Meetup : Amsterdam

4 steven0461 12 November 2013 09:12AM

Discussion article for the meetup : Amsterdam

WHEN: 23 November 2013 02:00:00PM (+0100)

WHERE: Oosterdokseiland 4, Amsterdam, The Netherlands

Let's have a Netherlands LessWrong meetup on Saturday the 23rd. We're meeting in the Starbucks / "East Dock Lounge" on the Oosterdokseiland next to Amsterdam's Central Station. (The building is new and not yet pictured on Google Maps, but I've verified that it exists in the territory. Note that there's also a Starbucks in the station itself; that isn't where we're meeting.)

I'll bring a sign that says "LW". You can reach me at 0611431304.

Discussion article for the meetup : Amsterdam

Comment author: steven0461 18 July 2013 03:21:46AM 2 points [-]

Black-Box Metaphilosophical AI is also risky, because it's hard to test/debug something that you don't understand.

On the other hand, to the extent that our uncertainty about whether different BBMAI designs do philosophy correctly is independent, we can build multiple ones and see what outputs they agree on. (Or a design could do this internally, achieving the same effect.)

it's unclear why such an AI won't cause disaster in the time period before it achieves philosophical competence.

This seems to be an argument for building a hybrid of what you call metaphilosophical and normative AIs, where the normative part "only" needs to be reliable enough to prevent initial disaster, and the metaphilosophical part can take over afterward.

Comment author: Will_Newsome 17 July 2013 01:16:23PM *  6 points [-]

"create an AI that minimizes the expected amount of astronomical waste"

Of course, this is still just a proxy measure... say that we're "in a simulation", or that there are already superintelligences in our environment who won't let us eat the stars, or something like that—we still want to get as good a bargaining position as we possibly can, or to coordinate with the watchers as well as we possibly can, or in a more fundamental sense we want to not waste any of our potential, which I think is the real driving intuition here. (Further clarifying and expanding on that intuition might be very valuable, both for polemical reasons and for organizing some thoughts on AI strategy.) I cynically suspect that the stars aren't out there for us to eat, but that we can still gain a lot of leverage over the acausal fanfic-writing commun... er, superintelligence-centered economy/ecology, and so, optimizing the hell out of the AGI that might become an important bargaining piece and/or plot point is still the most important thing for humans to do.

Metaphilosophical AI

The thing I've seen that looks closest to white-box metaphilosophical AI in the existing literature is Eliezer's causal validity semantics, or more precisely the set of intuitions Eliezer was drawing on to come up with the idea of causal validity semantics. I would recommend reading the section Story of a Blob and the sections on causal validity semantics in Creating Friendly AI. Note that philosophical intuitions are a fuzzily bordered subset of justification-bearing (i.e. both moral/values-like and epistemic) causes that are theoretically formally identifiable and are traditionally thought of as having a coherent, lawful structure.

Comment author: steven0461 18 July 2013 03:08:07AM 2 points [-]

we still want to get as good a bargaining position as we possibly can, or to coordinate with the watchers as well as we possibly can, or in a more fundamental sense we want to not waste any of our potential, which I think is the real driving intuition here

It seems that we have more morally important potential in some possible worlds than others, and although we don't want our language to commit us to the view that we only have morally important potential in possible worlds where we can prevent astronomical waste, neither do we want to suggest (as I think "not waste any of our potential" does) the view that we have the same morally important potential everywhere and that we should just minimize the expected fraction of our potential that is wasted. A more neutral way of framing things could be "minimize wasted potential, especially if the potential is astronomical", leaving the strength of the "especially" to be specified by theories of how much one can affect the world from base reality vs simulations and zoos, theories of how to deal with moral uncertainty, and so on.

Bayesian Adjustment Does Not Defeat Existential Risk Charity

38 steven0461 17 March 2013 08:50AM

(This is a long post. If you’re going to read only part, please read sections 1 and 2, subsubsection 5.6.2, and the conclusion.)

1. Introduction

Suppose you want to give some money to charity: where can you get the most bang for your philanthropic buck? One way to make the decision is to use explicit expected value estimates. That is, you could get an unbiased (averaging to the true value) estimate of what each candidate for your donation would do with an additional dollar, and then pick the charity associated with the most promising estimate.

Holden Karnofsky of GiveWell, an organization that rates charities for cost-effectiveness, disagreed with this approach in two posts he made in 2011. This is a response to those posts, addressing the implications for existential risk efforts.

According to Karnofsky, high returns are rare, and even unbiased estimates don’t take into account the reasons why they’re rare. So in Karnofsky's view, our favorite charity shouldn’t just be one associated with a high estimate, it should be one that supports the estimate with robust evidence derived from multiple independent lines of inquiry.1 If a charity’s returns are being estimated in a way that intuitively feels shaky, maybe that means the fact that high returns are rare should outweigh the fact that high returns were estimated, even if the people making the estimate were doing an excellent job of avoiding bias.

Karnofsky’s first post, Why We Can’t Take Expected Value Estimates Literally (Even When They’re Unbiased), explains how one can mitigate this issue by supplementing an explicit estimate with what Karnofsky calls a “Bayesian Adjustment” (henceforth “BA”). This method treats estimates as merely noisy measures of true values. BA starts with a prior representing what cost-effectiveness values are out there in the general population of charities, then the prior is updated into a posterior in standard Bayesian fashion.

Karnofsky provides some example graphs, illustrating his preference for robustness. If the estimate error is small, the posterior lies close to the explicit estimate. But if the estimate error is large, the posterior lies close to the prior. In other words, if there simply aren’t many high-return charities out there, a sharp estimate can be taken seriously, but a noisy estimate that says it has found a high-return charity must represent some sort of fluke.

Karnofsky does not advocate a policy of performing an explicit adjustment. Rather, he uses BA to emphasize that estimates are likely to be inadequate if they don’t incorporate certain kinds of intuitions — in particular, a sense of whether all the components of an estimation procedure feel reliable. If intuitions say an estimate feels shaky and too good to be true, then maybe the estimate was noisy and the prior is more important. On the other hand, if intuitions say an estimate has taken everything into account, then maybe the estimate was sharp and outweighs the prior.

Karnofsky’s second post, Maximizing Cost-Effectiveness Via Critical Inquiry, expands on these points. Where the first post looks at how BA is performed on a single charity at a time, the second post examines how BA affects the estimated relative values of different charities. In particular, it assumes that although the charities are all drawn from the same prior, they come with different estimates of cost-effectiveness. Higher estimates of cost-effectiveness come from estimation procedures with proportionally higher uncertainty.

It turns out that higher estimates aren’t always more auspicious: an estimate may be “too good to be true,” concentrating much of its evidential support on values that the prior already rules out for the most part. On the bright side, this effect can be mitigated via multiple independent observations, and such observations can provide enough evidence to solidify higher estimates despite their low prior probability.

Charities aiming to reduce existential risk have a potential claim to high expected returns, simply because of the size of the stakes. But if such charities are difficult to evaluate, and the prior probability of high expected values is low, then the implications of BA for this class of charities loom large.

This post will argue that competent efforts to reduce existential risk reduction are still likely to be optimal, despite BA. The argument will have three parts:

  1. BA differs from fully Bayesian reasoning, so that BA risks double-counting priors.

  2. The models in Karnofsky’s posts, when applied to existential risk, boil down to our having prior knowledge that the claimed returns are virtually impossible. (Moreover, similar models without extreme priors don’t lead to the same conclusions.)

  3. We don’t have such prior knowledge. Extreme priors would have implied false predictions in the past, imply unphysical predictions for the future, and are justified neither by our past experiences nor by any other considerations.

Claim 1 is not essential to the conclusion. While Claim 2 seems worth expanding on, it’s Claim 3 that makes up the core of the controversy. Each of these concerns will be addressed in turn.

Before responding to the claims themselves, however, it’s worth discussing a highly simplified model that will illustrate what Karnofsky’s basic point is.

continue reading »
Comment author: Elithrion 16 March 2013 06:23:52PM 2 points [-]

I'm surprised this post doesn't at least mention temporal discounting. Even if it's somewhat unpopular in utilitarian circles, it's sufficiently a part of mainstream assessments of the future and of basic human psychology that I would think its effects on astronomical waste (and related) arguments should at the very least be considered.

Comment author: steven0461 16 March 2013 09:36:45PM 5 points [-]

The post discusses the limiting case where astronomical waste has zero importance and the only thing that matters is saving present lives. Extending that to the case where astronomical waste has some finite level of importance based on time discounting seems like a matter of interpolating between full astronomical waste and no astronomical waste.

Comment author: Nick_Beckstead 15 March 2013 11:56:36PM *  12 points [-]

Thank you for writing this post. I feel that additional discussion of these ideas is valuable, and that this post adds to the discussion.

Note about my comment below: Though I’ve spoken with Holden about these issues in the past, what I say here is what I think, and shouldn’t be interpreted as his opinion.

I don’t think Holden’s arguments are intended to show that existential risk is not a promising cause. To the contrary, global catastrophic risk reduction is one of GiveWell Labs’ priority causes. I think his arguments are only intended to show that one can't appeal to speculative explicit expected value calculations to convincingly argue that targeted existential risk reduction is the best area to focus on. This perspective is much more plausible than the view that these arguments show that existential risk is not the best cause to investigate.

I believe that Holden's position becomes more plausible with the following two refinements:

  • Define the prior over good accomplished in terms of “lives saved, together with all the ripple effects of saving the lives.” By “ripple effects,” I mean all the indirect effects of the action, including speeding up development, reducing existential risk, or having other lasting impacts on the distant future.

  • Define the prior in terms of expected good accomplished, relative to “idealized probabilities,” where idealized probabilities are the probabilities we’d have given the available evidence at the time of the intervention, were we to construct our views in a way that avoided procedural errors (such as the influence of various biases, calculation errors, formulating the problem incorrectly).

When you do the first thing, it makes the adjustment play out rather differently. For instance, I believe the following would not be true:

As we’ve seen, Karnofsky’s toy examples use extreme priors, and these priors would entail a substantial adjustment to EV estimates for existential risk charities. This adjustment would in turn be sufficient to alter existential risk charities from good ideas to bad ideas.

The reason is that if there is a decent probability of humanity having a large and important influence on the far future, ripple effects could be quite large. If that’s true, targeted existential risk reduction—meaning efforts to reduce existential risk which focus on it directly—would not necessarily have many orders of magnitude greater effects on the far future than activities which do not focus on existential risk directly.

For similar reasons, I believe that Carl Shulman’s “Charity Doomsday Argument” would not go through if one follows the first suggestion. If ordinary actions can shape the far future as well, Holden’s framework doesn’t suggest that humanity will have a cramped future.

If we adopt the second suggestion, defining the prior over expected good accomplished, pointing to specific examples of highly successful interventions in the past does not clearly refute a narrow prior probability distribution. We have to establish, in addition, that given what people knew at the time, these interventions had highly outsized expected returns. This is somewhat analogous to the way in which pointing to specific stocks which had much higher returns than other stocks does not refute the efficient markets hypothesis; one has to show that, in the past, those stocks were knowably underpriced. A normal or log-normal prior over expected returns may be refuted still, but a refutation would be more subtle.

A couple of other points seem relevant as well, if one takes the above on board. First, as the “friend in a foreign country” example illustrates, a very low prior probability in a claim does not necessarily mean that the claim is unbelievable in practice. I believe that every time someone reads a newspaper, they can justifiably attain high credence in specific hypotheses, which, prior to reading the newspaper, had extremely low prior probabilities. Something similar may be true when specific novel scientific hypotheses, such as the ideal gas law, are discovered. So it seems that even if one adopts a fairly extreme prior, it wouldn’t have to be impossible to convince you that humanity would have a very large influence on the far future, or that something would actually reduce existential risk.

Finally, I’d like to comment on this idea:

Increased economic growth could have effects not just on timing, but on safety itself. For example, economic growth could increase existential risk by speeding up dangerous technologies more quickly than society can handle them safely, or it could decrease existential risk by promoting some sort of stability. It could also have various small but permanent effects on the future. Still, it would seem to be a fairly major coincidence if the policy of saving people’s lives in the Third World were also the policy that maximized safety. One would at least expect to see more effect from interventions targeted specifically at speeding up economic growth. An approach to foreign aid aimed at maximizing growth effects rather than near-term lives or DALYs saved would probably look quite different. Even then, it’s hard to see how economic growth could be the policy that maximized safety unless our model of what causes safety were so broken as to be useless.

There is a spectrum of strategies for shaping the far future that ranges from the very targeted (e.g., stop that asteroid from hitting the Earth) to very broad (e.g., create economic growth, help the poor, provide education programs for talented youth), with options like “tell powerful people about the importance of shaping the far future” in between. The limiting case of breadth might be just optimizing for proximate benefits or for speeding up development. I suspect that global health is probably not the best place on this spectrum to be, but I don’t find that totally obvious. I think it’s a very interesting question where on this spectrum we should prefer to be, other things being equal. My sense is that many people on LessWrong think that we should be on the highly targeted end of this spectrum. I am highly uncertain about this issue, and I’d be interested in seeing stronger arguments for or against this view.

Comment author: steven0461 16 March 2013 02:41:03AM 5 points [-]

Thanks for your detailed comment! I certainly agree that, if one takes into account ripple effects where saving lives leads to reduced existential risk, the disparities between direct ways of reducing existential risk on the one hand and other efficient ways of saving people's lives on the other hand are no longer astronomical in size. I learned of this argument partway into writing the post, and subsection 5.5 was meant to address it, but it's quite rough and far from the final word on that subject, particularly if you compare direct efforts to medium-direct efforts rather than to very indirect efforts.

It sounds as though, to model your intuitions on the situation, instead of putting a probability distribution on how many DALYs one could save by donating a dollar to a given charity, we'd instead have to put a probability distribution on what % of existential risk you could rationally expect to reduce by donating one dollar to a given charity. Does that sound right?

I would weakly guess that such a model would favor direct over semi-direct existential risk reduction and strongly guess that such a model would favor direct over indirect existential risk reduction. This is just based on thinking that some of the main variables relevant to existential risk are being pushed on by few enough people, and in ways that are sufficiently badly thought through, that there's likely to be low-hanging fruit to be picked by those who analyze the issues in a sufficiently careful and calculating manner. But this is a pretty vague and sketchy argument, and it definitely seems worth discussing this sort of model more thoroughly.

Comment author: 9eB1 15 March 2013 05:45:58AM 16 points [-]

Wonderful post. Thank you.

I have a feeling that the fundamental difference between your position and GiveWell's arises not from a difference of opinion regarding mathematical arguments but because of a difference of values. Utilitarianism doesn't say that I have to value potential people at anything approaching the level of value I assign to living persons. In particular, valuing potential persons at 0 negates many arguments that rely on speculative numbers to pump expected utility into the present, and I'm not even sure if it's not right. Suppose that you had to choose between killing everyone currently alive at the end of their natural life spans, or murdering all but two people whom you were assured would repopulate the planet. My preference would be the former, despite it meaning the end of humanity. Valuing potential people without an extremely high discount rate also leads one to be strongly pro-life, to be against birth control programs in developing nations, etc.

Another possibility is that GiveWell's true reason is based on the fact that recommending MIRI as an efficient charity would decrease their probability of becoming substantially larger (through attracting large numbers of mainstream donors). After they have more established credibility they would be able to direct a larger amount of money to existential charities, and recommending it now when it would reduce their growth trajectory could lower their impact in a fairly straightforward way unless the existential risk is truly imminent. But if they actually explicitly made this argument, it would undermine it's whole point as they would be revealing their fringe intentions. Note that I actually think this would be a reasonable thing to do and am not trying to cast any aspersions on GiveWell.

Comment author: steven0461 16 March 2013 02:04:28AM *  3 points [-]

I have a feeling that the fundamental difference between your position and GiveWell's arises not from a difference of opinion regarding mathematical arguments but because of a difference of values.

Karnofsky has, as far as I know, not endorsed measures of charitable effectiveness that discount the utility of potential people. (On the other hand, as Nick Beckstead points out in a different comment and as is perhaps under-emphasized in the current version of the main post, neither has Karnofsky made a general claim that Bayesian adjustment defeats existential risk charity. He has only explicitly come out against "if there's even a chance" arguments. But I think that in the context of his posts being reposted here on LW, many are likely to have interpreted them as providing a general argument that way, and I think it's likely that the reasoning in the posts has at least something to do with why Karnofsky treats the category of existential risk charity as merely promising rather than as a main focus. For MIRI in particular, Karnofsky has specific criticisms that aren't really related to the points here.)

In particular, valuing potential persons at 0 negates many arguments that rely on speculative numbers to pump expected utility into the present, and I'm not even sure if it's not right.

While valuing potential persons at 0 makes existential risk versus other charities a closer call than if you included astronomical waste, I think the case is still fairly strong that the best existential risk charities save more expected currently-existing lives than the best other charities. The estimate from Anna Salamon's talk linked in the main post makes investment into AI risk research roughly 4 orders of magnitude better for preventing the deaths of currently existing people than international aid charities. At the risk of anchoring, my guess is that the estimate is likely to be an overestimate, but not by 4 orders of magnitude. On the other hand, there may be non-existential risk charities that achieve greater returns in present lives but that also have factors barring them from being recommended by GiveWell.

Comment author: private_messaging 15 March 2013 06:25:33AM 2 points [-]

The much bigger issue is that for some anthropogenic risk (such as AI), the risk is caused by people, and can be increased by funding some groups of people. The expected utility thus has both positive and negative terms, and if you generate a biased list (e.g. by listening to what organization says about itself), and sum it, the resulting sum tells you nothing about the sign of expected utility.

Comment author: steven0461 16 March 2013 01:38:55AM *  1 point [-]

I agree: the argument given here doesn't address whether existential risk charities are likely to be helpful or actively harmful. The fourth paragraph of the conclusion and various caveats like "basically competent" were meant to limit the scope of the discussion to only those whose effects were mostly positive rather than negative. Carl Shulman suggested in a feedback comment that one could set up an explicit model where one multiplies (1) a normal variable centered on zero, or with substantial mass below zero, intended to describe uncertainty about whether the charity has mostly positive or mostly negative effects, with (2) a thicker-tailed and always positive variable describing uncertainty about the scale the charity is operating on.

Comment author: Larks 15 March 2013 12:53:05PM 3 points [-]

Ok, but if that's your reference class, "isn't a donkey sanctuary" counts as evidence you can update on. It seems there's large classes of charities we can be confident will not be extraordinarily effective, and these don't include FHI, MIRI etc.

Comment author: steven0461 16 March 2013 01:32:46AM 0 points [-]

Yes. There's a choice as to what to put into the prior and what to put into the likelihood. This makes it more difficult to make claims like "this number is a reasonable prior and this one is not". Instead, one has to specify the population the prior is about, and this in turn affects what likelihood ratios are reasonable.

View more: Next