MIRI is now in a close race for the prize for the most total unique donors over the 24 hours, which adds a lot of additional value to $10 donations by people who haven't donated yet.
(MIRI lost the third hour despite being comfortably on top of the leaderboard: what matters is the increase over the last hour, so at this point the leaderboard is probably misleading as an indicator of how close things are.)
The leaderboards for most unique donors seem pretty close between MIRI and the next contender, so additional $10 donations may be getting unusual value per dollar right now. (The first hour was close between MIRI and a different organization with both having something like 34 unique donors, so in that sort of situation, if the highest number wins, the expected value to MIRI of a donation might be on the rough order of $100.)
I've edited the post to include a meetup location: the Starbucks on the Oosterdokseiland just east of Amsterdam Central Station. Hope to see you guys there!
Black-Box Metaphilosophical AI is also risky, because it's hard to test/debug something that you don't understand.
On the other hand, to the extent that our uncertainty about whether different BBMAI designs do philosophy correctly is independent, we can build multiple ones and see what outputs they agree on. (Or a design could do this internally, achieving the same effect.)
it's unclear why such an AI won't cause disaster in the time period before it achieves philosophical competence.
This seems to be an argument for building a hybrid of what you call metaphilosophical and normative AIs, where the normative part "only" needs to be reliable enough to prevent initial disaster, and the metaphilosophical part can take over afterward.
"create an AI that minimizes the expected amount of astronomical waste"
Of course, this is still just a proxy measure... say that we're "in a simulation", or that there are already superintelligences in our environment who won't let us eat the stars, or something like that—we still want to get as good a bargaining position as we possibly can, or to coordinate with the watchers as well as we possibly can, or in a more fundamental sense we want to not waste any of our potential, which I think is the real driving intuition here. (Further clarifying and expanding on that intuition might be very valuable, both for polemical reasons and for organizing some thoughts on AI strategy.) I cynically suspect that the stars aren't out there for us to eat, but that we can still gain a lot of leverage over the acausal fanfic-writing commun... er, superintelligence-centered economy/ecology, and so, optimizing the hell out of the AGI that might become an important bargaining piece and/or plot point is still the most important thing for humans to do.
Metaphilosophical AI
The thing I've seen that looks closest to white-box metaphilosophical AI in the existing literature is Eliezer's causal validity semantics, or more precisely the set of intuitions Eliezer was drawing on to come up with the idea of causal validity semantics. I would recommend reading the section Story of a Blob and the sections on causal validity semantics in Creating Friendly AI. Note that philosophical intuitions are a fuzzily bordered subset of justification-bearing (i.e. both moral/values-like and epistemic) causes that are theoretically formally identifiable and are traditionally thought of as having a coherent, lawful structure.
we still want to get as good a bargaining position as we possibly can, or to coordinate with the watchers as well as we possibly can, or in a more fundamental sense we want to not waste any of our potential, which I think is the real driving intuition here
It seems that we have more morally important potential in some possible worlds than others, and although we don't want our language to commit us to the view that we only have morally important potential in possible worlds where we can prevent astronomical waste, neither do we want to suggest (as I think "not waste any of our potential" does) the view that we have the same morally important potential everywhere and that we should just minimize the expected fraction of our potential that is wasted. A more neutral way of framing things could be "minimize wasted potential, especially if the potential is astronomical", leaving the strength of the "especially" to be specified by theories of how much one can affect the world from base reality vs simulations and zoos, theories of how to deal with moral uncertainty, and so on.
I'm surprised this post doesn't at least mention temporal discounting. Even if it's somewhat unpopular in utilitarian circles, it's sufficiently a part of mainstream assessments of the future and of basic human psychology that I would think its effects on astronomical waste (and related) arguments should at the very least be considered.
The post discusses the limiting case where astronomical waste has zero importance and the only thing that matters is saving present lives. Extending that to the case where astronomical waste has some finite level of importance based on time discounting seems like a matter of interpolating between full astronomical waste and no astronomical waste.
Thank you for writing this post. I feel that additional discussion of these ideas is valuable, and that this post adds to the discussion.
Note about my comment below: Though I’ve spoken with Holden about these issues in the past, what I say here is what I think, and shouldn’t be interpreted as his opinion.
I don’t think Holden’s arguments are intended to show that existential risk is not a promising cause. To the contrary, global catastrophic risk reduction is one of GiveWell Labs’ priority causes. I think his arguments are only intended to show that one can't appeal to speculative explicit expected value calculations to convincingly argue that targeted existential risk reduction is the best area to focus on. This perspective is much more plausible than the view that these arguments show that existential risk is not the best cause to investigate.
I believe that Holden's position becomes more plausible with the following two refinements:
Define the prior over good accomplished in terms of “lives saved, together with all the ripple effects of saving the lives.” By “ripple effects,” I mean all the indirect effects of the action, including speeding up development, reducing existential risk, or having other lasting impacts on the distant future.
Define the prior in terms of expected good accomplished, relative to “idealized probabilities,” where idealized probabilities are the probabilities we’d have given the available evidence at the time of the intervention, were we to construct our views in a way that avoided procedural errors (such as the influence of various biases, calculation errors, formulating the problem incorrectly).
When you do the first thing, it makes the adjustment play out rather differently. For instance, I believe the following would not be true:
As we’ve seen, Karnofsky’s toy examples use extreme priors, and these priors would entail a substantial adjustment to EV estimates for existential risk charities. This adjustment would in turn be sufficient to alter existential risk charities from good ideas to bad ideas.
The reason is that if there is a decent probability of humanity having a large and important influence on the far future, ripple effects could be quite large. If that’s true, targeted existential risk reduction—meaning efforts to reduce existential risk which focus on it directly—would not necessarily have many orders of magnitude greater effects on the far future than activities which do not focus on existential risk directly.
For similar reasons, I believe that Carl Shulman’s “Charity Doomsday Argument” would not go through if one follows the first suggestion. If ordinary actions can shape the far future as well, Holden’s framework doesn’t suggest that humanity will have a cramped future.
If we adopt the second suggestion, defining the prior over expected good accomplished, pointing to specific examples of highly successful interventions in the past does not clearly refute a narrow prior probability distribution. We have to establish, in addition, that given what people knew at the time, these interventions had highly outsized expected returns. This is somewhat analogous to the way in which pointing to specific stocks which had much higher returns than other stocks does not refute the efficient markets hypothesis; one has to show that, in the past, those stocks were knowably underpriced. A normal or log-normal prior over expected returns may be refuted still, but a refutation would be more subtle.
A couple of other points seem relevant as well, if one takes the above on board. First, as the “friend in a foreign country” example illustrates, a very low prior probability in a claim does not necessarily mean that the claim is unbelievable in practice. I believe that every time someone reads a newspaper, they can justifiably attain high credence in specific hypotheses, which, prior to reading the newspaper, had extremely low prior probabilities. Something similar may be true when specific novel scientific hypotheses, such as the ideal gas law, are discovered. So it seems that even if one adopts a fairly extreme prior, it wouldn’t have to be impossible to convince you that humanity would have a very large influence on the far future, or that something would actually reduce existential risk.
Finally, I’d like to comment on this idea:
Increased economic growth could have effects not just on timing, but on safety itself. For example, economic growth could increase existential risk by speeding up dangerous technologies more quickly than society can handle them safely, or it could decrease existential risk by promoting some sort of stability. It could also have various small but permanent effects on the future. Still, it would seem to be a fairly major coincidence if the policy of saving people’s lives in the Third World were also the policy that maximized safety. One would at least expect to see more effect from interventions targeted specifically at speeding up economic growth. An approach to foreign aid aimed at maximizing growth effects rather than near-term lives or DALYs saved would probably look quite different. Even then, it’s hard to see how economic growth could be the policy that maximized safety unless our model of what causes safety were so broken as to be useless.
There is a spectrum of strategies for shaping the far future that ranges from the very targeted (e.g., stop that asteroid from hitting the Earth) to very broad (e.g., create economic growth, help the poor, provide education programs for talented youth), with options like “tell powerful people about the importance of shaping the far future” in between. The limiting case of breadth might be just optimizing for proximate benefits or for speeding up development. I suspect that global health is probably not the best place on this spectrum to be, but I don’t find that totally obvious. I think it’s a very interesting question where on this spectrum we should prefer to be, other things being equal. My sense is that many people on LessWrong think that we should be on the highly targeted end of this spectrum. I am highly uncertain about this issue, and I’d be interested in seeing stronger arguments for or against this view.
Thanks for your detailed comment! I certainly agree that, if one takes into account ripple effects where saving lives leads to reduced existential risk, the disparities between direct ways of reducing existential risk on the one hand and other efficient ways of saving people's lives on the other hand are no longer astronomical in size. I learned of this argument partway into writing the post, and subsection 5.5 was meant to address it, but it's quite rough and far from the final word on that subject, particularly if you compare direct efforts to medium-direct efforts rather than to very indirect efforts.
It sounds as though, to model your intuitions on the situation, instead of putting a probability distribution on how many DALYs one could save by donating a dollar to a given charity, we'd instead have to put a probability distribution on what % of existential risk you could rationally expect to reduce by donating one dollar to a given charity. Does that sound right?
I would weakly guess that such a model would favor direct over semi-direct existential risk reduction and strongly guess that such a model would favor direct over indirect existential risk reduction. This is just based on thinking that some of the main variables relevant to existential risk are being pushed on by few enough people, and in ways that are sufficiently badly thought through, that there's likely to be low-hanging fruit to be picked by those who analyze the issues in a sufficiently careful and calculating manner. But this is a pretty vague and sketchy argument, and it definitely seems worth discussing this sort of model more thoroughly.
Wonderful post. Thank you.
I have a feeling that the fundamental difference between your position and GiveWell's arises not from a difference of opinion regarding mathematical arguments but because of a difference of values. Utilitarianism doesn't say that I have to value potential people at anything approaching the level of value I assign to living persons. In particular, valuing potential persons at 0 negates many arguments that rely on speculative numbers to pump expected utility into the present, and I'm not even sure if it's not right. Suppose that you had to choose between killing everyone currently alive at the end of their natural life spans, or murdering all but two people whom you were assured would repopulate the planet. My preference would be the former, despite it meaning the end of humanity. Valuing potential people without an extremely high discount rate also leads one to be strongly pro-life, to be against birth control programs in developing nations, etc.
Another possibility is that GiveWell's true reason is based on the fact that recommending MIRI as an efficient charity would decrease their probability of becoming substantially larger (through attracting large numbers of mainstream donors). After they have more established credibility they would be able to direct a larger amount of money to existential charities, and recommending it now when it would reduce their growth trajectory could lower their impact in a fairly straightforward way unless the existential risk is truly imminent. But if they actually explicitly made this argument, it would undermine it's whole point as they would be revealing their fringe intentions. Note that I actually think this would be a reasonable thing to do and am not trying to cast any aspersions on GiveWell.
I have a feeling that the fundamental difference between your position and GiveWell's arises not from a difference of opinion regarding mathematical arguments but because of a difference of values.
Karnofsky has, as far as I know, not endorsed measures of charitable effectiveness that discount the utility of potential people. (On the other hand, as Nick Beckstead points out in a different comment and as is perhaps under-emphasized in the current version of the main post, neither has Karnofsky made a general claim that Bayesian adjustment defeats existential risk charity. He has only explicitly come out against "if there's even a chance" arguments. But I think that in the context of his posts being reposted here on LW, many are likely to have interpreted them as providing a general argument that way, and I think it's likely that the reasoning in the posts has at least something to do with why Karnofsky treats the category of existential risk charity as merely promising rather than as a main focus. For MIRI in particular, Karnofsky has specific criticisms that aren't really related to the points here.)
In particular, valuing potential persons at 0 negates many arguments that rely on speculative numbers to pump expected utility into the present, and I'm not even sure if it's not right.
While valuing potential persons at 0 makes existential risk versus other charities a closer call than if you included astronomical waste, I think the case is still fairly strong that the best existential risk charities save more expected currently-existing lives than the best other charities. The estimate from Anna Salamon's talk linked in the main post makes investment into AI risk research roughly 4 orders of magnitude better for preventing the deaths of currently existing people than international aid charities. At the risk of anchoring, my guess is that the estimate is likely to be an overestimate, but not by 4 orders of magnitude. On the other hand, there may be non-existential risk charities that achieve greater returns in present lives but that also have factors barring them from being recommended by GiveWell.
View more: Next
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
If someone wants to describe LW as cultish, they can take any parody of itself and present it as further evidence for their claims.
I think something like this has already happened with the Chuck-Norris-like list of Yudkowsky facts; the "Bayesian conspirator" illustration of the beisutsukai stories; and the redacted lecture screenshot that displayed "Eliezer Yudkowsky" on the right end of the intelligence scale. -- Instead of "they are cool people who can make fun" they can be spinned into "this is what those people seriously believe / this is how much they are obsessed with themselves... they must be truly insane". See RationalWiki:
On the other hand, if someone wants to describe LW as cultish, they could also use lack of parodies, or whatever else as an evidence. Once you are charged with being a witch, there is not much you could successfully say in your defense.
So at the end, perhaps we should ignore all such considerations (which, by the way, is what most non-cults do) and simply upvote or downvote things only by their own merit. Also, any attempts for this kind of PR automatically destroys themselves if it is easy to provide a link to the discussion about the PR. (And LW being LW, such discussion will almost certainly happen.)
I'd still be happy to remove the EY facts post, although I've been hesitant to do so because it would affect many other people's comments and hiding things might itself be construed as sinister. (I guess your point is that it doesn't matter, but I thought I'd mention it.)