Comment author: 26 March 2013 07:51:36PM *  4 points [-]

Responses on some more minor points (see my previous comment for big-picture responses):

Regarding "BA updates on a point estimate rather than on the full evidence that went into the point estimate" - I don't understand this claim. BA updates on the full probability distribution of the estimate, which takes into account potential estimate error. The more robust the estimate, the smaller the BA.

Regarding "double-counting" priors, I have not advocated for doing both an explicit "skepticism discount" in one's EEV calculation and then performing a BA on the output based on the same reasons for skepticism. Instead, I've discussed the pros and cons of these two different approaches to accounting for skepticism. There are cases in which I think some sources of skepticism (such as "only 10% of studies in this reference class are replicable") should be explicitly adjusted for, while others ("If a calculation tells me that an action is the best I can take, I should be skeptical because the conclusion is a priori unlikely") should be implicitly adjusted for. But I don't believe anything I've said implies that one should "double-count priors."

Regarding " log-normal priors would lead to different graphs in the second post, weakening the conclusion. To take the expectation of the logarithm and interpret that as the logarithm of the true cost-effectiveness is to bias the result downward." - FWIW, I did a version of my original analysis using log-normal distributions (including the correct formula for the expected value) and the picture didn't change much. I don't think this issue is an important one though I'm open to being convinced otherwise by detailed analysis.

I don't find the "charity doomsday argument" compelling. One could believe in low probability of extinction by (a) disputing that our current probability of extinction is high to begin with, or (b) accepting that it's high but disputing that it can only be lowered by a donation to one of today's charities (it could be lowered by a large set of diffuse actions, or by a small number of actions whose ability to get funding is overdetermined, or by a far-future charity, or by a combination). If one starts off believing that probability of extinction is high and that it can only be lowered by a particular charity working today that cannot close its funding gap without help from oneself, this seems to beg the question. (I don't believe this set of propositions.)

I don't believe any of the alternative solutions to "Pascal's Mugging" are compelling for all possible constructions of "Pascal's Mugging." The only one that seems difficult to get around by modifying the construction is the "bounded utility function" solution, but I don't believe it is reasonable to have a bounded utility function: I believe, for example, that one should be willing to pay \$100 for a 1/N chance of saving N lives for any N>=1, if (as is not the case with "Pascal's Mugging") the "1/N chance of saving N lives" calculation is well supported and therefore robust (i.e., has relatively narrow error bars). Thus, "Pascal's Mugging" remains an example of the sort of "absurd implication" I'd expect for an insufficiently skeptical prior.

Finally, regarding "a single percentage point of reduction of existential risks would be worth (from a utilitarian expected utility point-of-view) a delay of over 10 million years." - I'm not aware of reasons to believe it's clear that it would be easier to reduce extinction risk by a percentage point than to speed colonization by 10 million years. If the argument is simply that "a single percentage point seems like a small number," then I believe this is simply an issue of framing, a case of making something very difficult sound easy by expressing it as a small probability of a fantastically difficult accomplishment. Furthermore, I believe that what you call "speedup" reduces net risk of extinction, so I don't think the comparison is valid. (I will elaborate on this belief in the future.)

Comment author: 26 March 2013 07:49:13PM *  5 points [-]

Thanks for this post - I really appreciate the thoughtful discussion of the arguments I've made.

I'd like to respond by (a) laying out what I believe is a big-picture point of agreement, which I consider more important than any of the disagreements; (b) responding to what I perceive as the main argument this post makes against the framework I've advanced; (c) responding on some more minor points. (c) will be a separate comment due to length constraints.

A big-picture point of agreement: the possibility of vast utility gain does not - in itself - disqualify a giving opportunity as a good one, nor does it establish that the giving opportunity is strong. I'm worried that this point of agreement may be lost on many readers.

The OP makes it sound as though I believe that a high enough EEV is "ruled out" by priors; as discussed below, that is not my position. I agree, and always have, that "Bayesian adjustment does not defeat existential risk charity"; however, I think it defeats an existential risk charity that makes no strong arguments for its ability to make an impact, and relies on a "Pascal's Mugging" type argument for its appeal.

On the flip side, I believe that a lot of readers believe that "Pascal's Mugging" type arguments are sufficient to establish that a particular giving opportunity is outstanding. I don't believe the OP believes this.

I believe the OP and I are in agreement that one should support an existential risk charity if and only if it makes a strong overall case for its likely impact, a case that goes beyond the observation that even a tiny probability of success would imply high expected value. We may disagree on precisely how high the burden of argumentation is, and we probably disagree on whether MIRI clears that hurdle in its current form, but I don't believe either of us thinks the burden of argumentation is trivial or is so high that it can never be reached.

Response to what I perceive as the main argument of this post

It seems to me that the main argument of this post runs as follows: * The priors I'm using imply extremely low probabilities for certain events. * We don't have sufficient reasons to confidently assign such low probabilities to such events.

I think the biggest problems with this argument are as follows:

1 - Most importantly, nothing I've written implies an extremely low probability for any particular event. Nick Beckstead's comment on this post lays out the thinking here. The prior I describe isn't over expected lives saved or DALYs saved (or a similar metric); it's over the merit of a proposed action relative to the merits of other possible actions. So if one estimates that action A has a 10^-10 chance of saving 10^30 lives, while action B has a 50% chance of saving 1 life, one could be wrong about the difference between A and B by (a) overestimating the probability that action A will have the intended impact; (b) underestimating the potential impact of action B; (c) leaving out other consequences of A and B; (d) making some other mistake.

My current working theory is that proponents of "Pascal's Mugging" type arguments tend to neglect the "flow-through effects" of accomplishing good. There are many ways in which helping a person may lead to others' being helped, and ultimately may lead to a small probability of an enormous impact. Nick Beckstead raises a point similar to this one, and the OP has responded that it's a new and potentially compelling argument to him. I also think it's worth bearing in mind that there could be other arguments that we haven't thought of yet - and because of the structure of the situation, I expect such arguments to be more likely to point to further "regression to the mean" (so to make proponents of "Pascal's Mugging" arguments less confident that their proposed actions have high relative expected value) than to point in the other direction. This general phenomenon is a major reason that I place less weight on explicit arguments than many in this community - explicit arguments that consist mostly of speculation aren't very stable or reliable, and when "outside views" point the other way, I expect more explicit reflection to generate more arguments that support the "outside views."

2 - That said, I don't accept any of the arguments given here for why it's unacceptable to assign a very low probability to a proposition. I think there is a general confusion here between "low subjective probability that a proposition is correct" and "high confidence that a proposition isn't correct"; I don't think those two things are equivalent. Probabilities are often discussed with an "odds" framing, with the implication that assigning a 10^-10 probability to something means that I'd be willing to wager \$10^10 against \$1; this framing is a useful thought experiment in many cases, but when the numbers are like this I think it starts encouraging people to confuse their risk aversion with "non-extreme" (i.e., rarely under 1% or over 99%) subjective probabilities. Another framing is to ask, "If we could somehow do a huge number of 'trials' of this idea, say by simulating worlds constrained by the observations you've made, what would your over/under be for the proportion of trials in which the proposition is true?" and in that case one could simultaneously have an over/under of (10^-10 * # trials) and have extremely low confidence in one's view.

It seems to me that for any small p, there must be some propositions that we assign a probability at least as small as p. (For example, there must be some X such that the probability of an impact greater than X is smaller than p.) Furthermore, it isn't the case that assigning small p means that it's impossible to gather evidence that would change one's mind about p. For example, if you state to me that you will generate a random integer N1 between 1 and 10^100, there must be some integer N2 that I implicitly assign a probability of <=10^-100 as the output of your exercise. (This is true even if there are substantial "unknown unknowns" involved, for example if I don't trust that your generator is truly random.) Yet if you complete the exercise and tell me it produced the number N2, I quickly revise my probability from <=10^-100 to over 50%, based on a single quick observation.

For these reasons, I think the argument that "the mere fact that one assigns a sufficiently low probability to a proposition means that one must be in error" would have unacceptable implications and is not supported by the arguments in the OP.

Comment author: 01 August 2012 02:16:55PM 13 points [-]

I greatly appreciate the response to my post, particularly the highly thoughtful responses of Luke (original post), Eliezer, and many commenters.

Broad response to Luke's and Eliezer's points:

As I see it, there are a few possible visions of SI's mission:

• M1. SI is attempting to create a team to build a "Friendly" AGI.
• M2. SI is developing "Friendliness theory," which addresses how to develop a provably safe/useful/benign utility function without needing iterative/experimental development; this theory could be integrated into an AGI developed by another team, in order to ensure that its actions are beneficial.
• M3. SI is broadly committed to reducing AGI-related risks, and work on whatever will work toward that goal, including potentially M1 and M2.

My view is that the broader SI's mission, the higher the bar should be for the overall impressiveness of the organization and team. An organization with a very narrow, specific mission - such as "analyzing how to develop a provably safe/useful/benign utility function without needing iterative/experimental development" - can, relatively easily, establish which other organizations (if any) are trying to provide what it does and what the relative qualifications are; it can set clear expectations for deliverables over time and be held accountable to them; its actions and outputs are relatively easy to criticize and debate. By contrast, an organization with broader aims and less clearly relevant deliverables - such as "broadly aiming to reduce risks from AGI, with activities currently focused on community-building" - is giving a donor (or evaluator) less to go on in terms of what the space looks like, what the specific qualifications are and what the specific deliverables are. In this case it becomes more important that a donor be highly confident in the exceptional effectiveness of the organization and team as a whole.

Many of the responses to my criticisms (points #1 and #4 in Eliezer's response; "SI's mission assumes a scenario that is far less conjunctive than it initially appears" and "SI's goals and activities" section of Luke's response) correctly point out that they have less force, as criticisms, when one views SI's mission as relatively broad. However, I believe that evaluating SI by a broader mission raises the burden of affirmative arguments for SI's impressiveness. The primary such arguments I see in the responses are in Luke's list:

(1) The Sequences, the best tool I know for creating aspiring rationalists, (2) Harry Potter and the Methods of Rationality, a surprisingly successful tool for grabbing the attention of mathematicians and computer scientists around the world, and (3) the Singularity Summit, a mainstream-aimed conference that brings in people who end up making significant contributions to the movement — e.g. Tomer Kagan (an SI donor and board member) and David Chalmers (author of The Singularity: A Philosophical Analysis and The Singularity: A Reply).

I've been a consumer of all three of these, and while I've found them enjoyable, I don't find them sufficient for the purpose at hand. Others may reach a different conclusion. And of course, I continue to follow SI's progress, as I understand that it may submit more impressive achievements in the future.

Both Luke and Eliezer seem to disagree with the basic approach I'm taking here. They seem to believe that it is sufficient to establish that (a) AGI risk is an overwhelmingly important issue and that (b) SI compares favorably to other organizations that explicitly focus on this issue. For my part, I (a) disagree with the statement: "the loss in expected value resulting from an existential catastrophe is so enormous that the objective of reducing existential risks should be a dominant consideration whenever we act out of an impersonal concern for humankind as a whole"; (b) do not find Luke's argument that AI, specifically, is the most important existential risk to be compelling (it discusses only how beneficial it would be to address the issue well, not how likely a donor is to be able to help do so); (c) believe it is appropriate to compare the overall organizational impressiveness of the Singularity Institute to that of all other donation-soliciting organizations, not just to that of other existential-risk- or AGI-focused organizations. I would guess that these disagreements, particularly (a) and (c), come down to relatively deep worldview differences (related to the debate over "Pascal's Mugging") that I will probably write more about in the future.

On tool AI:

Most of my disagreements with SI representatives seem to be over how broad a mission is appropriate for SI, and how high a standard SI as an organization should be held to. However, the debate over "tool AI" is different, with both sides making relatively strong claims. Here SI is putting forth a specific point as an underappreciated insight and thus as a potential contribution/accomplishment; my view is that SI's suggested approach to AGI development is more dangerous than the "traditional" approach to software development, and thus that SI is advocating for an approach that would worsen risks from AGI.

My latest thoughts on this disagreement were posted separately in a comment response to Eliezer's post on the subject.

A few smaller points:

• I disagree with Luke's claim that " objection #1 punts to objection #2." Objection #2 (regarding "tool AI") points out one possible approach to AGI that I believe is both consonant with traditional software development and significantly safer than the approach advocated by SI. But even if the "tool AI" approach is not in fact safer, there may be safer approaches that SI hasn't thought of. SI does not just emphasize the general problem that AGI may be dangerous (something that I believe is a fairly common view), but emphasizes a particular approach to AGI safety, one that seems to me to be highly dangerous. If SI's approach is dangerous relative to other approaches that others are taking/advocating, or even approaches that have yet to be developed (and will be enabled by future tools and progress on AGI), this is a problem for SI.
• Luke states that rationality is "only a ceteris paribus predictor of success" and that it is a "weak one." I wish to register that I believe rationality is a strong (though not perfect) predictor of success, within the population of people who are as privileged (in terms of having basic needs met, access to education, etc.) as most SI supporters/advocates/representatives. So while I understand that success is not part of the definition of rationality, I stand by my statement that it is "the best evidence of superior general rationality (or of insight into it)."
• Regarding donor-advised funds: opening an account with Vanguard, Schwab or Fidelity is a simple process, and I doubt any of these institutions would overrule a recommendation to donate to an organization such as SI (in any case, this is easily testable).
Comment author: 01 August 2012 02:09:11PM 2 points [-]

To summarize how I see the current state of the debate over "tool AI":

• Eliezer and I have differing intuitions about the likely feasibility, safety and usefulness of the "tool" framework relative to the "Friendliness theory" framework, as laid out in this exchange. This relates mostly to Eliezer's point #2 in the original post. We are both trying to make predictions about a technology for which many of the details are unknown, and at this point I don't see a clear way forward for resolving our disagreements, though I did make one suggestion in that thread.
• Eliezer has also made two arguments (#1 and #4 in the original post) that appear to be of the form, "Even if the 'tool' approach is most promising, the Singularity Institute still represents a strong giving opportunity." A couple of thoughts on this point:
• One reason I find the "tool" approach relevant in the context of SI is that it resembles what I see as the traditional approach to software development. My view is that it is likely to be both safer and more efficient for developing AGI than the "Friendliness theory" approach. If this is the case, it seems that the safety of AGI will largely be a function of the competence and care with which its developers execute on the traditional approach to software development, and the potential value-added of a third-party team of "Friendliness specialists" is unclear.
• That said, I recognize that SI has multiple conceptually possible paths to impact, including developing AGI itself and raising awareness of the risks of AGI. I believe that the more the case for SI revolves around activities like these rather than around developing "Friendliness theory," the higher the bar for SI's general impressiveness (as an organization and team) becomes; I will elaborate on this when I respond to Luke's response to me.
• Regarding Eliezer's point #3 - I think this largely comes down to how strong one finds the argument for "tool A.I." I agree that one shouldn't expect SI to respond to every possible critique of its plans. But I think it's reasonable to expect it to anticipate and respond to the stronger possible critiques.
• I'd also like to address two common objections to the "tool AI" framework that came up in comments, though neither of these objections appears to have been taken up in official SI responses.
• Some have argued that the idea of "tool AI" is incoherent, or is not distinct from the idea of "Oracle AI," or is conceptually impossible. I believe these arguments to be incorrect, though my ability to formalize and clarify my intuitions on this point has been limited. For those interested in reading attempts to better clarify the concept of "tool AI" following my original post, I recommend jsalvatier's comments on the discussion post devoted to this topic as well as my exchange with Eliezer elsewhere on this thread.
• Some have argued that "agents" are likely to be more efficient and powerful than "tools," since they are not bottlenecked by human input, and thus that the "tool" concept is unimportant. I anticipated this objection in my original post and expanded on my response in my exchange with Eliezer elsewhere on this thread. In a nutshell, I believe the "tool" framework is likely to be a faster and more efficient way of developing a capable and useful AGI than the sort of framework for which "Friendliness theory" would be relevant; and if it isn't, that the sort of work SI is doing on "Friendliness theory" is likely to be of little value. (Again, I recognize that SI has multiple conceptually possible paths to impact other than development of "Friendliness theory" and will address these in a future comment.)
Comment author: 18 July 2012 04:29:00PM 14 points [-]

Thanks for the response. My thoughts at this point are that

• We seem to have differing views of how to best do what you call "reference class tennis" and how useful it can be. I'll probably be writing about my views more in the future.
• I find it plausible that AGI will have to follow a substantially different approach from "normal" software. But I'm not clear on the specifics of what SI believes those differences will be and why they point to the "proving safety/usefulness before running" approach over the "tool" approach.
• We seem to have differing views of how frequently today's software can be made comprehensible via interfaces. For example, my intuition is that the people who worked on the Netflix Prize algorithm had good interfaces for understanding "why" it recommends what it does, and used these to refine it. I may further investigate this matter (casually, not as a high priority); on SI's end, it might be helpful (from my perspective) to provide detailed examples of existing algorithms for which the "tool" approach to development didn't work and something closer to "proving safety/usefulness up front" was necessary.
Comment author: 18 July 2012 02:35:33AM 16 points [-]

Thanks for the response. To clarify, I'm not trying to point to the AIXI framework as a promising path; I'm trying to take advantage of the unusually high degree of formalization here in order to gain clarity on the feasibility and potential danger points of the "tool AI" approach.

It sounds to me like your two major issues with the framework I presented are (to summarize):

(1) There is a sense in which AIXI predictions must be reducible to predictions about the limited set of inputs it can "observe directly" (what you call its "sense data").

(2) Computers model the world in ways that can be unrecognizable to humans; it may be difficult to create interfaces that allow humans to understand the implicit assumptions and predictions in their models.

I don't claim that these problems are trivial to deal with. And stated as you state them, they sound abstractly very difficult to deal with. However, it seems true - and worth noting - that "normal" software development has repeatedly dealt with them successfully. For example: Google Maps works with a limited set of inputs; Google Maps does not "think" like I do and I would not be able to look at a dump of its calculations and have any real sense for what it is doing; yet Google Maps does make intelligent predictions about the external universe (e.g., "following direction set X will get you from point A to point B in reasonable time"), and it also provides an interface (the "route map") that helps me understand its predictions and the implicit reasoning (e.g. "how, why, and with what other consequences direction set X will get me from point A to point B").

Difficult though it may be to overcome these challenges, my impression is that software developers have consistently - and successfully - chosen to take them on, building algorithms that can be "understood" via interfaces and iterated over - rather than trying to prove the safety and usefulness of their algorithms with pure theory before ever running them. Not only does the former method seem "safer" (in the sense that it is less likely to lead to putting software in production before its safety and usefulness has been established) but it seems a faster path to development as well.

It seems that you see a fundamental disconnect between how software development has traditionally worked and how it will have to work in order to result in AGI. But I don't understand your view of this disconnect well enough to see why it would lead to a discontinuation of the phenomenon I describe above. In short, traditional software development seems to have an easier (and faster and safer) time overcoming the challenges of the "tool" framework than overcoming the challenges of up-front theoretical proofs of safety/usefulness; why should we expect this to reverse in the case of AGI?

Comment author: 05 July 2012 04:18:16PM 17 points [-]

Hello,

I appreciate the thoughtful response. I plan to respond at greater length in the future, both to this post and to some other content posted by SI representatives and commenters. For now, I wanted to take a shot at clarifying the discussion of "tool-AI" by discussing AIXI. One of the the issues I've found with the debate over FAI in general is that I haven't seen much in the way of formal precision about the challenge of Friendliness (I recognize that I have also provided little formal precision, though I feel the burden of formalization is on SI here). It occurred to me that AIXI might provide a good opportunity to have a more precise discussion, if in fact it is believed to represent a case of "a rare exception who specified his AGI in such unambiguous mathematical terms that he actually succeeded at realizing, after some discussion with SIAI personnel, that AIXI would kill off its users and seize control of its reward button."

So here's my characterization of how one might work toward a safe and useful version of AIXI, using the "tool-AI" framework, if one could in fact develop an efficient enough approximation of AIXI to qualify as a powerful AGI. Of course, this is just a rough outline of what I have in mind, but hopefully it adds some clarity to the discussion.

A. Write a program that

1. Computes an optimal policy, using some implementation of equation (20) on page 22 of http://www.hutter1.net/ai/aixigentle.pdf
2. "Prints" the policy in a human-readable format (using some fixed algorithm for "printing" that is not driven by a utility function)
3. Provides tools for answering user questions about the policy, i.e., "What will be its effect on _?" (using some fixed algorithm for answering user questions that makes use of AIXI's probability function, and is not driven by a utility function)
4. Does not contain any procedures for "implementing" the policy, only for displaying it and its implications in human-readable form

B. Run the program; examine its output using the tools described above (#2 and #3); if, upon such examination, the policy appears potentially destructive, continue tweaking the program (for example, by tweaking the utility it is selecting a policy to maximize) until the policy appears safe and desirable

C. Implement the policy using tools other than AIXI agent

D. Repeat (B) and (C) until one has confidence that the AIXI agent reliably produces safe and desirable policies, at which point more automation may be called for

My claim is that this approach would be superior to that of trying to develop "Friendliness theory" in advance of having any working AGI, because it would allow experiment- rather than theory-based development. Eliezer, I'm interested in your thoughts about my claim. Do you agree? If not, where is our disagreement?

Comment author: 10 May 2012 04:12:28PM 3 points [-]

Thanks for pointing this out. The links now work, though only from the permalink version of the page (not from the list of new posts).

Comment author: 17 January 2012 08:04:23PM 1 point [-]

Carl, it looks like we have a pretty substantial disagreement about key properties of the appropriate prior distribution over expected value of one's actions.

I am not sure whether you are literally endorsing a particular distribution (I am not sure whether "Solomonoff complexity prior" is sufficiently well-defined or, if so, whether you are endorsing that or a varied/adjusted version). I myself have not endorsed a particular distribution. So it seems like the right way to resolve our disagreement is for at least one of us to be more specific about what properties are core to our argument and why we believe any reasonable prior ought to have these properties. I'm not sure when I will be able to do this on my end and will likely contact you by email when I do.

What I do not agree with is the implication that my analysis is irrelevant to Pascal's Mugging. It may be irrelevant for people who endorse the sorts of priors you endorse. But not everyone agrees with you about what the proper prior looks like, and many people who are closer to me on what the appropriate prior looks like still seem unaware of the implications for Pascal's Mugging. If nothing else, my analysis highlights a relationship between one's prior distribution and Pascal's Mugging that I believe many others weren't aware of. Whether it is a decisive refutation of Pascal's Mugging is unresolved (and depends on the disagreement I refer to above).

Comment author: 29 December 2011 12:37:24AM 8 points [-]

Louie, I think you're mischaracterizing these posts and their implications. The argument is much closer to "extraordinary claims require extraordinary evidence" than it is to "extraordinary claims should simply be disregarded." And I have outlined (in the conversation with SIAI) ways in which I believe SIAI could generate the evidence needed for me to put greater weight on its claims.

I wrote more in my comment followup on the first post about why an aversion to arguments that seem similar to "Pascal's Mugging" does not entail an aversion to supporting x-risk charities. (As mentioned in that comment, it appears that important SIAI staff share such an aversion, whether or not they agree with my formal defense of it.)

I also think the message of these posts is consistent with the best available models of how the world works - it isn't just about trying to set incentives. That's probably a conversation for another time - there seems to be a lot of confusion on these posts (especially the second) and I will probably post some clarification at a later date.

View more: Next