Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: Eliezer_Yudkowsky 02 August 2013 08:42:04PM 10 points [-]

Holden, thanks for responding. I apologize again if I'm missing something obvious or straying too far outside my field.

I think the two things I would have to understand in order to accept your reply is that (a) my entire objection does indeed consist of "positing an offsetting harm in the form of inflation" - which isn't how it feels to me - and that (b) we should expect the "series of trades" visualization to mean that no inflation occurs in goods of the form that are being purchased by the low-income Kenyans.

Let me think about this.

Okay, I agree there's a sense in which (a) has to be true. If you could magically print shillings and have them purchase goods with no other prices changing and hence no change in other velocities of trade, this would have to be a good thing. The goods purchased would have to come from somewhere, but you can't possibly have something go wrong with the GD model without inflation somewhere else in Kenya. It's not how I think in my native model - I think about 'Who has money?' as a distribution-of-goods question, not a nominal pricing question - but point (a) has to be correct from the relevant point of view.

Let me think about point (b). Hm. So far it's not clear to me yet that point (b) is necessarily true when I try to translate my original model into those terms. Suppose - you'll probably think this sounds very perverse, but bear with me - suppose that GD's operation causes inflation in the price of basic foods and deflation in the price of fancy speedboats. Even if inflation at point A is offset by deflation at point B, this net-no-inflation repricing can be harmfully redistributive.

My visualization of you replies, "Why on Earth should I believe that?" But before answering that, why would I believe that? Either my original worry was incoherent, or I must have already believed this somehow. By argument (a), if inflation in Kenyan goods purchased primarily by low-income Kenyans is offset by a similar amount of deflation in similar goods, then net benefit is fine and there's no problem.

On further reflection I think that my original concern does translate into those terms. I don't know if the following is true, but it is my major concern: First suppose Kenya does not currently have an aggregate demand deficit and cannot directly benefit from printing money. Then suppose goods purchased by low-income Kenyans are denominated primarily in shillings, and goods purchased from the U.S. using U.S. dollars are going primarily to high-income Kenyans (fancy speedboats). Then it seems to me that the series of trades should end up creating inflation in the price of basic goods produced in Kenya, and offsetting deflation within Kenya at the point where U.S. dollars are finally spent on a larger market.

Note that even if this worry is structurally possible, one could very quickly answer it by showing that most foreign goods imported in Kenya are in fact consumed by the same class of Kenyans who are the targets of aid, in which case GD is mostly equivalent to giving low-income Kenyans USD and letting them make foreign purchases directly. (In which case, it correspondingly seems plausible to me that you might do most of the same good by buying shillings and burning them. Though this would lose positive redistributive effects and possibly slow down Kenyan trades by destroying money if they're not in a state of excess aggregate demand - delete the term for the good accomplished by printing money.)

It may also be that my concern is incorrect and that even if most Kenyan goods purchased in USD are not consumed by, or inputs to goods consumed by, the targeted recipients, you still don't get inflation in bread and offsetting deflation in speedboats. For example, I think you said something at the EA summit which I had forgotten up until this point about a series of trades being mutually beneficial. I.e., maybe you could show that the state of the world resulting in Kenya can be reached by starting with giving the target Kenyans USD and letting them buy foreign goods, which I agree is good, and then a series of trades occurring which benefit both sides of each trade and don't disadvantage any other low-income Kenyans or cause trade gains to be redistributed toward wealthier Kenyans. Though it seems to me that this line of argument would also have to show that my concern about inflation in bread offset by deflation in speedboats was misguided to begin with.

I don't suppose there's any relevant economic literature on direct aid which addresses this? Someone said something similar in the Givewell comments thread on your GD post, so it may not be such a non-obvious concern.

Sorry for forcing you to choose between spending time on this and leaving an unanswered question, I will understand if you choose to do the latter. I hope that the many argumentative people who are deluded into believing that they understand money, possibly including myself, do not put you off direct aid charities.

Comment author: HoldenKarnofsky 05 August 2013 04:27:00AM 6 points [-]

Eliezer, I think inflation caused via cash transfers results (under some fairly basic assumptions) in unchanged - not negative - total real wealth for the aggregate set of people experiencing the inflation, because this aggregate set of people includes the same set of people that causes the inflation as a result of having more currency. There may be situations in which "N people receive X units of currency, but the supply of goods they purchase remains fixed, so they experience inflation and do not end up with more real wealth", but not situations in which "N people receive X units of currency and as a result have less real wealth, or cause inflation for others that lowers the others' real wealth more than their own has risen."

If you believe that GiveDirectly's transfers cause negligible inflation for the people receiving them (as implied by the studies we've reviewed), this implies that those people become materially wealthier by (almost) the amount of the transfer. There may be other people along the chain who experience inflation, but these people at worst have unchanged real wealth in aggregate (according to the previous paragraph). (BTW, I've focused in on the implications of your scenario for inflation because we have data regarding inflation.)

It's theoretically possible that the distributive effects within this group are regressive (e.g., perhaps the people who sell to the GiveDirectly recipients then purchase goods that are needed by other lower-income people in another location, raising the price of those goods), but in order to believe this one has to make a seemingly large number of unjustified assumptions (including the assumption of an inelastic supply of the goods demanded by low-income Kenyans, which seems particularly unrealistic), and those distributive effects could just as easily be progressive.

It also seems worth noting that your concern would seem to apply equally to any case in which a donor purchases foreign currency and uses it to fund local services (e.g., from nonprofits), which would seem to include ~all cases of direct aid overseas.

If I'm still not fully addressing your point, it might be worth your trying a "toy economy" construction to elucidate your concern - something along the lines of "Imagine that there are a total of 10 people in Kenya; that 8 are poor, have wealth equal to X, and consume goods A and B at prices Pa and Pb; that 2 are wealthy, have wealthy equal to Y, and consume goods C and D at prices Pc and Pd. When the transfer is made, dollars are traded for T shillings, which are then distributed as follows, which has the following impact on prices and real consumption ..." I often find these constructions useful in these sorts of discussions to elucidate exactly the potential scenario one has in mind.

Comment author: HoldenKarnofsky 02 August 2013 06:27:24PM 14 points [-]

Thanks for the thoughtful post.

If recipients of cash transfers buy Kenyan goods, and the producers of those goods use their extra shillings to buy more Kenyan goods, and eventually someone down the line trades their shillings for USD, this would seem to be equivalent in the relevant ways to the scenario you outline in which "U.S. dollars were being sent directly to Kenyan recipients and used only to purchase foreign goods" - assuming no directly-caused inflation in Kenyan prices. In other words, it seems to me that you're essentially positing a potential offsetting harm of cash transfers in the form of inflation, and in the case where transfers do not cause inflation, there is no concern.

At the micro/village level, we've reviewed two studies showing minor (if any) inflation. At the country level, it's worth noting that the act of buying Kenyan currency with USD should be as deflationary as the act of putting those Kenyan currency back into the economy is inflationary. Therefore, it seems to me that inflation is a fairly minor concern.

I'm not entirely sure I've understood your argument, so let me know if that answer doesn't fully address it.

That said, it's important to note that we do not claim 100% confidence - or an absence of plausible negative/offsetting effects - for any of our top charities. For the intervention of each charity we review, we include a "negative/offsetting effects" section that lists possible negative/offsetting effects, and in most cases we can't conclusively dismiss such effects. Nonetheless, having noted and considered the possible negative/offsetting effects, we believe the probability that our top charities are accomplishing substantial net good is quite high, higher than for any other giving opportunities we're aware of.

Comment author: HoldenKarnofsky 26 March 2013 07:51:36PM *  4 points [-]

Responses on some more minor points (see my previous comment for big-picture responses):

Regarding "BA updates on a point estimate rather than on the full evidence that went into the point estimate" - I don't understand this claim. BA updates on the full probability distribution of the estimate, which takes into account potential estimate error. The more robust the estimate, the smaller the BA.

Regarding "double-counting" priors, I have not advocated for doing both an explicit "skepticism discount" in one's EEV calculation and then performing a BA on the output based on the same reasons for skepticism. Instead, I've discussed the pros and cons of these two different approaches to accounting for skepticism. There are cases in which I think some sources of skepticism (such as "only 10% of studies in this reference class are replicable") should be explicitly adjusted for, while others ("If a calculation tells me that an action is the best I can take, I should be skeptical because the conclusion is a priori unlikely") should be implicitly adjusted for. But I don't believe anything I've said implies that one should "double-count priors."

Regarding " log-normal priors would lead to different graphs in the second post, weakening the conclusion. To take the expectation of the logarithm and interpret that as the logarithm of the true cost-effectiveness is to bias the result downward." - FWIW, I did a version of my original analysis using log-normal distributions (including the correct formula for the expected value) and the picture didn't change much. I don't think this issue is an important one though I'm open to being convinced otherwise by detailed analysis.

I don't find the "charity doomsday argument" compelling. One could believe in low probability of extinction by (a) disputing that our current probability of extinction is high to begin with, or (b) accepting that it's high but disputing that it can only be lowered by a donation to one of today's charities (it could be lowered by a large set of diffuse actions, or by a small number of actions whose ability to get funding is overdetermined, or by a far-future charity, or by a combination). If one starts off believing that probability of extinction is high and that it can only be lowered by a particular charity working today that cannot close its funding gap without help from oneself, this seems to beg the question. (I don't believe this set of propositions.)

I don't believe any of the alternative solutions to "Pascal's Mugging" are compelling for all possible constructions of "Pascal's Mugging." The only one that seems difficult to get around by modifying the construction is the "bounded utility function" solution, but I don't believe it is reasonable to have a bounded utility function: I believe, for example, that one should be willing to pay $100 for a 1/N chance of saving N lives for any N>=1, if (as is not the case with "Pascal's Mugging") the "1/N chance of saving N lives" calculation is well supported and therefore robust (i.e., has relatively narrow error bars). Thus, "Pascal's Mugging" remains an example of the sort of "absurd implication" I'd expect for an insufficiently skeptical prior.

Finally, regarding "a single percentage point of reduction of existential risks would be worth (from a utilitarian expected utility point-of-view) a delay of over 10 million years." - I'm not aware of reasons to believe it's clear that it would be easier to reduce extinction risk by a percentage point than to speed colonization by 10 million years. If the argument is simply that "a single percentage point seems like a small number," then I believe this is simply an issue of framing, a case of making something very difficult sound easy by expressing it as a small probability of a fantastically difficult accomplishment. Furthermore, I believe that what you call "speedup" reduces net risk of extinction, so I don't think the comparison is valid. (I will elaborate on this belief in the future.)

Comment author: HoldenKarnofsky 26 March 2013 07:49:13PM *  8 points [-]

Thanks for this post - I really appreciate the thoughtful discussion of the arguments I've made.

I'd like to respond by (a) laying out what I believe is a big-picture point of agreement, which I consider more important than any of the disagreements; (b) responding to what I perceive as the main argument this post makes against the framework I've advanced; (c) responding on some more minor points. (c) will be a separate comment due to length constraints.

A big-picture point of agreement: the possibility of vast utility gain does not - in itself - disqualify a giving opportunity as a good one, nor does it establish that the giving opportunity is strong. I'm worried that this point of agreement may be lost on many readers.

The OP makes it sound as though I believe that a high enough EEV is "ruled out" by priors; as discussed below, that is not my position. I agree, and always have, that "Bayesian adjustment does not defeat existential risk charity"; however, I think it defeats an existential risk charity that makes no strong arguments for its ability to make an impact, and relies on a "Pascal's Mugging" type argument for its appeal.

On the flip side, I believe that a lot of readers believe that "Pascal's Mugging" type arguments are sufficient to establish that a particular giving opportunity is outstanding. I don't believe the OP believes this.

I believe the OP and I are in agreement that one should support an existential risk charity if and only if it makes a strong overall case for its likely impact, a case that goes beyond the observation that even a tiny probability of success would imply high expected value. We may disagree on precisely how high the burden of argumentation is, and we probably disagree on whether MIRI clears that hurdle in its current form, but I don't believe either of us thinks the burden of argumentation is trivial or is so high that it can never be reached.

Response to what I perceive as the main argument of this post

It seems to me that the main argument of this post runs as follows: * The priors I'm using imply extremely low probabilities for certain events. * We don't have sufficient reasons to confidently assign such low probabilities to such events.

I think the biggest problems with this argument are as follows:

1 - Most importantly, nothing I've written implies an extremely low probability for any particular event. Nick Beckstead's comment on this post lays out the thinking here. The prior I describe isn't over expected lives saved or DALYs saved (or a similar metric); it's over the merit of a proposed action relative to the merits of other possible actions. So if one estimates that action A has a 10^-10 chance of saving 10^30 lives, while action B has a 50% chance of saving 1 life, one could be wrong about the difference between A and B by (a) overestimating the probability that action A will have the intended impact; (b) underestimating the potential impact of action B; (c) leaving out other consequences of A and B; (d) making some other mistake.

My current working theory is that proponents of "Pascal's Mugging" type arguments tend to neglect the "flow-through effects" of accomplishing good. There are many ways in which helping a person may lead to others' being helped, and ultimately may lead to a small probability of an enormous impact. Nick Beckstead raises a point similar to this one, and the OP has responded that it's a new and potentially compelling argument to him. I also think it's worth bearing in mind that there could be other arguments that we haven't thought of yet - and because of the structure of the situation, I expect such arguments to be more likely to point to further "regression to the mean" (so to make proponents of "Pascal's Mugging" arguments less confident that their proposed actions have high relative expected value) than to point in the other direction. This general phenomenon is a major reason that I place less weight on explicit arguments than many in this community - explicit arguments that consist mostly of speculation aren't very stable or reliable, and when "outside views" point the other way, I expect more explicit reflection to generate more arguments that support the "outside views."

2 - That said, I don't accept any of the arguments given here for why it's unacceptable to assign a very low probability to a proposition. I think there is a general confusion here between "low subjective probability that a proposition is correct" and "high confidence that a proposition isn't correct"; I don't think those two things are equivalent. Probabilities are often discussed with an "odds" framing, with the implication that assigning a 10^-10 probability to something means that I'd be willing to wager $10^10 against $1; this framing is a useful thought experiment in many cases, but when the numbers are like this I think it starts encouraging people to confuse their risk aversion with "non-extreme" (i.e., rarely under 1% or over 99%) subjective probabilities. Another framing is to ask, "If we could somehow do a huge number of 'trials' of this idea, say by simulating worlds constrained by the observations you've made, what would your over/under be for the proportion of trials in which the proposition is true?" and in that case one could simultaneously have an over/under of (10^-10 * # trials) and have extremely low confidence in one's view.

It seems to me that for any small p, there must be some propositions that we assign a probability at least as small as p. (For example, there must be some X such that the probability of an impact greater than X is smaller than p.) Furthermore, it isn't the case that assigning small p means that it's impossible to gather evidence that would change one's mind about p. For example, if you state to me that you will generate a random integer N1 between 1 and 10^100, there must be some integer N2 that I implicitly assign a probability of <=10^-100 as the output of your exercise. (This is true even if there are substantial "unknown unknowns" involved, for example if I don't trust that your generator is truly random.) Yet if you complete the exercise and tell me it produced the number N2, I quickly revise my probability from <=10^-100 to over 50%, based on a single quick observation.

For these reasons, I think the argument that "the mere fact that one assigns a sufficiently low probability to a proposition means that one must be in error" would have unacceptable implications and is not supported by the arguments in the OP.

Comment author: HoldenKarnofsky 01 August 2012 02:16:55PM 14 points [-]

I greatly appreciate the response to my post, particularly the highly thoughtful responses of Luke (original post), Eliezer, and many commenters.

Broad response to Luke's and Eliezer's points:

As I see it, there are a few possible visions of SI's mission:

  • M1. SI is attempting to create a team to build a "Friendly" AGI.
  • M2. SI is developing "Friendliness theory," which addresses how to develop a provably safe/useful/benign utility function without needing iterative/experimental development; this theory could be integrated into an AGI developed by another team, in order to ensure that its actions are beneficial.
  • M3. SI is broadly committed to reducing AGI-related risks, and work on whatever will work toward that goal, including potentially M1 and M2.

My view is that the broader SI's mission, the higher the bar should be for the overall impressiveness of the organization and team. An organization with a very narrow, specific mission - such as "analyzing how to develop a provably safe/useful/benign utility function without needing iterative/experimental development" - can, relatively easily, establish which other organizations (if any) are trying to provide what it does and what the relative qualifications are; it can set clear expectations for deliverables over time and be held accountable to them; its actions and outputs are relatively easy to criticize and debate. By contrast, an organization with broader aims and less clearly relevant deliverables - such as "broadly aiming to reduce risks from AGI, with activities currently focused on community-building" - is giving a donor (or evaluator) less to go on in terms of what the space looks like, what the specific qualifications are and what the specific deliverables are. In this case it becomes more important that a donor be highly confident in the exceptional effectiveness of the organization and team as a whole.

Many of the responses to my criticisms (points #1 and #4 in Eliezer's response; "SI's mission assumes a scenario that is far less conjunctive than it initially appears" and "SI's goals and activities" section of Luke's response) correctly point out that they have less force, as criticisms, when one views SI's mission as relatively broad. However, I believe that evaluating SI by a broader mission raises the burden of affirmative arguments for SI's impressiveness. The primary such arguments I see in the responses are in Luke's list:

(1) The Sequences, the best tool I know for creating aspiring rationalists, (2) Harry Potter and the Methods of Rationality, a surprisingly successful tool for grabbing the attention of mathematicians and computer scientists around the world, and (3) the Singularity Summit, a mainstream-aimed conference that brings in people who end up making significant contributions to the movement — e.g. Tomer Kagan (an SI donor and board member) and David Chalmers (author of The Singularity: A Philosophical Analysis and The Singularity: A Reply).

I've been a consumer of all three of these, and while I've found them enjoyable, I don't find them sufficient for the purpose at hand. Others may reach a different conclusion. And of course, I continue to follow SI's progress, as I understand that it may submit more impressive achievements in the future.

Both Luke and Eliezer seem to disagree with the basic approach I'm taking here. They seem to believe that it is sufficient to establish that (a) AGI risk is an overwhelmingly important issue and that (b) SI compares favorably to other organizations that explicitly focus on this issue. For my part, I (a) disagree with the statement: "the loss in expected value resulting from an existential catastrophe is so enormous that the objective of reducing existential risks should be a dominant consideration whenever we act out of an impersonal concern for humankind as a whole"; (b) do not find Luke's argument that AI, specifically, is the most important existential risk to be compelling (it discusses only how beneficial it would be to address the issue well, not how likely a donor is to be able to help do so); (c) believe it is appropriate to compare the overall organizational impressiveness of the Singularity Institute to that of all other donation-soliciting organizations, not just to that of other existential-risk- or AGI-focused organizations. I would guess that these disagreements, particularly (a) and (c), come down to relatively deep worldview differences (related to the debate over "Pascal's Mugging") that I will probably write more about in the future.

On tool AI:

Most of my disagreements with SI representatives seem to be over how broad a mission is appropriate for SI, and how high a standard SI as an organization should be held to. However, the debate over "tool AI" is different, with both sides making relatively strong claims. Here SI is putting forth a specific point as an underappreciated insight and thus as a potential contribution/accomplishment; my view is that SI's suggested approach to AGI development is more dangerous than the "traditional" approach to software development, and thus that SI is advocating for an approach that would worsen risks from AGI.

My latest thoughts on this disagreement were posted separately in a comment response to Eliezer's post on the subject.

A few smaller points:

  • I disagree with Luke's claim that " objection #1 punts to objection #2." Objection #2 (regarding "tool AI") points out one possible approach to AGI that I believe is both consonant with traditional software development and significantly safer than the approach advocated by SI. But even if the "tool AI" approach is not in fact safer, there may be safer approaches that SI hasn't thought of. SI does not just emphasize the general problem that AGI may be dangerous (something that I believe is a fairly common view), but emphasizes a particular approach to AGI safety, one that seems to me to be highly dangerous. If SI's approach is dangerous relative to other approaches that others are taking/advocating, or even approaches that have yet to be developed (and will be enabled by future tools and progress on AGI), this is a problem for SI.
  • Luke states that rationality is "only a ceteris paribus predictor of success" and that it is a "weak one." I wish to register that I believe rationality is a strong (though not perfect) predictor of success, within the population of people who are as privileged (in terms of having basic needs met, access to education, etc.) as most SI supporters/advocates/representatives. So while I understand that success is not part of the definition of rationality, I stand by my statement that it is "the best evidence of superior general rationality (or of insight into it)."
  • Regarding donor-advised funds: opening an account with Vanguard, Schwab or Fidelity is a simple process, and I doubt any of these institutions would overrule a recommendation to donate to an organization such as SI (in any case, this is easily testable).
Comment author: HoldenKarnofsky 01 August 2012 02:09:11PM 4 points [-]

To summarize how I see the current state of the debate over "tool AI":

  • Eliezer and I have differing intuitions about the likely feasibility, safety and usefulness of the "tool" framework relative to the "Friendliness theory" framework, as laid out in this exchange. This relates mostly to Eliezer's point #2 in the original post. We are both trying to make predictions about a technology for which many of the details are unknown, and at this point I don't see a clear way forward for resolving our disagreements, though I did make one suggestion in that thread.
  • Eliezer has also made two arguments (#1 and #4 in the original post) that appear to be of the form, "Even if the 'tool' approach is most promising, the Singularity Institute still represents a strong giving opportunity." A couple of thoughts on this point:
    • One reason I find the "tool" approach relevant in the context of SI is that it resembles what I see as the traditional approach to software development. My view is that it is likely to be both safer and more efficient for developing AGI than the "Friendliness theory" approach. If this is the case, it seems that the safety of AGI will largely be a function of the competence and care with which its developers execute on the traditional approach to software development, and the potential value-added of a third-party team of "Friendliness specialists" is unclear.
    • That said, I recognize that SI has multiple conceptually possible paths to impact, including developing AGI itself and raising awareness of the risks of AGI. I believe that the more the case for SI revolves around activities like these rather than around developing "Friendliness theory," the higher the bar for SI's general impressiveness (as an organization and team) becomes; I will elaborate on this when I respond to Luke's response to me.
  • Regarding Eliezer's point #3 - I think this largely comes down to how strong one finds the argument for "tool A.I." I agree that one shouldn't expect SI to respond to every possible critique of its plans. But I think it's reasonable to expect it to anticipate and respond to the stronger possible critiques.
  • I'd also like to address two common objections to the "tool AI" framework that came up in comments, though neither of these objections appears to have been taken up in official SI responses.
    • Some have argued that the idea of "tool AI" is incoherent, or is not distinct from the idea of "Oracle AI," or is conceptually impossible. I believe these arguments to be incorrect, though my ability to formalize and clarify my intuitions on this point has been limited. For those interested in reading attempts to better clarify the concept of "tool AI" following my original post, I recommend jsalvatier's comments on the discussion post devoted to this topic as well as my exchange with Eliezer elsewhere on this thread.
    • Some have argued that "agents" are likely to be more efficient and powerful than "tools," since they are not bottlenecked by human input, and thus that the "tool" concept is unimportant. I anticipated this objection in my original post and expanded on my response in my exchange with Eliezer elsewhere on this thread. In a nutshell, I believe the "tool" framework is likely to be a faster and more efficient way of developing a capable and useful AGI than the sort of framework for which "Friendliness theory" would be relevant; and if it isn't, that the sort of work SI is doing on "Friendliness theory" is likely to be of little value. (Again, I recognize that SI has multiple conceptually possible paths to impact other than development of "Friendliness theory" and will address these in a future comment.)
Comment author: Eliezer_Yudkowsky 18 July 2012 02:12:12PM 27 points [-]

So first a quick note: I wasn't trying to say that the difficulties of AIXI are universal and everything goes analogously to AIXI, I was just stating why AIXI couldn't represent the suggestion you were trying to make. The general lesson to be learned is not that everything else works like AIXI, but that you need to look a lot harder at an equation before thinking that it does what you want.

On a procedural level, I worry a bit that the discussion is trying to proceed by analogy to Google Maps. Let it first be noted that Google Maps simply is not playing in the same league as, say, the human brain, in terms of complexity; and that if we were to look at the winning "algorithm" of the million-dollar Netflix Prize competition, which was in fact a blend of 107 different algorithms, you would have a considerably harder time figuring out why it claimed anything it claimed.

But to return to the meta-point, I worry about conversations that go into "But X is like Y, which does Z, so X should do reinterpreted-Z". Usually, in my experience, that goes into what I call "reference class tennis" or "I'm taking my reference class and going home". The trouble is that there's an unlimited number of possible analogies and reference classes, and everyone has a different one. I was just browsing old LW posts today (to find a URL of a quick summary of why group-selection arguments don't work in mammals) and ran across a quotation from Perry Metzger to the effect that so long as the laws of physics apply, there will always be evolution, hence nature red in tooth and claw will continue into the future - to him, the obvious analogy for the advent of AI was "nature red in tooth and claw", and people who see things this way tend to want to cling to that analogy even if you delve into some basic evolutionary biology with math to show how much it isn't like intelligent design. For Robin Hanson, the one true analogy is to the industrial revolution and farming revolutions, meaning that there will be lots of AIs in a highly competitive economic situation with standards of living tending toward the bare minimum, and this is so absolutely inevitable and consonant with The Way Things Should Be as to not be worth fighting at all. That's his one true analogy and I've never been able to persuade him otherwise. For Kurzweil, the fact that many different things proceed at a Moore's Law rate to the benefit of humanity means that all these things are destined to continue and converge into the future, also to the benefit of humanity. For him, "things that go by Moore's Law" is his favorite reference class.

I can have a back-and-forth conversation with Nick Bostrom, who looks much more favorably on Oracle AI in general than I do, because we're not playing reference class tennis with "But surely that will be just like all the previous X-in-my-favorite-reference-class", nor saying, "But surely this is the inevitable trend of technology"; instead we lay out particular, "Suppose we do this?" and try to discuss how it will work, not with any added language about how surely anyone will do it that way, or how it's got to be like Z because all previous Y were like Z, etcetera.

My own FAI development plans call for trying to maintain programmer-understandability of some parts of the AI during development. I expect this to be a huge headache, possibly 30% of total headache, possibly the critical point on which my plans fail, because it doesn't happen naturally. Go look at the source code of the human brain and try to figure out what a gene does. Go ask the Netflix Prize winner for a movie recommendation and try to figure out "why" it thinks you'll like watching it. Go train a neural network and then ask why it classified something as positive or negative. Try to keep track of all the memory allocations inside your operating system - that part is humanly understandable, but it flies past so fast you can only monitor a tiny fraction of what goes on, and if you want to look at just the most "significant" parts, you would need an automated algorithm to tell you what's significant. Most AI algorithms are not humanly understandable. Part of Bayesianism's appeal in AI is that Bayesian programs tend to be more understandable than non-Bayesian AI algorithms. I have hopeful plans to try and constrain early FAI content to humanly comprehensible ontologies, prefer algorithms with humanly comprehensible reasons-for-outputs, carefully weigh up which parts of the AI can safely be less comprehensible, monitor significant events, slow down the AI so that this monitoring can occur, and so on. That's all Friendly AI stuff, and I'm talking about it because I'm an FAI guy. I don't think I've ever heard any other AGI project express such plans; and in mainstream AI, human-comprehensibility is considered a nice feature, but rarely a necessary one.

It should finally be noted that AI famously does not result from generalizing normal software development. If you start with a map-route program and then try to program it to plan more and more things until it becomes an AI... you're doomed, and all the experienced people know you're doomed. I think there's an entry or two in the old Jargon File aka Hacker's Dictionary to this effect. There's a qualitative jump to writing a different sort of software - from normal programming where you create a program conjugate to the problem you're trying to solve, to AI where you try to solve cognitive-science problems so the AI can solve the object-level problem. I've personally met a programmer or two who've generalized their code in interesting ways, and who feel like they ought to be able to generalize it even further until it becomes intelligent. This is a famous illusion among aspiring young brilliant hackers who haven't studied AI. Machine learning is a separate discipline and involves algorithms and problems that look quite different from "normal" programming.

Comment author: HoldenKarnofsky 18 July 2012 04:29:00PM 15 points [-]

Thanks for the response. My thoughts at this point are that

  • We seem to have differing views of how to best do what you call "reference class tennis" and how useful it can be. I'll probably be writing about my views more in the future.
  • I find it plausible that AGI will have to follow a substantially different approach from "normal" software. But I'm not clear on the specifics of what SI believes those differences will be and why they point to the "proving safety/usefulness before running" approach over the "tool" approach.
  • We seem to have differing views of how frequently today's software can be made comprehensible via interfaces. For example, my intuition is that the people who worked on the Netflix Prize algorithm had good interfaces for understanding "why" it recommends what it does, and used these to refine it. I may further investigate this matter (casually, not as a high priority); on SI's end, it might be helpful (from my perspective) to provide detailed examples of existing algorithms for which the "tool" approach to development didn't work and something closer to "proving safety/usefulness up front" was necessary.
Comment author: Eliezer_Yudkowsky 11 July 2012 10:59:07PM 29 points [-]

Didn't see this at the time, sorry.

So... I'm sorry if this reply seems a little unhelpful, and I wish there was some way to engage more strongly, but...

Point (1) is the main problem. AIXI updates freely over a gigantic range of sensory predictors with no specified ontology - it's a sum over a huge set of programs, and we, the users, have no idea what the representations are talking about, except that at the end of their computations they predict, "You will see a sensory 1 (or a sensory 0)." (In my preferred formalism, the program puts a probability on a 0 instead.) Inside, the program could've been modeling the universe in terms of atoms, quarks, quantum fields, cellular automata, giant moving paperclips, slave agents scurrying around... we, the programmers, have no idea how AIXI is modeling the world and producing its predictions, and indeed, the final prediction could be a sum over many different representations.

This means that equation (20) in Hutter is written as a utility function over sense data, where the reward channel is just a special case of sense data. We can easily adapt this equation to talk about any function computed directly over sense data - we can get AIXI to optimize any aspect of its sense data that we please. We can't get it to optimize a quality of the external universe. One of the challenges I listed in my FAI Open Problems talk, and one of the problems I intend to talk about in my FAI Open Problems sequence, is to take the first nontrivial steps toward adapting this formalism - to e.g. take an equivalent of AIXI in a really simple universe, with a really simple goal, something along the lines of a Life universe and a goal of making gliders, and specify something given unlimited computing power which would behave like it had that goal, without pre-fixing the ontology of the causal representation to that of the real universe, i.e., you want something that can range freely over ontologies in its predictive algorithms, but which still behaves like it's maximizing an outside thing like gliders instead of a sensory channel like the reward channel. This is an unsolved problem!

We haven't even got to the part where it's difficult to say in formal terms how to interpret what a human says s/he wants the AI to plan, and where failures of phrasing of that utility function can also cause a superhuman intelligence to kill you. We haven't even got to the huge buried FAI problem inside the word "optimal" in point (1), which is the really difficult part in the whole thing. Because so far we're dealing with a formalism that can't even represent a purpose of the type you're looking for - it can only optimize over sense data, and this is not a coincidental fact, but rather a deep problem which the AIXI formalism deliberately avoided.

(2) sounds like you think an AI with an alien, superhuman planning algorithm can tell humans what to do without ever thinking consequentialistically about which different statements will result in human understanding or misunderstanding. Anna says that I need to work harder on not assuming other people are thinking silly things, but even so, when I look at this, it's hard not to imagine that you're modeling AIXI as a sort of spirit containing thoughts, whose thoughts could be exposed to the outside with a simple exposure-function. It's not unthinkable that a non-self-modifying superhuman planning Oracle could be developed with the further constraint that its thoughts are human-interpretable, or can be translated for human use without any algorithms that reason internally about what humans understand, but this would at the least be hard. And with AIXI it would be impossible, because AIXI's model of the world ranges over literally all possible ontologies and representations, and its plans are naked motor outputs.

Similar remarks apply to interpreting and answering "What will be its effect on _?" It turns out that getting an AI to understand human language is a very hard problem, and it may very well be that even though talking doesn't feel like having a utility function, our brains are using consequential reasoning to do it. Certainly, when I write language, that feels like I'm being deliberate. It's also worth noting that "What is the effect on X?" really means "What are the effects I care about on X?" and that there's a large understanding-the-human's-utility-function problem here. In particular, you don't want your language for describing "effects" to partition, as the same state of described affairs, any two states which humans assign widely different utilities. Let's say there are two plans for getting my grandmother out of a burning house, one of which destroys her music collection, one of which leaves it intact. Does the AI know that music is valuable? If not, will it not describe music-destruction as an "effect" of a plan which offers to free up large amounts of computer storage by, as it turns out, overwriting everyone's music collection? If you then say that the AI should describe changes to files in general, well, should it also talk about changes to its own internal files? Every action comes with a huge number of consequences - if we hear about all of them (reality described on a level so granular that it automatically captures all utility shifts, as well as a huge number of other unimportant things) then we'll be there forever.

I wish I had something more cooperative to say in reply - it feels like I'm committing some variant of logical rudeness by this reply - but the truth is, it seems to me that AIXI isn't a good basis for the agent you want to describe; and I don't know how to describe it formally myself, either.

Comment author: HoldenKarnofsky 18 July 2012 02:35:33AM 17 points [-]

Thanks for the response. To clarify, I'm not trying to point to the AIXI framework as a promising path; I'm trying to take advantage of the unusually high degree of formalization here in order to gain clarity on the feasibility and potential danger points of the "tool AI" approach.

It sounds to me like your two major issues with the framework I presented are (to summarize):

(1) There is a sense in which AIXI predictions must be reducible to predictions about the limited set of inputs it can "observe directly" (what you call its "sense data").

(2) Computers model the world in ways that can be unrecognizable to humans; it may be difficult to create interfaces that allow humans to understand the implicit assumptions and predictions in their models.

I don't claim that these problems are trivial to deal with. And stated as you state them, they sound abstractly very difficult to deal with. However, it seems true - and worth noting - that "normal" software development has repeatedly dealt with them successfully. For example: Google Maps works with a limited set of inputs; Google Maps does not "think" like I do and I would not be able to look at a dump of its calculations and have any real sense for what it is doing; yet Google Maps does make intelligent predictions about the external universe (e.g., "following direction set X will get you from point A to point B in reasonable time"), and it also provides an interface (the "route map") that helps me understand its predictions and the implicit reasoning (e.g. "how, why, and with what other consequences direction set X will get me from point A to point B").

Difficult though it may be to overcome these challenges, my impression is that software developers have consistently - and successfully - chosen to take them on, building algorithms that can be "understood" via interfaces and iterated over - rather than trying to prove the safety and usefulness of their algorithms with pure theory before ever running them. Not only does the former method seem "safer" (in the sense that it is less likely to lead to putting software in production before its safety and usefulness has been established) but it seems a faster path to development as well.

It seems that you see a fundamental disconnect between how software development has traditionally worked and how it will have to work in order to result in AGI. But I don't understand your view of this disconnect well enough to see why it would lead to a discontinuation of the phenomenon I describe above. In short, traditional software development seems to have an easier (and faster and safer) time overcoming the challenges of the "tool" framework than overcoming the challenges of up-front theoretical proofs of safety/usefulness; why should we expect this to reverse in the case of AGI?

Comment author: HoldenKarnofsky 05 July 2012 04:18:16PM 18 points [-]


I appreciate the thoughtful response. I plan to respond at greater length in the future, both to this post and to some other content posted by SI representatives and commenters. For now, I wanted to take a shot at clarifying the discussion of "tool-AI" by discussing AIXI. One of the the issues I've found with the debate over FAI in general is that I haven't seen much in the way of formal precision about the challenge of Friendliness (I recognize that I have also provided little formal precision, though I feel the burden of formalization is on SI here). It occurred to me that AIXI might provide a good opportunity to have a more precise discussion, if in fact it is believed to represent a case of "a rare exception who specified his AGI in such unambiguous mathematical terms that he actually succeeded at realizing, after some discussion with SIAI personnel, that AIXI would kill off its users and seize control of its reward button."

So here's my characterization of how one might work toward a safe and useful version of AIXI, using the "tool-AI" framework, if one could in fact develop an efficient enough approximation of AIXI to qualify as a powerful AGI. Of course, this is just a rough outline of what I have in mind, but hopefully it adds some clarity to the discussion.

A. Write a program that

  1. Computes an optimal policy, using some implementation of equation (20) on page 22 of http://www.hutter1.net/ai/aixigentle.pdf
  2. "Prints" the policy in a human-readable format (using some fixed algorithm for "printing" that is not driven by a utility function)
  3. Provides tools for answering user questions about the policy, i.e., "What will be its effect on _?" (using some fixed algorithm for answering user questions that makes use of AIXI's probability function, and is not driven by a utility function)
  4. Does not contain any procedures for "implementing" the policy, only for displaying it and its implications in human-readable form

B. Run the program; examine its output using the tools described above (#2 and #3); if, upon such examination, the policy appears potentially destructive, continue tweaking the program (for example, by tweaking the utility it is selecting a policy to maximize) until the policy appears safe and desirable

C. Implement the policy using tools other than AIXI agent

D. Repeat (B) and (C) until one has confidence that the AIXI agent reliably produces safe and desirable policies, at which point more automation may be called for

My claim is that this approach would be superior to that of trying to develop "Friendliness theory" in advance of having any working AGI, because it would allow experiment- rather than theory-based development. Eliezer, I'm interested in your thoughts about my claim. Do you agree? If not, where is our disagreement?

Thoughts on the Singularity Institute (SI)

256 HoldenKarnofsky 11 May 2012 04:31AM

This post presents thoughts on the Singularity Institute from Holden Karnofsky, Co-Executive Director of GiveWell. Note: Luke Muehlhauser, the Executive Director of the Singularity Institute, reviewed a draft of this post, and commented: "I do generally agree that your complaints are either correct (especially re: past organizational competence) or incorrect but not addressed by SI in clear argumentative writing (this includes the part on 'tool' AI). I am working to address both categories of issues." I take Luke's comment to be a significant mark in SI's favor, because it indicates an explicit recognition of the problems I raise, and thus increases my estimate of the likelihood that SI will work to address them.

September 2012 update: responses have been posted by Luke and Eliezer (and I have responded in the comments of their posts). I have also added acknowledgements.

The Singularity Institute (SI) is a charity that GiveWell has been repeatedly asked to evaluate. In the past, SI has been outside our scope (as we were focused on specific areas such as international aid). With GiveWell Labs we are open to any giving opportunity, no matter what form and what sector, but we still do not currently plan to recommend SI; given the amount of interest some of our audience has expressed, I feel it is important to explain why. Our views, of course, remain open to change. (Note: I am posting this only to Less Wrong, not to the GiveWell Blog, because I believe that everyone who would be interested in this post will see it here.)

I am currently the GiveWell staff member who has put the most time and effort into engaging with and evaluating SI. Other GiveWell staff currently agree with my bottom-line view that we should not recommend SI, but this does not mean they have engaged with each of my specific arguments. Therefore, while the lack of recommendation of SI is something that GiveWell stands behind, the specific arguments in this post should be attributed only to me, not to GiveWell.

Summary of my views

  • The argument advanced by SI for why the work it's doing is beneficial and important seems both wrong and poorly argued to me. My sense at the moment is that the arguments SI is making would, if accepted, increase rather than decrease the risk of an AI-related catastrophe. More
  • SI has, or has had, multiple properties that I associate with ineffective organizations, and I do not see any specific evidence that its personnel/organization are well-suited to the tasks it has set for itself. More
  • A common argument for giving to SI is that "even an infinitesimal chance that it is right" would be sufficient given the stakes. I have written previously about why I reject this reasoning; in addition, prominent SI representatives seem to reject this particular argument as well (i.e., they believe that one should support SI only if one believes it is a strong organization making strong arguments). More
  • My sense is that at this point, given SI's current financial state, withholding funds from SI is likely better for its mission than donating to it. (I would not take this view to the furthest extreme; the argument that SI should have some funding seems stronger to me than the argument that it should have as much as it currently has.)
  • I find existential risk reduction to be a fairly promising area for philanthropy, and plan to investigate it further. More
  • There are many things that could happen that would cause me to revise my view on SI. However, I do not plan to respond to all comment responses to this post. (Given the volume of responses we may receive, I may not be able to even read all the comments on this post.) I do not believe these two statements are inconsistent, and I lay out paths for getting me to change my mind that are likely to work better than posting comments. (Of course I encourage people to post comments; I'm just noting in advance that this action, alone, doesn't guarantee that I will consider your argument.) More

Intent of this post

I did not write this post with the purpose of "hurting" SI. Rather, I wrote it in the hopes that one of these three things (or some combination) will happen:

  1. New arguments are raised that cause me to change my mind and recognize SI as an outstanding giving opportunity. If this happens I will likely attempt to raise more money for SI (most likely by discussing it with other GiveWell staff and collectively considering a GiveWell Labs recommendation).
  2. SI concedes that my objections are valid and increases its determination to address them. A few years from now, SI is a better organization and more effective in its mission.
  3. SI can't or won't make changes, and SI's supporters feel my objections are valid, so SI loses some support, freeing up resources for other approaches to doing good.

Which one of these occurs will hopefully be driven primarily by the merits of the different arguments raised. Because of this, I think that whatever happens as a result of my post will be positive for SI's mission, whether or not it is positive for SI as an organization. I believe that most of SI's supporters and advocates care more about the former than about the latter, and that this attitude is far too rare in the nonprofit world.

continue reading »

View more: Next