Thoughts on the Singularity Institute (SI)

256 Post author: HoldenKarnofsky 11 May 2012 04:31AM

This post presents thoughts on the Singularity Institute from Holden Karnofsky, Co-Executive Director of GiveWell. Note: Luke Muehlhauser, the Executive Director of the Singularity Institute, reviewed a draft of this post, and commented: "I do generally agree that your complaints are either correct (especially re: past organizational competence) or incorrect but not addressed by SI in clear argumentative writing (this includes the part on 'tool' AI). I am working to address both categories of issues." I take Luke's comment to be a significant mark in SI's favor, because it indicates an explicit recognition of the problems I raise, and thus increases my estimate of the likelihood that SI will work to address them.

September 2012 update: responses have been posted by Luke and Eliezer (and I have responded in the comments of their posts). I have also added acknowledgements.

The Singularity Institute (SI) is a charity that GiveWell has been repeatedly asked to evaluate. In the past, SI has been outside our scope (as we were focused on specific areas such as international aid). With GiveWell Labs we are open to any giving opportunity, no matter what form and what sector, but we still do not currently plan to recommend SI; given the amount of interest some of our audience has expressed, I feel it is important to explain why. Our views, of course, remain open to change. (Note: I am posting this only to Less Wrong, not to the GiveWell Blog, because I believe that everyone who would be interested in this post will see it here.)

I am currently the GiveWell staff member who has put the most time and effort into engaging with and evaluating SI. Other GiveWell staff currently agree with my bottom-line view that we should not recommend SI, but this does not mean they have engaged with each of my specific arguments. Therefore, while the lack of recommendation of SI is something that GiveWell stands behind, the specific arguments in this post should be attributed only to me, not to GiveWell.

Summary of my views

  • The argument advanced by SI for why the work it's doing is beneficial and important seems both wrong and poorly argued to me. My sense at the moment is that the arguments SI is making would, if accepted, increase rather than decrease the risk of an AI-related catastrophe. More
  • SI has, or has had, multiple properties that I associate with ineffective organizations, and I do not see any specific evidence that its personnel/organization are well-suited to the tasks it has set for itself. More
  • A common argument for giving to SI is that "even an infinitesimal chance that it is right" would be sufficient given the stakes. I have written previously about why I reject this reasoning; in addition, prominent SI representatives seem to reject this particular argument as well (i.e., they believe that one should support SI only if one believes it is a strong organization making strong arguments). More
  • My sense is that at this point, given SI's current financial state, withholding funds from SI is likely better for its mission than donating to it. (I would not take this view to the furthest extreme; the argument that SI should have some funding seems stronger to me than the argument that it should have as much as it currently has.)
  • I find existential risk reduction to be a fairly promising area for philanthropy, and plan to investigate it further. More
  • There are many things that could happen that would cause me to revise my view on SI. However, I do not plan to respond to all comment responses to this post. (Given the volume of responses we may receive, I may not be able to even read all the comments on this post.) I do not believe these two statements are inconsistent, and I lay out paths for getting me to change my mind that are likely to work better than posting comments. (Of course I encourage people to post comments; I'm just noting in advance that this action, alone, doesn't guarantee that I will consider your argument.) More

Intent of this post

I did not write this post with the purpose of "hurting" SI. Rather, I wrote it in the hopes that one of these three things (or some combination) will happen:

  1. New arguments are raised that cause me to change my mind and recognize SI as an outstanding giving opportunity. If this happens I will likely attempt to raise more money for SI (most likely by discussing it with other GiveWell staff and collectively considering a GiveWell Labs recommendation).
  2. SI concedes that my objections are valid and increases its determination to address them. A few years from now, SI is a better organization and more effective in its mission.
  3. SI can't or won't make changes, and SI's supporters feel my objections are valid, so SI loses some support, freeing up resources for other approaches to doing good.

Which one of these occurs will hopefully be driven primarily by the merits of the different arguments raised. Because of this, I think that whatever happens as a result of my post will be positive for SI's mission, whether or not it is positive for SI as an organization. I believe that most of SI's supporters and advocates care more about the former than about the latter, and that this attitude is far too rare in the nonprofit world.

Does SI have a well-argued case that its work is beneficial and important?

I know no more concise summary of SI's views than this page, so here I give my own impressions of what SI believes, in italics.

  1. There is some chance that in the near future (next 20-100 years), an "artificial general intelligence" (AGI) - a computer that is vastly more intelligent than humans in every relevant way - will be created.
  2. This AGI will likely have a utility function and will seek to maximize utility according to this function.
  3. This AGI will be so much more powerful than humans - due to its superior intelligence - that it will be able to reshape the world to maximize its utility, and humans will not be able to stop it from doing so.
  4. Therefore, it is crucial that its utility function be one that is reasonably harmonious with what humans want. A "Friendly" utility function is one that is reasonably harmonious with what humans want, such that a "Friendly" AGI (FAI) would change the world for the better (by human standards) while an "Unfriendly" AGI (UFAI) would essentially wipe out humanity (or worse).
  5. Unless great care is taken specifically to make a utility function "Friendly," it will be "Unfriendly," since the things humans value are a tiny subset of the things that are possible.
  6. Therefore, it is crucially important to develop "Friendliness theory" that helps us to ensure that the first strong AGI's utility function will be "Friendly." The developer of Friendliness theory could use it to build an FAI directly or could disseminate the theory so that others working on AGI are more likely to build FAI as opposed to UFAI.

From the time I first heard this argument, it has seemed to me to be skipping important steps and making major unjustified assumptions. However, for a long time I believed this could easily be due to my inferior understanding of the relevant issues. I believed my own views on the argument to have only very low relevance (as I stated in my 2011 interview with SI representatives). Over time, I have had many discussions with SI supporters and advocates, as well as with non-supporters who I believe understand the relevant issues well. I now believe - for the moment - that my objections are highly relevant, that they cannot be dismissed as simple "layman's misunderstandings" (as they have been by various SI supporters in the past), and that SI has not published anything that addresses them in a clear way.

Below, I list my major objections. I do not believe that these objections constitute a sharp/tight case for the idea that SI's work has low/negative value; I believe, instead, that SI's own arguments are too vague for such a rebuttal to be possible. There are many possible responses to my objections, but SI's public arguments (and the private arguments) do not make clear which possible response (if any) SI would choose to take up and defend. Hopefully the dialogue following this post will clarify what SI believes and why.

Some of my views are discussed at greater length (though with less clarity) in a public transcript of a conversation I had with SI supporter Jaan Tallinn. I refer to this transcript as "Karnofsky/Tallinn 2011."

Objection 1: it seems to me that any AGI that was set to maximize a "Friendly" utility function would be extraordinarily dangerous.

Suppose, for the sake of argument, that SI manages to create what it believes to be an FAI. Suppose that it is successful in the "AGI" part of its goal, i.e., it has successfully created an intelligence vastly superior to human intelligence and extraordinarily powerful from our perspective. Suppose that it has also done its best on the "Friendly" part of the goal: it has developed a formal argument for why its AGI's utility function will be Friendly, it believes this argument to be airtight, and it has had this argument checked over by 100 of the world's most intelligent and relevantly experienced people. Suppose that SI now activates its AGI, unleashing it to reshape the world as it sees fit. What will be the outcome?

I believe that the probability of an unfavorable outcome - by which I mean an outcome essentially equivalent to what a UFAI would bring about - exceeds 90% in such a scenario. I believe the goal of designing a "Friendly" utility function is likely to be beyond the abilities even of the best team of humans willing to design such a function. I do not have a tight argument for why I believe this, but a comment on LessWrong by Wei Dai gives a good illustration of the kind of thoughts I have on the matter:

What I'm afraid of is that a design will be shown to be safe, and then it turns out that the proof is wrong, or the formalization of the notion of "safety" used by the proof is wrong. This kind of thing happens a lot in cryptography, if you replace "safety" with "security". These mistakes are still occurring today, even after decades of research into how to do such proofs and what the relevant formalizations are. From where I'm sitting, proving an AGI design Friendly seems even more difficult and error-prone than proving a crypto scheme secure, probably by a large margin, and there is no decades of time to refine the proof techniques and formalizations. There's good recent review of the history of provable security, titled Provable Security in the Real World, which might help you understand where I'm coming from.

I think this comment understates the risks, however. For example, when the comment says "the formalization of the notion of 'safety' used by the proof is wrong," it is not clear whether it means that the values the programmers have in mind are not correctly implemented by the formalization, or whether it means they are correctly implemented but are themselves catastrophic in a way that hasn't been anticipated. I would be highly concerned about both. There are other catastrophic possibilities as well; perhaps the utility function itself is well-specified and safe, but the AGI's model of the world is flawed (in particular, perhaps its prior or its process for matching observations to predictions are flawed) in a way that doesn't emerge until the AGI has made substantial changes to its environment.

By SI's own arguments, even a small error in any of these things would likely lead to catastrophe. And there are likely failure forms I haven't thought of. The overriding intuition here is that complex plans usually fail when unaccompanied by feedback loops. A scenario in which a set of people is ready to unleash an all-powerful being to maximize some parameter in the world, based solely on their initial confidence in their own extrapolations of the consequences of doing so, seems like a scenario that is overwhelmingly likely to result in a bad outcome. It comes down to placing the world's largest bet on a highly complex theory - with no experimentation to test the theory first.

So far, all I have argued is that the development of "Friendliness" theory can achieve at best only a limited reduction in the probability of an unfavorable outcome. However, as I argue in the next section, I believe there is at least one concept - the "tool-agent" distinction - that has more potential to reduce risks, and that SI appears to ignore this concept entirely. I believe that tools are safer than agents (even agents that make use of the best "Friendliness" theory that can reasonably be hoped for) and that SI encourages a focus on building agents, thus increasing risk.

Objection 2: SI appears to neglect the potentially important distinction between "tool" and "agent" AI.

Google Maps is a type of artificial intelligence (AI). It is far more intelligent than I am when it comes to planning routes.

Google Maps - by which I mean the complete software package including the display of the map itself - does not have a "utility" that it seeks to maximize. (One could fit a utility function to its actions, as to any set of actions, but there is no single "parameter to be maximized" driving its operations.)

Google Maps (as I understand it) considers multiple possible routes, gives each a score based on factors such as distance and likely traffic, and then displays the best-scoring route in a way that makes it easily understood by the user. If I don't like the route, for whatever reason, I can change some parameters and consider a different route. If I like the route, I can print it out or email it to a friend or send it to my phone's navigation application. Google Maps has no single parameter it is trying to maximize; it has no reason to try to "trick" me in order to increase its utility.

In short, Google Maps is not an agent, taking actions in order to maximize a utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish.

Every software application I know of seems to work essentially the same way, including those that involve (specialized) artificial intelligence such as Google Search, Siri, Watson, Rybka, etc. Some can be put into an "agent mode" (as Watson was on Jeopardy!) but all can easily be set up to be used as "tools" (for example, Watson can simply display its top candidate answers to a question, with the score for each, without speaking any of them.)

The "tool mode" concept is importantly different from the possibility of Oracle AI sometimes discussed by SI. The discussions I've seen of Oracle AI present it as an Unfriendly AI that is "trapped in a box" - an AI whose intelligence is driven by an explicit utility function and that humans hope to control coercively. Hence the discussion of ideas such as the AI-Box Experiment. A different interpretation, given in Karnofsky/Tallinn 2011, is an AI with a carefully designed utility function - likely as difficult to construct as "Friendliness" - that leaves it "wishing" to answer questions helpfully. By contrast with both these ideas, Tool-AGI is not "trapped" and it is not Unfriendly or Friendly; it has no motivations and no driving utility function of any kind, just like Google Maps. It scores different possibilities and displays its conclusions in a transparent and user-friendly manner, as its instructions say to do; it does not have an overarching "want," and so, as with the specialized AIs described above, while it may sometimes "misinterpret" a question (thereby scoring options poorly and ranking the wrong one #1) there is no reason to expect intentional trickery or manipulation when it comes to displaying its results.

Another way of putting this is that a "tool" has an underlying instruction set that conceptually looks like: "(1) Calculate which action A would maximize parameter P, based on existing data set D. (2) Summarize this calculation in a user-friendly manner, including what Action A is, what likely intermediate outcomes it would cause, what other actions would result in high values of P, etc." An "agent," by contrast, has an underlying instruction set that conceptually looks like: "(1) Calculate which action, A, would maximize parameter P, based on existing data set D. (2) Execute Action A." In any AI where (1) is separable (by the programmers) as a distinct step, (2) can be set to the "tool" version rather than the "agent" version, and this separability is in fact present with most/all modern software. Note that in the "tool" version, neither step (1) nor step (2) (nor the combination) constitutes an instruction to maximize a parameter - to describe a program of this kind as "wanting" something is a category error, and there is no reason to expect its step (2) to be deceptive.

I elaborated further on the distinction and on the concept of a tool-AI in Karnofsky/Tallinn 2011.

This is important because an AGI running in tool mode could be extraordinarily useful but far more safe than an AGI running in agent mode. In fact, if developing "Friendly AI" is what we seek, a tool-AGI could likely be helpful enough in thinking through this problem as to render any previous work on "Friendliness theory" moot. Among other things, a tool-AGI would allow transparent views into the AGI's reasoning and predictions without any reason to fear being purposefully misled, and would facilitate safe experimental testing of any utility function that one wished to eventually plug into an "agent."

Is a tool-AGI possible? I believe that it is, and furthermore that it ought to be our default picture of how AGI will work, given that practically all software developed to date can (and usually does) run as a tool and given that modern software seems to be constantly becoming "intelligent" (capable of giving better answers than a human) in surprising new domains. In addition, it intuitively seems to me (though I am not highly confident) that intelligence inherently involves the distinct, separable steps of (a) considering multiple possible actions and (b) assigning a score to each, prior to executing any of the possible actions. If one can distinctly separate (a) and (b) in a program's code, then one can abstain from writing any "execution" instructions and instead focus on making the program list actions and scores in a user-friendly manner, for humans to consider and use as they wish.

Of course, there are possible paths to AGI that may rule out a "tool mode," but it seems that most of these paths would rule out the application of "Friendliness theory" as well. (For example, a "black box" emulation and augmentation of a human mind.) What are the paths to AGI that allow manual, transparent, intentional design of a utility function but do not allow the replacement of "execution" instructions with "communication" instructions? Most of the conversations I've had on this topic have focused on three responses:

  • Self-improving AI. Many seem to find it intuitive that (a) AGI will almost certainly come from an AI rewriting its own source code, and (b) such a process would inevitably lead to an "agent." I do not agree with either (a) or (b). I discussed these issues in Karnofsky/Tallinn 2011 and will be happy to discuss them more if this is the line of response that SI ends up pursuing. Very briefly:
    • The idea of a "self-improving algorithm" intuitively sounds very powerful, but does not seem to have led to many "explosions" in software so far (and it seems to be a concept that could apply to narrow AI as well as to AGI).
    • It seems to me that a tool-AGI could be plugged into a self-improvement process that would be quite powerful but would also terminate and yield a new tool-AI after a set number of iterations (or after reaching a set "intelligence threshold"). So I do not accept the argument that "self-improving AGI means agent AGI." As stated above, I will elaborate on this view if it turns out to be an important point of disagreement.
    • I have argued (in Karnofsky/Tallinn 2011) that the relevant self-improvement abilities are likely to come with or after - not prior to - the development of strong AGI. In other words, any software capable of the relevant kind of self-improvement is likely also capable of being used as a strong tool-AGI, with the benefits described above.
    • The SI-related discussions I've seen of "self-improving AI" are highly vague, and do not spell out views on the above points.
  • Dangerous data collection. Some point to the seeming dangers of a tool-AI's "scoring" function: in order to score different options it may have to collect data, which is itself an "agent" type action that could lead to dangerous actions. I think my definition of "tool" above makes clear what is wrong with this objection: a tool-AGI takes its existing data set D as fixed (and perhaps could have some pre-determined, safe set of simple actions it can take - such as using Google's API - to collect more), and if maximizing its chosen parameter is best accomplished through more data collection, it can transparently output why and how it suggests collecting more data. Over time it can be given more autonomy for data collection through an experimental and domain-specific process (e.g., modifying the AI to skip specific steps of human review of proposals for data collection after it has become clear that these steps work as intended), a process that has little to do with the "Friendly overarching utility function" concept promoted by SI. Again, I will elaborate on this if it turns out to be a key point.
  • Race for power. Some have argued to me that humans are likely to choose to create agent-AGI, in order to quickly gain power and outrace other teams working on AGI. But this argument, even if accepted, has very different implications from SI's view.

    Conventional wisdom says it is extremely dangerous to empower a computer to act in the world until one is very sure that the computer will do its job in a way that is helpful rather than harmful. So if a programmer chooses to "unleash an AGI as an agent" with the hope of gaining power, it seems that this programmer will be deliberately ignoring conventional wisdom about what is safe in favor of shortsighted greed. I do not see why such a programmer would be expected to make use of any "Friendliness theory" that might be available. (Attempting to incorporate such theory would almost certainly slow the project down greatly, and thus would bring the same problems as the more general "have caution, do testing" counseled by conventional wisdom.) It seems that the appropriate measures for preventing such a risk are security measures aiming to stop humans from launching unsafe agent-AIs, rather than developing theories or raising awareness of "Friendliness."

One of the things that bothers me most about SI is that there is practically no public content, as far as I can tell, explicitly addressing the idea of a "tool" and giving arguments for why AGI is likely to work only as an "agent." The idea that AGI will be driven by a central utility function seems to be simply assumed. Two examples:

  • I have been referred to Muehlhauser and Salamon 2012 as the most up-to-date, clear explanation of SI's position on "the basics." This paper states, "Perhaps we could build an AI of limited cognitive ability — say, a machine that only answers questions: an 'Oracle AI.' But this approach is not without its own dangers (Armstrong, Sandberg, and Bostrom 2012)." However, the referenced paper (Armstrong, Sandberg and Bostrom 2012) seems to take it as a given that an Oracle AI is an "agent trapped in a box" - a computer that has a basic drive/utility function, not a Tool-AGI. The rest of Muehlhauser and Salamon 2012 seems to take it as a given that an AGI will be an agent.
  • I have often been referred to Omohundro 2008 for an argument that an AGI is likely to have certain goals. But this paper seems, again, to take it as given that an AGI will be an agent, i.e., that it will have goals at all. The introduction states, "To say that a system of any design is an 'articial intelligence', we mean that it has goals which it tries to accomplish by acting in the world." In other words, the premise I'm disputing seems embedded in its very definition of AI.

The closest thing I have seen to a public discussion of "tool-AGI" is in Dreams of Friendliness, where Eliezer Yudkowsky considers the question, "Why not just have the AI answer questions, instead of trying to do anything? Then it wouldn't need to be Friendly. It wouldn't need any goals at all. It would just answer questions." His response:

To which the reply is that the AI needs goals in order to decide how to think: that is, the AI has to act as a powerful optimization process in order to plan its acquisition of knowledge, effectively distill sensory information, pluck "answers" to particular questions out of the space of all possible responses, and of course, to improve its own source code up to the level where the AI is a powerful intelligence. All these events are "improbable" relative to random organizations of the AI's RAM, so the AI has to hit a narrow target in the space of possibilities to make superintelligent answers come out.

This passage appears vague and does not appear to address the specific "tool" concept I have defended above (in particular, it does not address the analogy to modern software, which challenges the idea that "powerful optimization processes" cannot run in tool mode). The rest of the piece discusses (a) psychological mistakes that could lead to the discussion in question; (b) the "Oracle AI" concept that I have outlined above. The comments contain some more discussion of the "tool" idea (Denis Bider and Shane Legg seem to be picturing something similar to "tool-AGI") but the discussion is unresolved and I believe the "tool" concept defended above remains essentially unaddressed.

In sum, SI appears to encourage a focus on building and launching "Friendly" agents (it is seeking to do so itself, and its work on "Friendliness" theory seems to be laying the groundwork for others to do so) while not addressing the tool-agent distinction. It seems to assume that any AGI will have to be an agent, and to make little to no attempt at justifying this assumption. The result, in my view, is that it is essentially advocating for a more dangerous approach to AI than the traditional approach to software development.

Objection 3: SI's envisioned scenario is far more specific and conjunctive than it appears at first glance, and I believe this scenario to be highly unlikely.

SI's scenario concerns the development of artificial general intelligence (AGI): a computer that is vastly more intelligent than humans in every relevant way. But we already have many computers that are vastly more intelligent than humans in some relevant ways, and the domains in which specialized AIs outdo humans seem to be constantly and continuously expanding. I feel that the relevance of "Friendliness theory" depends heavily on the idea of a "discrete jump" that seems unlikely and whose likelihood does not seem to have been publicly argued for.

One possible scenario is that at some point, we develop powerful enough non-AGI tools (particularly specialized AIs) that we vastly improve our abilities to consider and prepare for the eventuality of AGI - to the point where any previous theory developed on the subject becomes useless. Or (to put this more generally) non-AGI tools simply change the world so much that it becomes essentially unrecognizable from the perspective of today - again rendering any previous "Friendliness theory" moot. As I said in Karnofsky/Tallinn 2011, some of SI's work "seems a bit like trying to design Facebook before the Internet was in use, or even before the computer existed."

Perhaps there will be a discrete jump to AGI, but it will be a sort of AGI that renders "Friendliness theory" moot for a different reason. For example, in the practice of software development, there often does not seem to be an operational distinction between "intelligent" and "Friendly." (For example, my impression is that the only method programmers had for evaluating Watson's "intelligence" was to see whether it was coming up with the same answers that a well-informed human would; the only way to evaluate Siri's "intelligence" was to evaluate its helpfulness to humans.) "Intelligent" often ends up getting defined as "prone to take actions that seem all-around 'good' to the programmer." So the concept of "Friendliness" may end up being naturally and subtly baked in to a successful AGI effort.

The bottom line is that we know very little about the course of future artificial intelligence. I believe that the probability that SI's concept of "Friendly" vs. "Unfriendly" goals ends up seeming essentially nonsensical, irrelevant and/or unimportant from the standpoint of the relevant future is over 90%.

Other objections to SI's views

There are other debates about the likelihood of SI's work being relevant/helpful; for example,

  • It isn't clear whether the development of AGI is imminent enough to be relevant, or whether other risks to humanity are closer.
  • It isn't clear whether AGI would be as powerful as SI's views imply. (I discussed this briefly in Karnofsky/Tallinn 2011.)
  • It isn't clear whether even an extremely powerful UFAI would choose to attack humans as opposed to negotiating with them. (I find it somewhat helpful to analogize UFAI-human interactions to human-mosquito interactions. Humans are enormously more intelligent than mosquitoes; humans are good at predicting, manipulating, and destroying mosquitoes; humans do not value mosquitoes' welfare; humans have other goals that mosquitoes interfere with; humans would like to see mosquitoes eradicated at least from certain parts of the planet. Yet humans haven't accomplished such eradication, and it is easy to imagine scenarios in which humans would prefer honest negotiation and trade with mosquitoes to any other arrangement, if such negotiation and trade were possible.)

Unlike the three objections I focus on, these other issues have been discussed a fair amount, and if these other issues were the only objections to SI's arguments I would find SI's case to be strong (i.e., I would find its scenario likely enough to warrant investment in).

Wrapup

  • I believe the most likely future scenarios are the ones we haven't thought of, and that the most likely fate of the sort of theory SI ends up developing is irrelevance.
  • I believe that unleashing an all-powerful "agent AGI" (without the benefit of experimentation) would very likely result in a UFAI-like outcome, no matter how carefully the "agent AGI" was designed to be "Friendly." I see SI as encouraging (and aiming to take) this approach.
  • I believe that the standard approach to developing software results in "tools," not "agents," and that tools (while dangerous) are much safer than agents. A "tool mode" could facilitate experiment-informed progress toward a safe "agent," rather than needing to get "Friendliness" theory right without any experimentation.
  • Therefore, I believe that the approach SI advocates and aims to prepare for is far more dangerous than the standard approach, so if SI's work on Friendliness theory affects the risk of human extinction one way or the other, it will increase the risk of human extinction. Fortunately I believe SI's work is far more likely to have no effect one way or the other.

For a long time I refrained from engaging in object-level debates over SI's work, believing that others are better qualified to do so. But after talking at great length to many of SI's supporters and advocates and reading everything I've been pointed to as relevant, I still have seen no clear and compelling response to any of my three major objections. As stated above, there are many possible responses to my objections, but SI's current arguments do not seem clear on what responses they wish to take and defend. At this point I am unlikely to form a positive view of SI's work until and unless I do see such responses, and/or SI changes its positions.

Is SI the kind of organization we want to bet on?

This part of the post has some risks. For most of GiveWell's history, sticking to our standard criteria - and putting more energy into recommended than non-recommended organizations - has enabled us to share our honest thoughts about charities without appearing to get personal. But when evaluating a group such as SI, I can't avoid placing a heavy weight on (my read on) the general competence, capability and "intangibles" of the people and organization, because SI's mission is not about repeating activities that have worked in the past. Sharing my views on these issues could strike some as personal or mean-spirited and could lead to the misimpression that GiveWell is hostile toward SI. But it is simply necessary in order to be fully transparent about why I hold the views that I hold.

Fortunately, SI is an ideal organization for our first discussion of this type. I believe the staff and supporters of SI would overwhelmingly rather hear the whole truth about my thoughts - so that they can directly engage them and, if warranted, make changes - than have me sugar-coat what I think in order to spare their feelings. People who know me and my attitude toward being honest vs. sparing feelings know that this, itself, is high praise for SI.

One more comment before I continue: our policy is that non-public information provided to us by a charity will not be published or discussed without that charity's prior consent. However, none of the content of this post is based on private information; all of it is based on information that SI has made available to the public.

There are several reasons that I currently have a negative impression of SI's general competence, capability and "intangibles." My mind remains open and I include specifics on how it could be changed.

  • Weak arguments. SI has produced enormous quantities of public argumentation, and I have examined a very large proportion of this information. Yet I have never seen a clear response to any of the three basic objections I listed in the previous section. One of SI's major goals is to raise awareness of AI-related risks; given this, the fact that it has not advanced clear/concise/compelling arguments speaks, in my view, to its general competence.
  • Lack of impressive endorsements. I discussed this issue in my 2011 interview with SI representatives and I still feel the same way on the matter. I feel that given the enormous implications of SI's claims, if it argued them well it ought to be able to get more impressive endorsements than it has.

    I have been pointed to Peter Thiel and Ray Kurzweil as examples of impressive SI supporters, but I have not seen any on-record statements from either of these people that show agreement with SI's specific views, and in fact (based on watching them speak at Singularity Summits) my impression is that they disagree. Peter Thiel seems to believe that speeding the pace of general innovation is a good thing; this would seem to be in tension with SI's view that AGI will be catastrophic by default and that no one other than SI is paying sufficient attention to "Friendliness" issues. Ray Kurzweil seems to believe that "safety" is a matter of transparency, strong institutions, etc. rather than of "Friendliness." I am personally in agreement with the things I have seen both of them say on these topics. I find it possible that they support SI because of the Singularity Summit or to increase general interest in ambitious technology, rather than because they find "Friendliness theory" to be as important as SI does.

    Clear, on-record statements from these two supporters, specifically endorsing SI's arguments and the importance of developing Friendliness theory, would shift my views somewhat on this point.

  • Resistance to feedback loops. I discussed this issue in my 2011 interview with SI representatives and I still feel the same way on the matter. SI seems to have passed up opportunities to test itself and its own rationality by e.g. aiming for objectively impressive accomplishments. This is a problem because of (a) its extremely ambitious goals (among other things, it seeks to develop artificial intelligence and "Friendliness theory" before anyone else can develop artificial intelligence); (b) its view of its staff/supporters as having unusual insight into rationality, which I discuss in a later bullet point.

    SI's list of achievements is not, in my view, up to where it needs to be given (a) and (b). Yet I have seen no declaration that SI has fallen short to date and explanation of what will be changed to deal with it. SI's recent release of a strategic plan and monthly updates are improvements from a transparency perspective, but they still leave me feeling as though there are no clear metrics or goals by which SI is committing to be measured (aside from very basic organizational goals such as "design a new website" and very vague goals such as "publish more papers") and as though SI places a low priority on engaging people who are critical of its views (or at least not yet on board), as opposed to people who are naturally drawn to it.

    I believe that one of the primary obstacles to being impactful as a nonprofit is the lack of the sort of helpful feedback loops that lead to success in other domains. I like to see groups that are making as much effort as they can to create meaningful feedback loops for themselves. I perceive SI as falling well short on this front. Pursuing more impressive endorsements and developing benign but objectively recognizable innovations (particularly commercially viable ones) are two possible ways to impose more demanding feedback loops. (I discussed both of these in my interview linked above).

  • Apparent poorly grounded belief in SI's superior general rationality. Many of the things that SI and its supporters and advocates say imply a belief that they have special insights into the nature of general rationality, and/or have superior general rationality, relative to the rest of the population. (Examples here, here and here). My understanding is that SI is in the process of spinning off a group dedicated to training people on how to have higher general rationality.

    Yet I'm not aware of any of what I consider compelling evidence that SI staff/supporters/advocates have any special insight into the nature of general rationality or that they have especially high general rationality.

    I have been pointed to the Sequences on this point. The Sequences (which I have read the vast majority of) do not seem to me to be a demonstration or evidence of general rationality. They are about rationality; I find them very enjoyable to read; and there is very little they say that I disagree with (or would have disagreed with before I read them). However, they do not seem to demonstrate rationality on the part of the writer, any more than a series of enjoyable, not-obviously-inaccurate essays on the qualities of a good basketball player would demonstrate basketball prowess. I sometimes get the impression that fans of the Sequences are willing to ascribe superior rationality to the writer simply because the content seems smart and insightful to them, without making a critical effort to determine the extent to which the content is novel, actionable and important. 

    I endorse Eliezer Yudkowsky's statement, "Be careful … any time you find yourself defining the [rationalist] as someone other than the agent who is currently smiling from on top of a giant heap of utility." To me, the best evidence of superior general rationality (or of insight into it) would be objectively impressive achievements (successful commercial ventures, highly prestigious awards, clear innovations, etc.) and/or accumulation of wealth and power. As mentioned above, SI staff/supporters/advocates do not seem particularly impressive on these fronts, at least not as much as I would expect for people who have the sort of insight into rationality that makes it sensible for them to train others in it. I am open to other evidence that SI staff/supporters/advocates have superior general rationality, but I have not seen it.

    Why is it a problem if SI staff/supporter/advocates believe themselves, without good evidence, to have superior general rationality? First off, it strikes me as a belief based on wishful thinking rather than rational inference. Secondly, I would expect a series of problems to accompany overconfidence in one's general rationality, and several of these problems seem to be actually occurring in SI's case:

    • Insufficient self-skepticism given how strong its claims are and how little support its claims have won. Rather than endorsing "Others have not accepted our arguments, so we will sharpen and/or reexamine our arguments," SI seems often to endorse something more like "Others have not accepted their arguments because they have inferior general rationality," a stance less likely to lead to improvement on SI's part.
    • Being too selective (in terms of looking for people who share its preconceptions) when determining whom to hire and whose feedback to take seriously.
    • Paying insufficient attention to the limitations of the confidence one can have in one's untested theories, in line with my Objection 1.
  • Overall disconnect between SI's goals and its activities. SI seeks to build FAI and/or to develop and promote "Friendliness theory" that can be useful to others in building FAI. Yet it seems that most of its time goes to activities other than developing AI or theory. Its per-person output in terms of publications seems low. Its core staff seem more focused on Less Wrong posts, "rationality training" and other activities that don't seem connected to the core goals; Eliezer Yudkowsky, in particular, appears (from the strategic plan) to be focused on writing books for popular consumption. These activities seem neither to be advancing the state of FAI-related theory nor to be engaging the sort of people most likely to be crucial for building AGI.

    A possible justification for these activities is that SI is seeking to promote greater general rationality, which over time will lead to more and better support for its mission. But if this is SI's core activity, it becomes even more important to test the hypothesis that SI's views are in fact rooted in superior general rationality - and these tests don't seem to be happening, as discussed above.

  • Theft. I am bothered by the 2009 theft of $118,803.00 (as against a $541,080.00 budget for the year). In an organization as small as SI, it really seems as though theft that large relative to the budget shouldn't occur and that it represents a major failure of hiring and/or internal controls.

    In addition, I have seen no public SI-authorized discussion of the matter that I consider to be satisfactory in terms of explaining what happened and what the current status of the case is on an ongoing basis. Some details may have to be omitted, but a clear SI-authorized statement on this point with as much information as can reasonably provided would be helpful.

A couple positive observations to add context here:

  • I see significant positive qualities in many of the people associated with SI. I especially like what I perceive as their sincere wish to do whatever they can to help the world as much as possible, and the high value they place on being right as opposed to being conventional or polite. I have not interacted with Eliezer Yudkowsky but I greatly enjoy his writings.
  • I'm aware that SI has relatively new leadership that is attempting to address the issues behind some of my complaints. I have a generally positive impression of the new leadership; I believe the Executive Director and Development Director, in particular, to represent a step forward in terms of being interested in transparency and in testing their own general rationality. So I will not be surprised if there is some improvement in the coming years, particularly regarding the last couple of statements listed above. That said, SI is an organization and it seems reasonable to judge it by its organizational track record, especially when its new leadership is so new that I have little basis on which to judge these staff.

Wrapup

While SI has produced a lot of content that I find interesting and enjoyable, it has not produced what I consider evidence of superior general rationality or of its suitability for the tasks it has set for itself. I see no qualifications or achievements that specifically seem to indicate that SI staff are well-suited to the challenge of understanding the key AI-related issues and/or coordinating the construction of an FAI. And I see specific reasons to be pessimistic about its suitability and general competence.

When estimating the expected value of an endeavor, it is natural to have an implicit "survivorship bias" - to use organizations whose accomplishments one is familiar with (which tend to be relatively effective organizations) as a reference class. Because of this, I would be extremely wary of investing in an organization with apparently poor general competence/suitability to its tasks, even if I bought fully into its mission (which I do not) and saw no other groups working on a comparable mission.

But if there's even a chance …

A common argument that SI supporters raise with me is along the lines of, "Even if SI's arguments are weak and its staff isn't as capable as one would like to see, their goal is so important that they would be a good investment even at a tiny probability of success."

I believe this argument to be a form of Pascal's Mugging and I have outlined the reasons I believe it to be invalid in two posts (here and here). There have been some objections to my arguments, but I still believe them to be valid. There is a good chance I will revisit these topics in the future, because I believe these issues to be at the core of many of the differences between GiveWell-top-charities supporters and SI supporters.

Regardless of whether one accepts my specific arguments, it is worth noting that the most prominent people associated with SI tend to agree with the conclusion that the "But if there's even a chance …" argument is not valid. (See comments on my post from Michael Vassar and Eliezer Yudkowsky as well as Eliezer's interview with John Baez.)

Existential risk reduction as a cause

I consider the general cause of "looking for ways that philanthropic dollars can reduce direct threats of global catastrophic risks, particularly those that involve some risk of human extinction" to be a relatively high-potential cause. It is on the working agenda for GiveWell Labs and we will be writing more about it.

However, I don't think that "Cause X is the one I care about and Organization Y is the only one working on it" to be a good reason to support Organization Y. For donors determined to donate within this cause, I encourage you to consider donating to a donor-advised fund while making it clear that you intend to grant out the funds to existential-risk-reduction-related organizations in the future. (One way to accomplish this would be to create a fund with "existential risk" in the name; this is a fairly easy thing to do and one person could do it on behalf of multiple donors.)

For one who accepts my arguments about SI, I believe withholding funds in this way is likely to be better for SI's mission than donating to SI - through incentive effects alone (not to mention my specific argument that SI's approach to "Friendliness" seems likely to increase risks).

How I might change my views

My views are very open to revision.

However, I cannot realistically commit to read and seriously consider all comments posted on the matter. The number of people capable of taking a few minutes to write a comment is sufficient to swamp my capacity. I do encourage people to comment and I do intend to read at least some comments, but if you are looking to change my views, you should not consider posting a comment to be the most promising route.

Instead, what I will commit to is reading and carefully considering up to 50,000 words of content that are (a) specifically marked as SI-authorized responses to the points I have raised; (b) explicitly cleared for release to the general public as SI-authorized communications. In order to consider a response "SI-authorized and cleared for release," I will accept explicit communication from SI's Executive Director or from a majority of its Board of Directors endorsing the content in question. After 50,000 words, I may change my views and/or commit to reading more content, or (if I determine that the content is poor and is not using my time efficiently) I may decide not to engage further. SI-authorized content may improve or worsen SI's standing in my estimation, so unlike with comments, there is an incentive to select content that uses my time efficiently. Of course, SI-authorized content may end up including excerpts from comment responses to this post, and/or already-existing public content.

I may also change my views for other reasons, particularly if SI secures more impressive achievements and/or endorsements.

One more note: I believe I have read the vast majority of the Sequences, including the AI-foom debate, and that this content - while interesting and enjoyable - does not have much relevance for the arguments I've made.

Again: I think that whatever happens as a result of my post will be positive for SI's mission, whether or not it is positive for SI as an organization. I believe that most of SI's supporters and advocates care more about the former than about the latter, and that this attitude is far too rare in the nonprofit world.

Acknowledgements

Thanks to the following people for reviewing a draft of this post and providing thoughtful feedback (this of course does not mean they agree with the post or are responsible for its content): Dario Amodei, Nick Beckstead, Elie Hassenfeld, Alexander Kruel, Tim Ogden, John Salvatier, Jonah Sinick, Cari Tuna, Stephanie Wykstra.

Comments (1270)

Comment author: MarkusRamikin 10 May 2012 03:47:35PM *  2 points [-]

Not a big deal, but for me your "more" links don't seem to be doing anything. Firefox 12 here.

EDIT: Yup, it's fixed. :)

Comment author: RobertLumley 10 May 2012 04:02:25PM *  0 points [-]

They don't work for me in Chrome 18.

Edit: I didn't think anchor tags were possible in LW posts, but I could be completely off on this. At least, I've never seen them before

Comment author: gwern 10 May 2012 06:09:54PM 0 points [-]

Anchor tags are possible in LW, but they require additional work. (The only way I know of is editing the raw HTML.)

Comment author: gwern 10 May 2012 04:05:56PM 0 points [-]

Ditto. The anchors they point to don't seem to exist.

Comment author: HoldenKarnofsky 10 May 2012 04:12:28PM 3 points [-]

Thanks for pointing this out. The links now work, though only from the permalink version of the page (not from the list of new posts).

Comment author: paulfchristiano 10 May 2012 05:16:26PM *  33 points [-]

Thanks for taking the time to express your views quite clearly--I think this post is good for the world (even with a high value on your time and SI's fundraising ability), and that norms encouraging this kind of discussion are a big public good.

I think the explicit objections 1-3 are likely to be addressed satisfactorily (in your judgment) by less than 50,000 words, and that this would provide a good opportunity for SI to present sharper versions of the core arguments---part of the problem with existing materials is certainly that it is difficult and unrewarding to respond to a nebulous and shifting cloud of objections. A lot of what you currently view as disagreements with SI's views may get shifted to doubts about SI being the right organization to back, which probably won't get resolved by 50,000 words.

Comment author: taw 10 May 2012 06:04:43PM 3 points [-]

Existential risk reduction is a very worthy cause. As far as I can tell there are a few serious efforts - they have scenarios which by outside view have non-negligible chances, and in case of many of these scenarios these efforts make non-negligible difference to the outcome.

Such efforts are:

  • asteroid tracking
  • seed vaults
  • development of various ways to deal with potential pandemics (early tracking systems, drugs etc.) - this actually overlaps with "normal" medicine a lot
  • arguably, global warming prevention is a borderline issue, since there is a tiny chance of massive positive feedback loops that will make Earth nearly uninhabitable. These chances are believed to be tiny by modern climate science, but all chances for existential risk are tiny.

That's about the entire list I'm aware of (are there any others?)

And then there's huge number of efforts which claim to do something based on existential risk, but either theories behind risk they're concerning themselves with, or theories behind why their efforts are likely to help, are based on assumptions not shared by vast majority of competent people.

All FAI-related stuff suffers from both of these problems - their risk is not based on any established science, and their answer is even less based in reality. If it suffered from only one of these problems it might be fixable, but as far as I can tell it is extremely unlikely to join the category of serious efforts ever.

The best claim those non-serious effort can make is that tiny chance that the risk is real * tiny change the organization will make a difference * huge risk is still a big number, but that's not a terribly convincing argument.

I'm under impression that we're doing far less than everything we can with these serious efforts, and we haven't really identified everything that can be dealt with with such serious effort. We should focus there (and on a lot of things which are not related to existential risk).

Comment author: RomeoStevens 10 May 2012 08:31:43PM 1 point [-]

nuclear holocaust. biological holocaust. super eruptions whose ash blocks significant levels of sunlight.

Comment author: taw 10 May 2012 10:07:47PM 4 points [-]

I understand that global thermonuclear war could cause serious damage, but I'm not aware of any credible efforts that can prove they're moving things in the right direction.

What do you mean by "biological holocaust"?

Super eruptions surely follow some kind of power law, and as far as I can tell (and we can be sure by extrapolating from the power law), they don't get anywhere remotely near levels of destroying all life on Earth.

And we sure know how to heat Earth significantly in no time - just release some of these into atmosphere. It will only increase temperature, not sunlight, so food production and such will still be affected, but we already produce way more food per capita to feed everyone, so even a pretty big reduction won't get anywhere near compromising food security for majority of people, let alone threatening to kill everyone.

Comment author: RomeoStevens 10 May 2012 10:13:04PM 1 point [-]

pandemics, man-made or natural.

Comment author: taw 10 May 2012 10:22:43PM 4 points [-]

Yeah, I've mentioned pandemics already.

I'm not terribly willing to treat them as an "existential" risk, since countless pandemics already happened and for natural reasons they never actually kill the entire population.

And the way how awesomely we've dealt with SARS is a good data point showing that pandemics might actually be under control now. At least we should have far more confidence in our ability to deal with pandemics is far better than our ability to deal with just about any other existential threat.

And one nice side effect of just plain old medicine is reduction of this existential risk, even without any efforts specifically towards handling existential risk. Every antibiotic, every antiviral, every new way of keeping patients alive longer, every diagnostic improvement, every improvement in hygiene in poor countries etc. - they all make pandemics less likely and more manageable.

Comment author: RomeoStevens 10 May 2012 10:56:36PM 0 points [-]

Oh, I somehow skipped seeing that in the OP. I don't think our ability to deal with mundane bugs has much transferability to our ability to deal with super bugs.

Comment author: taw 11 May 2012 12:21:24AM 2 points [-]

There's really no such thing as a "super bug". All organisms follow the same constraints of biology and epidemiology. If there was even some magical "super bug" it would infect everything of any remotely compatible species, not be constrained to one species, and small subset of cells in it.

We might not have any drugs ready for a particular infection, but we didn't have any for SARS, it was extremely infectious, and extremely deadly, and it worked perfectly fine in the end. We have tools like quarantine, detection etc. which work against any disease known or unknown.

Medicine made a massive progress since then - mass sequencing of infectious genomes for quick reaction time is now far more practical, and we might soon even get broad spectrum antivirals.

And we've eradicated two diseases already (smallpox, rinderpest) with two more being very close to eradication (polio, dracunculiasis), and it's not like anybody has any intentions of stopping the total at 4. We'll keep eradicating diseases, even if it takes a decade or two for each such attempt. Every time we manage to do that, there's one less source of potential pandemic.

I cannot really imagine how it could be going better than that.

This doesn't fully apply to hypothetical manmade pandemics, but currently we don't really know how to make such thing (the best we can do it modify existing disease to be a bit more nasty, creating diseases de novo is far beyond our capabilities), nobody has any particular desire to do so, and any broad spectrum countermeasures we develop against natural diseases will likely at least partly apply against manmade diseases in any case.

Comment author: RomeoStevens 11 May 2012 12:26:18AM 3 points [-]

AFAIK nothing precludes extremely lethal bugs with long incubation periods. As for "nobody has any particular desire to", I hope you are right.

Comment author: taw 11 May 2012 06:55:31AM 2 points [-]

Except the fact they wouldn't be particularly lethal.

If 100% of humans had HIV, it would increase probably make most countries disregard patent laws on a few drugs, and human life spans would get shorter by like 5-10 years on average.

This should keep things in perspective.

Comment author: JoshuaZ 11 May 2012 02:06:56AM 5 points [-]

I'm not terribly willing to treat them as an "existential" risk, since countless pandemics already happened and for natural reasons they never actually kill the entire population.

Most major pandemics have occurred before modern transport was common. The presence of easy air travel makes a serious pandemic more problematic. And in fact if one looks at emergent diseases in the last sixty years, such as HIV, one sees that they are effectively taking advantage of the ease of transport in the modern world.

Comment author: taw 11 May 2012 06:59:07AM 2 points [-]

HIV emerged before modern medicine developed. It was discovered in 1981 - almost prehistory by medical standards, but it was actually transfered to humans somewhere in late 19th century. It wrecks the most havoc in places which are extremely far from modern medicine as well, in developed countries HIV is a fairly minor problem.

SARS is a much better example of a new disease and how modern medicine can deal with it.

Comment author: JoshuaZ 11 May 2012 02:30:54PM 1 point [-]

Even in Africa, HIV has taken advantage of modern transport. Migrant workers are a major cause of HIV spread in sub-Saharan Africa. This has advanced to the point where new road building projects think about what they will do to disease transmission. These laborers and the like aren't just walking- the possibility of such migrant labor is connected to the fact that even in the developing world, buses exist.

Comment author: [deleted] 11 May 2012 04:34:16AM 3 points [-]

I understand that global thermonuclear war could cause serious damage, but I'm not aware of any credible efforts that can prove they're moving things in the right direction.

http://en.wikipedia.org/wiki/New_START

This stuff, as slow and grinding as it is, does make a difference.

Comment author: taw 11 May 2012 07:03:27AM 4 points [-]

There's no particular reason to believe this is going to make global thermonuclear war any less likely. Russia and United States aren't particularly likely to start a global thermonuclear warfare anytime soon, and in longer perspective any major developed country, if it wanted, could build nuclear arsenals sufficient to make a continent uninhabitable within a few years.

There's also this argument that mutually assured destruction was somehow stabilizing and preventing nuclear warfare - the only use of nuclear weapons so far happened when the other side had no way to retaliate. I'm quite neutral on this - I'm unwilling to say that nuclear arms reductions either increase or decrease risk of global war (which will eventually turn nuclear or otherwise very nasty).

Comment author: Rain 10 May 2012 09:10:51PM *  3 points [-]
Comment author: taw 10 May 2012 10:16:44PM 2 points [-]

Most of entries on the list are either not quantifiable even approximately to within order of magnitude. Of those that are (which is pretty much only "risks from nature" in Bostrom's system) many are still bad candidates for putting significant effort into, because:

  • we either have little ways to deal with them (like nearby supernova explosions)
  • we have a lot of time and future will be better equipped to deal with them (like eventual demise of Sun)
  • they don't actually seem to get anywhere near civilization-threatening levels (like volcanoes)

About the only new risk I see on the list which can and should be dealt with is having some backup plans for massive solar flares, but I'm not sure what we can do about it other than putting some extra money into astrophysics departments so they can figure things out better and give us better estimates.

Comment author: JGWeissman 10 May 2012 06:16:19PM 2 points [-]

Regarding tools versus agent AGI's, I think the desired end game is still an Friendly Agent AGI. I am open to tool AIs being useful in the path to building such an agent. Similar ideas advocated by SI include use of automated theorem provers in formally proving Friendliness, and creating a seed AI to compute the Coherent Extropolated Volition of humanity and build an FAI with the appropiate utility function.

Comment author: shminux 10 May 2012 06:30:00PM *  58 points [-]

Wow, I'm blown away by Holden Karnofsky, based on this post alone. His writing is eloquent, non-confrontational and rational. It shows that he spent a lot of time constructing mental models of his audience and anticipated its reaction. Additionally, his intelligence/ego ratio appears to be through the roof. He must have learned a lot since the infamous astroturfing incident. This is the (type of) person SI desperately needs to hire.

Emotions out of the way, it looks like the tool/agent distinction is the main theoretical issue. Fortunately, it is much easier than the general FAI one. Specifically, to test the SI assertion that, paraphrasing Arthur C. Clarke,

Any sufficiently advanced tool is indistinguishable from an agent.

one ought to formulate and prove this as a theorem, and present it for review and improvement to the domain experts (the domain being math and theoretical computer science). If such a proof is constructed, it can then be further examined and potentially tightened, giving new insights to the mission of averting the existential risk from intelligence explosion.

If such a proof cannot be found, this will lend further weight to the HK's assertion that SI appears to be poorly qualified to address its core mission.

Comment author: MarkusRamikin 10 May 2012 07:59:51PM *  20 points [-]

Wow, I'm blown away by Holden Karnofsky, based on this post alone. His writing is eloquent, non-confrontational and rational. It shows that he spent a lot of time constructing mental models of his audience and anticipated its reaction. Additionally, his intelligence/ego ratio appears to be through the roof.

Agreed. I normally try not to post empty "me-too" replies; the upvote button is there for a reason. But now I feel strongly enough about it that I will: I'm very impressed with the good will and effort and apparent potential for intelligent conversation in HoldenKarnofsky's post.

Now I'm really curious as to where things will go from here. With how limited my understanding of AI issues is, I doubt a response from me would be worth HoldenKarnofsky's time to read, so I'll leave that to my betters instead of adding more noise. But yeah. Seeing SI ideas challenged in such a positive, constructive way really got my attention. Looking forward to the official response, whatever it might be.

Comment author: [deleted] 11 May 2012 08:34:24AM 5 points [-]

Agreed. I normally try not to post empty "me-too" replies; the upvote button is there for a reason. But now I feel strongly enough about it that I will: I'm very impressed with the good will and effort and apparent potential for intelligent conversation in HoldenKarnofsky's post.

“the good will and effort and apparent potential for intelligent conversation” is more information than an upvote, IMO.

Comment author: MarkusRamikin 11 May 2012 09:00:28AM *  2 points [-]

Right, I just meant shminux said more or less the same thing before me. So normally I would have just upvoted his comment.

Comment author: Bugmaster 10 May 2012 08:18:55PM 0 points [-]

I also find it likely that certain practical problems would be prohibitively difficult (if not outright impossible) to solve without an AGI of some sort. Fluent machine translation seems to be one of these problems, for example.

Comment author: mwaser 10 May 2012 10:07:04PM 1 point [-]

If it is true (i.e. if a proof can be found) that "Any sufficiently advanced tool is indistinguishable from agent", then any RPOP will automatically become indistinguishable from an agent once it has self-improved past our comprehension point.

This would seem to argue against Yudkowsky's contention that the term RPOP is more accurate than "Artificial Intelligence" or "superintelligence".

Comment author: shminux 10 May 2012 10:37:25PM *  2 points [-]

First, I am not fond of the term RPOP, because it constrains the space of possible intelligences to optimizers. Humans are reasonably intelligent, yet we are not consistent optimizers. Neither do current domain AIs (they have bugs that often prevent them from performing optimization consistently and predictably).That aside, I don't see how your second premise follows from the first. Just because RPOP is a subset of AI and so would be a subject of such a theorem, it does not affect in any way the (non)validity of the EY's contention.

Comment author: Alejandro1 10 May 2012 11:40:53PM 2 points [-]

I don't understand; isn't Holden's point precisely that a tool AI is not properly described as an optimization process? Google Maps isn't optimizing anything in a non-trivial sense, anymore than a shovel is.

Comment author: TheOtherDave 11 May 2012 12:13:51AM 1 point [-]

Honestly, this whole tool/agent distinction seems tangential to me.

Consider two systems, S1 and S2.

S1 comprises the following elements: a) a tool T, which when used by a person to achieve some goal G, can efficiently achieve G
b) a person P, who uses T to efficiently achieve G.

S2 comprises a non-person agent A which achieves G efficiently.

I agree that A is an agent and T is not an agent, and I agree that T is a tool, and whether A is a tool seems a question not worth asking. But I don't quite see why I should prefer S1 to S2.

Surely the important question is whether I endorse G?

Comment author: dspeyer 11 May 2012 02:08:12AM *  2 points [-]

A tool+human differs from a pure AI agent in two important ways:

  • The human (probably) already has naturally-evolved morality, sparing us the very hard problem of formalizing that.

  • We can arrange for (almost) everyone to have access to the tool, allowing tooled humans to counterbalance eachother.

Comment author: TheOtherDave 11 May 2012 03:13:38AM 0 points [-]

Well, I certainly agree that both of those things are true.

And it might be that human-level evolved moral behavior is the best we can do... I don't know. It would surprise me, but it might be true.

That said... given how unreliable such behavior is, if human-level evolved moral behavior even approximates the best we can do, it seems likely that I would do best to work towards neither T nor A ever achieving the level of optimizing power we're talking about here.

Comment author: dspeyer 11 May 2012 03:23:45AM 4 points [-]

Humanity isn't that bad. Remember that the world we live in is pretty much the way humans made it, mostly deliberately.

But my main point was that existing humanity bypasses the very hard did-you-code-what-you-meant-to problem.

Comment author: TheOtherDave 11 May 2012 03:33:30AM 0 points [-]

I agree with that point.

Comment author: abramdemski 11 May 2012 04:59:58AM 3 points [-]

My understanding of Holden's argument was that powerful optimization processes can be run in either tool-mode or agent-mode.

For example, Google maps optimizes routes, but returns the result with alternatives and options for editing, in "tool mode".

Comment author: badger 10 May 2012 11:28:21PM 2 points [-]

If the tool/agent distinction exists for sufficiently powerful AI, then a theory of friendliness might not be strictly necessary, but still highly prudent.

Going from a tool-AI to an agent-AI is a relatively simple step of the entire process. If meaningful guarantees of friendliness turn out to be impossible, then security comes down on no one attempting to make an agent-AI when strong enough tool-AIs are available. Agency should be kept to a minimum, even with a theory of friendliness in hand, as Holden argues in objection 1. Guarantees are safeguards against the possibility of agency rather than a green light.

Comment author: Eliezer_Yudkowsky 11 May 2012 12:06:50AM 29 points [-]

Any sufficiently advanced tool is indistinguishable from agent.

I shall quickly remark that I, myself, do not believe this to be true.

Comment author: shminux 11 May 2012 12:22:18AM *  6 points [-]

Then the objection 2 seems to hold:

AGI running in tool mode could be extraordinarily useful but far more safe than an AGI running in agent mode

unless I misunderstand your point severely (it happened once or twice before).

Comment author: TheOtherDave 11 May 2012 12:37:32AM *  0 points [-]

If my comment here correctly captures what is meant by "tool mode" and "agent mode", then it seems to follow that AGI running in tool mode is no safer than the person using it.

If that's the case, then an AGI running in tool mode is safer than an AGI running in agent mode if and only if agent mode is less trustworthy than whatever person ends up using the tool.

Are you assuming that's true?

Comment author: shminux 11 May 2012 02:02:15AM 2 points [-]

What you presented there (and here) is another theorem, something that should be proved (and published, if it hasn't been yet). If true, this gives an estimate on how dangerous a non-agent AGI can be. And yes, since we have had a lot of time study people and no time at all to study AGI, I am guessing that an AGI is potentially much more dangerous, because so little is known. Or at least that seems to be the whole point of the goal of developing provably friendly AI.

Comment author: [deleted] 11 May 2012 08:32:38AM 0 points [-]

What you presented there (and here) is another theorem

What? It sounds like a common-sensical¹ statement about tools in general and human nature, but not at all like something which could feasibly be expressed in mathematical form.

Footnote:

  1. This doesn't mean it's necessarily true, though.
Comment author: scav 11 May 2012 09:43:40AM 0 points [-]

No, because a person using a dangerous tool is still just a person, with limited speed of cognition, limited lifespan, and no capacity for unlimited self-modification.

A crazy dictator with a super-capable tool AI that tells him the best strategy to take over the world is still susceptible to assassination, and his plan no matter how clever cannot unfold faster than his victims are able to notice and react to it.

Comment author: TheOtherDave 11 May 2012 12:19:16PM 0 points [-]

Tool != Oracle.

At least, not my my understanding of tool.

My understanding of a supercapable tool AI is one that takes over the world if a crazy dictator directs it to, just like my understanding of a can opener tool is one that opens a can at my direction, rather than one that gives me directions on how to open a can.

Presumably it also augments the dictator's lifespan, cognition, etc. if she asks, insofar as it's capable of doing so.

More generally, my understanding of these concepts is that the only capability that a tool AI lacks that an agent AI has is the capability of choosing goals to implement. So, if we're assuming that an agent AI would be capable of unlimited self-modification in pursuit of its own goals, I conclude that a corresponding tool AI is capable of unlimited self-modification in pursuit of its agent's goals. It follows that assuming that a tool AI is not capable of augmenting its human agent in accordance with its human agent's direction is not safe.

(I should note that I consider a capacity for unlimited self-improvement relatively unlikely, for both tool and agent AIs. But that's beside my point here.)

Agreed that a crazy dictator with a tool that will take over the world for her is safer than an agent capable of taking over the world, if only because the possibility exists that the tool can be taken away from her and repurposed, and it might not occur to her to instruct it to prevent anyone else from taking it or using it.

I stand by my statement that such a tool is no safer than the dictator herself, and that an AGI running in such a tool mode is safer than that AGI running in agent mode only if the agent mode is less trustworthy than the crazy dictator.

Comment author: Eliezer_Yudkowsky 11 May 2012 01:55:11AM 38 points [-]

It's complicated. A reply that's true enough and in the spirit of your original statement, is "Something going wrong with a sufficiently advanced AI that was intended as a 'tool' is mostly indistinguishable from something going wrong with a sufficiently advanced AI that was intended as an 'agent', because math-with-the-wrong-shape is math-with-the-wrong-shape no matter what sort of English labels like 'tool' or 'agent' you slap on it, and despite how it looks from outside using English, correctly shaping math for a 'tool' isn't much easier even if it "sounds safer" in English." That doesn't get into the real depths of the problem, but it's a start. I also don't mean to completely deny the existence of a safety differential - this is a complicated discussion, not a simple one - but I do mean to imply that if Marcus Hutter designs a 'tool' AI, it automatically kills him just like AIXI does, and Marcus Hutter is unusually smart rather than unusually stupid but still lacks the "Most math kills you, safe math is rare and hard" outlook that is implicitly denied by the idea that once you're trying to design a tool, safe math gets easier somehow. This is much the same problem as with the Oracle outlook - someone says something that sounds safe in English but the problem of correctly-shaped-math doesn't get very much easier.

Comment author: lukeprog 11 May 2012 02:38:32AM *  9 points [-]

Though it's not as detailed and technical as many would like, I'll point readers to this bit of related reading, one of my favorites:

Yudkowsky (2011). Complex value systems are required to realize valuable futures.

Comment author: shminux 11 May 2012 04:41:12AM *  1 point [-]

Not being a domain expert, I do not pretend to understand all the complexities. My point was that either you can prove that tools are as dangerous as agents (because mathematically they are (isomorphic to) agents), or HK's Objection 2 holds. I see no other alternative...

Comment author: abramdemski 11 May 2012 04:53:27AM 6 points [-]

but I do mean to imply that if Marcus Hutter designs a 'tool' AI, it automatically kills him just like AIXI does

Why? Or, rather: Where do you object to the argument by Holden? (Given a query, the tool-AI returns an answer with a justification, so the plan for "cure cancer" can be checked to make sure it does not do so by killing or badly altering humans.)

Comment author: FeepingCreature 11 May 2012 12:27:08PM 4 points [-]

One trivial, if incomplete, answer is that to be effective, the Oracle AI needs to be able to answer the question "how do we build a better oracle AI" and in order to define "better" in that sentence in a way that causes our oracle to output a new design that is consistent with all the safeties we built into the original oracle, it needs to understand the intent behind the original safeties just as much as an agent-AI would.

Comment author: Cyan 11 May 2012 05:12:21PM *  15 points [-]

The real danger of Oracle AI, if I understand it correctly, is the nasty combination of (i) by definition, an Oracle AI has an implicit drive to issue predictions most likely to be correct according to its model, and (ii) a sufficiently powerful Oracle AI can accurately model the effect of issuing various predictions. End result: it issues powerfully self-fulfilling prophecies without regard for human values. Also, depending on how it's designed, it can influence the questions to be asked of it in the future so as to be as accurate as possible, again without regard for human values.

Comment author: ciphergoth 11 May 2012 05:34:49PM 7 points [-]

My understanding of an Oracle AI is that when answering any given question, that question consumes the whole of its utility function, so it has no motivation to influence future questions. However the primary risk you set out seems accurate. Countermeasures have been proposed, such as asking for an accurate prediction for the case where a random event causes the prediction to be discarded, but in that instance it knows that the question will be asked again of a future instance of itself.

Comment author: [deleted] 11 May 2012 08:22:12AM 27 points [-]

This sounds like it'd be a good idea to write a top-level post about it.

Comment author: Viliam_Bur 11 May 2012 03:07:19PM 5 points [-]

What exactly is the difference between a "tool" and an "agent", if we taboo the words?

My definition would be that "agent" has their own goals / utility functions (speaking about human agents, those goals / utility functions are set by evolution), while "tool" has a goal / utility function set by someone else. This distinction may be reasonable on a human level, "human X optimizing for human X's utility" versus "human X optimizing for human Y's utility", but on a machine level, what exactly is the difference between a "tool" that is ordered to reach a goal / optimize a utility function, and an "agent" programmed with the same goal / utility function?

Am I using a bad definition that misses something important? Or is there anything than prevents "agent" to be reduced to a "tool" (perhaps a misconstructed tool) of the forces that have created them? Or is it that all "agents" are "tools", but not all "tools" are "agents", because... why?

Comment author: dspeyer 11 May 2012 02:47:26AM 6 points [-]

Any sufficiently advanced tool is indistinguishable from [an] agent.

Let's see if we can use concreteness to reason about this a little more thoroughly...

As I understand it, the nightmare looks something like this. I ask Google SuperMaps for the fastest route from NYC to Albany. It recognizes that computing this requires traffic information, so it diverts several self-driving cars to collect real-time data. Those cars run over pedestrians who were irrelevant to my query.

The obvious fix: forbid SuperMaps to alter anything outside of its own scratch data. It works with the data already gathered. Later a Google engineer might ask it what data would be more useful, or what courses of action might cheaply gather that data, but the engineer decides what if anything to actually do.

This superficially resembles a box, but there's no actual box involved. The AI's own code forbids plans like that.

But that's for a question-answering tool. Let's take another scenario:

I tell my super-intelligent car to take me to Albany as fast as possible. It sends emotionally manipulative emails to anyone else who would otherwise be on the road encouraging them to stay home.

I don't see an obvious fix here.

So the short answer seems to be that it matters what the tool is for. A purely question-answering tool would be extremely useful, but not as useful as a general purpose one.

Could humans with a oracular super-AI police the development and deployment of active super-AIs?

Comment author: shminux 11 May 2012 04:49:57AM 2 points [-]

I tell my super-intelligent car to take me to Albany as fast as possible. It sends emotionally manipulative emails to anyone else who would otherwise be on the road encouraging them to stay home.

I believe that HK's post explicitly characterizes anything active like this as having agency.

Comment author: Will_Sawin 11 May 2012 06:21:55AM 9 points [-]

I think the correct objection is something you can't quite see in google maps. If you program an AI to do nothing but output directions, it will do nothing but output directions. If those directions are for driving, you're probably fine. If those directions are big and complicated plans for something important, that you follow without really understanding why you're doing (and this is where most of the benefits of working with an AGI will show up), then you could unknowingly take over the world using a sufficiently clever scheme.

Also note that it would be a lot easier for the AI to pull this off if you let it tell you how to improve its own design. If recursively self-improving AI blows other AI out of the water, then tool AI is probably not safe unless it is made ineffective.

This does actually seem like it would raise the bar of intelligence needed to take over the world somewhat. It is unclear how much. The topic seems to me to be worthy of further study/discussion, but not (at least not obviously) a threat to the core of SIAI's mission.

Comment author: Viliam_Bur 11 May 2012 03:16:32PM *  2 points [-]

If those directions are big and complicated plans for something important, that you follow without really understanding why you're doing (and this is where most of the benefits of working with an AGI will show up), then you could unknowingly take over the world using a sufficiently clever scheme.

It also helps that Google Maps does not have general intelligence, so it does not include user's reactions to its output, the consequent user's actions in the real world, etc. as variables in its model, which may influence the quality of the solution, and therefore can (and should) be optimized (within constraints given by user's psychology, etc.), if possible.

Shortly: Google Maps does not manipulate you, because it does not see you.

Comment author: drnickbone 11 May 2012 09:36:18AM 4 points [-]

This was my thought as well: an automated vehicle is in "agent" mode.

The example also demonstrates why an AI in agent mode is likely to be more useful (in many cases) than an AI in tool mode. Compare using Google maps to find a route to the airport versus just jumping into a taxi cab and saying "Take me to the airport". Since agent-mode AI has uses, it is likely to be developed.

Comment author: abramdemski 11 May 2012 05:36:33AM *  1 point [-]

I tell my super-intelligent car to take me to Albany as fast as possible. It sends emotionally manipulative emails to anyone else who would otherwise be on the road encouraging them to stay home.

Then it's running in agent mode? My impression was that a tool-mode system presents you with a plan, but takes no actions. So all tool-mode systems are basically question-answering systems.

Perhaps we can meaningfully extend the distinction to some kinds of "semi-autonomous" tools, but that would be a different idea, wouldn't it?

(Edit) After reading more comments, "a different idea" which seems to match this kind of desire... http://lesswrong.com/lw/cbs/thoughts_on_the_singularity_institute_si/6jys

Comment author: David_Gerard 11 May 2012 01:57:05PM *  14 points [-]

Then it's running in agent mode? My impression was that a tool-mode system presents you with a plan, but takes no actions. So all tool-mode systems are basically question-answering systems.

I'm a sysadmin. When I want to get something done, I routinely come up with something that answers the question, and when it does that reliably I give it the power to do stuff on as little human input as possible. Often in daemon mode, to absolutely minimise how much it needs to bug me. Question-answerer->tool->agent is a natural progression just in process automation. (And this is why they're called "daemons".)

It's only long experience and many errors that's taught me how to do this such that the created agents won't crap all over everything. Even then I still get surprises.

Comment author: private_messaging 11 May 2012 03:21:42PM *  1 point [-]

Well, do your 'agents' build a model of the world, fidelity of which they improve? I don't think those really are agents in the AI sense, and definitely not in self improvement sense.

Comment author: David_Gerard 11 May 2012 03:28:55PM *  10 points [-]

They may act according to various parameters they read in from the system environment. I expect they will be developed to a level of complication where they have something that could reasonably be termed a model of the world. The present approach is closer to perceptual control theory, where the sysadmin has the model and PCT is part of the implementation. 'Cos it's more predictable to the mere human designer.

Capacity for self-improvement is an entirely different thing, and I can't see a sysadmin wanting that - the sysadmin would run any such improvements themselves, one at a time. (Semi-automated code refactoring, for example.) The whole point is to automate processes the sysadmin already understands but doesn't want to do by hand - any sysadmin's job being to automate themselves out of the loop, because there's always more work to do. (Because even in the future, nothing works.)

I would be unsurprised if someone markets a self-improving system for this purpose. For it to go FOOM, it also needs to invent new optimisations, which is presently a bit difficult.

Edit: And even a mere daemon-like automated tool can do stuff a lot of people regard as unFriendly, e.g. high frequency trading algorithms.

Comment author: TheOtherDave 11 May 2012 02:12:03PM *  1 point [-]

Then it's running in agent mode? My impression was that a tool-mode system presents you with a plan, but takes no actions. So all tool-mode systems are basically question-answering systems.

My own impression differs.

It becomes increasingly clear that "tool" in this context is sufficiently subject to different definitions that it's not a particularly useful term.

Comment author: private_messaging 11 May 2012 07:56:39AM 3 points [-]

Any sufficiently advanced tool is indistinguishable from an agent.

I do not think this is even true.

Comment author: David_Gerard 11 May 2012 02:00:03PM *  3 points [-]

I routinely try to turn sufficiently reliable tools into agents wherever possible, per this comment.

I suppose we could use a definition of "agent" that implied greater autonomy in setting its own goals. But there are useful definitions that don't.

Comment author: [deleted] 11 May 2012 08:13:25AM *  4 points [-]

Any sufficiently advanced tool is indistinguishable from an agent.

I have no strong intuition about whether this is true or not, but I do intuit that if it's true, the value of sufficiently for which it's true is so high it'd be nearly impossible to achieve it accidentally.

(On the other hand the blind idiot god did ‘accidentally’ make tools into agents when making humans, so... But after all that only happened once in hundreds of millions of years of ‘attempts’.)

Comment author: othercriteria 11 May 2012 01:04:24PM 3 points [-]

the blind idiot god did ‘accidentally’ make tools into agents when making humans, so... But after all that only happened once in hundreds of millions of years of ‘attempts’.

This seems like a very valuable point. In that direction, we also have the tens of thousands of cancers that form every day, military coups, strikes, slave revolts, cases of regulatory capture, etc.

Comment author: Richard_Loosemore 10 May 2012 07:11:15PM 1 point [-]

Holden, I think your assessment is accurate ... but I would venture to say that it does not go far enough.

My own experience with SI, and my background, might be relevant here. I am a member of the Math/Physical Science faculty at Wells College, in Upstate NY. I also have had a parallel career as a cognitive scientist/AI researcher, with several publications in the AGI field, including the opening chapter (coauthored with Ben Goertzel) in a forthcoming Springer book about the Singularity.

I have long complained about SI's narrow and obsessive focus on the "utility function" aspect of AI -- simply put, SI assumes that future superintelligent systems will be driven by certain classes of mechanism that are still only theoretical, and which are very likely to be superceded by other kinds of mechanism that have very different properties. Even worse, the "utility function" mechanism favored by SI is quite likely to be so unstable that it will never allow an AI to achieve any kind of human-level intelligence, never mind the kind of superintelligence that would be threatening.

Perhaps most important of all, though, is the fact that the alternative motivation mechanism might (and notice that I am being cautious here: might) lead to systems that are extremely stable. Which means both friendly and safe.

Taken in isolation, these thoughts and arguments might amount to nothing more than a minor addition to the points that you make above. However, my experience with SI is that when I tried to raise these concerns back in 2005/2006 I was subjected to a series of attacks that culminated in a tirade of slanderous denunciations from the founder of SI, Eliezer Yudkowsky. After delivering this tirade, Yudkowsky then banned me from the discussion forum that he controlled, and instructed others on that forum that discussion about me was henceforth forbidden.

Since that time I have found that when I partake in discussions on AGI topics in a context where SI supporters are present, I am frequently subjected to abusive personal attacks in which reference is made to Yudkowsky's earlier outburst. This activity is now so common that when I occasionally post comments here, my remarks are very quickly voted down below a threshold that makes them virtually invisible. (A fate that will probably apply immediately to this very comment).

I would say that, far from deserving support, SI should be considered a cult-like community in which dissent is ruthlessly suppressed in order to exaggerate the point of view of SI's founders and controllers, regardless of the scientific merits of those views, or of the dissenting opinions.

Comment author: shminux 10 May 2012 07:23:16PM *  13 points [-]

I would say that, far from deserving support, SI should be considered a cult-like community in which dissent is ruthlessly suppressed in order to exaggerate the point of view of SI's founders and controllers, regardless of the scientific merits of those views, or of the dissenting opinions.

This is a very strong statement. Have you allowed for the possibility that your current judgement might be clouded by the events transpired some 6 years ago?

Comment author: Dolores1984 10 May 2012 07:30:23PM 9 points [-]

I myself employ a very strong heuristic, from years of trolling the internet: when a user joins a forum and complains about an out-of-character and strongly personal persecution by the moderation staff in the past, there is virtually always more to the story when you look into it.

Comment author: Richard_Loosemore 10 May 2012 08:07:06PM 6 points [-]

Indeed, Dolores, that is an empirically sound strategy, if used with caution.

My own experience, however, is that people who do that can usually be googled quickly, and are often found to be unqualified cranks of one persuasion or another. People with more anger than self-control.

But that is not always the case. Recently, for example, a woman friended me on Facebook and then posted numerous diatribes against a respected academic acquaintance of mine, accusing him of raping her and fathering her child. These posts were quite blood-curdling. And their target appeared quite the most innocent guy you could imagine. Very difficult to make a judgement. However, about a month ago the guy suddenly came out and made a full and embarrassing frank admission of guilt. It was an astonishing episode. But it was an instance of one of those rare occasions when the person (the woman in this case) turned out to be perfectly justified.

I am helpless to convince you. All I can do is point to my own qualifications and standing. I am no lone crank crying in the wilderness. I teach Math, Physics and Cognitive Neuroscience at the undergraduate level, and I have coauthored a paper with one of the AGI field's leading exponents (Ben Goertzel), in a book about the Singularity that was at one point (maybe not anymore!) slated to be a publishing landmark for the field. You have to make a judgement.

Comment author: steven0461 10 May 2012 08:25:30PM *  9 points [-]

Regardless of who was how much at fault in the SL4 incident, surely you must admit that Yudkowsky's interactions with you were unusually hostile relative to how he generally interacts with critics. I can see how you'd want to place emphasis on those interactions because they involved you personally, but that doesn't make them representative for purposes of judging cultishness or making general claims that "dissent is ruthlessly suppressed".

Comment author: Richard_Loosemore 10 May 2012 10:34:28PM 1 point [-]

Steven. That does make it seem as though the only thing worth complaining about was the "unusually" hostile EY behavior on that occasion. As if it were exceptional, not repeated before or since.

But that is inaccurate. That episode was the culmination of a long sequence of derogatory remarks. So that is what came before.

What came after? I have made a number of attempts to open a dialog on the important issue at hand, which is not the personal conflict but the question of AGI motivation systems. My attempts have been rebuffed. And instead I have been subjected to repeated attacks by SI members.

That would be six years of repeated attacks.

So portraying it as an isolated incident is not factually correct. Which was my point, of course.

Comment author: Rain 10 May 2012 10:36:50PM *  10 points [-]

I'm interested in any compiled papers or articles you wrote about AGI motivation systems, aside from the forthcoming book chapter, which I will read. Do you have any links?

Comment author: MattMahoney 11 May 2012 04:06:33PM 4 points [-]
Comment author: Richard_Loosemore 10 May 2012 08:17:08PM 2 points [-]

shminux, It is of course possible that my current judgement might be clouded by past events ... however, we have to assess the point at which judgements are "clouded" (in other words, poor because of confusion or emotion) by time, rather than being lessons learned that still apply.

In the time since those events I have found no diminution in the rate at which SI people intervene aggressively in discussions I am having, with the sole purposes of trying to tell everyone that I was banned from Yudkowsky's forum back in 2006.

This most recently happened just a few weeks ago. On that occasion Luke Muehlhauser (no less) took the unusual step of asking me to friend him on Facebook, after which he joined a discussion I was having and made scathing ad hominem comments about me -- which included trying to use the fact of the 2006 episode as a piece of evidence for my lack of credibility -- and then disappeared again. He made no reply when his ad hominem assertions were challenged.

Now: would you consider it to be a matter of clouded judgment on my part when Luke Muehlhauser is still, in 2012, engaging in that kind of attack?

On balance, then, I think my comments come from privileged insight (I am one of the few to have made technical objections to SI's cherished beliefs, and I was given valuable insight into their psychology when I experienced the violent reaction) rather than clouded judgement.

Comment author: shminux 10 May 2012 08:26:01PM 16 points [-]

This most recently happened just a few weeks ago. On that occasion Luke Muehlhauser (no less) took the unusual step of asking me to friend him on Facebook, after which he joined a discussion I was having and made scathing ad hominem comments about me

Sounds serious... Feel free to post a relevant snippet of the discussion, here or elsewhere, so that those interested can judge this event on its merits, and not through your interpretation of it.

Comment author: lukeprog 10 May 2012 08:52:36PM *  29 points [-]

On April 7th, Richard posted to Facebook:

LessWrong has now shown its true mettle. After someone here on FB mentioned a LW discussion of consciousness, I went over there and explained that Eliezer Yudkowsky, in his essay, had completely misunderstood the Zombie Argument given by David Chalmers. I received a mix of critical, thoughtful and sometimes rude replies. But then, all of a sudden, Eliezer took an interest in this old thread again, and in less than 24 hours all of my contributions were relegated to the trash. Funnily enough, David Chalmers himself then appeared and explained that Eliezer had, in fact, completely misunderstood his argument. Chalmers' comments, strangely enough, have NOT been censored. :-)

I replied:

I haven't read the whole discussion, but just so everyone is clear...

Richard's claim that "in less than 24 hours all of my contributions were relegated to the trash" is false.

What happened is that LWers disvalued Richard's comments and downvoted them. Because most users have their preferences set to hide comments with a score of less than -3, these users saw Richard's most-downvoted comments as collapsed by default, with a note reading "comment score below threshold", and a plus symbol you can click to expand the comment and the ensuing thread. This happens regularly even for many LW regulars like Will Newsome.

What happened was not censorship. Richard's comments were not "relegated to the trash." They were downvoted by the community, and not merely because Eliezer "took an interest" in the thread again. I have strongly disagreed with Eliezer on LW before and had my comments massively UP-voted by the community. LessWrong is not community of mindless Eliezer-drones. It's a community of people who have learned the skills of thinking quantitatively for themselves, which is one reason it can be hard for the community to cooperate to get things done in general.

Chalmers' comments weren't "censored" because (1) nobody's comments on that thread were actually censored, to my knowledge, and (2) the community thought Chalmers' comments were valuable even when they disagreed with them.

Richard, I find your comment to be misleading to the point of being dishonest, similar to the level of dishonesty in the messages that got you banned from the SL4 mailing list: http://www.sl4.org/archive/0608/15895.html

I've appreciated several of the articles you've written for IEET and H+, and I wish you would be more careful with your communications.

As you can see, the point of my comment wasn't to "abuse" Richard, but to explain what actually happened so that readers could compare it to what Richard said had happened.

At that point, Abram Demski commented:

I humbly suggest that the debate end here. (I do not predict anything useful coming out of a continued debate, and I'd prefer if we kept on the interesting track which the conversation turned to.)

...and Richard and I agreed.

Thus, I will say no more here. Indeed, given Richard's reaction (which I might have predicted with a bit more research), I regret having raised the issue with him at all.

Comment author: shminux 10 May 2012 09:10:51PM *  7 points [-]

I fail to see anything that can be qualified as an ad hominem ("an attempt to negate the truth of a claim by pointing out a negative characteristic or belief of the person supporting it") in what you quoted. If anything, the original comment by Richard comes much closer to this definition.

Comment author: Richard_Loosemore 11 May 2012 12:51:26AM 1 point [-]

shminux.

I refer you to http://www.theskepticsguide.org/resources/logicalfallacies.aspx for a concise summary of argument fallacies, including ad hominem...

"Ad hominem An ad hominem argument is any that attempts to counter another’s claims or conclusions by attacking the person, rather than addressing the argument itself."

My original argument, that Luke took so much exception to, was one made by many people in the history of civilisation: is it censorship when a community of people collectively vote in such a way that a dissenting voice becomes inaudible? For example, if all members of Congress were to shout loudly when a particular member got up to speak, drowning out their words, would this be censorship, or just their exercise of a community vote against that person? The question is debatable, and many people would agree that it is a quite sinister form of censorship.

So my point about censorship shared a heritage with something that has been said my others, on countless occasions.

Now, did Luke accept that MANY people would agree that this kind of "shouting down" of a voice was tantamount to censorship?

Far from accepting that this is a commonplace, he called my comment "misleading to the point of being dishonest". That is not a reference to the question of whether the point was or was not valid, it was a reference to my character. My level of honesty. Which is the standard definition of an ad hominem.

But of course, he went much further than this simple ad hominem. He said: "Richard, I find your comment to be misleading to the point of being dishonest, similar to the level of dishonesty in the messages that got you banned from the SL4 mailing list: http://www.sl4.org/archive/0608/15895.html"

This is also an example of a "Poisoning the Well" attack. Guilt by association.

Furthermore, he makes a slanderous claim of bad character. He refers to "... the level of dishonesty that got you banned from the SL4 mailing list". In fact, there was no dishonesty in that episode at all. He alludes to the supposed dishonesty as if it were established fact, and uses it to try to smear my character rather than my argument.

But, in the face of this clear example of an ad hominem attack (it is self-evident, hardly needing me to spell it out), you, shminux, see nothing. In fact, without explaining your reasoning, you go on to state that you find more evidence for ad hominem in my original remarks! I just looked again: I say nothing about anyone's character (!), so how can there be evidence for me attacking someone by using an ad hominem?

Finally, Luke distorted the quote of the conversation, above. He OMITTED part of the conversation, in which I supplied evidence that there was no dishonesty on my part, and that there was massive evidence that the banning occurred because Yudkowsky needed to stop me when I suggested we get an outside expert opinion to adjudicate the dispute. Faced with this reply, Luke disappeared. He made only ONE comment (the one above) and then he ignored the reply.

He continues to ignore that evidence, and continues to slander my character, making references to "Indeed, given Richard's reaction (which with a bit more research I might have predicted), I regret having raised the issue with him at all."

My "reaction" was to supply evidence.

Apparently that is a mark against someone.

Comment author: dlthomas 11 May 2012 03:35:04AM 7 points [-]

For example, if all members of Congress were to shout loudly when a particular member got up to speak, drowning out their words, would this be censorship, or just their exercise of a community vote against that person?

One thing to note is that your comment wasn't removed; it was collapsed. It can still be viewed by anyone who clicks the expander or has their threshold set sufficiently low (with my settings, it's expanded). There is a tension between the threat of censorship being a problem on the one hand, and the ability for a community to collectively decide what they want to talk about on the other.

The censorship issue is also diluted by the fact that 1) nothing here is binding on anyone (which is way different than your Congress example), and 2) there are plenty of other places people can discuss things, online and off. It is still somewhat relevant, of course, to the question of whether there's an echo-chamber effect, but carefull not to pull in additional connotations with choice of words and examples.

Comment author: Will_Newsome 11 May 2012 01:52:33AM *  6 points [-]

This happens regularly even for many LW regulars like Will Newsome.

(Though to be fair I think this sort of depends on your definition of "regularly"—I think over 95% of my comments aren't downvoted, many of them getting 5 or more upvotes, in contrast with other contributors who get about 25% of their comments downvoted and usually end up leaving as a result.)

Comment author: [deleted] 11 May 2012 09:41:40AM -1 points [-]

Well, if someone's comments are downvoted that regularly and still they stay LW regulars, there's something wrong.

Comment author: Manfred 11 May 2012 02:30:43PM *  3 points [-]

With them or with us?

Comment author: [deleted] 11 May 2012 03:55:39PM -1 points [-]

Most likely with them.

Comment author: MarkusRamikin 10 May 2012 08:02:54PM *  4 points [-]

However, my experience with SI is that when I tried to raise these concerns back in 2005/2006 I was subjected to a series of attacks that culminated in a tirade of slanderous denunciations from the founder of SI, Eliezer Yudkowsky.

I am frequently subjected to abusive personal attacks in which reference is made to Yudkowsky's earlier outburst

Link to the juicy details cough I mean evidence?

Comment author: Oscar_Cunningham 10 May 2012 08:17:39PM 10 points [-]
Comment author: ChrisHallquist 11 May 2012 04:16:45AM 3 points [-]

As someone who was previously totally unaware of that flap, that doesn't sound to me like a "slanderous tirade." Maybe Loosemore would care to explain what he thought was slanderous about it?

Comment author: Richard_Loosemore 10 May 2012 08:35:27PM -1 points [-]

Markus: Happy to link to the details, but where in the huge stream would you like to be linked to? The problem is that opinions can be sharply skewed by choosing to link to only selected items.

I cite as evidence Oscar's choice, below, to link to a post by EY. In that post he makes a series of statements that are flagrant untruths. If you read that particular link, and take his word as trustworthy, you get one impression.

But if you knew that EY had to remove several quotes from their context and present them in a deceiptful manner, in order to claim that I said things that I did not, you might get a very different impression.

You might also get a different impression if you knew this. The comment that Oscar cites came shortly after I offered to submit the dispute to outside arbitration by an expert in the field we were discussing. I offered that ANYONE could propose an outside expert, and I would abide by their opinion.

It was only at that point that EY suddenly wrote the post that Oscar just referenced, in which he declared me to be banished from the list and (a short time later) that all discussion about the topic should cease.

That fact by itself speaks volumes.

Comment author: Multiheaded 10 May 2012 08:42:08PM 5 points [-]

I've read SL4 around that time and saw the whole drama (although I couldn't understand all the exact technical details, being 16). My prior on EY flagrantly lying like that is incredibly low. I'm virtually certain that you're quite cranky in this regard.

Comment author: gwern 10 May 2012 09:07:49PM 13 points [-]

I was on SL4 as well, and regarded Eliezer as basically correct, although I thought Loosemore's ban was more than a little bit disproportionate. (If John Clark didn't get banned for repeatedly and willfully misunderstanding Godelian arguments, wasting the time of countless posters over many years, why should Loosemore be banned for backtracking on some heuristics & biases positions?)

Comment author: Eliezer_Yudkowsky 11 May 2012 01:58:22AM 8 points [-]

(Because JKC never lied about his credentials, which is where it really crosses the line into trolling.)

Comment author: MarkusRamikin 10 May 2012 08:45:22PM *  9 points [-]

I'll gladly start reading at any point you'll link me to.

The fact that you don't just provide a useful link but instead several paragraphs of excuses why the stuff I'm reading is untrustworthy I count as (small) evidence against you.

Comment author: woodchuck64 10 May 2012 08:37:32PM -1 points [-]

I strongly suspect the rationality of the internet would improve many orders of magnitude if all arguments about arguments were quietly deleted.

Comment author: woodchuck64 10 May 2012 08:46:35PM *  0 points [-]

Okay, make that: I strongly suspect the rationality of the rational internet would improve many orders of magnitude if all arguments about arguments were quietly deleted

Comment author: khafra 11 May 2012 04:53:39PM *  3 points [-]

Every time I try to think about that, I end up thinking about logical paradoxes instead.

edit for less subtlety in reponse to unexplained downvote: That argument is self-refuting.

Comment author: lukeprog 10 May 2012 08:47:57PM *  5 points [-]

SI should be considered a cult-like community in which dissent is ruthlessly suppressed in order to exaggerate the point of view of SI's founders and controllers, regardless of the scientific merits of those views, or of the dissenting opinions.

Obligatory link: You're Calling Who a Cult Leader?

Also, your impression might be different if you had witnessed the long, deep, and ongoing disagreements between Eliezer and I about several issues fundamental to SI — all while Eliezer suggested that I be made Executive Director and then continued to support me in that role.

Can you give an example of what you mean by "abusive personal attacks"?

Comment author: Hul-Gil 11 May 2012 01:00:49AM *  9 points [-]

Can you provide some examples of these "abusive personal attacks"? I would also be interested in this ruthless suppression you mention. I have never seen this sort of behavior on LessWrong, and would be shocked to find it among those who support the Singularity Institute in general.

I've read a few of your previous comments, and while I felt that they were not strong arguments, I didn't downvote them because they were intelligent and well-written, and competent constructive criticism is something we don't get nearly enough of. Indeed, it is usually welcomed. The amount of downvotes given to the comments, therefore, does seem odd to me. (Any LW regular who is familiar with the situation is also welcome to comment on this.)

I have seen something like this before, and it turned out the comments were being downvoted because the person making them had gone over, and over, and over the same issues, unable or unwilling to either competently defend them, or change his own mind. That's no evidence that the same thing is happening here, of course, but I give the example because in my experience, this community is almost never vindictive or malicious, and is laudably willing to consider any cogent argument. I've never seen an actual insult levied here by any regular, for instance, and well-constructed dissenting opinions are actively encouraged.

So in summary, I am very curious about this situation; why would a community that has been - to me, almost shockingly - consistent in its dedication to rationality, and honestly evaluating arguments regardless of personal feelings, persecute someone simply for presenting a dissenting opinion?

One final thing I will note is that you do seem to be upset about past events, and it seems like it colors your view (and prose, a bit!). From checking both here and on SL4, for instance, your later claims regarding what's going on ("dissent is ruthlessly suppressed") seem exaggerated. But I don't know the whole story, obviously - thus this question.

Comment author: metaphysicist 11 May 2012 01:49:07AM 7 points [-]

So in summary, I am very curious about this situation; why would a community that has been - to me, almost shockingly - consistent in its dedication to rationality, and honestly evaluating arguments regardless of personal feelings, persecute someone simply for presenting a dissenting opinion?

The answer is probably that you overestimate that community's dedication to rationality because you share its biases. The main post demonstrates an enormous conceit among the SI vanguard. Now, how is that rational? How does it fail to get extensive scrutiny in a community of rationalists?

My take is that neither side in this argument distinguished itself. Loosemore called for an "outside adjudicator" to solve a scientific argument. What kind of obnoxious behavior is that, when one finds oneself losing an argument? Yudkowsky (rightfully pissed off) in turn, convicted Loosemore of a scientific error, tarred him with incompetence and dishonesty, and banned him. None of these "sins" deserved a ban (no wonder the raw feelings come back to haunt); no honorable person would accept a position where he has the authority to exercise such power (a party to a dispute is biased). Or at the very least, he wouldn't use it the way Yudkowsky did, when he was the banned party's main antagonist.

Comment author: Hul-Gil 11 May 2012 03:29:11AM *  4 points [-]

The answer is probably that you overestimate that community's dedication to rationality because you share its biases.

That's probably no small part of it. However, even if my opinion of the community is tinted rose, note that I refer specifically to observation. That is, I've sampled a good amount of posts and comments here on LessWrong, and I see people behaving rationally in arguments - appreciation of polite and lucid dissension, no insults or ad hominem attacks, etc. It's harder to tell what's going on with karma, but again, I've not seen any one particular individual harassed with negative karma merely for disagreeing.

The main post demonstrates an enormous conceit among the SI vanguard. Now, how is that rational? How does it fail to get extensive scrutiny in a community of rationalists?

Can you elaborate, please? I'm not sure what enormous conceit you refer to.

My take is that neither side in this argument distinguished itself. Loosemore called for an "outside adjudicator" to solve a scientific argument. What kind of obnoxious behavior is that, when one finds oneself losing an argument? Yudkowsky (rightfully pissed off) in turn, convicted Loosemore of a scientific error, tarred him with incompetence and dishonesty, and banned him. None of these "sins" deserved a ban

I think that's an excellent analysis. I certainly feel like Yudkowsky overreacted, and as you say, in the circumstances no wonder it still chafes; but as I say above, Richard's arguments failed to impress, and calling for outside help ("adjudication" for an argument that should be based only on facts and logic?) is indeed beyond obnoxious.

Comment author: John_Maxwell_IV 11 May 2012 05:39:52AM *  4 points [-]

It seems like everyone is talking about SL4; here is a link to what Richard was probably complaining about:

http://www.sl4.org/archive/0608/15895.html

Comment author: Hul-Gil 11 May 2012 07:24:24AM *  8 points [-]

Thanks. I read the whole debate, or as much of it as is there; I've prepared a short summary to post tomorrow if anyone is interested in knowing what really went on ("as according to Hul-Gil", anyway) without having to hack their way through that thread-jungle themselves.

(Summary of summary: Loosemore really does know what he's talking about - mostly - but he also appears somewhat dishonest, or at least extremely imprecise in his communication.)

Comment author: [deleted] 11 May 2012 04:49:52PM 3 points [-]

Please do post it, I think it would help resolve the arguments in this thread.

Comment author: ChrisHallquist 11 May 2012 04:23:23AM 7 points [-]

I initially upvoted this post, because the criticism seemed reasonable. Then I read the discussion, and switched to downvoting it. In particular, this:

Taken in isolation, these thoughts and arguments might amount to nothing more than a minor addition to the points that you make above. However, my experience with SI is that when I tried to raise these concerns back in 2005/2006 I was subjected to a series of attacks that culminated in a tirade of slanderous denunciations from the founder of SI, Eliezer Yudkowsky. After delivering this tirade, Yudkowsky then banned me from the discussion forum that he controlled, and instructed others on that forum that discussion about me was henceforth forbidden.

Since that time I have found that when I partake in discussions on AGI topics in a context where SI supporters are present, I am frequently subjected to abusive personal attacks in which reference is made to Yudkowsky's earlier outburst. This activity is now so common that when I occasionally post comments here, my remarks are very quickly voted down below a threshold that makes them virtually invisible. (A fate that will probably apply immediately to this very comment).

Serious accusations there, with no links that would allow someone to judge the truth of them. And after reading the discussion, I suspect the reason people keep bringing up your 2006 banning is because they see your current behavior is part of a pattern of bad behavior, and that the behavior that led to your 2006 banning was also part of that same pattern of bad behavior.

Comment author: John_Maxwell_IV 11 May 2012 05:26:37AM 7 points [-]

I don't see how friendly and safe follow from stable.

Comment author: Dolores1984 10 May 2012 07:26:04PM 7 points [-]

Leaving aside the question of whether Tool AI as you describe it is possible until I've thought more about it:

The idea of a "self-improving algorithm" intuitively sounds very powerful, but does not seem to have led to many "explosions" in software so far (and it seems to be a concept that could apply to narrow AI as well as to AGI).

Looking to the past for examples is a very weak heuristic here, since we have never dealt with software that could write code at a better than human level before. It's like saying, before the invention of the internal combustion engine, "faster horses have never let you cross oceans before." Same goes for the assumption that strong AI will resemble extremely narrow AI software tools that already exist in specific regards. It's evidence, but it's very weak evidence, and I for one wouldn't bet on it.

Comment author: jimrandomh 10 May 2012 07:26:29PM *  13 points [-]

I don't work for SI and this is not an SI-authorized response, unless SI endorses it later. This comment is based on my own understanding based on conversations with and publications of SI members and general world model, and does not necessarily reflect the views or activities of SI.

The first thing I notice is that your interpretation of SI's goals with respect to AGI are narrower than the impression I had gotten, based on conversations with SI members. In particular, I don't think SI's research is limited to trying to make AGI friendliness provable, but on a variety of different safety strategies, and on the relative win-rates of different technological paths, eg brain uploading vs. de-novo AI, classes of utility functions and their relative risks, and so on. There is also a distinction between "FAI theory" and "AGI theory" that you aren't making; the idea, as I see it, is that to the extent to which these are separable, "FAI theory" covers research into safety mechanisms which reduce the probability of disaster if any AGI is created, while "AGI theory" covers research that brings the creation of any AGI closer. Your first objection - that a maximizing FAI would be very dangerous - seems to be based on a belief, first, that SI is researching a narrower class of safety mechanisms than it really is, and second, that SI researches AGI theory, which I believe it explicitly does not.

You seem a bit sore that SI hasn't talked about your notion of Tool-AI, but I'm a bit confused by this, since it's the first time I've heard that term used, and your link is to an email thread which, unless I'm missing something, was not disseminated publicly or through SI in general. A conversation about tool-based AI is well worth having; my current perspective is that it looks like it interacts with the inevitability argument and the overall AI power curve in such a way that it's still very dangerous, and that it amounts to a slightly different spin on Oracle AI, but this would be a complicated discussion. But bringing it up effectively for the first time, in the middle of a multi-pronged attack on SI's credibility, seems really unfair. While there may have been a significant communications failure in there, a cursory reading suggests to me that your question never made it to the right person.

The claim that SI will perform better if they don't get funding seems very strange. My model is that it would force their current employees to leave and spend their time on unrelated paid work instead, which doesn't seem like an improvement. I get the impression that your views of SI's achievements may be getting measured against a metric of achievements-per-organization, rather than achievements-per-dollar; in absolute budget terms, SI is tiny. But they've still had a huge memetic influence, difficult as that is to measure.

All that said, I applaud your decision to post your objections and read the responses. This sort of dialogue is a good way to reach true beliefs, and I look forward to reading more of it from all sides.

Comment author: steven0461 10 May 2012 08:12:28PM *  6 points [-]

In particular, I don't think SI's research is limited to trying to make AGI friendliness provable, but on a variety of different safety strategies, and on the relative win-rates of different technological paths, eg brain uploading vs. de-novo AI, classes of utility functions and their relative risks, and so on.

I agree, and would like to note the possibility, for those who suspect FAI research is useless or harmful, of earmarking SI donations to research on different safety strategies, or on aspects of AI risk that are useful to understand regardless of strategy.

Comment author: rocurley 10 May 2012 10:55:19PM *  9 points [-]

This likely won't work. Money is fungible, so unless the total donations so earmarked exceeds the planned SI funding for that cause, they won't have to change anything. They're under no obligation to not defund your favorite cause by exactly the amount you donated, thus laundering your donation into the general fund. (Unless I misunderstand the relevant laws?)

EDIT NOTE: The post used to say vast majority; this was changed, but is referenced below.

Comment author: steven0461 10 May 2012 11:02:48PM *  4 points [-]

Suppose you earmark to a paper on a topic X that SI would otherwise probably not write a paper on. Would that cause SI to take money out of research on topics similar to X and into FAI research? There would probably be some sort of (expected) effect in that direction, but I think the size of the effect depends on the details of what causes SI's allocation of resources, and I think the effect would be substantially smaller than would be necessary to make an earmarked donation equivalent to a non-earmarked donation. Still, you're right to bring it up.

Comment author: dlthomas 10 May 2012 11:03:45PM 5 points [-]

You have an important point here, but I'm not sure it gets up to "vast majority" before it becomes relevant.

Earmarking $K for X has an effect once $K exceeds the amount of money that would have been spent on X if the $K had not been earmarked. The size of the effect still certainly depends on the difference, and may very well not be large.

Comment author: Rain 10 May 2012 08:47:21PM 1 point [-]

Some recent discussion of AIs as tools.

Comment author: Rain 10 May 2012 08:03:13PM *  32 points [-]

I completely agree with the intent of this post. These are all important issues SI should officially answer. (Edit: SI's official reply is here.) Here are some of my thoughts:

  • I completely agree with objection 1. I think SI should look into doing exactly as you say. I also feel that friendliness has a very high failure chance and that all SI can accomplish is a very low marginal decrease in existential risk. However, I feel this is the result of existential risk being so high and difficult to overcome (Great Filter) rather than SI being so ineffective. As such, for them to engage this objection is to admit defeatism and millenialism, and so they put it out of mind since they need motivation to keep soldiering on despite the sure defeat.

  • Objection 2 is interesting, though you define AGI differently, as you say. Some points against it: Only one AGI needs to be in agent mode to realize existential risk, even if there are already billions of tool-AIs running safely. Tool-AI seems closer in definition to narrow AI, which you point out we already have lots of, and are improving. It's likely that very advanced tool-AIs will indeed be the first to achieve some measure of AGI capability. SI uses AGI to mean agent-AI precisely because at some point someone will move beyond narrow/tool-AI into agent-AI. AGI doesn't "have to be an agent", but there will likely be agent-AI at some point. I don't see a means to limit all AGI to tool-AI in perpetuity.

  • 'Race for power' should be expanded to 'incentivised agent-AI'. There exist great incentives to create agent-AI above tool-AI, since AGI will be tireless, ever watchful, supremely faster, smarter, its answers not necessarily understood, etc. These include economic incentives, military incentives, etc., not even to implement-first, but to be better/faster on practical everyday events.

  • Objection 3, I mostly agree. Though should tool-AIs achieve such power, they can be used as weapons to realize existential risk, similar to nuclear, chemical, bio-, and nanotechnological advances.

  • I think this post focuses too much on "Friendliness theory". As Zack_M_Davis stated, SIAI should have more appropriately been called "The Singularity Institute For or Against Artificial Intelligence Depending on Which Seems to Be a Better Idea Upon Due Consideration". Friendliness is one word which could encapsulate a basket of possible outcomes, and they're agile enough to change position should it be shown to be necessary, as some of your comments request. Maybe SI should make tool-AI a clear stepping stone to friendliness, or at least a clear possible avenue worth exploring. Agreed.

  • Much agreed re: feedback loops.

  • "Kind of organization": painful but true.

However, I don't think that "Cause X is the one I care about and Organization Y is the only one working on it" to be a good reason to support Organization Y. For donors determined to donate within this cause, I encourage you to consider donating to a donor-advised fund while making it clear that you intend to grant out the funds to existential-risk-reduction-related organizations in the future. (One way to accomplish this would be to create a fund with "existential risk" in the name; this is a fairly easy thing to do and one person could do it on behalf of multiple donors.) For one who accepts my arguments about SI, I believe withholding funds in this way is likely to be better for SI's mission than donating to SI - through incentive effects alone (not to mention my specific argument that SI's approach to "Friendliness" seems likely to increase risks).

Good advice; I'll look into doing this. One reason I've been donating to them is so they can keep the lights on long enough to see and heed this kind of criticism. Maybe those incentives weren't appropriate.

This post limits my desire to donate additional money to SI beyond previous commitments. I consider it a landmark in SI criticism. Thank you for engaging this very important topic.

Edit: After SI's replies and careful consideration, I decided to continue donating directly to them, as they have a very clear roadmap for improvement and still represent the best value in existential risk reduction.

Comment author: khafra 11 May 2012 02:20:10PM 7 points [-]

You're an accomplished and proficient philanthropist; if you do make steps in the direction of a donor-directed existential risk fund, I'd like to see them written about.

Comment author: jedharris 10 May 2012 08:19:47PM *  10 points [-]

Karnofsky's focus on "tool AI" is useful but also his statement of it may confuse matters and needs refinement. I don't think the distinction between "tool AI" and "agent AI" is sharp, or in quite the right place.

For example, the sort of robot cars we will probably have in a few years are clearly agents-- you tell them to "come here and take me there" and they do it without further intervention on your part (when everything is working as planned). This is useful in a way that any amount and quality of question answering is not. Almost certainly there will be various flavors of robot cars available and people will choose the ones they like (that don't drive in scary ways, that get them where they want to go even if it isn't well specified, that know when to make conversation and when to be quiet, etc.) As long as robot cars just drive themselves and people around, can't modify the world autonomously to make their performance better, and are subject to continuing selection by their human users, they don't seem to be much of a threat.

The key points here seem to be (1) limited scope, (2) embedding in a network of other actors and (3) humans in the loop as evaluators. We could say these define "tool AIs" or come up with another term. But either way the antonym doesn't seem to be "agent AIs" but maybe something like "autonomous AIs" or "independent AIs" -- AIs with the power to act independently over a very broad range, unchecked by embedding in a network of other actors or by human evaluation.

Framed this way, we can ask "Why would independent AIs exist?" If the reason is mad scientists, an arms race, or something similar then Karnofsky has a very strong argument that any study of friendliness is beside the point. Outside these scenarios, the argument that we are likely to create independent AIs with any significant power seems weak; Karnofsky's survey more or less matches my own less methodical findings. I'd be interested in strong arguments if they exist.

Given this analysis, there seem to be two implications:

  • We shouldn't build independent AIs, and should organize to prevent their development if they seem likely.

  • We should thoroughly understand the likely future evolution of a patchwork of diverse tool AIs, to see where dangers arise.

For better or worse, neither of these lend themselves to tidy analytical answers, though analytical work would be useful for both. But they are very much susceptible to investigation, proposals, evangelism, etc.

These do lend themselves to collaboration with existing AI efforts. To the extent they perceive a significant risk of development of independent AIs in the foreseeable future, AI researchers will want to avoid that. I'm doubtful this is an active risk but could easily be convinced by evidence -- not just abstract arguments -- and I'm fairly sure they feel the same way.

Understanding the long term evolution of a patchwork of diverse tool AIs should interest just about all major AI developers, AI project funders, and long term planners who will be affected (which is just about all of them). Short term bias and ceteris paribus bias will lead to lots of these folks not engaging with the issue, but I think it will seem relevant to an increasing number as the hits keep coming.

Comment author: RomeoStevens 10 May 2012 08:32:53PM 2 points [-]

any amount and quality of question answering is not.

"how do I build an automated car?"

Comment author: Hul-Gil 11 May 2012 03:44:40AM *  3 points [-]

That doesn't help you if you need a car to take you someplace in the next hour or so, though. I think jed's point is that sometimes it is useful for an AI to take action rather than merely provide information.

Comment author: drethelin 10 May 2012 08:32:39PM 3 points [-]

Tool-based works might be a faster and safer way to create useful AI, but as long as agent-based methods are possible it seems extremely important to me to work on verifying friendliness of artificial agents.

Comment author: abramdemski 11 May 2012 07:21:47AM 0 points [-]

Important, perhaps, but extremely important? If tool-based systems are faster in coming and safer, then they will be available to help the process of creating, studying, and (if necessary) defending against powerful agents.

My prediction would be that tool AI would be economically incentivised, since humans want tools. Agent AI might be created later on more aesthetic grounds, as pets or hoped-for equals. (But that's just an intuition.)

Comment author: drethelin 11 May 2012 05:01:49PM 2 points [-]

For the same reason that a personal assistant is vastly more useful and powerful than a PDA, even though they might nominally serve the same function of remembering phone numbers, appointments, etc. people are extremely likely to want to create agent AIs.

Comment author: drethelin 10 May 2012 08:59:45PM 2 points [-]

As a separate point, people talk about AI friendliness as a safety precaution, but I think an important thing to remember is a truly friendly self improving AGI would probably be the greatest possible thing you could do for the world. It's possible the risk of human destruction from the pursuit of FAI is larger than the possible upside, but if you include the FAI's ability to mitigate other existential risks I don't think that's the case.

Comment author: lukeprog 10 May 2012 09:24:19PM *  62 points [-]

Update: My full response to Holden is now here.

As Holden said, I generally think that Holden's objections for SI "are either correct (especially re: past organizational competence) or incorrect but not addressed by SI in clear argumentative writing (this includes the part on 'tool' AI)," and we are working hard to fix both categories of issues.

In this comment I would merely like to argue for one small point: that the Singularity Institute is undergoing comprehensive changes — changes which I believe to be improvements that will help us to achieve our mission more efficiently and effectively.

Holden wrote:

I'm aware that SI has relatively new leadership that is attempting to address the issues behind some of my complaints. I have a generally positive impression of the new leadership; I believe the Executive Director and Development Director, in particular, to represent a step forward in terms of being interested in transparency and in testing their own general rationality. So I will not be surprised if there is some improvement in the coming years...

Louie Helm was hired as Director of Development in September 2011. I was hired as a Research Fellow that same month, and made Executive Director in November 2011. Below are some changes made since September. (Pardon the messy presentation: LW cannot correctly render tables in comments.)

SI before Sep. 2011: Very few peer-reviewed research publications.
SI today: More peer-reviewed publications coming in 2012 than in all past years combined. Additionally, I alone have a dozen papers in development, for which I am directing every step of research and writing, and will write the final draft, but am collaborating with remote researchers so as to put in only 5%-20% of the total hours required myself.

SI before Sep. 2011: No donor database / a very broken one.
SI today: A comprehensive donor database.

SI before Sep. 2011: Nearly all work performed directly by SI staff.
SI today: Most work outsourced to remote collaborators so that SI staff can focus on the things that only they can do.

SI before Sep. 2011: No strategic plan.
SI today: A strategic plan developed with input from all SI staff, and approved by the Board.

SI before Sep. 2011: Very little communication about what SI is doing.
SI today: Monthly progress reports, plus three Q&As with Luke about SI research and organizational development.

SI before Sep. 2011: No list of the research problems SI is working on.
SI today: A long, fully-referenced list of research problems SI is working on.

SI before Sep. 2011: Very little direct management of staff and projects.
SI today: Luke monitors all projects and staff work, and meets regularly with each staff member.

SI before Sep. 2011: Almost no detailed tracking of the expense of major SI projects (e.g. Summit, papers, etc.). The sole exception seems to be that Amy was tracking the costs of the 2011 Summit in NYC.
SI today: Detailed tracking of the expense of major SI projects for which this is possible (Luke has a folder in Google docs for these spreadsheets, and the summary spreadsheet is shared with the Board).

SI before Sep. 2011: No staff worklogs.
SI today: All staff members share their worklogs with Luke, Luke shares his worklog with all staff plus the Board.

SI before Sep. 2011: Best practices not followed for bookkeeping/accounting; accountant's recommendations ignored.
SI today: Meetings with consultants about bookkeeping/accounting; currently working with our accountant to implement best practices and find a good bookkeeper.

SI before Sep. 2011: Staff largely separated, many of them not well-connected to the others.
SI today: After a dozen or so staff dinners, staff much better connected, more of a team.

SI before Sep. 2011: Want to see the basics of AI Risk explained in plain language? Read The Sequences (more than a million words) or this academic book chapter by Yudkowsky.
SI today: Want to see the basics of AI Risk explained in plain language? Read Facing the Singularity (now in several languages, with more being added) or listen to the podcast version.

SI before Sep. 2011: Very few resources created to support others' research in AI risk.
SI today: IntelligenceExplosion.com, Friendly-AI.com, list of open problems in the field, with references, AI Risk Bibliography 2012, annotated list of journals that may publish papers on AI risk, a partial history of AI risk research, and a list of forthcoming and desired articles on AI risk.

SI before Sep. 2011: A hard-to-navigate website with much outdated content.
SI today: An entirely new website that is easier to navigate and has much new content (nearly complete; should launch in May or June).

SI before Sep. 2011: So little monitoring of funds that $118k was stolen in 2010 before SI noticed. (Note that we have won stipulated judgments to get much of this back, and have upcoming court dates to argue for stipulated judgments to get the rest back.)
SI today: Our bank accounts have been consolidated, with 3-4 people regularly checking over them.

SI before Sep. 2011: SI publications exported straight to PDF from Word or Google Docs, sometimes without even author names appearing.
SI today: All publications being converted into slick, useable LaTeX template (example), with all references checked and put into a central BibTeX file.

SI before Sep. 2011: No write-up of our major public technical breakthrough (TDT) using the mainstream format and vocabulary comprehensible to most researchers in the field (this is what we have at the moment).
SI today: Philosopher Rachael Briggs, whose papers on decision theory have been twice selected for the Philosopher's Annual, has been contracted to write an explanation of TDT and publish it in one of a select few leading philosophy journals.

SI before Sep. 2011: No explicit effort made toward efficient use of SEO or our (free) Google Adwords.
SI today: Highly optimized use of Google Adwords to direct traffic to our sites; currently working with SEO consultants to improve our SEO (of course, the new website will help).

(Just to be clear, I think this list shows not that "SI is looking really great!" but instead that "SI is rapidly improving and finally reaching a 'basic' level of organizational function.")

Comment author: lukeprog 11 May 2012 02:54:28AM *  22 points [-]

...which is not to say, of course, that things were not improving before September 2011. It's just that the improvements have accelerated quite a bit since then.

For example, Amy was hired in December 2009 and is largely responsible for these improvements:

  • Built a "real" Board and officers; launched monthly Board meetings in February 2010.
  • Began compiling monthly financial reports in December 2010.
  • Began tracking Summit expenses and seeking Summit sponsors.
  • Played a major role in canceling many programs and expenses that were deemed low ROI.
Comment author: [deleted] 11 May 2012 04:25:54AM *  9 points [-]

Our bank accounts have been consolidated, with 3-4 people regularly checking over them.

In addition to reviews, should SI implement a two-man rule for manipulating large quantities of money? (For example, over 5k, over 10k, etc.)

Comment author: Eliezer_Yudkowsky 11 May 2012 05:00:20AM 4 points [-]

And note that these improvements would not and could not have happened without more funding than the level of previous years - if, say, everyone had been waiting to see these kinds of improvements before funding.

Comment author: John_Maxwell_IV 11 May 2012 05:07:40AM 1 point [-]

This seems like a rather absolute statement. Knowing Luke, I'll bet he would've gotten some of it done even on a limited budget.

Comment author: ciphergoth 11 May 2012 06:08:58AM 7 points [-]

Luke and Louie Helm are both on paid staff.

Comment author: lukeprog 11 May 2012 08:13:02AM *  54 points [-]

note that these improvements would not and could not have happened without more funding than the level of previous years

Really? That's not obvious to me. Of course you've been around for all this and I haven't, but here's what I'm seeing from my vantage point...

Recent changes that cost very little:

  • Donor database
  • Strategic plan
  • Monthly progress reports
  • A list of research problems SI is working on (it took me 16 hours to write)
  • IntelligenceExplosion.com, Friendly-AI.com, AI Risk Bibliography 2012, annotated list of journals that may publish papers on AI risk, a partial history of AI risk research, and a list of forthcoming and desired articles on AI risk (each of these took me only 10-25 hours to create)
  • Detailed tracking of the expenses for major SI projects
  • Staff worklogs
  • Staff dinners (or something that brought staff together)
  • A few people keeping their eyes on SI's funds so theft would be caught sooner
  • Optimization of Google Adwords

Stuff that costs less than some other things SI had spent money on, such as funding Ben Goertzel's AGI research or renting downtown Berkeley apartments for the later visiting fellows:

  • Research papers
  • Management of staff and projects
  • Rachael Briggs' TDT write-up
  • Best-practices bookkeeping/accounting
  • New website
  • LaTeX template for SI publications; references checked and then organized with BibTeX
  • SEO

Do you disagree with these estimates, or have I misunderstood what you're claiming?

Comment author: [deleted] 11 May 2012 08:18:32AM *  5 points [-]

I was hired as a Research Fellow that same month

Luke alone has a dozen papers in development

Why did you start referring to yourself in the first person and then change your mind? (Or am I missing something?)

Comment author: lukeprog 11 May 2012 08:20:33AM *  9 points [-]

Brain fart: now fixed.

Comment author: [deleted] 11 May 2012 08:27:14AM *  18 points [-]

(Why was this downvoted? If it's because the downvoter wants to see fewer brain farts, they're doing it wrong, because the message such a downvote actually conveys is that they want to see fewer acknowledgements of brain farts. Upvoted back to 0, anyway.)

Comment author: siodine 11 May 2012 01:35:22PM 4 points [-]

Isn't this very strong evidence in support for Holden's point about "Apparent poorly grounded belief in SI's superior general rationality" (excluding Luke, at least)? And especially this?

Comment author: jmmcd 10 May 2012 09:43:37PM 1 point [-]

I feel that the relevance of "Friendliness theory" depends heavily on the idea of a "discrete jump" that seems unlikely and whose likelihood does not seem to have been publicly argued for.

It has been. An AI foom could be fast enough and/or sufficiently invisible in the early stages that it's practically discrete, to us. So the AI-foom does have relevance, contra

I believe I have read the vast majority of the Sequences, including the AI-foom debate, and that this content - while interesting and enjoyable - does not have much relevance for the arguments I've made.

Comment author: Wei_Dai 10 May 2012 10:44:59PM 13 points [-]

I agree with much of this post, but find a disconnect between the specific criticisms and the overall conclusion of withholding funds from SI even for "donors determined to donate within this cause", and even aside from whether SI's FAI approach increases risk. I see a couple of ways in which the conclusion might hold.

  1. SI is doing worse than they are capable of, due to wrong beliefs. Withholding funds provides incentive for them to do what you think is right, without having to change their beliefs. But this could lead to waste if people disagree in different directions, and funds end up sitting unused because SI can't satisfy everyone, or if SI thinks the benefit of doing what they think is optimal is greater than the value of extra funds they could get from doing what you think is best.
  2. A more capable organization already exists or will come up later and provide a better use of your money. This seems unlikely in the near future, given that we're already familiar with the "major players" in the existential risk area and based on past history, it doesn't seem likely that a new group of highly capable people would suddenly get interested in the cause. In the longer run, it's likely that many more people will be attracted to work in this area as time goes on and the threat of a bad-by-default Singularity becomes more obvious, but those people have the disadvantage of having less time for their work to take effect (which reduces the average value of donations), and there will probably also be many more willing donors than at this time (which reduces the marginal value of donations).

So neither of these ways to fill in the missing part of the argument seems very strong. I'd be interested to know what Holden's own thoughts are, or if anyone else can make stronger arguments on his behalf.

Comment author: Bugmaster 10 May 2012 11:04:10PM *  7 points [-]

Holden said,

However, I don't think that "Cause X is the one I care about and Organization Y is the only one working on it" to be a good reason to support Organization Y.

This addresses your point (2). Holden believes that SI is grossly inefficient at best, and actively harmful at worst (since he thinks that they might inadvertently increase AI risk). Therefore, giving money to SI would be counterproductive, and a donor would get a better return on investment in other places.

As for point (1), my impression is that Holden's low estimate of SI's competence is due to a combination of what he sees as wrong beliefs, as well as an insufficient capability to implement even the correct beliefs into practice. SI claims to be supremely rational, but their list of achievements is lackluster at best -- which indicates a certain amount of Donning-Kruger effect that's going on. Furthermore, SI appears to be focused on growing SI and teaching rationality workshops, as opposed to their stated mission of researching FAI theory.

Additionally, Holden indicted SI members pretty strongly (though very politely) for what I will (in a less polite fashion) label as arrogance. The prevailing attitude of SI members seems to be (according to Holden) that the rest of the world is just too irrational to comprehend their brilliant insights, and therefore the rest of the world has little to offer -- and therefore, any criticism of SI's goals or actions can be dismissed out of hand.

EDIT: found the right quote, duh.

Comment author: TheOtherDave 10 May 2012 11:18:03PM 12 points [-]

If Holden believes that:
A) reducing existential risk is valuable, and
B) SI's effectiveness at reducing existential risk is a significant contributor to the future of existential risk, and
C) SI is being less effective at reducing existential risk than they would be if they fixed some set of problems P, and
D) withholding GiveWell's endorsement while pre-committing to re-evaluating that refusal if given evidence that P has been fixed increases the chances that SI will fix P...

...it seems to me that Holden should withhold GiveWell's endorsement while pre-committing to re-evaluating that refusal if given evidence that P has been fixed.

Which seems to be what he's doing. (Of course, I don't know whether those are his reasons.)

What, on your view, ought he do instead, if he believes those things?

Comment author: Wei_Dai 11 May 2012 12:36:02AM 5 points [-]

Holden must believe some additional relevant statements, because A-D (with "existential risk" suitably replaced) could be applied to every other charity, as presumably no charity is perfect.

I guess what I most want to know is what Holden thinks are the reasons SI hasn't already fixed the problems P. If it's lack of resources or lack of competence, then "withholding ... while pre-committing ..." isn't going to help. If it's wrong beliefs, then arguing seems better than "incentivizing", since that provides a permanent instead of temporary solution, and in the course of arguing you might find out that you're wrong yourself. What does Holden believe that causes him to think that providing explicit incentives to SI is a good thing to do?

Comment author: TheOtherDave 11 May 2012 01:57:19AM 0 points [-]

Absolutely agreed that if D is false -- for example, if increasing SI's incentive to fix P doesn't in fact increase SI's chances of fixing P, or if a withholding+precommitting strategy doesn't in fact increase SI's incentive to fix P, or some other reason -- then the strategy I describe makes no sense.

Comment author: dspeyer 11 May 2012 02:53:55AM 1 point [-]

But C applies more to some charities than others. And evaluating how much of a charity's potential effectiveness is lost to internal flaws is a big piece of what GiveWell does.

Comment author: ciphergoth 11 May 2012 06:44:03AM 2 points [-]

Thanks for making this argument!

AFAICT charities generally have perverse incentives - to do what will bring in donations, rather than what will do the most good. That can usually argue against things like transparency, for example. So I think when Holden usually says "don't donate to X yet" it's as part of an effort to make these incentives saner.

As it happens, I don't think this problem applies especially strongly to SI, but others may differ.

Comment author: [deleted] 11 May 2012 08:47:57AM -1 points [-]
Comment author: timtyler 10 May 2012 11:02:37PM 0 points [-]

I thought objections 1 and 2 were bogus. I thought Holden would be better off steering away from the more technical arguments and sticking to the line that these folk don't have a clearly-argued case regarding them doing a lot of good.

Comment author: timtyler 10 May 2012 11:21:14PM *  1 point [-]

I believe that the probability that SI's concept of "Friendly" vs. "Unfriendly" goals ends up seeming essentially nonsensical, irrelevant and/or unimportant from the standpoint of the relevant future is over 90%.

It seems like an odd thing to say. Why take the standpoint of the "relevant future"? History is written by the winners - but that doesn't mean that their perspective is shared by us. Besides the statement is likely wrong - "Friendly" and "Unfriendly" as defined by Yudkowsky are fairly reasonable and useful concepts.

Comment author: kalla724 10 May 2012 11:26:58PM 2 points [-]

Very good. Objection 2 in particular resonates with my view of the situation.

One other thing that is often missed is the fact that SI assumes that development of superinteligent AI will precede other possible scenarios - including the augmented human intelligence scenario (CBI producing superhumans, with human motivations and emotions, but hugely enhanced intelligence). In my personal view, this scenario is far more likely than the creation of either friendly or unfriendly AI, and the problems related to this scenario are far more pressing.

Comment author: Eliezer_Yudkowsky 11 May 2012 12:30:27AM 24 points [-]

Thank you very much for writing this. I, um, wish you hadn't posted it literally directly before the May Minicamp when I can't realistically respond until Tuesday. Nonetheless, it already has a warm place in my heart next to the debate with Robin Hanson as the second attempt to mount informed criticism of SIAI.

Comment author: John_Maxwell_IV 11 May 2012 05:16:53AM *  21 points [-]

It looks to me as though Holden had the criticisms he expresses even before becoming "informed", presumably by reading the sequences, but was too intimidated to share them. Perhaps it is worth listening to/encouraging uninformed criticisms as well as informed ones?

Comment author: lukeprog 11 May 2012 08:19:04AM *  6 points [-]

[Holden's critique] already has a warm place in my heart... as the second attempt to mount informed criticism of SIAI.

To those who think Eliezer is exaggerating: please link me to "informed criticism of SIAI."

It is so hard to find good critics.

Edit: Well, I guess there are more than two examples, though relatively few. I was wrong to suggest otherwise. Much of this has to do with the fact that SI hasn't been very clear about many of its positions and arguments: see Beckstead's comment and Hallquist's followup.

Comment author: XiXiDu 11 May 2012 10:22:18AM *  14 points [-]

To those who think Eliezer is exaggerating: please link me to "informed criticism of SIAI."

It would help if you could elaborate on what you mean by "informed".

Most of what Holden wrote, and much more, has been said by other people, excluding myself, before.

I don't have the time right now to wade through all those years of posts and comments but might do so later.

And if you are not willing to take into account what I myself wrote, for being uninformed, then maybe you will however agree that at least all of my critical comments that have been upvoted to +10 (ETA changed to +10, although there is a lot more on-topic at +5) should have been taken into account. If you do so you will find that SI could have updated some time ago on some of what has been said in Holden's post.

Comment author: Gastogh 11 May 2012 03:10:11PM *  7 points [-]

It would help if you could elaborate on what you mean by "informed".

Seconded. It seems to me like it's not even possible to mount properly informed criticism if much of the findings are just sitting unpublished somewhere. I'm hopeful that this is actually getting fixed sometime this year, but it doesn't seem fair to not release information and then criticize the critics for being uninformed.

Comment author: Will_Newsome 11 May 2012 05:21:56PM 16 points [-]

Wei Dai has written many comments and posts that have some measure of criticism, and various members of the community, including myself, have expressed agreement with them. I think what might be a problem is that such criticisms haven't been collected into a single place where they can draw attention and stir up drama, as Holden's post has.

There are also critics like XiXiDu. I think he's unreliable, and I think he'd admit to that, but he also makes valid criticisms that are shared by other LW folk, and LW's moderation makes it easy to sift his comments for the better stuff.

Perhaps an institution could be designed. E.g., a few self-ordained SingInst critics could keep watch for critiques of SingInst, collect them, organize them, and update a page somewhere out-of-the-way over at the LessWrong Wiki that's easily checkable by SI folk like yourself. LW philanthropists like User:JGWeissman or User:Rain could do it, for example. If SingInst wanted to signal various good things then it could even consider paying a few people to collect and organize criticisms of SingInst. Presumably if there are good critiques out there then finding them would be well worth a small investment.

Comment author: lukeprog 11 May 2012 07:10:30PM 4 points [-]

Good point. Wei Dai qualifies as informed criticism. Though, he seems to agree with us on all the basics, so that might not be the kind of criticism Eliezer was talking about.

Comment author: thomblake 11 May 2012 05:49:32PM 10 points [-]

I'm not sure how much he's put into writing, but Ben Goertzel is surely informed. One might argue he comes to the wrong conclusions about AI danger, but it's not from not thinking about it.

Comment author: Wei_Dai 11 May 2012 06:26:10PM *  29 points [-]

This is a bit exasperating. Did you not see my comments in this thread? Have you and Eliezer considered that if there really have been only two attempts to mount informed criticism of SIAI, then LessWrong must be considered a massive failure that SIAI ought to abandon ASAP?

Comment author: Wei_Dai 11 May 2012 02:45:15AM 50 points [-]

Is it just me, or do Luke and Eliezer's initial responses appear to send the wrong signals? From the perspective of an SI critic, Luke's comment could be interpreted as saying "for us, not being completely incompetent is worth bragging about", and Eliezer's as "we're so arrogant that we've only taken two critics (including Holden) seriously in our entire history". These responses seem suboptimal, given that Holden just complained about SI's lack of impressive accomplishments, and being too selective about whose feedback to take seriously.

Comment author: Furcas 11 May 2012 03:15:54AM *  23 points [-]

Luke isn't bragging, he's admitting that SI was/is bad but pointing out it's rapidly getting better. And Eliezer is right, criticisms of SI are usually dumb. Could their replies be interpreted the wrong way? Sure, anything can be interpreted in any way anyone likes. Of course Luke and Eliezer could have refrained from posting those replies and instead posted carefully optimized responses engineered to send nothing but extremely appealing signals of humility and repentance.

But if they did turn themselves into politicians, we wouldn't get to read what they actually think. Is that what you want?

Comment author: Wei_Dai 11 May 2012 08:30:50AM *  27 points [-]

Luke isn't bragging, he's admitting that SI was/is bad but pointing out it's rapidly getting better.

But the accomplishments he listed (e.g., having a strategic plan, website redesign) are of the type that Holden already indicated to be inadequate. So why the exhaustive listing, instead of just giving a few examples to show SI is getting better and then either agreeing that they're not yet up to par, or giving an argument for why Holden is wrong? (The reason I think he could be uncharitably interpreted as bragging is that he would more likely exhaustively list the accomplishments if he was proud of them, instead of just seeing them as fixes to past embarrassments.)

And Eliezer is right, criticisms of SI are usually dumb.

I'd have no problem with "usually" but "all except two" seems inexcusable.

But if they did turn themselves into politicians, we wouldn't get to read what they actually think. Is that what you want?

Do their replies reflect their considered, endorsed beliefs, or were they just hurried remarks that may not say what they actually intended? I'm hoping it's the latter...

Comment author: Kaj_Sotala 11 May 2012 10:10:04AM *  38 points [-]

But the accomplishments he listed (e.g., having a strategic plan, website redesign) are of the type that Holden already indicated to be inadequate. So why the exhaustive listing, instead of just giving a few examples to show SI is getting better and then either agreeing that they're not yet up to par, or giving an argument for why Holden is wrong?

Presume that SI is basically honest and well-meaning, but possibly self-deluded. In other words, they won't outright lie to you, but they may genuinely believe that they're doing better than they really are, and cherry-pick evidence without realizing that they're doing so. How should their claims of intending to get better be evaluated?

Saying "we're going to do things better in the future" is some evidence about SI intending to do better, but rather weak evidence, since talk is cheap and it's easy to keep thinking that you're really going to do better soon but there's this one other thing that needs to be done first and we'll get started on the actual improvements tomorrow, honest.

Saying "we're going to do things better in the future, and we've fixed these three things so far" is stronger evidence, since it shows that you've already began fixing problems and might keep up with it. But it's still easy to make a few improvements and then stop. There are far more people who try to get on a diet, follow it for a while and then quit than there are people who actually diet for as long as they initially intended to do.

Saying "we're going to do things better in the future, and here's the list of 18 improvements that we've implemented so far" is much stronger evidence than either of the two above, since it shows that you've spent a considerable amount of effort on improvements over an extended period of time, enough to presume that you actually care deeply about this and will keep up with it.

I don't have a cite at hand, but it's been my impression that in a variety of fields, having maintained an activity for longer than some threshold amount of time is a far stronger predictor of keeping up with it than having maintained it for a shorter time. E.g. many people have thought about writing a novel and many people have written the first five pages of a novel. But when considering the probability of finishing, the difference between the person who's written the first 5 pages and the person who's written the first 50 pages is much bigger than the difference between the person who's written the first 100 pages and the person who's written the first 150 pages.

There's a big difference between managing some performance once, and managing sustained performance over an extended period of time. Luke's comment is far stronger evidence of SI managing sustained improvements over an extended period of time than a comment just giving a few examples of improvement.

Comment author: Will_Newsome 11 May 2012 03:47:13AM 19 points [-]

Eliezer's comment makes me think that you, specifically, should consider collecting your criticisms and putting them in Main where Eliezer is more likely to see them and take the time to seriously consider them.

Comment author: Nick_Beckstead 11 May 2012 03:56:21AM 51 points [-]

While I have sympathy with the complaint that SI's critics are inarticulate and often say wrong things, Eliezer's comment does seem to be indicative of the mistake Holden and Wei Dai are describing. Most extant presentations of SIAI's views leave much to be desired in terms of clarity, completeness, concision, accessibility, and credibility signals. This makes it harder to make high quality objections. I think it would be more appropriate to react to poor critical engagement more along the lines of "We haven't gotten great critics. That probably means that we need to work on our arguments and their presentation," and less along the lines of "We haven't gotten great critics. That probably means that there's something wrong with the rest of the world."

Comment author: ChrisHallquist 11 May 2012 04:04:08AM 27 points [-]

This. I've been trying to write something about Eliezer's debate with Robin Hanson, but the problem I keep running up against is that Eliezer's points are not clearly articulated at all. Even making my best educated guesses about what's supposed to go in the gaps in his arguments, I still ended up with very little.

Comment author: Nick_Beckstead 11 May 2012 05:11:05AM 5 points [-]

In fairness I should add that I think Luke M agrees with this assessment and is working on improving these arguments/communications.

Comment author: ChrisHallquist 11 May 2012 03:58:27AM 5 points [-]

I read Luke's comment just as "I'm aware these are issues and we're working on it." I didn't read him as "bragging" about the ones that have been solved. Eliezer's... I see the problem with. I initially read it as just commenting Holden on his high-quality article (which I agree was high-quality), but I can see it being read as backhanded at anyone else who's criticized SIAI.

Comment author: magfrump 11 May 2012 04:50:01AM 7 points [-]

Luke's comment addresses the specific point that Holden made about changes in the organization given the change in leadership.

Holden said:

I'm aware that SI has relatively new leadership that is attempting to address the issues behind some of my complaints. I have a generally positive impression of the new leadership; I believe the Executive Director and Development Director, in particular, to represent a step forward in terms of being interested in transparency and in testing their own general rationality. So I will not be surprised if there is some improvement in the coming years, particularly regarding the last couple of statements listed above. That said, SI is an organization and it seems reasonable to judge it by its organizational track record, especially when its new leadership is so new that I have little basis on which to judge these staff.

Luke attempted to provide (for the reader) a basis on which to judge these staff members.

Eliezer's response was... characteristic of Eliezer? And also very short and coming at a busy time for him.

Comment author: ciphergoth 11 May 2012 06:34:15AM 4 points [-]

Are there other specific critiques you think should have made Eliezer's list, or is it that you think he should not have drawn attention to their absence?

Comment author: Wei_Dai 11 May 2012 07:39:41AM 26 points [-]

Are there other specific critiques you think should have made Eliezer's list, or is it that you think he should not have drawn attention to their absence?

Many of Holden's criticisms have been made by others on LW already. He quoted me in Objection 1. Discussion of whether Tool-AI and Oracle-AI are or are not safe have occurred numerous times. Here's one that I was involved in. Many people have criticized Eliezer/SI for not having sufficiently impressive accomplishments. Cousin_it and Silas Barta have questioned whether the rationality techniques being taught by SI (and now the rationality org) are really effective.

Comment author: private_messaging 11 May 2012 07:13:02AM *  0 points [-]

It's the correct signals. The incompetents inherently signal incompetence, the competence can't be faked beyond superficial level (and faking competence is all about signalling that you are sure you are competent). The lack of feedback is inherent in the assumption behind 'we are sending wrong signal' rather than 'maybe, we really are incompetent'.

Comment author: [deleted] 11 May 2012 08:39:33AM 0 points [-]

I kind-of agree about Eliezer's comment, but Luke's doesn't sound like that to me.

Comment author: [deleted] 11 May 2012 08:41:22AM 5 points [-]

Retracted. I've just re-read Eliezer's comment more calmly, and it's not that bad either.

Comment author: rhollerith_dot_com 11 May 2012 04:04:57AM *  13 points [-]

I feel that [SI] ought to be able to get more impressive endorsements than it has.

SI seems to have passed up opportunities to test itself and its own rationality by e.g. aiming for objectively impressive accomplishments.

Holden, do you believe that charitable organizations should set out deliberately to impress donors and high-status potential endorsers? I would have thought that a donor like you would try to ignore the results of any attempts at that and to concentrate instead on how much the organization has actually improved the world because to do otherwise is to incentivize organizations whose real goal is to accumulate status and money for their own sake.

For example, Eliezer's attempts to teach rationality or "technical epistemology" or whatever you want to call it through online writings seem to me to have actually improved the world in a non-negligible way and seem to have been designed to do that rather than designed merely to impress.

ADDED. The above is probably not as clear as it should be, so let me say it in different words: I suspect it is a good idea for donors to ignore certain forms of evidence ("impressiveness", affiliation with high-status folk) of a charity's effectiveness to discourage charities from gaming donors in ways that seems to me already too common, and I was a little surprised to see that you do not seem to ignore those forms of evidence.

Comment author: rhollerith_dot_com 11 May 2012 06:36:47PM *  5 points [-]

In other words, I tend to think that people who make philanthropy their career and who have accumulated various impressive markers of their potential to improve the world are likely to continue to accumulate impressive markers, but are less likely to improve the world than people who have already actually improved the world.

And of the three core staff members of SI I have gotten to know, 2 (Eliezer and another one who probably does not want to be named) have already improved the world in non-negligible ways and the third spends less time accumulating credentials and impressiveness markers than almost anyone I know.

Comment author: ChrisHallquist 11 May 2012 04:26:41AM 1 point [-]

I'm mildly surprised that this post has not yet attracted more criticism. My initial reaction was that criticisms (1) and (2) seemed like strong ones, and almost posted a comment saying so. Then I thought, "I should look for other people discussing those points and join that discussion." But after doing that, I feel like people haven't given much in the way of objections to (1) and (2). Perceptions correct? Do lots of other people agree with them?

Comment author: ciphergoth 11 May 2012 06:51:02AM 5 points [-]

I think that many of Holden's stronger points call for longer, more carefully worked out answers than a dashed-off comment.

Comment author: kip1981 11 May 2012 05:49:51AM 8 points [-]

My biggest criticism of SI is that I cannot decide between:

A. promoting AI and FAI issues awareness will decrease the chance of UFAI catastrophe; or B. promoting AI and FAI issues awareness will increase the chance of UFAI catastrophe

This criticism seems district from the ones that Holden makes. But it is my primary concern. (Perhaps the closest example is Holden's analogy that SI is trying to develop facebook before the Internet).

A seems intuitive. Basically everyone associated with SI assumes that A is true, as far as I can tell. But A is not obviously true to me. It seems to me at least plausible that:

A1. promoting AI and FAI issues will get lots of scattered groups around the world more interested in creating AGI A2. one of these groups will develop AGI faster than otherwise due to A1 A3. the world will be at greater risk of UFAI catastrophe than otherwise due to A2 (i.e. the group creates AGI faster than otherwise, and fails at FAI)

More simply: SI's general efforts, albeit well intended, might accelerate the creation of AGI, and the acceleration of AGI might decrease the odds of the first AGI being friendly. This is one path by which B, not A, would be true.

SI might reply that, although it promotes AGI, it very specifically limits its promotion to FAI. Although that is SI's intention, it is not at all clear that promoting FAI will not have the unintended consequence of accelerating UFAI. By analogy, if a responsible older brother goes around promoting gun safety all the time, the little brother might be more likely to accidentally blow his face off, than if the older brother had just kept his mouth shut. Maybe the older brother shouldn't have kept his mouth shut, maybe he should have... it's not clear either way.

If B is more true than A, the best thing that SI could do would probably be develop clandestine missions to assassinate people who try to develop AGI. SI does almost the exact opposite.

SI's efforts are based on the assumption that A is true. But it's far from clear to me that A, instead of B, is true. Maybe it is, maybe it is. SI seems overconfident that A is true. I've never heard anyone at SI (or elsewhere) really address this criticism.

Comment author: ciphergoth 11 May 2012 06:31:10AM 27 points [-]

Firstly, I'd like to add to the chorus saying that this is an incredible post; as a supporter of SI, it warms my heart to see it. I disagree with the conclusion - I would still encourage people to donate to SI - but if SI gets a critique this good twice a decade it should count itself lucky.

I don't think GiveWell making SI its top rated charity would be in SI's interests. In the long term, SI benefits hugely when people are turned on to the idea of efficient charity, and asking them to swallow all of the ideas behind SI's mission at the same time will put them off. If I ran GiveWell and wanted to give an endorsement to SI, I might break the rankings into multiple lists: the most prominent being VillageReach-like charities which directly do good in the near future, then perhaps a list for charities that mitigate broadly accepted and well understood existential risks (if this can be done without problems with politics), and finally a list of charities which mitigate more speculative risks.

Comment author: hairyfigment 11 May 2012 07:41:37AM 4 points [-]

The organization section touches on something that concerns me. Developing a new decision theory sounds like it requires more mathematical talent than the SI yet has available. I've said before that hiring some world-class mathematicians for a year seems likely to either get said geniuses interested in the problem, to produce real progress, or to produce a proof that SI's current approach can't work. In other words, it seems like the best form of accountability we can hope for given the theoretical nature of the work.

Now Eliezer is definitely looking for people who might help. For instance, the latest chapter of "Harry Potter and the Methods of Rationality" mentioned

a minicamp for 20 mathematically talented youths...Most focus will be on technical aspects of rationality (probability theory, decision theory) but also with some teaching of the same mental skills in the other Minicamps.

It also says,

Several instructors of International Olympiad level have already volunteered.

So they technically have something already. And if there exists a high-school student who can help with the problem, or learn to do so, that person seems relatively likely to enjoy HP:MoR. But I worry that Eliezer is thinking too much in terms of his own life story here, and has not had to defend his approach enough.

Comment author: Manfred 11 May 2012 09:29:07AM *  1 point [-]

Developing a new decision theory sounds like it requires more mathematical talent than the SI yet has available.

On what measure of difficulty are you basing this? We have some guys around here doing a pretty good job.

Comment author: hairyfigment 11 May 2012 06:25:41PM 1 point [-]

I phrased that with too much certainty. While I have little if any reason to see fully-reflective decision theory as an easier task than self-consistent infinite set theory, I also have no clear reason to think the contrary.

But I'm trying to find the worst scenario that we could plan for. I can think of two broad ways that Eliezer's current plan could be horribly misguided:

  1. if it works well enough to help someone produce an uFAI but not well enough to stop this in time
  2. if some part of it -- such as a fully-reflective decision theory that humans can understand -- is mathematically impossible, and SI never realizes this.

Now SI technically seems aware of both problems. The fact that Eliezer went out of his way to help critics understand Löb's Theorem and that he keeps mentioning said theorem seems like a good sign. But should I believe that SI is doing enough to address #2? Why?

Comment author: jonperry 11 May 2012 08:09:02AM 4 points [-]

Let's say that the tool/agent distinction exists, and that tools are demonstrably safer. What then? What course of action follows?

Should we ban the development of agents? All of human history suggests that banning things does not work.

With existential stakes, only one person needs to disobey the ban and we are all screwed.

Which means the only safe route is to make a friendly agent before anyone else can. Which is pretty much SI's goal, right?

So I don't understand how practically speaking this tool/agent argument changes anything.

Comment author: [deleted] 11 May 2012 08:54:47AM 2 points [-]

Which means the only safe route is to make a friendly agent before anyone else can.

Only if running too fast doesn't make it easier to screw something up, which it most likely does.

Comment author: jonperry 11 May 2012 09:23:26AM 2 points [-]

Yes, you can create risk by rushing things. But you still have to be fast enough to outrun the creation of UFAI by someone else. So you have to be fast, but not too fast. It's a balancing act.

Comment author: Monkeymind 11 May 2012 03:10:04PM *  3 points [-]

If intelligence is the ability to understand concepts, and a super-intelligent AI has a super ability to understand concepts, what would prevent it (as a tool) from answering questions in a way so as to influence the user and affect outcomes as though it were an agent?

Comment author: khafra 11 May 2012 05:23:27PM 3 points [-]

If the time at which anyone activates a uFAI is known, SI should activate their current FAI best effort (CFBE) one day before that.

If the time at which anyone activates a GAI of unknown friendliness is known, SI should compare the probability distribution function for the friendliness of the two AIs, and activate their CFBE one day earlier only if it has more probability mass on the "friendly" side.

If the time at which anyone makes a uFAI is unknown, SI should activate their CFBE when the probability that they'll improve the CFBE in the next day is lower than the probability that someone will activate a uFAI in the next day.

If the time at which anyone makes a GAI of unknown friendliness is unknown, SI should activate their CFBE when the probability that CFBE=uFAI is less than the probability that anyone else will activate a GAI of unknown friendliness, multiplied by the probability that the other GAI will be unfriendly.

...I think. I do tend to miss the obvious when trying to think systematically, and I was visualizing gaussian pdfs without any particular justification, and a 1-day decision cycle with monotonically improving CFBE, and this is only a first-order approximation: It doesn't take into account any correlations between the decisions of SI and other GAI researchers.

Comment author: jsteinhardt 11 May 2012 05:53:38PM 1 point [-]

I think the idea is to use tool AI to create safe agent AI.

Comment author: Mitchell_Porter 11 May 2012 10:40:56AM 7 points [-]

Maybe I'm just jaded, but this critique doesn't impress me much. Holden's substantive suggestion is that, instead of trying to design friendly agent AI, we should just make passive "tool AI" that only reacts to commands but never acts on its own. So when do we start thinking about the problems peculiar to agent AI? Do we just hope that agent AI will never come into existence? Do we ask the tool AI to solve the friendly AI problem for us? (That seems to be what people want to do anyway, an approach I reject as ridiculously indirect.)

Comment author: Will_Newsome 11 May 2012 05:40:31PM 7 points [-]

(Perhaps I should note that I find your approach to be too indirect as well: if you really understand how justification works then you should be able to use that knowledge to make (invoke?) a theoretically perfectly justified agent, who will treat others' epistemic and moral beliefs in a thoroughly justified manner without your having to tell it "morality is in mind-brains, figure out what the mind-brains say then do what they tell you to do". That is, I think the correct solution should be just clearly mathematically and meta-ethically justified, question-dissolving, reflective, non-arbitrary, perfect decision theory. Such an approach is closest in spirit to CFAI. All other approaches, e.g. CEV, WBE, or oracle AI, are relatively arbitrary and unmotivated, especially meta-ethically.)

Comment author: hairyfigment 11 May 2012 06:12:05PM 2 points [-]

Not only does this seem wrong, but if I believed it I would want SI to look for the correct decision theory (roughly what Eliezer says he's doing anyway). It fails to stress the possibility that Eliezer's whole approach is wrong. In fact it seems willfully (heh) ignorant of the planning fallacy and similar concerns: even formalizing the 'correct' prior seems tricky to me, so why would it be feasible to formalize "correct" meta-ethics even if it exists in the sense you mean? And what reason do we have to believe that a version with no pointers to brains exists at all?

At least with reflective decision theory I see no good reason to think that a transparently-written AGI is impossible in principle (our neurons don't just fire randomly, nor does evolution seem like a particularly good searcher of mindspace), so a theory of decisions that can describe said AGI's actions should be mathematically possible barring some alternative to math. (Whether, eg, the description would fit in our observable universe seems like another question.)

Comment author: sufferer 11 May 2012 04:13:58PM *  1 point [-]

But if there's even a chance …

Holden cites two posts (Why We Can’t Take Expected Value Estimates Literally and Maximizing Cost-effectiveness via Critical Inquiry). They are supposed to support the argument that small or very small changes to the probability of an existential risk event occurring are not worth caring about or donating money towards.

I think that these posts both have serious problems (see the comments, esp Carl Shulman's). In particular Why We Can’t Take Expected Value Estimates Literally was heavily criticised by Robin Hanson in On Fudge Factors.

Robin Hanson has been listed as the other major "intelligent/competent" critic of SIAI. That he criticises what seems to be the keystone of Holden's argument should be cause for concern for Holden. (after all, if "even a chance" is good enough, then all the other criticisms melt away).

This would be a much more serious criticism of SIAI if Holden and Hanson could come to agreement on what exactly the problem with SIAI is, and if Holden could sort out the problems with these two supporting posts*

(*of course they won't do that without substantial revision of one or both of their positions because Hanson is on the same page as the rest of SIAI with regard to expected utility, see On Fudge Factors. Hanson's disagreement with SIAI is a different one; approximately that Hanson thinks ems first is likely and that a singleton is both bad and unlikely, and Hanson's axiology is significantly unintuitive to the extent that he is not really on the same page as most people with regard to what counts as a good or bad outcome)

Comment author: TheOtherDave 11 May 2012 04:21:29PM 6 points [-]

Robin Hanson has been listed as the other major "intelligent/competent" critic of SIAI. That he criticises what seems to be the keystone of Holden's argument should be cause for concern for Holden.

So, I stipulate that Robin, whom Eliezer considers the only other major "intelligent/competent" critic of SI, disagrees with this aspect of Holden's position. I also stipulate that this aspect is the keystone of Holden's argument, and without it all the rest of it is irrelevant. (I'm not sure either of those statements is actually true, but they're beside my point here.)

I do not understand why these stipulated facts should be a significant cause for concern for Holden, who may not consider Eliezer's endorsement of what is and isn't legitimate criticism of SI particularly significant evidence of anything important.

Can you expand on your reasoning here?

Comment author: sufferer 11 May 2012 04:39:47PM *  0 points [-]

I suspect that Holden would also consider Robin Hanson a competent critic. This is because Robin is smart, knowledgeable and prestigiously accredited.

But your comment has alerted me to the fact that even if Hanson comes out as a flat-earther tomorrow the supporting posts are still weak.

The issue of the two most credible critics of SIAI disagreeing with each other is logically independent of the issue of Holden's wobbly argument against the utilitarian argument for SIAI. Many thanks.

Comment author: jsteinhardt 11 May 2012 05:51:45PM 0 points [-]

I'm not sure what you mean by

Hanson is on the same page as the rest of SIAI with regard to expected utility

As Holden and Eliezer both explicitly state, SIAI itself rejects the "but there's still a chance" argument.

Comment author: NancyLebovitz 11 May 2012 04:50:00PM 8 points [-]

I'd brought up a version of the tool/agent distinction, and was told firmly that people aren't smart or fast enough to direct an AI. (Sorry, this is from memory-- I don't have the foggiest how to do an efficient search to find that exchange.)

I'm not sure that's a complete answer-- how possible is it to augment a human towards being able to manage an AI? On the other hand, a human like that isn't going to be much like humans 1.0, so problems of Friendliness are still in play.

Perhaps what's needed is building akrasia into the world-- a resistance to sudden change. This has its own risks, but sudden existential threats are rare. [1]

At this point, I think the work on teaching rationality is more reliably important than the work on FAI. FAI involves some long inferential chains. The idea that people could improve their lives a lot by thinking more carefully about what they're doing and acting on those thoughts (with willingness to take feedback) is a much more plausible idea, even if you factor in the idea that rationality can be taught.

[1] Good enough for fiction-- we're already living in a world like that. We call the built-in akrasia Murphy.

Comment author: TheOtherDave 11 May 2012 06:05:03PM 7 points [-]

You may be thinking of this exchange, which I found only because I remembered having been involved in it.

I continue to think that "tool" is a bad term to use here, because people's understanding of what it refers to vary so relevantly.

As for what is valuable work... hm.

I think teaching people to reason in truth-preserving and value-preserving ways is worth doing.
I think formalizing a decision theory that captures universal human intuitions about what the right thing to do in various situations is worth doing.
I think formalizing a decision theory that captures non-universal but extant "right thing" intuitions is potentially worth doing, but requires a lot of auxiliary work to actually be worth doing.
I think formalizing a decision theory that arrives at judgments about the right thing to do in various situations where those judgments are counterintuitive for most/all humans but reliably lead, if implemented, to results that those same humans reliably endorse more the results of their intuitive judgments is worth doing.
I think building systems that can solve real-world problems efficiently is worth doing, all else being equal, though I agree that powerful tools frequently have unexpected consequences that create worse problems than they solve, in which case it's not worth doing.
I think designing frameworks within which problem-solving systems can be built, such that the chances of unexpected negative consequences are lower inside that framework than outside of it, is worth doing.

I don't find it likely that SI is actually doing any of those things particularly more effectively than other organizations.

Comment author: NancyLebovitz 11 May 2012 06:59:24PM 2 points [-]

Thanks for the link-- that was what I was thinking of.

Do you have other organizations which teach rationality in mind? Offhand, the only thing I can think of is cognitive behavioral therapy, and it's not exactly an organization.