Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

AI Risk and Opportunity: A Strategic Analysis

8 Post author: lukeprog 04 March 2012 06:06AM

Suppose you buy the argument that humanity faces both the risk of AI-caused extinction and the opportunity to shape an AI-built utopia. What should we do about that? As Wei Dai asks, "In what direction should we nudge the future, to maximize the chances and impact of a positive intelligence explosion?"

This post serves as a table of contents and an introduction for an ongoing strategic analysis of AI risk and opportunity.

Contents:

  1. Introduction (this post)
  2. Humanity's Efforts So Far
  3. A Timeline of Early Ideas and Arguments
  4. Questions We Want Answered
  5. Strategic Analysis Via Probability Tree
  6. Intelligence Amplification and Friendly AI
  7. ...


Why discuss AI safety strategy?

The main reason to discuss AI safety strategy is, of course, to draw on a wide spectrum of human expertise and processing power to clarify our understanding of the factors at play and the expected value of particular interventions we could invest in: raising awareness of safety concerns, forming a Friendly AI team, differential technological development, investigating AGI confinement methods, and others.

Discussing AI safety strategy is also a challenging exercise in applied rationality. The relevant issues are complex and uncertain, but we need to take advantage of the fact that rationality is faster than science: we can't "try" a bunch of intelligence explosions and see which one works best. We'll have to predict in advance how the future will develop and what we can do about it.


Core readings

Before engaging with this series, I recommend you read at least the following articles:


Example questions

Which strategic questions would we like to answer? Muehlhauser (2011) elaborates on the following questions:

  • What methods can we use to predict technological development?
  • Which kinds of differential technological development should we encourage, and how?
  • Which open problems are safe to discuss, and which are potentially dangerous?
  • What can we do to reduce the risk of an AI arms race?
  • What can we do to raise the "sanity waterline," and how much will this help?
  • What can we do to attract more funding, support, and research to x-risk reduction and to specific sub-problems of successful Singularity navigation?
  • Which interventions should we prioritize?
  • How should x-risk reducers and AI safety researchers interact with governments and corporations?
  • How can optimal philanthropists get the most x-risk reduction for their philanthropic buck?
  • How does AI risk compare to other existential risks?
  • Which problems do we need to solve, and which ones can we have an AI solve?
  • How can we develop microeconomic models of WBEs and self-improving systems?
  • How can we be sure a Friendly AI development team will be altruistic?

Salamon & Muehlhauser (2013) list several other questions gathered from the participants of a workshop following Singularity Summit 2011, including:

  • How hard is it to create Friendly AI?
  • What is the strength of feedback from neuroscience to AI rather than brain emulation?
  • Is there a safe way to do uploads, where they don't turn into neuromorphic AI?
  • How possible is it to do FAI research on a seastead?
  • How much must we spend on security when developing a Friendly AI team?
  • What's the best way to recruit talent toward working on AI risks?
  • How difficult is stabilizing the world so we can work on Friendly AI slowly?
  • How hard will a takeoff be?
  • What is the value of strategy vs. object-level progress toward a positive Singularity?
  • How feasible is Oracle AI?
  • Can we convert environmentalists into people concerned with existential risk?
  • Is there no such thing as bad publicity [for AI risk reduction] purposes?

These are the kinds of questions we will be tackling in this series of posts for Less Wrong Discussion, in order to improve our predictions about which direction we can nudge the future to maximize the chances of a positive intelligence explosion.

Comments (161)

Comment author: Wei_Dai 05 March 2012 06:43:33PM *  4 points [-]

I suggest adding some more meta questions to the list.

  • What improvements can we make to the way we go about answering strategy questions? For example, should we differentiate between "strategic insights" (such as Carl Shulman's insight that WBE-based Singletons may be feasible) and "keeping track of the big picture" (forming the overall strategy and updating it based on new insights and evidence), and aim to have people specialize in each, so that people deciding strategy won't be tempted to overweigh their own insights? Another example: is there a better way to combine probability estimates from multiple people?
  • How do people in other fields answer strategy questions? Is there such a thing as a science or art of strategy that we can copy from (and perhaps improve upon with ideas from x-rationality)?
  • Should the subject be called "AI safety strategies" or "Singularity strategies"? (I prefer the latter.)
Comment author: Vladimir_Nesov 04 March 2012 04:00:30PM 3 points [-]

Link: In this ongoing thread, Wei Dai and I discuss the merits of pre-WBE vs. post-WBE decision theory/FAI research.

Comment author: lukeprog 28 March 2012 09:58:00PM *  2 points [-]

What could an FAI project look like? Louie points out that it might look like Princeton's Institute for Advanced Study:

Created as a haven for thinking, the Institute [for Advanced Study] remains for many the Shangri-la of academe: a playground for the scholarly superstars who become the Institute's permanent faculty. These positions carry no teaching duties, few administrative responsibilities, and high salaries, and so represent a pinnacle of academic advancement. The expectation is that given this freedom, the professors at the Institute will think the big thoughts that can propel social and intellectual progress. Over the years the permanent faculty has included Nobel laureates as well as recipients of almost every other intellectual honor. Among the mathematicians, there have been several winners of the Fields Medal…

If the permanent faculty makes up the intellectual foundation of the Institute, the lifeblood is provided by the parade of international visitors who bring a continuous influx of new ideas. They may come for as little as an afternoon, or as long as a few years, in which case they take up temporary positions as Institute "members."

Idea #1: Write a good, very technical Open Problems in Friendly Artificial Intelligence, get a few of the best mathematicians/physicists who care about FAI accepted as visitors, have them talk to faculty and visitors at about the technical problems related to FAI.

Idea #2: Convince wealthy donors to endow a chair at the Institute for Advanced Study for somebody to do FAI research. (Princeton may not mind us sending another brilliant person and a bunch of money their way.)

Similar research institutes: PARC, Bell Labs, Perimeter Institute, maybe others?

Comment author: gwern 28 March 2012 10:24:08PM 8 points [-]

But did the IAS actually succeed? Off-hand, the only thing I can think of them for was hosting Einstein in his crankish years, Kurt Godel before he want crazy, and Von Neumann's work on a real computer (which they disliked and wanted to get rid of). Richard Hamming, who might know, said:

When you are famous it is hard to work on small problems. This is what did Shannon in. After information theory, what do you do for an encore? The great scientists often make this error. They fail to continue to plant the little acorns from which the mighty oak trees grow. They try to get the big thing right off. And that isn't the way things go. So that is another reason why you find that when you get early recognition it seems to sterilize you. In fact I will give you my favorite quotation of many years. The Institute for Advanced Study in Princeton, in my opinion, has ruined more good scientists than any institution has created, judged by what they did before they came and judged by what they did after. Not that they weren't good afterwards, but they were superb before they got there and were only good afterwards.

(My own thought is to wonder if this is kind of a regression to the mean, or perhaps regression due to aging.)

Comment author: Wei_Dai 28 March 2012 11:27:52PM 4 points [-]

How do you maintain secrecy in such a setting? Or is there a new line of thought that says secrecy isn't necessary for an FAI project?

Comment author: lukeprog 29 March 2012 12:25:18AM 1 point [-]

The person/people working on FAI there could work exclusively on the relatively safe problems, e.g. CEV.

Comment author: Wei_Dai 29 March 2012 02:50:55PM 9 points [-]

Ok, I thought when you said "FAI project" you meant a project to build FAI. But I've noticed two problems with trying to do some of the relatively safe FAI-related problems in public:

  1. It's hard to predetermine whether a problem is safe, and hard to stop or slow down once research momentum gets going. For example I've become concerned that decision theory research may be dangerous, but I'm having trouble getting even myself to stop.
  2. All the problems, safe and unsafe, are interrelated, and people working on the (seemly) safe problems will naturally become interested in the (more obviously) unsafe ones as well and start thinking about them. (For example solving CEV seems to require understanding the nature of "preference", which leads to decision theory, and solving decision theory seems to require understanding the nature of logical uncertainty.) It seems very hard to prevent this or make all the researchers conscientious and security-conscious enough to not leak out or deliberately publish (e.g. to gain academic reputation) unsafe research results. Even if you pick the initial researchers to be especially conscientious and security-conscious, the problem will get worse as they publish results and other people become interested in their research areas.
Comment author: lukeprog 29 March 2012 06:32:00PM 3 points [-]

Yes, both Eliezer and I (and many others) agree with these points. Eliezer seems pretty set on only doing a basement-style FAI team, perhaps because he's thought about the situation longer and harder than I have. I'm still exploring to see whether there are strategic alternatives, or strategic tweaks. I'm hoping we can discuss this in more detail when my strategic analysis series gets there.

Comment author: Wei_Dai 29 March 2012 09:10:54PM 6 points [-]

But it seems like SIAI has already deviated from the basement-style FAI plan, since it started supporting research associates who are allowed/encouraged to publish openly, and encouraging public FAI-related research in other ways (such as publishing a list of open problems). And if the "slippery slope" problems I described were already known, why didn't anyone bring them up during the discussions about whether to publish papers about UDT? (I myself only thought of them in the general explicit form yesterday.)

If SIAI already knew about these problems but still thinks it's a good idea to promote public FAI-related research and publish papers about decision theory, then I'm even more confused than before. I hope your series "gets there" soon so I can see where the cause of the disagreement lies.

Comment author: lukeprog 29 March 2012 09:14:50PM 3 points [-]

What I'm saying is that there are costs and benefits to open FAI work. You listed some costs, but that doesn't mean there aren't also benefits. See, e.g. Vladimir's comment.

Comment author: Wei_Dai 29 March 2012 09:23:13PM 2 points [-]

The benefits are only significant if there is a significant chance of successfully building FAI before some UFAI project takes off. Maybe our disagreement just boils down to different intuitions about that? But Nesov agrees this chance is "tiny" and still wants to push open research, so I'm still confused.

Comment author: Vladimir_Nesov 29 March 2012 10:29:12PM *  1 point [-]

The benefits are only significant if there is a significant chance of successfully building FAI before some UFAI project takes off. ... But Nesov agrees this chance is "tiny" and still wants to push open research, so I'm still confused.

I want to make it bigger, as much as I can. It doesn't matter how small a chance of winning there is, as long as our actions improve it. Giving up doesn't seem like a strategy that leads to winning. The strategy of navigating the WBE transition (or some more speculative intelligence improvement tool) is a more complicated question, and I don't see in what way the background catastrophic risk matters for it.

This also came up in a previous discussion about this we had: it's necessary to distinguish the risk within a given interval of years, and the eventual risk (i.e. the risk of never building a FAI). The same action can make immediate risk worse, but probability of eventually winning higher. I think encouraging an open effort for researching metaethics through decision theory is like that; also better acceptance of the problem might be leveraged to overturn the hypothetical increase in UFAI risk.

Comment author: Wei_Dai 29 March 2012 11:39:05PM *  4 points [-]

It doesn't matter how small a chance of winning there is, as long as our actions improve it.

Yes, if we're talking about the overall chance of winning, but I was talking about the chance of winning through a specific scenario (directly building FAI). If the chance of that is tiny, why did your cost/benefit analysis of the proposed course of action (encouraging open FAI research) focus completely on it? Shouldn't we be thinking more about how the proposal affects other ways of winning? ETA: To spell it out, encouraging open FAI research decreases the probability that we win by winning the WBE race or through intelligence amplification, by increasing the probability that UFAI happens first.

Giving up doesn't seem like a strategy that leads to winning.

Nobody is saying "let's give up". If we don't encourage open FAI research, we can still push for a positive Singularity in other ways, some of which I've posted about recently in discussion.

The strategy of navigating the WBE transition (or some more speculative intelligence improvement tool) is a more complicated question, and I don't see in what way the background catastrophic risk matters for it.

What do you mean? What aren't you seeing?

The same action can make immediate risk worse, but probability of eventually winning higher.

Yes, of course. I am talking about the probability of eventually winning.

Comment author: Will_Newsome 29 March 2012 10:56:46PM *  2 points [-]

The same action can make immediate risk worse, but probability of eventually winning higher.

Near/Far. Long-term effects aren't predictable and shouldn't be traded for more predictable short-term losses. In my experience it fails the Predictable Retrospective Stupidity test. Even when you try to factor in structural uncertainty, you still end up getting burned. And even if you still want to make such a tradeoff then you should halt all research until you've come to agreement or a natural stopping point with Wei Dai or others who have reservations. Stop, melt, catch fire, don't destroy the world.

(Disclaimer: This comment is fueled by a strong emotional reaction due to contingent personal details that might or might not upon further reflection deserve to be treated as substantial evidence for the policy I recommend.)

Comment author: lukeprog 29 March 2012 10:07:47PM 1 point [-]

Yeah, we'll come back to this in the strategy series. There are lots of details to consider.

Comment author: Vladimir_Nesov 29 March 2012 08:26:07PM *  3 points [-]

There seems to be a tradeoff here. An open project has more chances to develop the necessary theory faster, but having such project in the open looks like a clearly bad idea towards the endgame. So on one hand, an open project shouldn't be cultivated (and becomes harder to hinder) as we get closer to the endgame, but on the other, a closed project will probably not get off the ground, and fueling it by an initial open effort is one way to make it stronger. So there's probably some optimal point to stop encouraging open development, and given the current state of the theory (nil) I believe the time hasn't come yet.

The open effort could help the subsequent closed project in two related ways: gauge the point where the understanding of what to actually do in the closed project is sufficiently clear (for some sense of "sufficiently"), and form enough of background theory to be able to convince enough young Conways (with necessary training) to work on the problem on the closed stage.

Comment author: Wei_Dai 29 March 2012 09:51:09PM *  3 points [-]

So there's probably some optimal point to stop encouraging open development, and given the current state of the theory (nil) I believe the time hasn't come yet.

Your argument seems premised on the assumption that there will be an endgame. If we assume some large probability that we end up deciding not to have an endgame at all (i.e., not to try to actually build FAI with unenhanced humans), then it's no longer clear "the time hasn't come yet".

Even if we assume that with probability ~1 there will be an effort to directly build FAI, given the slippery slope effects we have to stop encouraging open research well before the closed project starts. The main deciding factors for "when" must be how large the open research community has gotten, how strong the slippery slope effects are, and how much "pull" SingInst has against those effects. The "current state of the theory" seems to have little to do with it. (Edit: No that's too strong. Let me amend it to "one consideration among many".)

Comment author: Vladimir_Nesov 29 March 2012 10:12:01PM *  3 points [-]

If we assume some large probability that we end up deciding not to have an endgame at all (i.e., not to try to actually build FAI with unenhanced humans), then it's no longer clear "the time hasn't come yet".

This is something we'll know better further down the road, so as long as it's possible to defer this decision (i.e. while the downside is not too great, however that should be estimated), it's the right thing to do. I still can't rule out that there might be a preference definition procedure (that refers to humans) simple enough to be implemented pre-WBE, and decision theory seems to be an attack on this possibility (clarifying why this is naive, for example, in which case it'll also serve as an argument to the powerful in the WBE race).

The "current state of the theory" seems to have little to do with it. (Edit: No that's too strong. Let me amend it to "one consideration among many".)

Well, maybe not specifically current, but what can be expected eventually, for the closed project to benefit from, which does seem to me like a major consideration in the possibility of its success.

Comment author: Will_Newsome 29 March 2012 09:32:20PM 3 points [-]

I'm confused as to what you have in mind when you're thinking of work on CEV. Do you mean things like getting a better model of the philosophy of reflective consistency, or studying mechanism design to find algorithms for relatively fair aggregation, or looking into neuroscience to see how beliefs and preferences are encoded, or...? Is there perhaps a post I missed or am forgetting?

Comment author: Giles 04 March 2012 02:59:50PM 2 points [-]

Which open problems are safe to discuss, and which are potentially dangerous

This one seems particularly interesting, especially as it seems to apply to itself. Due to the attention hazard problem, coming up with a "list of things you're not allowed to discuss" sounds like a bad idea. But what's the alternative? Yeuugh.

Comment author: Stuart_Armstrong 15 March 2012 12:29:24PM *  3 points [-]

Selective opinion and answers (for longer discussions, respond to specific points and I'll furnish more details):

Which kinds of differential technological development should we encourage, and how?

I recommend pushing for whole brain emulations, with scanning-first and emphasis on fully uploading actual humans. Also, military development of AI should be prioritised over commercial and academic development, if possible.

Which open problems are safe to discuss, and which are potentially dangerous?

Seeing what has already been published, I see little advantage to restricting discussion of most open problems.

What can we do to reduce the risk of an AI arms race?

Any methods that would reduce traditional arms races. Cross ownership of stocks in commercial companies. Investment funds with specific AI disclosure requirements. Rewards for publishing interim results.

What can we do to raise the "sanity waterline," and how much will this help?

Individual sanity waterline raising among researchers useful, but generally we want to raise the sanity waterline of institutions, which is harder but more important (and may have nothing to do with improving individuals).

Which interventions should we prioritize?

We need a solid push to see if reduced impact or Oracle AIs can work, and we need to make the academic and business worlds to take the risks more seriously. Interventions to stop the construction of dangerous AIs unlikely to succeed, but "working with your company to make your AIs safer (and offering useful advice along the way)" could work. We need to develop useful tools we can offer others, not solely nagging them all the time.

How should x-risk reducers and AI safety researchers interact with governments and corporations?

Beggars can't be choosers. For the moment, we need to make them take it seriously, convince them, and give away any safety-increasing info we might have. Later we may have to pursue different courses.

How can optimal philanthropists get the most x-risk reduction for their philanthropic buck?

Funding SIAI and FHI and similar, getting us in contact with policy makers, raising the respectability of xrisks.

How does AI risk compare to other existential risks?

Very different; no other xrisk has such uncertain probabilities and timelines, and such huge risks and rewards and various scenarios that can play out.

Which problems do we need to solve, and which ones can we have an AI solve?

We need to survive till AI, and survive AI. If we survive, most trends are positive, so don't need to worry about much else.

How can we develop microeconomic models of WBEs and self-improving systems?

With thought and research :-)

How can we be sure a Friendly AI development team will be altruistic?

Do it ourselves, normalise altruistic behaviour in the field, or make it in their self-interest to be altruistic.

How hard is it to create Friendly AI?

Probably extraordinarily hard if the FAI is as intelligent as we fear. More work needs to be done to explore partial solutions (limited impact, Oracle, etc...)

Is there a safe way to do uploads, where they don't turn into neuromorphic AI?

Keep them as human (in their interactions, in their virtual realities, in their identities etc...) as possible.

How possible is it to do FAI research on a seastead?

How is this relevant? If governments were so concerned about AI potential that the location of the research became important, then we would have made tremendous progress in getting people to take it seriously, and AI will most likely not be developed by a small seasteading independent group.

How much must we spend on security when developing a Friendly AI team?

We'll see at the time.

What's the best way to recruit talent toward working on AI risks?

General: get people involved as a problem to be worked on, socialise them into our world, get them to care. AI researchers: conferences and publications and getting more respectable publicity.

How difficult is stabilizing the world so we can work on Friendly AI slowly?

Very.

How hard will a takeoff be?

Little useful data. Use scenario planning rather than probability estimates.

What is the value of strategy vs. object-level progress toward a positive Singularity?

Both needed, both need to be closely connected, easy shifts from one to the other. Possibly should be more strategy at the current time.

How feasible is Oracle AI?

As yet unknown. Research progressing, based on past performance I expect new insights to arrive.

Can we convert environmentalists into people concerned with existential risk?

With difficulty for AI risks, with ease for some others (extreme global warming). Would this be useful? Smaller more tightly focused pressure groups would preform much better, even if less influence.

Is there no such thing as bad publicity [for AI risk reduction] purposes?

Anything that makes it seem more like an area for cranks is bad publicity.

Comment author: Wei_Dai 19 April 2012 09:41:29AM 1 point [-]

What are your most important disagreements with other FHI/SIAI people? How do you account for these disagreements?

You say:

I recommend pushing for whole brain emulations

but also:

We need a solid push to see if reduced impact or Oracle AIs can work

which makes me a bit confused. Are you saying we should push them simultaneously, or what? Also, what path do you see from a successful Oracle AI to a positive Singularity? For example, use Oracle AI to develop WBE technology, then use WBEs to create FAI? Or something else?

Comment author: Stuart_Armstrong 19 April 2012 09:51:47AM 4 points [-]

What are your most important disagreements with other FHI/SIAI people? How do you account for these disagreements?

Main disagreement with FHI people is that I'm more worried about AI than they are (I'm probably up with the SIAI folks on this). I suspect an anchoring effect here - I was drawn to the FHI's work through AI risk, others were drawn in through other angles (also I spend much more time on Less Wrong, making AI risks very salient). Not sure what this means for accuracy, so my considered opinion is that AI is less risky than I individually believe.

Are you saying we should push them simultaneously, or what?

My main disagreement with SIAI is that I think FAI is unlikely to be implementable on time. So I want to explore alternative avenues, several ones ideally. Oracle to FAI would be one route; Oracle to people taking AI seriously to FAI might be another. WBE opens up many other avenues (including "no AI"), so is also worth looking into.

I haven't bothered to try and close the gap between me and SIAI on this, because even if they are correct, I think it's valuable for the group to have someone looking into non-FAI avenues.

Comment author: Wei_Dai 19 April 2012 10:12:01AM 2 points [-]

Thanks for the answers. The main problem I have with Oracle AI is that it seems a short step from OAI to UFAI, but a long path to FAI (since you still need to solve ethics and it's hard to see how OAI helps with that), so it seems dangerous to push for it, unless you do it in secret and can keep it secret. Do you agree? If so, I'm not sure how "Oracle to people taking AI seriously to FAI" is supposed to work.

Comment author: Stuart_Armstrong 19 April 2012 10:29:43AM 1 point [-]

My main "pressure point" is pushing UFAI development towards OAI. ie I don't advocate building OAI, but making sure that the first AGIs will be OAIs. And I'm using far too many acronyms.

Comment author: Wei_Dai 19 April 2012 10:39:12AM 5 points [-]

What does it matter that the first AGIs will be OAIs, if UFAIs follow immediately after? I mean, once knowledge of how to build OAIs start to spread, how are you going to make sure that nobody fails to properly contain their Oracles, or intentionally modifies them into AGIs that act on their own initiatives? (This recent post of mine might better explain where I'm coming from, if you haven't already read it.)

Comment author: cousin_it 19 April 2012 09:52:15PM *  2 points [-]

We can already think productively about how to win if oracle AIs come first. Paul Christiano is working on this right now, see the "formal instructions" posts on his blog. Things are still vague but I think we have a viable attack here.

Comment author: Stuart_Armstrong 20 April 2012 08:37:17AM 1 point [-]

Wot cousin_it said.

Of course the model "OAIs are extremely dangerous if not properly contained; let's let everyone have one!" isn't going to work. But there are many things we can try with an OAI (building a FAI, for instance), and most importantly, some of these things will be experimental (the FAI approach relies on getting the theory right, with no opportunity to test it). And there is a window that doesn't exist with a genie - a window where people realise superintelligence is possible and where we might be able to get them to take safety seriously (and they're not all dead). We might also be able to get exotica like a limited impact AI or something like that, if we can find safe ways of experimenting with OAIs.

And there seems no drawback to pushing an UFAI project into becoming an OAI project.

Comment author: Wei_Dai 20 April 2012 06:29:34PM 2 points [-]

Cousin_it's link is interesting, but it doesn't seem to have anything to do with OAI, and instead looks like a possible method of directly building an FAI.

Of course the model "OAIs are extremely dangerous if not properly contained; let's let everyone have one!" isn't going to work.

Hmm, maybe I'm underestimating the amount of time it would take for OAI knowledge to spread, especially if the first OAI project is a military one (on the other hand, the military and their contractors don't seem to be having better luck with network security than anyone else). How long do you expect the window of opportunity (i.e., the time from the first successful OAI to the first UFAI, assuming no FAI gets built in the mean time) to be?

some of these things will be experimental

I'd like to have FAI researchers determine what kind of experiments they want to do (if any, after doing appropriate benefit/risk analysis), which probably depends on the specific FAI approach they intend to use, and then build limited AIs (or non-AI constructs) to do the experiments. Building general Oracles that can answer arbitrary (or a wide range of) questions seems unnecessarily dangerous for this purpose, and may not help anyway depending on the FAI approach.

And there seems no drawback to pushing an UFAI project into becoming an OAI project.

There may be, if the right thing to do is to instead push them to not build an AGI at all.

Comment author: Stuart_Armstrong 23 April 2012 11:11:41AM 0 points [-]

One important fact I haven't been mentioning: OAI help tremendously with medium speed takeoffs (fast takeoffs are dangerous for the usual reasons, slow takeoffs mean that we will have moved beyond OAIs by the time the intelligence level hits dangerous), because we can then use them to experiment.

There may be, if the right thing to do is to instead push them to not build an AGI at all.

Interacting with AGI people at the moment (organising a jointish conference), will have a clearer idea of how they react to these ideas at a later stage.

Comment author: Vladimir_Nesov 23 April 2012 11:57:12AM *  0 points [-]

slow takeoffs mean that we will have moved beyond OAIs by the time the intelligence level hits dangerous

Moved where/how? Slow takeoff means we have more time, but I don't see how it changes the nature of the problem. Low time to WBE makes (not particularly plausible) slow takeoff similar to the (moderately likely) failure to develop AGI before WBE.

Comment author: Vladimir_Nesov 19 April 2012 11:28:38AM *  1 point [-]

Together with Wei's point that OAI doesn't seem to help much, there is the downside that existence of OAI safety guidelines might make it harder to argue against pushing AGI in general. So on net it's plausible that this might be a bad idea, which argues for weighing this tradeoff more carefully.

Comment author: Stuart_Armstrong 20 April 2012 08:38:33AM 1 point [-]

there is the downside that existence of OAI safety guidelines might make it harder to argue against pushing AGI in general.

Possibly. But in my experience even getting the AGI people to admit that there might be safety issues is over 90% of the battle.

Comment author: Vladimir_Nesov 20 April 2012 10:44:06AM *  0 points [-]

It's useful for AGI researchers to notice that there are safety issues, but not useful for them to notice that there are "safety issues" which can be dealt with by following OAI guidelines. The latter kind of understanding might be worse than none at all, as it seemingly resolves the problem. So it's not clear to me that getting people to "admit that there might be safety issues" is in itself a worthwhile milestone.

Comment author: Vladimir_Nesov 19 April 2012 11:22:46AM *  1 point [-]

My main disagreement with SIAI is that I think FAI is unlikely to be implementable on time.

Why do you say this is a disagreement? Who at SIAI thinks FAI is likely to be implementable on time (and why)?

So I want to explore alternative avenues, several ones ideally.

Right, assuming we can find any alternative avenues of comparable probability of success. I think it's unlikely for FAI to be implementable both "on time" (i.e. by humans in current society), and via alternative avenues (of which fast WBE humans seems the most plausible one, which argues for late WBE that's not hardware-limited, not pushing it now). This makes current research as valuable as alternative routes despite improbability of current research's success.

Comment author: Stuart_Armstrong 19 April 2012 11:24:28AM *  1 point [-]

Why do you say this is a disagreement? Who at SIAI thinks FAI is likely to be implementable on time (and why)?

Let me rephrase: I think the expected gain from pursuing FAI is less that pursuing other methods. Other methods are less likely to work, but more likely to be implementable. I think SIAI disagrees with this accessment.

Comment author: Vladimir_Nesov 19 April 2012 11:35:48AM 1 point [-]

I think the expected gain from pursuing FAI is less that pursuing other methods. Other methods are less likely to work, but more likely to be implementable.

I assume that by "implementable" you mean that it's an actionable project, that might fail to "work", i.e. deliver the intended result. I don't see how "implementability" is a relevant characteristic. What matters is whether something works, i.e. succeeds. If you think that other methods are less likely to work, how are they of greater expected value? I probably parsed some of your terms incorrectly.

Comment author: Stuart_Armstrong 19 April 2012 11:54:42AM 1 point [-]

Whether the project reached the desired goal, versus whether that goal will actually work. If Nick and Eliezer both agreed about some design that "this is how you build a FAI", then I expect it will work. However, I don't think it's likely that would happen. It's more likely they will say "this is how you build a proper Oracle AI", but less likely the Oracle will end up being safe.

Comment author: Vladimir_Nesov 23 April 2012 12:05:05PM *  0 points [-]

Whether the project reached the desired goal, versus whether that goal will actually work.

Okay, but I still don't understand how a project with lower probability of "actually working" can be of higher expected value. I'm referring to this statement:

I think the expected gain from pursuing FAI is less that pursuing other methods. Other methods are less likely to work...

The argument you seem to be giving in support of higher expected value of other methods is that they are "more likely to be implementable" (a project reaching its stated goal, even if that goal turns out to be no good), but I don't see how is that an interesting property.

Comment author: [deleted] 23 April 2012 05:10:39PM *  0 points [-]

He didn't say other architectures would be no good, he said they're less likely to be safe.

He thinks the distribution P(Outcome | do(complete Oracle AI project)) isn't as highly peaked at Weirdtopia as P(outcome | do(complete FAI)); Oracle AI puts more weight on regions like "Lifeless universe", "Eternal Torture", "Rainbows and Slow Death", and "Failed Utopia".

However, "Complete FAI" isn't an actionable procedure, so he examines the chance of completion conditional on different actions he can take. "Not worth pursuing because non-implementable" means that available FAI supporting actions don't have a reasonable chance of producing friendly AI, which discounts the peak in the conditional outcome distribution at valuable futures relative to do(complete FAI). And supposedly he has some other available oracle AI supporting strategy which fares better.

Eating a sandwich isn't as cool as building an interstellar society with wormholes for transportation, but I'm still going to make a sandwich for lunch, because it's going to work and maybe be okay-ish.

Comment author: Wei_Dai 19 April 2012 11:12:29AM 1 point [-]

Main disagreement with FHI people is that I'm more worried about AI than they are (I'm probably up with the SIAI folks on this).

Where can we read FHI's analysis of AI risk? Why are they not as worried as you and SIAI people? Has there ever been a debate between FHI and SIAI on this? What threats are they most worried about? What technologies do they want to push or slow down?

Comment author: Stuart_Armstrong 19 April 2012 03:28:41PM 2 points [-]

What threats are they most worried about?

AI is high on the list - one of the top risks, even if their objective assessment is lower than SIAI. Nuclear war, synthetic biology, nanotech, pandemics, social collapse: these are the other ones we're looking it.

Comment author: Stuart_Armstrong 19 April 2012 03:26:04PM *  2 points [-]

Basically they don't buy the "AI inevitably goes foom and inevitably takes over". They see definite probabilities of these happening, but their estimates are closer to 50% than to 100%.

Comment author: TheOtherDave 19 April 2012 04:25:35PM 3 points [-]

They estimate it at 50%???
And there are other things they are more concerned about?
What are those other things?

Comment author: Stuart_Armstrong 20 April 2012 08:29:27AM 2 points [-]

They estimate a variety of of conditional statements ("AI possible this century", "if AI then FOOM", "if FOOM then DOOM", etc...) with magnitudes between 20% and 80% (I had the figures somewhere, but can't find them). I think when it was all multiplied out it was in the 10-20% range.

And I didn't say they thought other things were more worrying; just that AI wasn't the single overwhelming risk/reward factor that SIAI (and me) believe it to be.

Comment author: XiXiDu 19 April 2012 03:15:43PM 0 points [-]

A wild guess. FHI believes that the best what can reasonably be done about existential risks at this point in time is to do research into existential risks, including possible unknown unknowns, and into strategies to reduce current existential risks. This somewhat agrees with their FAQ:

Research into existential risk and analysis of potential countermeasures is a very strong candidate for being the currently most cost-effective way to reduce existential risk. This includes research into some methodological problems and into certain strategic questions that pertain to existential risk. Similarly, actions that contribute indirectly to producing more high-quality analysis on existential risk and a capacity later to act on the result of such analysis could also be extremely cost-effective. This includes, for example, donating money to existential risk research, supporting organizations and networks that engage in fundraising for existential risks work, and promoting wider awareness of the topic and its importance.

In other words, FHI seems to focus on meta issues, existential risks in general, rather than associated specifics.

Comment author: XiXiDu 04 March 2012 11:19:33AM *  3 points [-]

"In what direction should we nudge the future, to maximize the chances and impact of a positive Singularity?"

Friendly AI is incredible hard to get right and a friendly AI that is not quite friendly could create a living hell for the rest of time, increasing negative utility dramatically.

I vote for antinatalism. It should be seriously considered to create a true paperclip maximizer that transforms the universe into an inanimate state devoid of suffering. Friendly AI is simply too risky.

I think that humans are not psychological equal. Not only are there many outliers, but most humans would turn into abhorrent creatures given their own pocket universe, unlimited power and a genie. And even given our current world, if we were to remove the huge memeplex of western civilization, most people would act like stone age hunter-gatherer. And that would be bad enough. After all, violence is the major cause of death within stone age socities.

Even proposals like CEV (Coherent Extrapolated Volition) can turn out to be a living hell for a percentage of all beings. I don't expect any amount of knowledge, or intelligence, to cause humans to abandon their horrible preferences.

Eliezer Yudkowsky says that intelligence does not imply benevolence. That an artificial general intelligence won't turn out to be friendly. That we have to make it friendly. Yet his best proposal is that humanity will do what is right if we only knew more, thought faster, were more the people we wished we were and had grown up farther together. The idea is that knowledge and intelligence implies benevolence for people. I don't think so.

The problem is that if you extrapolate chaotic systems, e.g. human preferences given real world influence, small differences in initial conditions are going to yield widely diverging outcomes. That our extrapolated volition converges rather than diverges seems to be a bold prediction.

I just don't see that a paperclip maximizer burning the cosmic commons is as bad as it is currently portrayed. Sure, it is "bad". But everything else might be much worse.

Here is a question for those who think that antinatalism is just stupid. Would you be willing to rerun the history of the universe to obtain the current state? Would you be willing to create another Genghis Khan, a new holocaust, allowing intelligent life to evolve?

As Greg Egan wrote: "To get from micro-organisms to intelligent life this way would involve an immense amount of suffering, with billions of sentient creatures living, struggling and dying along the way."

If you are not willing to do that, then why are you willing to do the same now, just for much longer, by trying to colonize the universe? Are you so sure that the time to come will be much better? How sure are you?

ETA

I expect any friendly AI outcome that fails to be friendly in a certain way to increase negative utility and only a perfectly "friendly" (whatever that means, it is still questionable if the whole idea makes sense) AI to yield a positive utility outcome.

That is because the closer any given AGI design is to friendliness the more likely it is that humans will be kept alive but might suffer. Whereas an unfriendly AI in complete ignorance of human values will more likely just see humans as a material resource without having any particular incentive to keep humans around.

Just imagine a friendly AI which fails to "understand" or care about human boredom.

There are several possibilities by which SIAI could actually cause a direct increase in negative utility.

1) Friendly AI is incredible hard and complex. Complex systems can fail in complex ways. Agents that are an effect of evolution have complex values. To satisfy complex values you need to meet complex circumstances. Therefore any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways. A half-baked, not quite friendly, AI might create a living hell for the rest of time, increasing negative utility dramatically.

2) Humans are not provably friendly. Given the power to shape the universe the SIAI might fail to act altruistic and deliberately implement an AI with selfish motives or horrible strategies.

Comment author: Wei_Dai 06 March 2012 07:09:36AM *  6 points [-]

Earlier, you wrote

Personally I don't want to contribute anything to an organisation which admits to explore strategies that are unacceptable by most people. And I wouldn't suggest anyone else to do so.

Surely building an anti-natalist AI that turns the universe into inert matter would be considered unacceptable by most people. So I'm confused. Do you intend to denounce SIAI if they do seriously consider this strategy, and also if they don't?

Comment author: XiXiDu 06 March 2012 10:54:34AM 0 points [-]

Surely building an anti-natalist AI that turns the universe into inert matter would be considered unacceptable by most people. So I'm confused.

Yet I am not secretive about it and I believe that it is one of the less horrible strategies. Given that SI is strongly attached to decision theoretic ideas, which I believe are not the default outcome due to practically intractable problems, I fear that their strategies might turn out to be much worse than the default case.

I think that it is naive to simply trust SI because they seem like nice people. Although I don't doubt that they are nice people. But I think that any niceness is easily drowned by their eagerness to take rationality to its logical extreme without noticing that they have reached a point where the consequences constitute a reductio ad absurdum. If game and decision theoretic conjectures show that you can maximize expected utility by torturing lots of people, or by voluntary walking into death camps, then that's the right thing to do. I don't think that they are psychopathic personalities per se though. Those people are simply hold captive by their idea of rationality. And that is what makes them extremely dangerous.

Do you intend to denounce SIAI if they do seriously consider this strategy, and also if they don't?

I would denounce myself if I would seriously consider that strategy. But I would also admire them for doing so because I believe that it is the right thing to do given their own framework of beliefs. What they are doing right now seems just hypocritical. Researching FAI will almost certainly lead to worse outcomes than researching how to create an anti-natalist AI as soon possible.

What I really believe is that there is not enough data to come to any definitive conclusion about the whole idea of a technological singularity and dangerous recursive self-improvement in particular and that it would be stupid to act on any conclusion that one could possible come up with at this point.

I believe that SI/lesswrong mainly produces science fiction and interesting, although practically completely useless, though-experiments. The only danger I see is that some people associated with SI/lesswrong might run rampant once someone demonstrates certain AI capabilities.

All in all I think they are just fooling themselves. They collected massive amounts of speculative math and logic and combined it into a framework of beliefs that can be used to squash any common sense. They have seduced themselves with formulas and lost any ability to discern scribbles on paper from real world rationality. They managed to give a whole new meaning to the idea of model uncertainty by making it reach new dramatic heights.

Bayes’ Theorem, the expected utility formula, and Solomonoff induction are unusable in most but a few limited situations where you have a well-defined testable and falsifiable hypothesis or empirical data. In most situations those heuristics are computationally intractable, one more than the other.

There is simply no way to assign utility to world states without deluding yourself to believe that your decisions are more rational than just trusting your intuition. There is no definition of "utility" that's precise enough to figure out what a being that maximizes it would do. There can't be, not without unlimited resources. Any finite list of actions maximizes infinitely many different quantities. Utility does only become well-defined if we add limitations on what sort of quantities we consider. And even then...

Preferences are a nice concept. But they are just as elusive as the idea of a "self". Preferences are not just malleable but they keep changing as we make more observations, and so does the definition of utility. Which makes it impossible to act in a time-consistent way.

Comment author: Wei_Dai 06 March 2012 07:16:03PM 2 points [-]

What I really believe is that there is not enough data to come to any definitive conclusion about the whole idea of a technological singularity and dangerous recursive self-improvement in particular and that it would be stupid to act on any conclusion that one could possible come up with at this point.

I agree with the "not enough data to come to any definitive conclusion" part, but think we could prepare for the Singularity by building an organization that is not attached to any particular plan but is ready to act when there is enough data to come to definitive conclusions (and tries to gather more data in the mean time). Do you agree with this, or do you think we should literally do nothing?

I believe that SI/lesswrong mainly produces science fiction and interesting, although practically completely useless, though-experiments.

I guess I have a higher opinion of SIAI than that. Just a few months ago you were saying:

I also fear that, at some point, I might need the money. Otherwise I would have already donated a lot more to the Singularity Institute years ago.

What made you change your mind since then?

Comment author: XiXiDu 06 March 2012 08:13:55PM 1 point [-]

I also fear that, at some point, I might need the money. Otherwise I would have already donated a lot more to the Singularity Institute years ago.

What made you change your mind since then?

I did not change my mind. All I am saying is that I wouldn't suggest anyone to contribute money to SI who fully believes what they believe. Because that would be counterproductive. If I accepted all of their ideas then I would make the same suggestion as you, to build "an organization that is not attached to any particular plan".

But I do not share all of their beliefs. Particularly I do not currently believe that there is a strong case that uncontrollable recursive self-improvement is possible. And if it is possible I do not think that it is feasible. And even if it is feasible I believe that it won't happen any time soon. And if it will happen soon I do not think that SI will have anything to do with it.

I believe that SI is an important organisation that deserves money. Although if I would share their idea of rationality and their technological optimism then the risks would outweigh the benefit.

Why I believe SI deserves money:

  • It makes people think by confronting them with the logical consequences of state of the art ideas from the field of rationality.
  • It explores topics and fringe theories that are neglected or worthy of consideration.
  • It challenges the conventional foundations of charitable giving, causing organisations like GiveWell to reassess and possibly improve their position.
  • It creates a lot of exciting and fun content and dicussions.

All in all I believe that SI will have a valuable influence. I believe that the world needs people and organisations that explore crazy ideas, that try to treat rare diseases in cute kittens and challenge conventional wisdom. And SI is such an organisation. Just like Roger Penrose and Stuart Hameroff. Just like all the creationists who caused evolutionary biologist to hone their arguments. SI will influence lots of fields and make people contemplate their beliefs.

To fully understand why my criticism of SI and willingness to donate does not contradict, you also have to realize that I do not accept the usual idea of charitable giving that is being voiced here. I think that the reasons for why people like me contribute money to charities and causes are complex and can't be reduced to something as simple as wanting to do the most good. It is not just about wanting to do good, signaling or warm fuzzies. It is is all of it and much more. I also believe that it is piratically impossible to figure out how to maximize good deeds. And even if you were to do it for selfish reasons, you'd have to figure out what you want in the first place. An idea which is probably "not even wrong".

Comment author: XiXiDu 06 March 2012 08:41:47PM -1 points [-]

I also fear that, at some point, I might need the money. Otherwise I would have already donated a lot more to the Singularity Institute years ago.

What made you change your mind since then?

Before you throw more of what I wrote in the past at me:

  • I sometimes take different positions just to explore an argument, because it is fun to discuss and because I am curious what reactions I might provoke.
  • I don't have a firm opinion on many issues.
  • There are a lot of issues for which there are as many arguments that oppose a certain position as there are arguments that support it.
  • Most of what I write is not thought-out. I most often do not consciously contemplate what I write.
  • I find it very easy to argue for whatever position.
  • I don't really care too much about most issues but write as if I do, to evoke feedback. I just do it for fun.
  • I am sometimes not completely honest to exploit the karma system. Although I don't do that deliberately.
  • If I believe that SI/lesswrong could benefit from criticism I voice it if nobody else does.

The above is just some quick and dirty introspection that might hint at the reason for some seemingly contradictionary statements. The real reasons are much more complex of course, but I haven't thought about that either :-)

I just don't have the time right now to think hard about all the issues discussed here. I am still busy improving my education. At some point I will try to tackle the issues with due respect and in all seriousness.

Comment author: wedrifid 07 March 2012 10:53:10AM *  4 points [-]

Before you throw more of what I wrote in the past at me:

I have quoted everything XiXiDu said here so that it is not lost in any future edits.

Many of XiXis contributions consist of persuasive denunciations. As he points out in the parent (and quoted below), often these are based off little research, without much contemplation and are done to provoke reactions rather than because they are correct. Since XiXiDu is rather experienced at this mode of communication - and the arguments he uses have been able to be selected for persuasiveness through trial and error - there is a risk that he will be taken more seriously than is warranted.

The parent should be used to keep things in perspective when XiXiDu is rabble rousing.

  • I sometimes take different positions just to explore an argument, because it is fun to discuss and because I am curious what reactions I might provoke.
  • I don't have a firm opinion on many issues.
  • There are a lot of issues for which there are as many arguments that oppose a certain position as there are arguments that support it.
  • Most of what I write is not thought-out. I most often do not consciously contemplate what I write.
  • I find it very easy to argue for whatever position.
  • I don't really care too much about most issues but write as if I do, to evoke feedback. I just do it for fun.
  • I am sometimes not completely honest to exploit the karma system. Although I don't do that deliberately.
  • If I believe that SI/lesswrong could benefit from criticism I voice it if nobody else does.

The above is just some quick and dirty introspection that might hint at the reason for some seemingly contradictionary statements. The real reasons are much more complex of course, but I haven't thought about that either :-)

I just don't have the time right now to think hard about all the issues discussed here. I am still busy improving my education. At some point I will try to tackle the issues with due respect and in all seriousness.

Comment author: Steve_Rayhawk 07 March 2012 07:22:58PM *  9 points [-]

That said, I think his fear of culpability (for being potentially passively involved in an existential catastrophe) is very real. I suspect he is continually driven, at a level beneath what anyone's remonstrations could easily affect, to try anything that might somehow succeed in removing all the culpability from him. This would be a double negative form of "something to protect": "something to not be culpable for failure to protect".

If this is true, then if you try to make him feel culpability for his communication acts as usual, this will only make his fear stronger and make him more desperate to find a way out, and make him even more willing to break normal conversational rules.

I don't think he has full introspective access to his decision calculus for how he should let his drive affect his communication practices or the resulting level of discourse. So his above explanations for why he argues the way he does are probably partly confabulated, to match an underlying constraining intuition of "whatever I did, it was less indefensible than the alternative".

(I feel like there has to be some kind of third alternative I'm missing here, that would derail the ongoing damage from this sort of desperate effort by him to compel someone or something to magically generate a way out for him. I think the underlying phenomenon is worth developing some insight into. Alex wouldn't be the only person with some amount of this kind of psychology going on -- just the most visible.)

Comment author: XiXiDu 08 March 2012 10:58:08AM *  1 point [-]

What is your suggestion then? How do I get out? Delete all of my posts, comments and website like Roko?

Seriously, if it wasn't for assholes like wedrifid I wouldn't even bother anymore and just quit. The grandparent was an attempt at honesty, trying to leave. Then that guy comes along claiming that most of my submissions consisted of "persuasive denunciations". Someone as him who does nothing else all the time. Someone who never argues for his case.

ETA Ah fuck it all. I'll take another attempt and log out now and not get involved anymore. Happy self-adulation.

Comment author: wedrifid 07 March 2012 11:57:38PM *  1 point [-]

If this is true, then if you try to make him feel culpability for his communication acts as usual, this will only make his fear stronger and make him more desperate to find a way out, and make him even more willing to break normal conversational rules.

I certainly wouldn't try to make him feel culpability. Or, for that matter, "try to make him" anything at all. I don't believe I have the ability to influence XiXi significantly and I don't believe it would be useful to try (any more). It is for this reason that I rather explicitly spoke in the third person to any prospective future readers that it may be appropriate to refer here in the future. Pretending that I was actually talking to XiXiDu when I was clearly speaking to others is would just be insulting to him.

There are possible future cases (and plenty of past cases) where a reply to one of XiXiDu's fallacious denunciations that consists of simply a link here is more useful than ignoring the comment entirely and hoping that the damage done is minimal.

Comment author: XiXiDu 08 March 2012 09:51:46AM 1 point [-]

...one of XiXiDu's fallacious denunciations...

Show me just one.

Comment author: XiXiDu 08 March 2012 10:20:25AM 0 points [-]

I don't believe I have the ability to influence XiXi significantly...

You could easily influence me with actual arguments.

Comment author: Alsadius 15 April 2012 08:18:43PM 0 points [-]

If a denunciation is accurate, does it really matter what the source is? Sometimes, putting pin to balloon is its own reward.

Comment author: wedrifid 15 April 2012 08:44:38PM 0 points [-]

If a denunciation is accurate, does it really matter what the source is?

The rhetorical implication appears to be non-sequitur. Again. Please read more carefully.

Comment author: Alsadius 15 April 2012 09:00:15PM 0 points [-]

You're suggesting that he might be making arguments that are taken more seriously than they warrant. Unless an argument is based on incorrect facts, it should be taken exactly as seriously as it warrants on its own merits. Why does the source matter?

Comment author: wedrifid 15 April 2012 09:26:24PM 0 points [-]

Why does the source matter?

Even if the audience is assumed to be perfect at evaluating evidence on it's merits then the source matters to the extent that the authority of the author and the authority of the presentation are considered evidence. Knowing how pieces of evidence were selected also gives information, so knowing about the can provide significant information.

And the above assumption definitely doesn't hold - people are not perfect at evaluating evidence on it's merits. Considerations about how arguments optimized through trial error for persuasiveness become rather important when all recipients have known biases and you are actively trying to reduce the damage said biases cause.

Finally, considerations about how active provocation may have an undesirable influence on the community are qualitatively different from considerations about whether a denunciation is accurate. Just because I evaluate XiXiDu's typical 'arguments' as terribly nonsensical thinking that does not mean I should be similarly dismissive of the potential damage that can be done by them, given the expressed intent and tactics. I can evaluate the threat that the quoted agenda has as significant even when I don't personally take the output of that agenda seriously at all.

Comment author: XiXiDu 07 March 2012 11:27:35AM -2 points [-]

I have quoted everything XiXiDu said here so that it is not lost in any future edits.

You might want to save this as well.

...without much contemplation and are done to provoke reactions rather than because they are correct.

Here is how I see it. I am just an uneducated below average IQ individual and don't spend more time on my submissions than it takes to write them. If people are swayed by my ramblings then how firm could their beliefs possible be in the first place?

Many of XiXis contributions consist of persuasive denunciations. [...] there is a risk that he will be taken more seriously than is warranted.

I could have as easily argued in favor of SI. If I was to start now and put some extra effort into it I believe I could actually become more persuasiveness than SI itself. Do you believe that in a world where I did that you would tell people that my arguments are based on little research and that there is a risk that I am taken more seriously than is warranted?

Comment author: [deleted] 07 March 2012 08:17:45PM 6 points [-]

below average IQ individual

Don't self-deprecate too much. Have you taken a (somewhat recent) IQ test, say an online matrix test or the Mensa one? (If so, personal prediction.)

Even though LW over-estimates its own IQ, don't forget how stupid IQ 100 really is.

Comment author: wallowinmaya 07 March 2012 10:57:43PM 3 points [-]

uneducated below average IQ

Don't be ridiculous.

Comment author: XiXiDu 09 March 2012 11:28:31AM 1 point [-]

Don't be ridiculous.

Yesterday I took an IQ test suggested by muflax and scored 78.

Comment author: wallowinmaya 09 March 2012 01:31:16PM 1 point [-]

Yeah, I took it too and scored 37 - because my eyes were closed.

Do you really believe that you're dumber than 90% of all people? (~ IQ of 78; I suppose the SD was 15)

Seriously, do you know just how stupid most humans are?

I deny the data.

Comment author: wedrifid 08 March 2012 12:05:01AM 0 points [-]

For sure. XiXiDu uses grammar correctly! (Well, enough so that "become more persuasiveness" struck me as an editing error rather than typical.)

If someone uses grammar correctly it is an overwhelmingly strong indicator that either they are significantly educated (self or otherwise) or have enough intelligence to compensate!

Comment author: MichaelAnissimov 07 March 2012 04:58:52AM 1 point [-]

Given all these facts, it's pretty hard to take what you say seriously...

Comment author: steven0461 07 March 2012 09:58:56PM *  16 points [-]

a friendly AI that is not quite friendly could create a living hell for the rest of time, increasing negative utility dramatically

"Ladies and gentlemen, I believe this machine could create a living hell for the rest of time..."

(audience yawns, people look at their watches)

"...increasing negative utility dramatically!"

(shocked gasps, audience riots)

Comment author: XiXiDu 08 March 2012 10:30:54AM *  0 points [-]

Do you actually disagree with anything or are you just trying to ridicule it? Do you think that the possibility that FAI research might increase negative utility is not to be taken seriously? Do you think that world states where faulty FAI designs are implemented have on average higher utility than world states where nobody is alive? If so, what research could I possible do to come to the same conclusion? What arguments do I miss? Do I just have to think about it longer?

Consider the way Eliezer Yudkowsky agrues in favor of FAI research:

Two hundred million years from now, the children’s children’s children of humanity in their galaxy-civilizations, are unlikely to look back and say, “You know, in retrospect, it really would have been worth not colonizing the Herculus supercluster if only we could have saved 80% of species instead of 20%”. I don’t think they’ll spend much time fretting about it at all, really. It is really incredibly hard to make the consequentialist utilitarian case here, as opposed to the warm-fuzzies case.

or

This is crunch time. This is crunch time for the entire human species. … and it’s crunch time not just for us, it’s crunch time for the intergalactic civilization whose existence depends on us. I think that if you’re actually just going to sort of confront it, rationally, full-on, then you can’t really justify trading off any part of that intergalactic civilization for any intrinsic thing that you could get nowadays …

Is his style of argumentation any different from mine except that he promises lots of positive utility?

Comment author: steven0461 08 March 2012 06:38:05PM *  9 points [-]

I was just amused by the anticlimacticness of the quoted sentence (or maybe by how it would be anticlimactic anywhere else but here), the way it explains why a living hell for the rest of time is a bad thing by associating it with something so abstract as a dramatic increase in negative utility. That's all I meant by that.

Comment author: Wei_Dai 04 March 2012 10:10:14PM 10 points [-]

It should be seriously considered to create a true paperclip maximizer that transforms the universe into an inanimate state devoid of suffering.

Have you considered the many ways something like that could go wrong?

  • The paperclip maximizer (PM) encounters an alien civilization and causes lots of suffering warring with it
  • PM decides there's a chance that it's in a simulation run by a sadistic being who will punish it (prevent it from making paperclips) unless it creates trillions of conscious beings and tortures them
  • PM is itself capable of suffering
  • PM decides to create lots of descendent AIs in order to maximize paperclip production and they happen to be capable of suffering. (Our genes made us to maximize copies of them and we happen to be capable of suffering.)
  • somebody steals PM's source code before it's launched, and makes a sadistic AI

From your perspective, wouldn't it be better to just build a really big bomb and blow up Earth? Or alternatively, if you want to minimize suffering throughout the universe and maybe throughout the multiverse (e.g., by acausal negotiation with superintelligences in other universes), instead of just our corner of the world, you'd have to solve a lot of the same problems as FAI.

Comment author: XiXiDu 05 March 2012 11:12:03AM *  2 points [-]

Have you considered the many ways something like that could go wrong? [...] From your perspective, wouldn't it be better to [...] minimize suffering throughout the universe and maybe throughout the multiverse (e.g., by acausal negotiation with superintelligences in other universes), instead of just our corner of the world, you'd have to solve a lot of the same problems as FAI.

The reason for why I think that working towards FAI might be a bad idea is that it increases the chance of something going horrible wrong.

If I was to accept the framework of beliefs hold by SI then I would assign a low probability to the possibility that the default scenario in which an AI undergoes recursive self-improvement will include a lot of blackmailing that leads to a lot of suffering. Where the default is that nobody tries to make AI friendly.

I believe that any failed attempt at friendly AI is much more likely to 1) engage in blackmailing 2) keep humans alive 3) fail in horrible ways:

Utility of FOOM scenarios

Probability of FOOM scenarios

I think that working towards friendly AI will in most cases lead to negative utility scenarios that vastly outweigh the negative utility of an attempt that creating a simple transformer that turns the universe into an inanimate state.

ETA Not sure why the graph looks so messed up. Does anyone know of a better graphing tool?

Comment author: Wei_Dai 05 March 2012 07:46:00PM *  6 points [-]

I think that working towards friendly AI will in most cases lead to negative utility scenarios that vastly outweigh the negative utility of an attempt that creating a simple transformer that turns the universe into an inanimate state.

I think it's too early to decide this. There are many questions whose answers will become clearer before we have to make a choice one way or another. If eventually it becomes clear that building an antinatalist AI is the right thing to do, I think the best way to accomplish it would be through an organization that's like SIAI but isn't too attached to the idea of FAI and just wants to do whatever is best.

Now you can either try to build an organization like that from scratch, or try to push SIAI in that direction (i.e., make it more strategic and less attached to a specific plan). Of course, being lazy, I'm more tempted to do the latter, but your miles may vary. :)

Comment author: lukeprog 10 March 2012 07:56:26PM 3 points [-]

If eventually it becomes clear that building an antinatalist AI is the right thing to do, I think the best way to accomplish it would be through an organization that's like SIAI but isn't too attached to the idea of FAI and just wants to do whatever is best.

Yes.

I, for one, am ultimately concerned with doing whatever's best. I'm not wedded to doing FAI, and am certainly not wedded to doing 9-researchers-in-a-basement FAI.

Comment author: XiXiDu 10 March 2012 08:50:21PM 5 points [-]

I, for one, am ultimately concerned with doing whatever's best. I'm not wedded to doing FAI, and am certainly not wedded to doing 9-researchers-in-a-basement FAI.

Well, that's great. Still, there are quite a few problems.

How do I know

  • ... that SI does not increase existential risk by solving problems that can be used to build AGI earlier?
  • ... that you won't launch a half-baked friendly AI that will turn the world into a hell?
  • ... that you don't implement some strategies that will do really bad things to some people, e.g. myself?

Every time I see a video of one of you people I think, "Wow, those seem like really nice people. I am probably wrong. They are going to do the right thing."

But seriously, is that enough? Can I trust a few people with the power to shape the whole universe? Can I trust them enough to actually give them money? Can I trust them enough with my life until the end of the universe?

You can't even tell me what "best" or "right" or "winning" stands for. How do I know that it can be or will be defined in a way that those labels will apply to me as well?

I have no idea what your plans are for the day when time runs out. I just hope that you are not going to hope for the best and run some not quite friendly AI that does really crappy things. I hope you consider the possibility of rather blowing everything up than risking even worse outcomes.

Comment author: lukeprog 11 March 2012 08:17:45AM *  3 points [-]

Can I trust a few people with the power to shape the whole universe?

Hell no.

This is an open problem. See "How can we be sure a Friendly AI development team will be altruistic?" on my list of open problems.

Comment author: timtyler 11 March 2012 02:00:30PM 1 point [-]

I hope you consider the possibility of rather blowing everything up than risking even worse outcomes.

Blowing everying up would be pretty bad. Bad enough to not encourage the possibility.

Comment author: Vladimir_Nesov 05 March 2012 08:20:32PM *  0 points [-]

"Would you murder a child, if it's the right thing to do?"

an organization that's like SIAI but isn't too attached to the idea of FAI and just wants to do whatever is best.

If FAI is by definition a machine that does whatever is best, this distinction doesn't seem meaningful.

Comment author: Wei_Dai 05 March 2012 08:44:58PM *  3 points [-]

Ok, let me rephrase that to be clearer.

an organization that's like SIAI but isn't too attached to a specific kind of FAI design (that may be too complex and prone to fail in particularly horrible ways), and just wants to do whatever is best.

Comment author: Vladimir_Nesov 05 March 2012 08:48:29PM 1 point [-]

Do you think SingInst is too attached to a specific kind of FAI design? This isn't my impression. (Also, at this point, it might be useful to unpack "SingInst" into particular people constituting it.)

Comment author: Wei_Dai 06 March 2012 07:08:52AM 5 points [-]

Do you think SingInst is too attached to a specific kind of FAI design?

XiXiDu seems to think so. I guess I'm less certain but I didn't want to question that particular premise in my response to him.

It does confuse me that Eliezer set his focus so early on CEV. I think "it's too early to decide this" applies to CEV just as well as XiXiDu's anti-natalist AI. Why not explore and keep all the plausible options open until the many strategically important questions become clearer? Why did it fall to someone outside SIAI (me, in particular) to write about the normative and meta-philosophical approaches to FAI? (Note that the former covers XiXiDu's idea as a special case.) Also concerning is that many criticisms have been directed at CEV but Eliezer seems to ignore most of them.

Also, at this point, it might be useful to unpack "SingInst" into particular people constituting it.

I'd be surprised if there weren't people within SingInst who disagree with the focus on CEV, but if so, they seem reluctant to disagree in public so it's hard to tell who exactly, or how much say they have in what SingInst actually does.

I guess this could all be due to PR considerations. Maybe Eliezer just wanted to focus public attention on CEV because it's the politically least objectionable FAI approach, and isn't really terribly attached to the idea when it comes to actually building an FAI. But you can see how an outsider might get that impression...

Comment author: Jayson_Virissimo 06 March 2012 09:52:47AM *  7 points [-]

I always thought CEV was half-baked as a technical solution, but as a PR tactic it is...genius.

Comment author: Will_Newsome 06 March 2012 10:19:56AM 5 points [-]

Yeah, I thought it was explicitly intended more as a political manifesto than a philosophical treatise. I have no idea why so many smart people, like lukeprog, seem to be interpreting it not only as a philosophical basis but as outlining a technical solution.

Comment author: amcknight 07 March 2012 02:10:20AM 3 points [-]

Why do you think an unknown maximizer would be worse than a not quite friendly AI? Failed Utopia #4-2 sounds much better than a bunch of paperclips. Orgasmium sounds at least as good as paper clips.

Comment author: timtyler 05 March 2012 08:13:21PM 2 points [-]

Graphs make your case more convincing - even when they are drawn wrong and don't make sense!

...but seriously: where are you getting the figures in the first graph from?

Are you one of these "negative utilittarians" - who thinks that any form of suffering is terrible?

Comment author: timtyler 05 March 2012 08:16:14PM *  1 point [-]

I believe that any failed attempt at friendly AI is much more likely to 1) engage in blackmailing 2) keep humans alive 3) fail in horrible ways:

You sound a bit fixated on doom :-(

What do you make of the idea that the world has been consistently getting better for most of the last 3 billion years (give or take the occasional asteroid strike) - and that the progress is likely to continue?

Comment author: XiXiDu 05 March 2012 02:05:10PM *  1 point [-]

The paperclip maximizer (PM) encounters an alien civilization and causes lots of suffering warring with it

I don't think that it is likely that it will encounter anything that has equal resources and if it does that suffering would occur (see below).

PM decides there's a chance that it's in a simulation run by a sadistic being who will punish it (prevent it from making paperclips) unless it creates trillions of conscious beings and tortures them

That seems like one of the problems that have to be solved in order to build an AI that transforms the universe into an inanimate state. But I think it is much easier to make an AI not simulate any other agents than to create a friendly AI. Much more can go wrong by creating a friendly AI, including the possibility that it tortures trillions of beings. In the case of a transformer you just have to make sure that it values an universe that is as close as possible to a state where no computation takes place and that does not engage in any kind of trade, acausal or otherwise.

PM is itself capable of suffering

I believe that any sort of morally significant suffering is an effect of (natural) evolution, and may in fact be dependent on that. I think that the kind of maximizer that SI has in mind is more akin to a transformation process that isn't consciousness, does not have emotions and cannot suffer. If those qualities would be necessary requirements then I don't think that we will build an artificial general intelligence any time soon and that if we do it will happen slowly and not be able to undergo dangerous recursive self-improvement.

somebody steals PM's source code before it's launched, and makes a sadistic AI

I think that this is more likely to be the case with friendly AI research because it takes longer.

Comment author: Steve_Rayhawk 04 March 2012 11:44:05AM 9 points [-]

Currently you suspect that there are people, such as yourself, who have some chance of correctly judging whether arguments such as yours are correct, and of attempting to implement the implications if those arguments are correct, and of not implementing the implications if those arguments are not correct.

Do you think it would be possible to design an intelligence which could do this more reliably?

Comment author: steven0461 04 March 2012 07:27:16PM *  5 points [-]

I don't get it. Design a Friendly AI that can better judge whether it's worth the risk of botching the design of a Friendly AI?

ETA: I suppose your point applies to some of XiXiDu's concerns but not others?

Comment author: Vladimir_Nesov 04 March 2012 08:24:41PM 2 points [-]

A lens that sees its flaws.

Comment author: steven0461 04 March 2012 08:45:43PM 2 points [-]

I don't understand. Is the claim here that you can build a "decide whether the risk of botched Friendly AI is worth taking machine", and the risk of botching such a machine is much less than the risk of botching a Friendly AI?

Comment author: Vladimir_Nesov 04 March 2012 09:28:27PM *  5 points [-]

A FAI that includes such "Should I run?" heuristic could pose a lesser risk than a FAI without such heuristic. If this heuristic works better than human judgment about running a FAI, it should be used instead of human judgment.

This is the same principle as for AI's decisions themselves, where we don't ask AI's designers for object-level moral judgments, or encode specific object-level moral judgments into AI. Not running an AI would then be equivalent to hardcoding the decision "Should the AI run?" resolved by designers to "No." into the AI, instead of coding the question and letting the AI itself answer it (assuming we can expect it to answer the question more reliably than the programmers can).

Comment author: steven0461 04 March 2012 10:09:15PM *  4 points [-]

If we botched the FAI, wouldn't we also probably have botched its ability to decide whether it should run?

Comment author: Vladimir_Nesov 04 March 2012 10:45:20PM *  1 point [-]

Yes, and if it tosses a coin, it has 50% chance of being right. The question is calibration, how much trust should such measures buy compared to their absence, given what is known about given design.

Comment author: John_Maxwell_IV 05 March 2012 01:30:31AM *  0 points [-]

The machine Steve proposes might not bear as much risk of creating "living hell" by attempting to get the human utility function right, but missing in such a way that humans are still alive, just living very unpleasantly. To me, this seems by far the biggest of XiXiDu's concerns.

Comment author: XiXiDu 04 March 2012 12:26:05PM *  3 points [-]

Here are a few premises:

  • Complex systems can fail in complex ways.
  • Destruction is easier than creation.
  • Expected utility maximization is rational and feasible.
  • We should be extremely conservative about not implementing a half-baked friendly AI.

If you believe that self-improving AI is inevitable and that creating friendly AI is more difficult than creating unfriendly AI then to launch an AI that simply destroys everything as quickly as possible has a higher expected utility than doing nothing or trying to implement an AI that is not completely friendly.

The negative utility of an AI that transforms the universe into an inanimate state is simply the positive utility of a perfectly friendly AI. But most outcomes are expected to yield unfriendly AI, or not quite friendly AI, which will actively increase negative utility by possibly keeping alive living beings indefinitely given abhorrent circumstances. To prevent those scenarios one can try to solve friendly AI, which will most likely fail (or even increase the chances of a negative singularity), or try to launch a destructive singleton with simple goals to prevent further suffering and the evolution of life elsewhere in the universe. Creating a destructive self-improving AI that kills all life should be the easiest of all possibilities with a high probability of success.

Comment author: MichaelAnissimov 06 March 2012 01:48:39PM *  2 points [-]

Assuming your argument is correct, wouldn't it make more sense to blow ourselves up with nukes rather than pollute the universe with UFAI? There may be other intelligent civilizations out there leading worthwhile lives that we threaten unfairly by unleashing UFAI.

I'm skeptical that friendly AI is as difficult as all that because, to take an example, humans are generally considered pretty "wicked" by traditional writers and armchair philosophers, but lately we haven't been murdering each other or deliberately going out of way to make each other's lives miserable very often. For instance, say I were invincible. I could theoretically stab everyone I meet without any consequences, but I doubt I would do that. And I'm just human. Goodness may seem mystical and amazingly complex from our current viewpoint, but is it really as complex as all that? There were a lot of things in history and science that seemed mystically complex but turned out to be formalizable in compressed ways, such as the mathematics of Darwinian population genetics. Who would have imagined that the "Secrets of Life and Creation" would be revealed like that? But they were. Could "sufficient goodness that we can be convinced the agent won't put us through hell" also have a compact description that was clearly tractable in retrospect?

Comment author: XiXiDu 06 March 2012 03:24:11PM 3 points [-]

Assuming your argument is correct, wouldn't it make more sense to blow ourselves up with nukes rather than pollute the universe with UFAI? There may be other intelligent civilizations out there leading worthwhile lives that we threaten unfairly by unleashing UFAI.

There might be countless planets that are about to undergo an evolutionary arms race for the next few billions years resulting in a lot of suffering. It is very unlikely that there is a single source of life that is exactly on the right stage of evolution with exactly the right mind design to not only lead worthwhile lives but also get their AI technology exactly right to not turn everything into a living hell.

In case you assign negative utility to suffering, which is likely to be universally accepted to have negative utility, then given that you are an expected utility maximizer it should be a serious consideration to end all life. Because 1) agents that are an effect of evolution have complex values 2) to satisfy complex values you need to meet complex circumstances 3) complex systems can fail in complex ways 4) any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways.

For instance, say I were invincible. I could theoretically stab everyone I meet without any consequences, but I doubt I would do that. And I'm just human.

To name just one example where things could go horrible wrong. Humans are by their very nature interested in domination and sex. Our aversion against sexual exploitation is largely dependent on the memeplex of our cultural and societal circumstances. If you knew more, were smarter and could think faster you might very well realize that such an aversion is a unnecessary remnant that you can easily extinguish to open up new pathways to gain utility. That Gandhi would not agree to have his brain modified into a baby-eater is incredible naive. Given the technology people will alter their preferences and personality. Many people actually perceive their moral reservations to be limiting. It only takes some amount of insight to just overcome such limitations.

You simply can't be sure that future won't hold vast amounts of negative utility. It is much easier for things to go horrible wrong than to be barely acceptable.

Goodness may seem mystical and amazingly complex from our current viewpoint, but is it really as complex as all that?

Maybe not, but betting on the possibility that goodness can be easily achieved is like pulling a random AI from mind design space hoping that it turns out to be friendly.

Comment author: timtyler 06 March 2012 08:07:23PM *  2 points [-]

You simply can't be sure that future won't hold vast amounts of negative utility. It is much easier for things to go horrible wrong than to be barely acceptable.

Similarly, it is easier to make piles of rubble than skyscrapers. Yet - amazingly - there are plenty of skyscrapers out there. Obviously something funny is going on...

Comment author: timtyler 04 March 2012 10:15:46PM *  2 points [-]

The negative utility of an AI that transforms the universe into an inanimate state is simply the positive utility of a perfectly friendly AI. But most outcomes are expected to yield unfriendly AI, or not quite friendly AI, which will actively increase negative utility by possibly keeping alive living beings indefinitely given abhorrent circumstances.

Hang on, though. That's still normally better than not existing at all! Hell has to be at least bad enough for the folk in it to want to commit suicide for utility to count as "below zero". Most plausible futures just aren't likely to be that bad for the creatures in them.

Comment author: XiXiDu 05 March 2012 03:41:49PM -1 points [-]

Hell has to be at least bad enough for the folk in it to want to commit suicide for utility to count as "below zero". Most plausible futures just aren't likely to be that bad for the creatures in them.

The present is already bad enough. There is more evil than good. You are more often worried than optimistic. You are more often hurt than happy. That's the case for most people. We just tend to remember the good moments more than the rest of our life.

It is generally easier to arrive at bad world states than good world states. Because to satisfy complex values you need to meet complex circumstances. And even given simple values and goals, the laws of physics are grim and remorseless. In the end you're going to lose the fight against the general decay. Any temporary success is just a statistical fluke.

Comment author: timtyler 05 March 2012 07:51:02PM *  1 point [-]

The present is already bad enough. There is more evil than good. You are more often worried than optimistic. You are more often hurt than happy.

No, I'm not!

That's the case for most people. We just tend to remember the good moments more than the rest of our life.

Yet most creatures would rather live than die - and they show that by choosing to live. Dying is an option - they choose not to take it.

It is generally easier to arrive at bad world states than good world states. Because to satisfy complex values you need to meet complex circumstances. And even given simple values and goals, the laws of physics are grim and remorseless. In the end you're going to lose the fight against the general decay. Any temporary success is just a statistical fluke.

It sounds as though by now there should be nothing left but dust and decay! Evidently something is wrong with this reasoning. Evolution produces marvellous wonders - as well as entropy. Your existence is an enormous statistical fluke - but you still exist. There's no need to be "down" about it.

Comment author: katydee 05 March 2012 10:19:43PM 0 points [-]

You are more often hurt than happy.

For some people, this is a solved problem.

Comment author: timtyler 06 March 2012 08:12:40PM -3 points [-]

Creating a destructive self-improving AI that kills all life should be the easiest of all possibilities with a high probability of success.

Where "success" refers to obliterating yourself and all your descendants. That's not how most Darwinian creatures usually define success. Natural selection does build creatures that want to die - but only rarely and by mistake.

Comment author: Kaj_Sotala 04 March 2012 02:44:26PM 7 points [-]

As pessimistic as this sounds, I'm not sure if I actually disagree with any of it.

Comment author: David_Gerard 04 March 2012 02:36:45PM *  3 points [-]

Has anyone constructed even a vaguely plausible outline, let alone a definition, of what would constitute a "human-friendly intelligence", defined in terms other than effects you don't want it to have? As you note, humans aren't human-friendly intelligences, or we wouldn't have internal existential risk.

The CEV proposal seems to attempt to move the hard bit to technological magic (a superintelligence scanning human brains and working out a solution to human desires that is possible, is coherent and won't destroy us all) - this is saying "then a miracle occurs" in more words.

Comment author: John_Maxwell_IV 04 March 2012 11:51:25PM 2 points [-]

As you note, humans aren't human-friendly intelligences, or we wouldn't have internal existential risk.

It's possible that particular humans might approximate human friendly intelligences.

Comment author: David_Gerard 05 March 2012 08:02:55AM -1 points [-]

Assuming it's not impossible, how would you know? What constitutes a human-friendly intelligence, in other than negative terms?

Comment author: timtyler 05 March 2012 08:01:45PM -2 points [-]

Has anyone constructed even a vaguely plausible outline, let alone a definition, of what would constitute a "human-friendly intelligence", defined in terms other than effects you don't want it to have?

Er, that's how it is defined - at least by Yudkowsky. You want to argue definitions? Without even offering one of your own? How will that help?

Comment author: David_Gerard 05 March 2012 08:09:50PM -3 points [-]

No, I'm pointing out that a purely negative definition isn't actually a useful definition that describes the thing the label is supposed to be pointing at. How does one work toward a negative? We can say a few things it isn't - what is it?

Comment author: timtyler 05 March 2012 08:43:24PM *  1 point [-]

Yudkowsky says:

The term "Friendly AI" refers to the production of human-benefiting, non-human-harming actions in Artificial Intelligence systems that have advanced to the point of making real-world plans in pursuit of goals.

That isn't a "purely negative" definition in the first place.

Even if it was - would you object to the definition of "hole" on similar grounds?

What exactly is wrong with defining some things in terms of what they are not?

It I say a "safe car" is one that doesn't kill or hurt people, that seems just fine to me.

Comment author: David_Gerard 05 March 2012 10:44:32PM *  0 points [-]

The word "artificial" there makes it look like it means more than it does. And humans are just as made of atoms. Let's try it without that:

The term "friendly intelligence" refers to the production of human-benefiting, non-human-harming actions in intelligences that have advanced to the point of making real-world plans in pursuit of goals.

It's only described in terms of its effects, and then only vaguely. We have no idea what it would actually be. The CEV plan doesn't include what it would actually be, it just includes a technological magic step where it's worked out.

This may be better than nothing, but it's not enough to say it's talking about anything that's actually understood in even the vaguest terms.

For an analogy, what would a gorilla-friendly human-level intelligence be like? How would you reasonably make sure it wasn't harmful to the future of gorillas? (Humans out the box do pretty badly at this.) What steps would the human take to ascertain the CEV of gorillas, assuming tremendous technological resources?

Comment author: timtyler 06 March 2012 12:13:15AM 0 points [-]

We can't answer the "how can you do this?" questions today. If we could we would be done.

It's true that CEV is an 8-year old, moon-onna-stick wishlist - apparently created without much thought about to how to implement it. C'est la vie.

Comment author: John_Maxwell_IV 05 March 2012 12:23:27AM *  1 point [-]

Interesting thoughts.

It seems like an attempt at Oracle AI, which simply strives to answer all questions accurately while otherwise exerting as little influence on the world as possible, would be strictly better than a paperclip maximizer, no? At the very least you wouldn't see any of the risks of "almost friendly AI".

You might see some humans getting power over other humans, but to be honest I don't think that would be worse than humans existing, period. Keep in mind that historically, the humans that were put in power over others were the ones who had the ruthlessness necessary to get to the top – they might not be representative. Can you name any female evil dictators?

Comment author: lukeprog 07 March 2012 09:56:48PM *  0 points [-]

[ignore; was off-topic]

Comment author: Will_Newsome 05 March 2012 10:13:27AM 0 points [-]

Do you think that it is possible to build an AI that does the moral thing even without being directly contingent on human preferences? Conditional on its possibility, do you think we should attempt to create such an AI?

I share your trepidation about humans and their values, but I see that as implying that we have to be meta enough such that even if humans are wrong, our AI will still do what is right. It seems to me that this is still a real possibility. For an example of an FAI architecture that is more in this direction, check out CFAI.

Comment author: XiXiDu 05 March 2012 01:45:16PM 2 points [-]

Do you think that it is possible to build an AI that does the moral thing even without being directly contingent on human preferences?

No. I believe that it is practically impossible to systematically and consistently assign utility to world states. I believe that utility can not even be grounded and therefore defined. I don't think that there exists anything like "human preferences" and therefore human utility functions, apart from purely theoretical highly complex and therefore computationally intractable approximations. I don't think that there is anything like a "self" that can be used to define what constitutes a human being, not practically anyway. I don't believe that it is practically possible to decide what is morally right and wrong in the long term, not even for a superintelligence.

I believe that stable goals are impossible and that any attempt at extrapolating the volition of people will alter it.

Besides I believe that we won't be able to figure out any of the following in time:

  • The nature of consciousness and its moral significance.
  • The relation and moral significance of suffering/pain/fun/happiness.

I further believe that the following problems are impossible to solve, respectively constitute a reductio ad absurdum of certain ideas:

Comment author: timtyler 05 March 2012 08:10:30PM 1 point [-]

I believe that it is practically impossible to systematically and consistently assign utility to world states. I believe that utility can not even be grounded and therefore defined. I don't think that there exists anything like "human preferences" and therefore human utility functions, apart from purely theoretical highly complex and therefore computationally intractable approximations. I don't think that there is anything like a "self" that can be used to define what constitutes a human being, not practically anyway. I don't believe that it is practically possible to decide what is morally right and wrong in the long term, not even for a superintelligence.

Strange stuff.

Surely "right" and "wrong" make the most sense in the context of a specified moral system.

If you are using those terms outside such a context, it usually implies some kind of moral realism - in which case, one wonders what sort of moral realism you have in mind.