ETA: I'll be adding things to the list that I think belong there.

I'm assuming a high level of credence in classic utilitarianism, and that AI-Xrisk is significant (e.g. roughly >10%), and timelines are not long (e.g. >50% ASI in <100years). ETA: For the purpose of this list, I don't care about questioning those assumptions.

Here's my current list (off the top of my head):

  • not your comparitive advantage
  • consider other Xrisks more threatening (top contenders: bio / nuclear)
  • infinite ethics (and maybe other fundamental ethical questions, e.g. to do with moral uncertainty)
  • S-risks
  • simulation hypothesis
  • ETA: AI has high moral value in expectation / by default
  • ETA: low tractability (either at present or in general)
  • ETA: Doomsday Argument as overwhelming evidence against futures with large number of minds

Also, does anyone want to say why they think none of these should change the picture? Or point to a good reference discussing this question? (etc.)


New Answer
New Comment


4 Answers sorted by

Without rejecting any of the premises in your question I can come up with:

Low tractability: you assign almost all of the probability mass to one or both of "alignment will be easily solved" and "alignment is basically impossible"

Currently low tractability: If your timeline is closer to 100 years than 10, it is possible that the best use of resources for AI risk is "sit on them until the field developers further" in the same sense that someone in the 1990s wanting good facial recognition might have been best served by waiting for modern ML.

Refusing to prioritize highly uncertain causes in order to avoid the Winner's Curse outcome of your highest priority ending up as something with low true value and high noise

Flavours of utilitarianism that don't value the unborn and would not see it as an enormous tragedy if we failed to create trillions of happy post-Singularity people (depending on the details human extinction might not even be negative, so long as the deaths aren't painful)

Other reasons that people may have (I have some of these reasons, but not all):

  • not a classical utilitarian
  • don't believe those timelines
  • too distant to feel an emotional tie to
  • unclear what to do even if it is a priority
  • very high discount rate for future humans
  • belief that moral value is relative with cognitive ability (an extremely smart AI may be worth a few quitillion humans in a moral/experiential sense)

Of these, I think the one that I'm personally least moved by while acknowleging it as one of the better arguments against utilitarianism is the last. It's clear that there's SOME difference in moral weight for different experiences of different experiencers. Which means there's some dimension on which a utility monster is conceivable. If it's a dimension that AGI will excel on, we can maximize utility by giving it whatever it wants.


I'll add one more:

  • Doomsday Argument as overwhelming evidence against futures with large number of minds

Also works against any other x-risk related effort and condones a carpe-diem sort of attitude on the civilizational level.

Here's Will MacAskill's answer.

How is that an answer? It seems like he's mostly contesting my premises "that AI-Xrisk is significant (e.g. roughly >10%), and timelines are not long (e.g. >50% ASI in <100years)"

3Ben Pace
My bad, just read the title.
3David Scott Krueger (formerly: capybaralet)
Nice! owning up to it; I like it! :D
20 comments, sorted by Click to highlight new comments since:

infinite ethics (and maybe other fundamental ethical questions, e.g. to do with moral uncertainty)

Why does this reduce the priority of working on AI risk? Is it just that it makes the problem harder and hence less tractable?

simulation hypothesis

I've written down some thoughts on how the possibility of being in simulations affects my motivation to work on AI risk: Beyond Astronomical Waste, Comment 1, Comment 2.

I'm not aware of a satisfying resolution to the problems of infinite ethics. It calls into question the underlying assumptions of classical utilitarianism, which is my justification for prioritizing AI-Xrisk above all else. I can imagine ways of resolving infinite ethics that convince me of a different ethical viewpoint which in turn changes my cause prioritization.

I think infinite ethics will most likely be solved in a way that leaves longtermism unharmed. See my recent comment to William MacAskill on this topic.

I can imagine ways of resolving infinite ethics that convince me of a different ethical viewpoint which in turn changes my cause prioritization.

Do you have specific candidate solutions in mind?

I think infinite ethics will most likely be solved in a way that leaves longtermism unharmed.

Yes, or it might just never be truly "solved". I agree that complexity theory seems fairly likely to hold (something like) a solution.


Do you have specific candidate solutions in mind?

Not really. I don't think about infinite ethics much, which is probably one of the reasons it seems likely to change my mind. I expect that if I spent more time thinking about it, I would just become increasingly convinced that it isn't worth thinking about.

But it definitely troubles me that I haven't taken the time to really understand it, since I feel like I am in a similar epistemic state to ML researchers who dismiss Xrisk concerns and won't take the time to engage with them.

I guess there's maybe a disanalogy there, though, in that it seems like people who *have* thought more about infinite ethics tend to not be going around trying to convince others that it really actually matters a lot and should change what they work on or which causes they prioritize.

-------------------

I guess the main way I can imagine changing my views by studying infinite ethics would be to start believing that I should actually just aim to increase the chances of generating infinite utility (to the extent this is actually a mathematically coherent thing to try to do), which doesn't necessarily/obviously lead to prioritizing Xrisk, as far as I can see.

The possibility of such an update seems like it might make studying infinite ethics until I understand it better a higher priority than reducing AI-Xrisk.


I guess there’s maybe a disanalogy there, though, in that it seems like people who have thought more about infinite ethics tend to not be going around trying to convince others that it really actually matters a lot and should change what they work on or which causes they prioritize.

Yep, seems like a good reason to not be too worried about it...

I guess the main way I can imagine changing my views by studying infinite ethics would be to start believing that I should actually just aim to increase the chances of generating infinite utility (to the extent this is actually a mathematically coherent thing to try to do), which doesn’t necessarily/obviously lead to prioritizing Xrisk, as far as I can see.

If infinite utility is actually possible, then not maximizing the chances of generating infinite utility would count as an x-risk, wouldn't it? And it seems like the best way to prevent that would be to build a superintelligent AI that would do a good job of maximizing the chances of generating infinite utility, in case that was possible. Metaphilosophical AI seems to be an obvious approach to this.

And it seems like the best way to prevent that would be to build a superintelligent AI that would do a good job of maximizing the chances of generating infinite utility, in case that was possible.

I haven't thought about it enough to say... it certainly seems plausible, but it seems plausible that spending a good chunk of time thinking about it *might* lead to different conclusions. *shrug

I think it makes sense to at least spend some time reading up on papers and posts about infinite ethics then. It doesn't take very long to catch up to the state of the art and then you'll probably have a much better idea if infinite ethics would be a better field to spend more time in, and it's probably a good idea for an AI alignment researcher to have some background in it anyway. I'd say the same thing about moral uncertainty, if your reasoning about that is similar.

I have spent *some* time on it (on the order of 10-15hrs maybe? counting discussions, reading, etc.), and I have a vague intention to do so again, in the future. At the moment, though, I'm very focused on getting my PhD and trying to land a good professorship ~ASAP.

The genesis of this list is basically me repeatedly noticing that there are crucial considerations I'm ignoring (/more like procrastinating on :P) that I don't feel like I have a good justification for ignoring, and being bothered by that.

It seemed important enough to at least *flag* these things.

If you think most AI alignment researchers should have some level of familiarity with these topics, it seems like it would be valuable for someone to put together a summary for us. I might be interested in such a project at some point in the next few years.

The genesis of this list is basically me repeatedly noticing that there are crucial considerations I’m ignoring (/more like procrastinating on :P) that I don’t feel like I have a good justification for ignoring, and being bothered by that.

It seemed important enough to at least flag these things.

That makes sense. Suggest putting this kind of background info in your future posts to give people more context.

If you think most AI alignment researchers should have some level of familiarity with these topics, it seems like it would be valuable for someone to put together a summary for us.

Hmm, I guess I think that more for moral uncertainty than for infinite ethics. For infinite ethics, it's more that I think at least some people in AI alignment should have some level of familiarity, and it makes sense for whoever is most interested the topic (or otherwise motivated to learn it) to learn about it. Others could just have some sense of "this is a philosophical problem that may be relevant, I'll look into it more in the future if I need to."

I'm often prioritizing posting over polishing posts, for better or worse.

I'm also sometimes somewhat deliberately underspecific in my statements because I think it can lead to more interesting / diverse / "outside-the-box" kinds of responses that I think are very valuable from an "idea/perspective generation/exposure" point-of-view (and that's something I find very valuable in general).

Some of the same human moral heuristics that care about the cosmic endowment also diverge when contemplating an infinite environment. Therefore, someone who finds that the environment is infinite might exclude such heuristics from their aggregate and come to care less about what happens regarding AI than, say, their friends and family.

There are two questions which I think are important to distinguish:

Is AI x-risk the top priority for humanity?

Is AI x-risk the top priority of some individual?

The first question is perhaps extremely important in a general sense. However, the second question is, I think, more useful since it provides actionable information to specific people. Of course, the difficulty of answering the second question is that it depends heavily on individual factors, such as

  • The ethical system of the individual which they are using the evaluate the question.
  • The specific talents, and time-constraints of the individual.

I also partially object to placing AI x-risk into one entire bundle. There are many ways that people can influence the development of artificial intelligence:

  • Technical research
  • Social research to predict and intervene on governance for AI
  • AI forecasting to help predict which type of AI will end up existing and what their impact will be

Even within technical research, it is generally considered that there are different approaches:

  • Machine learning research with an emphasis on creating systems that could scale to superhuman capabilities while remaining aligned. This would include, but would not be limited to
    • Paul Christiano-style research, such as expanding iterated distillation and amplification
    • ML transparency
    • ML robustness to distributional shifts
  • Fundamental mathematical research which could help dissolve confusion about AI capabilities and alignment. This includes
    • Uncovering insights into decision theory
    • Discovering the necessary conditions for a system to be value aligned
    • Examining how systems could be stable upon reflection, such as after self-modification

I agree with the distinction you make and think it's nice to disentangle them. I'm most interested in the "Is AI x-risk the top priority for humanity?" question. I'm fine with bundling all of the approaches to reducing AI-Xrisk being bundled here, because I'm just asking "is working on it (in *some* way) the highest priority".

You're right. I initially put this in the answer category, but I really meant it as clarification. I assumed that the personal question was more important since the humanity question is not very useful (except maybe to governments and large corporations).

Well... it's also pretty useful to individuals, IMO, since it affects what you tell other people, when discussing cause prioritization.

S-risks

Not necessarily a reason to deprioritize AI x-risk work, given that unaligned AI could be bad from an s-risk perspective as well:

Pain seems to have evolved because it has a functional purpose in guiding behavior: evolution having found it suggests that pain might be the simplest solution for achieving its purpose. A superintelligence which was building subagents, such as worker robots or disembodied cognitive agents, might then also construct them in such a way that they were capable of feeling pain - and thus possibly suffering (Metzinger 2015) - if that was the most efficient way of making them behave in a way that achieved the superintelligence’s goals.

Humans have also evolved to experience empathy towards each other, but the evolutionary reasons which cause humans to have empathy (Singer 1981) may not be relevant for a superintelligent singleton which had no game-theoretical reason to empathize with others. In such a case, a superintelligence which had no disincentive to create suffering but did have an incentive to create whatever furthered its goals, could create vast populations of agents which sometimes suffered while carrying out the superintelligence’s goals. Because of the ruling superintelligence’s indifference towards suffering, the amount of suffering experienced by this population could be vastly higher than it would be in e.g. an advanced human civilization, where humans had an interest in helping out their fellow humans. [...]

If attempts to align the superintelligence with human values failed, it might not put any intrinsic value on avoiding suffering, so it may create large numbers of suffering subroutines.

I agree there's substantial overlap, but there could be cases where "what's best for reducing Xrisk" and "what's best for reducing Srisk" really come apart. If I saw a clear-cut case for that; I'd be inclined to favor Srisk reduction (modulo, e.g., comparative advantage considerations).

That's certainly true. To be clear, my argument was not "these types of work are entirely overlapping", but rather just that "taking s-risk seriously doesn't necessarily mean no overlap with x-risk prevention".

[-][anonymous]10

A counter-argument to this would be the classical s-risk example of a cosmic ray particle flipping the sign on the utility function of an otherwise Friendly AI, causing it to maximize suffering that would dwarf any accidental suffering caused by a paperclip maximizer.

That seems like a reason to work on AI alignment and figure out ways to avoid that particular failure mode, e.g. hyperexistential separation.