At what point will it no longer be useful for humans to be involved in the process of alignment research? After the first slightly-superhuman AGI, well into superintelligence, or somewhere in between?

Feel free to answer differently for different kinds of human involvement:

  • Humans could be involved as a source of data about human values 
  • Humans could be involved as a red-team, trying to get evidence of misalignment or to verify the trustworthiness of systems 
  • Humans could be involved in setting the broad research agenda, delegating to the AGIs 
  • Humans could be involved in complementing the technical weaknesses of the AGIs, helping them in some way to research new alignment methods 

What do you envision we are doing between AGI and superintelligence?

New Answer
New Comment

1 Answers sorted by

Seth Herd

55

All being dead? I don't think we'll necessarily get from AGI to ASI if we don't get the initial stages just right. This question sounds a bit blasé about our odds here. I don't think we're doomed, my point estimate of p(doom) is approximately 50%, but more importantly it's gaining uncertainty as I continue to learn more and take in more of the many interacting complex arguments and states of the world that I don't have enough expertise in to form a good estimate. And that's after spending a very substantial amount of time on the question. I don't think anyone has a good p(doom) estimate at this point.

I mention this before answering, because I think assuming success is the surest route to failure.

To take the question seriously, my point estimate is that some humans, and hopefully as many, will be doing technical alignment research for a few years between AGI and ASI. I think we'll be doing all of your latter three categories of research you mention; loosely, being in charge (for better or worse) and filling in gaps in AGI thinking at each point in its advancement. 

I think it's somewhat likely that we'll create AGI that roughly follows our instructions as it passes through a parahuman band (in which it is better than us at some cognitive tasks and worse at others). As it advances, alignment per se will be out of our hands. But as we pass through that band, human work on alignment will be at its most intense and most important. We'll know what sort of mind we're aligning, and what details of its construction and training might keep it on track or throw it off. 

If we do a good job with that critical risk period, we can, and more of us will, advance to the more fun parts of current alignment thinking: deciding what sort of fantastic future we want: what values we want AGI to follow for the long-term. If we get aligned human-plus AGI, and haven't destroyed the world yet through misalignment, misuse or human conflict with AGI-created superweapons, we'll have pretty good odds of making it for the long haul, doing a long reflection, and inviting everyone in on the fun parts of alignment.

If we do our jobs well, our retirement will be as slow as we care to make it. But there's much to do in the meantime. Particularly, right now.

1 comment, sorted by Click to highlight new comments since:

It's plausible that reflection and figuring out what should happen with the future will be ongoing work among humans for tens or hundreds of years after the singularity.