Hard agree. It's ironic that it took hundreds of years to get people to accept the unintuitive positive-sum-ness of liberalism, libertarianism, and trade. But now we might have to convince everyone that those seemingly-robust effects are likely to go away, and that governments and markets are going to be unintuitively harsh.

There are several important "happy accidents" that allowed almost everyone to thrive under liberalism, that are likely to go away:
- Not usually enough variation in ability to allow sheer domination (though this is not surprising, due to selection - everyone who was completely dominated is mostly not around anymore).
- Predictable death from old age as a leveler preventing power lock-in.
- Sexual reproduction (and deleterious effects of inbreeding) giving gains to intermixing beyond family units, and reducing the all-or-nothing stakes of competition.
- Not usually enough variation in reproductive rates to pin us to Malthusian equilibria.

Applying right-wing frames to AGI (geo)politics

David Duvenaud3d*30

I'm afraid you might be right, though maybe something like "transhumanist North Korea" is the best we can hope for while remaining meaningfully human. Care to outline, or link to, other options you have in mind?

G.D. as Capitalist Evolution, and the claim for humanity's (temporary) upper hand

David Duvenaud2mo30

Thanks for the summary. I really like your phrasing "We will not wake up to a Skynet banner; we will simply notice, one product launch at a time, that fewer meaningful knobs are within human reach."

But as for "by what title/right do we insist on staying in charge?" I find it odd to act as if there is some external moral frame that we need to satisfy to maintain power. By what right does a bear catch a fish? Or a mother feed her child? I hope that a moral frame comprehensive enough to include humans is sufficiently compelling to future AIs to make them treat us well, but I don't think that that happens by default.

I think we should frame the problem as "how do we make sure we control the moral framework of future powerful beings", not as "how do we justify our existence to whatever takes over". I think it's entirely possible for us to end up building something that takes over that doesn't care about our interests, and I simply care about (my) human interests, full stop, with no larger justification.

I might have an expansive view of my interests that includes all sorts of charity to other beings in a way that is easy for other beings to get on board with. But there are just so, so many possible beings that could exist that won't care about my interests or moral code. Many already exist with us on this planet, such as wild animals and totalitarian governments. So my plea is: don't think you can argue your way into being treated well! Instead, make sure that any being or institution you create has a permanent interest in treating you well.

The Most Forbidden Technique

David Duvenaud4mo184

As someone who writes these kinds of papers, I try to make an effort to cite the original inspirations when possible. And although I agree with Robin's theory broadly, there are also some mechanical reasons why Yudkowsky in particular is hard to cite.

The most valuable things about the academic paper style as a reader are:
1) Having a clear, short summary (the abstract)
2) Stating the claimed contributions explicitly
3) Using standard jargon, or if not, noting so explicitly
4) A related work section that contrasts one's own position against others'
5) Being explicit about what evidence you're marshalling and where it comes from.
6) Listing main claims explicitly.
7) The best papers include a "limitations" or "why I might be wrong" section.

Yudkowsky mostly doesn't do these things. That doesn't mean he doesn't deserve credit for making a clear and accessible case for many foundational aspects of AI safety. It's just that in any particular context, it's hard to say what, exactly, his claims or contributions were.

In this setting, maybe the most appropriate citation would be something like "as illustrated in many thought experiments by yudkowsky [cite particular sections of the sequences and hpmor], it's dangerous to rely on any protocol for detecting scheming by agents more intelligent than oneself". But that's a pretty broad claim. Maybe I'm being unfair - but it's not clear to me what exactly yudkowsky's work says about the workability of these schemes other than "there be dragons here".

A History of the Future, 2025-2040

David Duvenaud4mo50

I think people will be told about and sometime notice AIs' biases, but they'll still be the most trustworthy source of information for almost everybody. I think Wikipedia is a good example here - it's obviously biased on many politicized topics, but it's still usually the best source for anyone who doesn't personally know experts or which obscure forums to trust.

The Risk of Gradual Disempowerment from AI

David Duvenaud5mo30

Great question. I think treacherous turn risk is still under-funded in absolute terms. And gradual disempowerment is much less shovel-ready as a discipline.

I think there are two reasons why maybe this question isn't so important to answer:
1) The kinds of skills required might be somewhat disjoint.
2) Gradual disempowerment is perhaps a subset or extension of the alignment problem. As Ryan Greenblatt and others point out: at some point, agents aligned to one person or organization will also naturally start working on this problem at the object level for their principals.

The Risk of Gradual Disempowerment from AI

David Duvenaud5mo52

Yes, Ryan is correct. Our claim is that even fully-aligned personal AI representatives won't necessarily be able to solve important collective action problems in our favor. However, I'm not certain about this. The empirical crux for me is: Do collective action problems get easier to solve as everyone gets smarter together, or harder?

As a concrete example, consider a bunch of local polities in a literal arms race. If each had their own AGI diplomats, would they be able to stop the arms race? Or would the more sophisticated diplomats end up participating in precommitment races or other exotic strategies that might still prevent a negotiated settlement? Perhaps the less sophisticated diplomats would fear that a complicated power-sharing agreement would lead to their disempowerment eventually anyways, and refuse to compromise?

As a less concrete example, our future situation might be analogous to a population of monkeys who unevenly have access to human representatives which earnestly advocate on their behalf. There is a giant, valuable forest that the monkeys live in next to a city where all important economic activity and decision-making happens between humans. Some of the human population (or some organizations, or governments) end up not being monkey-aligned, instead focusing on their own growth and security. The humans advocating on behalf of monkeys can see this is happening, but because they can't always participate directly in wealth generation as well as independent humans, they eventually become a small and relatively powerless constituency. The government and various private companies regularly bid or tax enormous amounts of money for forest land, and even the monkeys with index funds eventually are forced to sell, and then go broke from rent.

I admit that there are many moving parts of this scenario, but it's the closest simple analogy to what I'm worried about that I've found so far. I'm happy for people to point out ways this analogy won't match reality.

Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

David Duvenaud5mo317

I disagree - I think Ryan raised an obvious objection that we didn't directly address in the paper. I'd like to encourage medium-effort engagement from people as paged-in as Ryan. The discussion spawned was valuable to me.