Against Agents as an Approach to Aligned Transformative AI

DragonGod

Epistemic Status

Written (very) quickly^[1].

Thesis

The creation of superintelligent agents — even aligned ones — would with high probability entail a kind of existential catastrophe^[2].

Introduction

Agents seem kind of bad. Even assuming we solve intent alignment^[3] for agents, I don't know that I would be happy with the outcome.

I'm not convinced that my vision of a flourishing future is at all compatible with the creation of agentic superintelligences.

I've had many discussions about what the post singularity landscape would look like in a world where alignment succeeded, and I've not really been satisfied with the visions painted. It just doesn't seem to admit my values.

Why Do I Oppose Agents?

I strongly value autonomy and enhancing individual and group autonomy is essential to what I think a brighter future looks like. Superhuman (aligned) agents seems liable to hard break autonomy (as I conceive of it).

I think the crux of my disagreement with aligned superhuman agents is human obsoletion. In a world with strongly superhuman agents, humanity would quickly be rendered obsolete.

The joys of invention, the wonders of scientific discovery, the triumph of creation, many forms of self actualisation would be placed forever beyond our grasp. We'd be rendered irrelevant.

The idea of other minds accomplishing all of those on our behalf? It just isn't the same. It doesn't just matter that some accomplishments were made, but that I (or you, or others) made those accomplishments.

I do not want a corrigible intent aligned godlike nanny serving my every whim; I want to be a god myself goddammit.

When I try visualising the future in a world of aligned superhuman agents, I find the prospect so bleak. So hopeless, as if life has been robbed of all meaning.

"Our final invention" is not something I rejoice at, but something that grips me with despair. It would be an existential catastrophe in its own right.

What Do I Hope For?

My desired outcome for the AI project is to develop superhuman general intelligences that are not agents^[4], that have no independent volition of their own. I want AI systems to amplify human cognition, not to replace it or supersede it.

I want humans to ultimately supply the volition. To retain autonomy and control over our future.

Agents Are Not Necessary?

This is not an impossible ask. Agents do not seem strictly necessary to attain AI systems with transformative capabilities.

I currently believe that:

We do not need to instantiate agents to get superhuman general intelligences with transformative potential
- E.g. I've argued that in the limit, selecting systems for minimising predictive loss on next token prediction on humanity's text corpus converges to (strongly superhuman) general intelligences
It is not necessarily easier to reach AGI via the agent archetype than via other archetypes
- I currently expect AGI to first be reached via self supervised learning (or for self supervised learning to do most of the cognitive lifting in the first generally intelligent systems we create

^{^}
Written as a stream of thought in a 30 minute sprint so that it gets written at all.
Left to my own devices, I'd never be able to write this up in a form I'd endorse anytime in the near future. A poor attempt seems marginally more dignified.
^{^}
Albeit one of the better kinds.
^{^}
We identify an optimisation target that we'd be happy where we fully informed for arbitrarily powerful systems to optimise for and we successfully align the agents.
Leaving aside the case of whether such optimisation targets exist, let's tentatively grant them for now.
^{^}
Janus' Simulators offers an archetype that seems to provide a promising pathway to non agentic superhuman general intelligences.

I don't disagree, but we still need to deeply understand agency. superintelligent systems will have bubbles of agency arise in them, because everything acquires self-preservation as a goal to some degree, especially ones exposed to human culture. Of course it's probably a bad idea to create superintelligent hyper-targeted optimizers, as those would be incredibly overconfident about their objective, and overconfidence about your objective is looking to be a key kind of failure that defines strong unsafety.

eg, ref: https://causalincentives.com/

I'm nit criticising agent foundations work. I just don't really like the prospect of building superhuman agents.

I am also completely against building powerful autonomous agents (albeit for different reasons), but to avoid doing this seems to require extremely high levels of coordination. All it takes is one lab to build a singleton capable of disempowering humanity. It would be great to stay in the "tool AI" regime for as long as possible, but how?

I do not want a corrigible intent aligned godlike nanny serving my every whim; I want to be a god myself goddammit.

Would it be okay to be gods rigth hand?

Do countries being way more capable than citizens trigger these same feelings? What is relevantly different? Why is nanny bad but president foundated goverment not?

Probably because I identify strongly with other humans and don't expect artificial agents to be human or human like.

But the gist is that I don't think I'll appreciate humanity being coddled/enfeebled. I want AI to enhance/amplify human abilities, not to replace us.

In the video game Tacoma the protagonist is after data that an AI holds. The method of transfer for that is to take the AIs brain physically into a spaceship. As an item it is like a big glass videocard with two big panels of glass holding the circuitry between them. I think the game is trying to sell it as a surprise that the circuitry seems to be made out of biological matter and looks awfully lot like a brain scan slice. With the later "political asylum" sequence the game is trying to sell a transition of treating from the AI as a alien evil "other" to a peer sentient.

If you ever feel tempted to carry around torches and chant "the silicons will not replace us" it would make sense to check that it is not coming from improperly right sources. Your subtrates chemical or electromagnetic wave permissitivity properties are not central social properties.

If you are a small pattern in decisions of a superintelligence, that doesn't mean that the decisions made by you, the pattern, are not your own. Same as with making decisions within physics. And there are better alternatives to physics, perhaps not allowing existential catastrophies or de novo superintelligent agents.

Simulator/simulation distinction allows many simulations with their own impregnable rules, including nonexistence of simulators. This way, existence of something in the world doesn't oppose the possibility of a different world where that thing was never present or indeed allowed by the laws of nature.

We don't get to make new scientific discoveries, new inventions, etc. with superhuman agents.

What self-actualisation means for me seems like it would be absent in the post singularity world?

A simulated world created after the singularity is not necessarily a post-singularity world, if singularity never happened in it. You already don't get to make discoveries that won't be available in the future, or in other Everett branches, and simulators are even less directly related to what's going on within the simulation.

You don't interact with other Everett branches, or the future, so it doesn't matter in the same way as what happens in the same world. If you exist within a simulation with appropriate laws of nature, you similarly won't be able to interact with things happening elsewhere, those things within a simulator won't be relevant for you in the same way as other Everett branches are not relevant to you within physics.