TLDR:
- Building powerful human-AI collaboration tools in the open removes capabilities overhang, which reduces discourse lag, which reduces x-risk.
- Alignment work is philosophy/writing/thinking-heavy, capabilities work is coding-heavy. Cyborg tools are more for the former than the latter, and great coding tools already exist.

 

Only goal is to reduce x-risk

Given this, "safety concerns" like "but what if someone uses your app and discovers jailbreaks or hacks somebody" are not actually a problem. (maybe even net positive, since they "update discourse" on unknown dangers, more on this later)

 

Specifics of capabilities work versus alignment work

(tldr: cyborg tools help alignment more than capabilities, because it's for reading/writing, not coding)

Capabilities work

Very empirical, you're writing code and running experiments, running actual Pytorch code running on lots and lots of gpus.
(One piece of evidence is that Sholto and Trenton in the Dwarkesh podcast describe AI research as very empirical)

Alignment work

More about reading and writing on Lesswrong and Substack, thinking about things conceptually, convincing people about dangers of AGI, etc.

Some looks more like general AI research, for example mechanistic interpretability.

(Concrete/pragmatic note: my tool helps me a lot with absorbing material, next up are https://www.narrowpath.co/ and https://www.thecompendium.ai/. In normal UIs you can just dump and ask for a summary, but obviously there are more sophisticated ways to both parse an AI's response, and for feeding it prompts. For example, ask it for a list of title/text pairs, formatted as JSON, use code that parses it and loads a widget containing titled text blocks, which is easier to browse, which my tool can currently do.

Cyborg tools

- prompt engineering (like pulling from multiple text blocks from within the app, templates, customizable/programmable "surfaces" for writing stuff, for dynamically loading sections of prompts via arbitrary code execution)
- all the benefits of normal note-taking/management systems + heavy customizability

Generally reading and writing, non-agentic stuff, many people have already written about this stuff. Basically my point is that there is little overlap with capabilities research.

You can essentially take it from:
type prompt -> read response
to:
type text in widgets, type code that takes widgets and converts to messages object for api call -> token stream is handled by customizable code -> code is called after response is done

 

Capabilities overhang means discourse lags behind reality

To the extent that the (potential) abilities of a human-AI team are locked behind good UIs (which I think they are--by a huge amount, but won't get into that here), the worse that the existing/well-known UIs are, the more the discourse about a given level of model capability is behind the actual potential of that model.

(in short: model capability*( amplification( best possible UI  ) - amplification( current UI ) ) = discourse lag )

This is bad for basically everything that involves talking to other people about the dangers of AI.
Which includes: affecting policy, getting funding, telling people their tech is dangerous and they should feel bad, creating institutions and cooperation among people that don't want to die --- note that these activities are a huge portion of many people's work in AI alignment.

(The only world in which a capability overhang is not like a dam waiting to burst is if I'm the only one who will ever design super good AI tooling and if I just keep quiet nobody will ever build such things, but I'm not that smart, nor that good of a programmer (though pretty good Tkinter GUI builder))

 

This is a fork in the road

If cyborgism reduces x-risk then this tech should be developed as much and as soon as possible, which you do by sharing, open-sourcing and discussing.

If it doesn't, I don't have to go through the massive effort of open sourcing this, which involves testing, documenting, probably a lot of rewriting, which I really don't feel like doing anyway.

Also I can only ask people for donations (aka "funding") if I can actually show people what my idea is and what I've got so far.

 

Claude's feedback challenging my assumptions

I shared an earlier draft of this with Claude (Sonnet 3.5), it argued against an underlying assumption. I couldn't find a good counterargument.
I'll directly quote it since it wrote so well:

(...)

Me:

capability overhang means people think models are less powerful than they are, which means its harder to get people to take xrisk seriously, which means you can corral less researchers

Claude:

(...)

RESPONSE:
This implies a specific causal chain:
Better UI → Demonstrates true capabilities → Researchers update beliefs → More alignment work

But consider:
1. Researchers might update differently than expected (e.g. "oh cool, let's build more powerful stuff!")
2. Demonstration of capabilities could accelerate both alignment AND capabilities work
3. The bottleneck might not be belief - many researchers who fully believe in x-risk still work on capabilities

(...)

RESPONSE:
"People don't want humanity to go extinct" seems obviously true yet doesn't predict behavior well. Consider:
- Most people believe in climate change but don't significantly alter behavior
- People smoke despite knowing it kills
- Engineers kept working on nuclear weapons post-Manhattan Project

Humans seem to:
1. Discount large future risks
2. Compartmentalize ("someone else will solve it")
3. Follow local incentives (career, status, interesting problems) over global outcomes

(...)

The end.

New Comment
1 comment, sorted by Click to highlight new comments since:

I think (not sure!) the damage from people/orgs/states going "wow, AI is powerful, I will try to build some" is larger than the upside of people/orgs/states going "wow, AI is powerful, I should be scared of it". It only takes one strong enough one of the former to kill everyone, and the latter is gonna have a very hard time stopping all of them.

By not informing the public that AI is indeed powerful, awareness of that fact is disproportionately allocated to people who will choose to think hard about it on their own, and thus that knowledge is more likely to be in reasonabler hands (for example they'd also be more likely to think "hmm maybe I shouldn't build unaligned powerful AI").

The same goes for cyborg tools, as well as general insights about AI: we should want them to be differentially accessible to alignment people than the general public.

In fact, my biggest criticism of OpenAI is not that they built GPTs, but that they productized it, made it widely available, and created a giant public frenzy about LLMs. I think we'd have more time to solve alignment if they kept it internally and the public wasn't thinking about AI nearly as much.