Jeffrey Hemphill — LessWrong

You can think of it as “dangerous capabilities in everyone’s hands”, but I prefer to think of it as “everyone in the world can work on alignment in a hands-on way, and millions of people are exposed to the problem in a much more intuitive and real way than we ever foresaw”.

Ordinary people without PhDs are learning what capabilities and limitations LLMs have. They are learning what capabilities you can and cannot trust an LLM with. They are coming up with creative jailbreaks we never thought of. And they’re doing so with toy models that don’t have superhuman powers of reasoning, and don’t pose X-risks.

It was always hubris to think only a small sect of people in the SF bay area could be trusted with the reins to AI. I’ve never been one to bet against human ingenuity, and I’m not about to bet against them now that I’ve seen the open source community use LLaMa to blaze past every tech company.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments