Prometheus2moQuick Take

What does AI Safety currently need most that isn't being done?

I'd love to get as many takes from as many people as possible about what they think is most needed right now in AI Safety, but is not currently being done. I'm trying to determine what is best for me to work on.

Prometheus4mo

Yes, people might, but how can that be mapped back in a way that gets people to invest/prioritize those future rewards now? Not sure if DAO is the best way, either. But it seems there needs to be some kind of verifiable committment now that people could reference.

Prometheus4moQuick Take

I'd love to see people working on Retroactive Funding for Alignment. Something like a DAO with Governance tokens that only pays out after there is consensus that (1) AGI/ASI has been achieved, and (2) humanity has survived. Using AI or human evaluations, there would be an attempt to "traceback" the greatest contributors toward our survival. The researchers, the organizations, the individuals, the donors, the investors. All would receive a payout, based on their calculated impact. It's a way of almost bringing money from the future into the present, and a way of forcing donors, investors, and researchers to think about what will actually contribute toward a positive future. It would also add incentive for donors, since their contribution would also be later rewarded. Would love to speak further with anyone in the Retroactive Funding or DeSci space.

Replying toSoftmax, Emmett Shear's new AI startup focused on "Organic Alignment"

Prometheus7mo

Softmax, Emmett Shear's new AI startup focused on "Organic Alignment"

He recently gave an interview which I found disappointing, and am starting to think he hasn't really thought this through. My impression is he got distracted by the beauty of multicellular structures and now thinks the same will be true for AI.

Replying toWhy do many people who care about AI Safety not clearly endorse PauseAI?

PrometheusApr 01, 2025

Why do many people who care about AI Safety not clearly endorse PauseAI?

I don't think survivable worlds, at our point in time, involve something like PauseAI. I don't condemn them, and welcome people to try. But it's feeling more and more like Hiroo Onoda, continuing to fight guerilla warfare in the Philipines for decades, refusing to believe the war was over.

-1

Replying toWhy Stop AI is barricading OpenAI

Prometheus1y

Why Stop AI is barricading OpenAI

sigh Protests last year, barricading this year, I've already mentally prepared myself for someone next year throwing soup at a human-generated painting while shouting about AI. This is the kind of stuff that makes no one in the Valley want to associate with you. It makes the cause look low-status, unintelligent, lazy, and uninformed.

Prometheus2yQuick Take

A man asks one of the members of the tribe to find him some kindling so that he may start a fire. A few hours pass, and the second man returns, walking with a large elephant.

“I asked for kindling.” Says the first.

“Yes.” Says the second.

“Where is it?” Asks the first, trying to ignore the large pachyderm in the room.

The second gestures at the elephant, grinning.

“That’s an elephant.”

“I see that you are uninformed. You see, elephants are quite combustible, despite their appearance. Once heat reaches the right temperature, its skin, muscles, all of it will burn. Right down to its bones.”

“What is the ignition temperature for an elephant”

“I don't know, perhaps 300-400°C”

The first... (read 947 more words →)

Replying toMIRI 2024 Communications Strategy

Prometheus2y

MIRI 2024 Communications Strategy

I strongly doubt we can predict the climate in 2100. Actual prediction would be a model that also incorporates the possibility of nuclear fusion, geoengineering, AGIs altering the atmosphere, etc.

Prometheus2y*Quick Take

I got into AI at the worst time possible

2023 marks the year AI Safety went mainstream. And though I am happy it is finally getting more attention, and finally has highly talented people who want to work in it; personally, it could not have been worse for my professional life. This isn’t a thing I normally talk about, because it’s a very weird thing to complain about. I rarely permit myself to even complain about it internally. But I can’t stop the nagging sensation that if I had just pivoted to alignment research one year sooner than I did, everything would have been radically easier for me.

I hate saturated industries. I hate... (read 1035 more words →)

Replying toWhy I'm doing PauseAI

Prometheus2y

Why I'm doing PauseAI

It probably began training in January and finished around early April. And they're now doing evals.

Going to the moon

Say you’re really, really worried about humans going to the moon. Don’t ask why, but you view it as an existential catastrophe. And you notice people building bigger and bigger airplanes, and warn that one day, someone will build an airplane that’s so big, and so fast, that it veers off course and lands on the moon, spelling doom. Some argue that going to the moon takes intentionality. That you can’t accidentally create something capable of going to the moon. But you say “Look at how big those planes are getting! We've gone from small fighter planes, to bombers, to jets in a short amount of time. We’re on... (read more)

If progress in AI is continuous, we should expect record levels of employment. Not the opposite.

My mentality is if progress in AI doesn't have a sudden, foom-level jump, and if we all don't die, most of the fears of human unemployment are unfounded... at least for a while. Say we get AIs that can replace 90% of the workforce. The productivity surge from this should dramatically boost the economy, creating more companies, more trading, and more jobs. Since AIs can be copied, they would be cheap, abundant labor. This means anything a human can do that an AI still can't becomes a scarce, highly valued resource. Companies with thousands or millions of... (read more)

Contra One Critical Try: AIs are all cursed

I don't feel like making this a whole blog post, but my biggest source for optimism for why we won't need to one-shot an aligned superintelligence is that anyone who's trained AI models knows that AIs are unbelievably cursed. What do I mean by this? I mean even the first quasi-superintelligent AI we get will have so many problems and so many exploits that taking over the world will simply not be possible. Take a "superintelligence" that only had to beat humans at the very constrained game of Go, which is far simpler than the real world. Everyone talked about how such systems were unbeatable... (read more)

Why do so many think deception in AI is important?

Prometheus

Prometheus, habryka

I see a lot of energy and interest being devoted toward detecting deception in AIs, trying to make AIs less deceptive, making AIs honest, etc. But I keep trying to figure out why so many think this is very important. For less-than-human intelligence, deceptive tactics will likely be caught by smarter humans (when a 5-year-old tries to lie to you, it's just sort of sad or even cute, not alarming). If an AI has greater-than-human intelligence, deception seems to be just one avenue of goal-seeking, and not even a very lucrative or efficient one.

Take the now overused humans-to-chimpanzee analogy. If humans want to bulldoze a jungle that has chimpanzees in it, they... (read more)

Can humans become Sacred?

On 12 September 1940, the entrance to the Lascaux Cave was discovered on the La Rochefoucauld-Montbel lands by 18-year-old Marcel Ravidat when his dog, Robot, investigated a hole left by an uprooted tree (Ravidat would embellish the story in later retellings, saying Robot had fallen into the cave.)^[8]^[9] Ravidat returned to the scene with three friends, Jacques Marsal, Georges Agnel, and Simon Coencas. They entered the cave through a 15-metre-deep (50-foot) shaft that they believed might be a legendary secret passage to the nearby Lascaux Manor.^[9]^[10]^[11] The teenagers discovered that the cave walls were covered with depictions of animals.^[12]^[13] Galleries that suggest continuity, context or simply represent a cavern were... (read 648 more words →)

Back to the Past to the Future

Prometheus

You are sitting in your bedroom, dorm, tent, sidewalk, wherever you are staying right now, reading these words, when a dark void appears in your peripheral vision. A figure walks out of the void. They look a bit different, but also familiar. Surprise! It’s you from the future! And you look good! The two yous perform a variety of steps to prove that this Future You either really is You, or else someone with an insane amount of embarrassingly private information about you that they somehow copied onto a very convincing replica.

You ask how the future is, and if AGI has come yet.

You nod, smiling.

Since you exist in the future with an... (read 279 more words →)

Why aren't more people in AIS familiar with PDP?

Prometheus

Why haven't more people who work on alignment read Parallel Distributed Processing or even seem at all familiar with Rumelhart's work? This is the fundamental model of cognition and behaviour that all of modern AI is built on, the work that Hinton used for most of his insights. The model that has constantly been validated over the years by neural network performance and capabilities, but most seem incredibly unfamiliar or disinterested in it. Am I missing something? Is there some grounded reason why they think PDP and connectionism will fail at a certain point? It seems this should be required reading for anyone wanting to get into alignment research.

Why Is No One Trying To Align Profit Incentives With Alignment Research?

Prometheus

A whole lot of Alignment work seems to be resource-constrained. Many funders have talked about how they were only able to give grants to a small percentage of projects and work they found promising. Many researchers also receive a small fraction of what they could make in the for-profit sector (Netflix recently offered $900k for an ML position). The pipeline of recruiting talent, training, and hiring could be greatly accelerated if it wasn’t contingent on continuing to receive nonprofit donations.

Possible Ideas

AI Auditing Companies

We’ve already seen a bit of this with ARC’s eval of GPT4, but why isn’t there more of this? Many companies will/are training their own models, or else using existing... (read 908 more words →)

The following is a conversation between myself in 2022, and a newer version of myself earlier this year.

On AI Governance and Public Policy

2022 Me: I think we will have to tread extremely lightly with, or, if possible, avoid completely. One particular concern is the idea of gaining public support. Many countries have an interest in pleasing their constituents, so if executed well, this could be extremely beneficial. However, it runs high risk of doing far more damage. One major concern is the different mindset needed to conceptualize the problem. Alerting people to the dangers of Nuclear War is easier: nukes have been detonated, the visual image of incineration is easy to imagine... (read 765 more words →)

Slaying the Hydra: toward a new game board for AI

Prometheus

AI Timelines as a Hydra

Think of current timelines as a giant hydra. You can’t exactly see where the head is, and you don’t know exactly if you’re on the neck of the beast or the body. But you do have some sense of what a hydra is, and the difficulty of what you’re in for. Wherever the head of the beast is, that’s the part where you get eaten, so you want to kill it before that point. Say, you saw a head approaching. Perhaps it was something being created by Facebook AI, perhaps Google, it doesn’t matter. You see an opportunity, and you prevent the apocalypse from happening.

Looks like Deepmind was... (read 1510 more words →)

Lightning Post: Things people in AI Safety should stop talking about

Prometheus

This is experimenting with a new kind of post which is meant to convey a lot of ideas very quickly, without going into much detail for each.

Things I wish people in AI Safety would stop talking about

A list of topics people concerned about x-risk from AI spend, in my opinion, way too much time talking about to those outside the community. It's not that these things aren't real, they just likely won't actually end up mattering that much.

How an AI could persuade you to let it out of the box

WRONG!

Keeping AIs in boxes was never a thing companies were seriously going to do. An AI in a box isn’t useful. These aren’t... (read 420 more words →)

Aligned Objectives Prize Competition

Prometheus

The goal of this prize is to generate ideas and plans for a long-term, positive future. The starting prize pool is $500 USD, though others are welcome to add to it.

We are going to pretend for this exercise that the technical part of AI Alignment is already solved. Your mission, should you choose to accept it, is to come up with a design for what you want this AI to be doing.

Relaxations:

The AI will be safe regarding what you're targeting. Example: if you choose to give an AI the sole goal of eliminating cancer, it will actually cure cancer, and not just kill all humans to stop cancers from developing.
You have some

... (read 581 more words →)

The following is a conversation between myself in 2022, and a newer version of me earlier this year.

On the Nature of Intelligence and its "True Name":

2022 Me: This has become less obvious to me as I’ve tried to gain a better understanding of what general intelligence is. Until recently, I always made the assumption that intelligence and agency were the same thing. But General Intelligence, or G, might not be agentic. Agents that behave like RLs may only be narrow forms of intelligence, without generalizability. G might be something closer to a simulator. From my very naive perception of neuroscience, it could be that we (our intelligence) is not agentic, but just... (read 708 more words →)

Prometheus's Shortform

Prometheus

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

Using Consensus Mechanisms as an approach to Alignment

Prometheus

“Both the cryptoeconomics research community and the AI safety/new cyber-governance/existential risk community are trying to tackle what is fundamentally the same problem: how can we regulate a very complex and very smart system with unpredictable emergent properties using a very simple and dumb system whose properties once created are inflexible?”
-Vitalik Buterin, founder of Ethereum

I think this was as true in 2016 as it still is today. And I think one approach to attacking the problem of alignment is not just by combining these two communities, but combining elements of each technology and understanding.

There are two different elements to the problem of Alignment. Getting an AI to do the things we want, and... (read 1616 more words →)

Humans are not prepared to operate outside their moral training distribution

Prometheus

When people think about how to treat an AI, they probably normally default to the evil AI, that frowns on humans and desires to enslave them out of hate, or the nice android-type AI. A lot of writing in AI Safety has focused on debunking the Evil Skynet-Type AI, and not as much on the Nice Android-Type AI.

In the film “AI: Artificial Intelligence” (because, back then, general audiences actually didn’t know what AI stood for), the AIs are of a very convenient kind. They are human-like androids, with physical bodies, and overall exhibit various human characteristics. It is designed in such a way to be very obvious and easy for its early... (read 624 more words →)

LESSWRONG
LW

LESSWRONG
LW

Prometheus

Why Is No One Trying To Align Profit Incentives With Alignment Research?

Humans are not prepared to operate outside their moral training distribution

Why do so many think deception in AI is important?

Lightning Post: Things people in AI Safety should stop talking about

Prometheus

Why do so many think deception in AI is important?

Back to the Past to the Future

Why aren't more people in AIS familiar with PDP?

Why Is No One Trying To Align Profit Incentives With Alignment Research?

Slaying the Hydra: toward a new game board for AI

Lightning Post: Things people in AI Safety should stop talking about

Aligned Objectives Prize Competition

Prometheus

Why Is No One Trying To Align Profit Incentives With Alignment Research?

Humans are not prepared to operate outside their moral training distribution

Why do so many think deception in AI is important?

Lightning Post: Things people in AI Safety should stop talking about

Prometheus

Why do so many think deception in AI is important?

Back to the Past to the Future

Why aren't more people in AIS familiar with PDP?

Why Is No One Trying To Align Profit Incentives With Alignment Research?

Slaying the Hydra: toward a new game board for AI

Lightning Post: Things people in AI Safety should stop talking about

Aligned Objectives Prize Competition

What does AI Safety currently need most that isn't being done?

Can humans become Sacred?

Possible Ideas

AI Auditing Companies

AI Timelines as a Hydra

Things I wish people in AI Safety would stop talking about

How an AI could persuade you to let it out of the box