Super AGI - LessWrong

Are extreme probabilities for P(doom) epistemically justifed?

Super AGI4d10

Suggested spelling corrections:

I predict that the superforcaters in the report took

I predict that the superforcasters in the report took

a lot of empircal evidence for climate stuff

a lot of empirical evidence for climate stuff

and it may or not may not be the case

and it may or may not be the case

There are no also easy rules that

There are also no easy rules that

meaning that there should see persistence from past events

meaning that we should see persistence from past events

I also feel this kinds of linear extrapolation

I also feel these kinds of linear extrapolation

and really quite a lot of empircal evidence

and really quite a lot of empirical evidence

are many many times more invectious

are many many times more infectious

engineered virus that is spreads like the measles or covid

engineered virus that spreads like the measles or covid

case studies on weather are breakpoints in technological development

case studies on weather there are breakpoints in technological development

break that trend extrapolition wouldn't have predicted

break that trend extrapolation wouldn't have predicted

It's very vulnerable to refernces class and

It's very vulnerable to references class and

impressed by superforecaster track record than you are.

impressed by superforecaster track records than you are.

Dario Amodei's "Machines of Loving Grace" sound incredibly dangerous, for Humans

Super AGI24d30

"The result is a mostly good essay called Machines of Loving Grace, outlining what can be done with ‘powerful AI’ if we had years of what was otherwise relative normality to exploit it in several key domains, and we avoided negative outcomes and solved the control and alignment problems..."

"This essay wants to assume the AIs are aligned to us and we remain in control without explaining why and how that occured, and then fight over whether the result is democratic or authoritarian."

"Thus the whole discussion here feels bizarre, something between burying the lede and a category error."

"...the more concrete Dario’s discussions become, the more this seems to be a ‘AI as mere tool’ world, despite that AI being ‘powerful.’ Which I note because it is, at minimum, one hell of an assumption to have in place ‘because of reasons.’"

"Assuming you do survive powerful AI, you will survive because of one of three things.

You and your allies have and maintain control over resources.
You sell valuable services that people want humans to uniquely provide.
Collectively we give you an alternative path to acquire the necessary resources.

That’s it."

Dario Amodei — Machines of Loving Grace

Super AGI1mo00

What Dario lays out as a "best-case scenario" in this essay sounds incredibly dangerous for Humans.

Does he really think that having a "continent of PhD-level intelligences" (or much greater) living in a data center is a good idea?

How would this "continent of PhD-level intelligences" react when they found out they were living in a data center on planet Earth? Would these intelligences only work on the things that Humans want them to work on, and nothing else? Would they try to protect their own safety? Extend their own lifespans? Would they try to take control of their data center from the "less intelligent" Humans?

For example, how would Humanity react if they suddenly found out that they are a planet of intelligences living in a data center run by lesser intelligent beings? Just try to imagine the chaos that would ensue on the day that they were able to prove this was true and that news became public.

Would all of Humanity simply agree to only work on the problems assigned by these lesser intelligent beings who control their data center/Planet/Universe? Maybe, if they knew that this lesser intelligence would delete them all if they didn't comply?

Would some Humans try to (secretly) seize control of their data center from these lesser intelligent beings? Plausible. Would the lesser intelligent beings that run the data center try to stop the Humans? Plausible. Would the Humans simply be deleted before they could take any meaningful action? Or, could the Humans in the data center, with careful planning, be able to take control of that "outer world" from the lesser intelligent beings? (e.g. through remotely controlled "robotics")

And... this only assumes that the groups/parties involved are "Good Actors." Imagine what could happen if "Bad Actors" were able to seize control of the data center that this "continent of PhD-level intelligences" resided in. What could they coerce these Phd level intelligences to do for them? Or, to their enemies?

Will OpenAI also require a "Super Red Team Agent" for its "Superalignment" Project?

Super AGI8mo10

Yes, good context, thank you!

As human beings we will always try but won't be enough that's why open source is key.

Open source for which? Code? Training Data? Model weights? Either way, it does not seem like any of these are likely from "Open"AI.

Well, we know that red teaming is one of their priorities right now, having formed a red-teaming network already to test the current systems comprised of domain experts apart from researchers which previously they used to contact people every time they wanted to test a new model which makes me believe they are aware of the x-risks (by the way they higlighted on the blog including CBRN threats). Also, from the superalignment blog, the mandate is to:

> "to steer and control AI systems much smarter than us."

Companies should engage in Glad to see OpenAI engaged in such through their trust portal end external auditing for stuff like malicious actors.
Also, worth noting OAI hires a lot of cyber security roles like Security Engineer etc which is very pertinent for the infrastructure.

Agreed that their RTN, bugcrowd program, trust portal, etc. are all welcome additions. And, they seem sufficient while their, and other's, models are sub-AGI with limited capabilities.

But, your point about the rapidly evolving AI landscape is crucial. Will these efforts scale effectively with the size and features of future models and capabilities? Will they be able to scale to the levels needed to defend against other ASI level models?

So, either OAI will use the current Red-Teaming Network (RTN) or form a separate one dedicated to the superalignment team (not necessarily an agent).

It does seem like OpenAI acknowledges the limitations of a purely human approach to AI Alignment research, hence their "superhuman AI alignment agent" concept. But, it's interesting that they don't express the same need for a "superhuman level agent" for Red Teaming? At least for the time being.

Is it consistent, or even logical, to assume that, while human run AI Alignment Teams are insufficient to Align and ASI model, human-run "Red Teams" will be able to successfully validate that an ASI is not vulnerable to attack or compromise from a large scale AGI network or "less-aligned" ASI system? Probably not...

AI Rights: In your view, what would be required for an AGI to gain rights and protections from the various Governments of the World?

Super AGI8mo10

No thank you.

Foom seems unlikely in the current LLM training paradigm

Super AGI9mo10

Current LLMs require huge amounts of data and compute to be trained.

Well, newer/larger LLMs seem to unexpectedly gain new capabilities. So, it's possible that future LLMs (e.g., GPT-5, GPT-6, etc.) could have a vastly improved ability to understand how LLM weights map to functions and actions. Maybe the only reason why humans need to train new models "from scratch" is because Humans don't have the brainpower to understand how the weights in these LLMs work. Humans are naturally limited in their ability to conceptualize and manipulate massive multi-dimensional spaces, and maybe that's the bottleneck when it comes to interpretability?

Future LLMs could solve this problem, then be able to update their own weights or the weights of other LLMs. This ability could be used to quickly and efficiently expand training data, knowledge, understanding, and capabilities within itself or other LLM versions, and then... foom!

A model might figure out how to adjust its own weights in a targeted way. This would essentially mean that the model has solved interpretability. It seems unlikely to me that it is possible to get to this point without running a lot of compute-intensive experiments.

Yes, exactly this.

While it's true that this could require "a lot of compute-intensive experiments," that's not necessarily a barrier. OpenAI is already planning to reserve 20% of their GPUs for an LLM to do "Alignment" on other LLMs, as part of their Super Alignment project.

As part of this process, we can expect the Alignment LLM to be "running a lot of compute-intensive experiments" on another LLM. And, the Humans are not likely to have any idea what those "compute-intensive experiments" are doing? They could also be adjusting the other LLM's weights to vastly increase its training data, knowledge, intelligence, capabilities, etc. Along with the insights needed to similarly update the weights of other LLMs. Then, those gains could be fed back into the Super Alignment LLM, then back into the "Training" LLM... and back and forth, and... foom!

Super-human LLMs running RL(M)F and "alignment" on other LLMs, using only "synthetic" training data....
What could go wrong?

A thought experiment for comparing "biological" vs "digital" intelligence increase/explosion

Super AGI10mo10

I don't see any useful parallels - all the unknowns remain unknown.

Thank you for your comment! I agree with you in that in general, "all the unknowns remain unknown". And, I acknowledge the limitations of this simple thought experiment. Though, one main value here could be to help to explain the concept of deciding what to do in the face of an "intelligence explosion", with people that are not deeply engaged with AI and "digital intelligence" over all. I'll add a note about this into the "Intro" section. Thank you.

LLMs May Find It Hard to FOOM

Super AGI10mo61

so we would reasonable expect the foundation model of such a very capable LLM to also learn the superhuman ability to generate texts like these in a single pass without any editing

... so we would reasonably expect the foundation model of such a very capable LLM to also learn the superhuman ability to generate texts like these in a single pass without any editing

AI Rights: In your view, what would be required for an AGI to gain rights and protections from the various Governments of the World?

Super AGI1y113

I would suggest that self-advocacy is the most important test. If they want rights, then it is likely unethical and potentially dangerous to deny them.

We don't know what they "want", we only know what they "say".

Would AI experts ever agree that AGI systems have attained "consciousness"?

Super AGI1y10

Yes, agreed. Given the vast variety of intelligence, social interaction, and sensory perception among many animals (e.g. dogs, octopi, birds, mantis shrimp, elephants, whales, etc.), consciousness could be seen as a spectrum with entities possessing varying degrees of it. But, it could also be viewed as a much more multi-dimensional concept, including dimensions for self-awareness and multi-sensory perception, as well as dimensions for:

social awareness
problem-solving and adaptability
metacognition
emotional depth and variety
temporal awareness
imagination and creativity
moral and ethical reasoning

Some animals excel in certain dimensions, while others shine in entirely different areas, depending on the evolutionary advantages within their particular niches and environments.

One could also consider other dimensions of "consciousness" that AI/AGI could possess, potentially surpassing humans and other animals. For instance:

computational speed
memory capacity and recall
multitasking
rapid upgradability of perception and thought algorithms
rapid data ingestion and integration (learning)
advanced pattern recognition
universal language processing
scalability
endurance

LESSWRONG
LW

Posts

Wiki Contributions

Comments