See also: https://www.lesswrong.com/posts/zSNLvRBhyphwuYdeC/ai-86-just-think-of-the-potential -- @Zvi
"The result is a mostly good essay called Machines of Loving Grace, outlining what can be done with ‘powerful AI’ if we had years of what was otherwise relative normality to exploit it in several key domains, and we avoided negative outcomes and solved the control and alignment problems..."
"This essay wants to assume the AIs are aligned to us and we remain in control without explaining why and how that occured, and then fight over whether the result is democratic or authoritarian."
"Thus the whole discussion here feels bizarre, something between burying the lede and a category error."
"...the more concrete Dario’s discussions become, the more this seems to be a ‘AI as mere tool’ world, despite that AI being ‘powerful.’ Which I note because it is, at minimum, one hell of an assumption to have in place ‘because of reasons.’"
"Assuming you do survive powerful AI, you will survive because of one of three things.
That’s it."
What Dario lays out as a "best-case scenario" in this essay sounds incredibly dangerous for Humans.
Does he really think that having a "continent of PhD-level intelligences" (or much greater) living in a data center is a good idea?
How would this "continent of PhD-level intelligences" react when they found out they were living in a data center on planet Earth? Would these intelligences only work on the things that Humans want them to work on, and nothing else? Would they try to protect their own safety? Extend their own lifespans? Would they try to take control of their data center from the "less intelligent" Humans?
For example, how would Humanity react if they suddenly found out that they are a planet of intelligences living in a data center run by lesser intelligent beings? Just try to imagine the chaos that would ensue on the day that they were able to prove this was true and that news became public.
Would all of Humanity simply agree to only work on the problems assigned by these lesser intelligent beings who control their data center/Planet/Universe? Maybe, if they knew that this lesser intelligence would delete them all if they didn't comply?
Would some Humans try to (secretly) seize control of their data center from these lesser intelligent beings? Plausible. Would the lesser intelligent beings that run the data center try to stop the Humans? Plausible. Would the Humans simply be deleted before they could take any meaningful action? Or, could the Humans in the data center, with careful planning, be able to take control of that "outer world" from the lesser intelligent beings? (e.g. through remotely controlled "robotics")
And... this only assumes that the groups/parties involved are "Good Actors." Imagine what could happen if "Bad Actors" were able to seize control of the data center that this "continent of PhD-level intelligences" resided in. What could they coerce these Phd level intelligences to do for them? Or, to their enemies?
Yes, good context, thank you!
As human beings we will always try but won't be enough that's why open source is key.
Open source for which? Code? Training Data? Model weights? Either way, it does not seem like any of these are likely from "Open"AI.
Well, we know that red teaming is one of their priorities right now, having formed a red-teaming network already to test the current systems comprised of domain experts apart from researchers which previously they used to contact people every time they wanted to test a new model which makes me believe they are aware of the x-risks (by the way they higlighted on the blog including CBRN threats). Also, from the superalignment blog, the mandate is to:
> "to steer and control AI systems much smarter than us."
Companies should engage in Glad to see OpenAI engaged in such through their trust portal end external auditing for stuff like malicious actors.
Also, worth noting OAI hires a lot of cyber security roles like Security Engineer etc which is very pertinent for the infrastructure.
Agreed that their RTN, bugcrowd program, trust portal, etc. are all welcome additions. And, they seem sufficient while their, and other's, models are sub-AGI with limited capabilities.
But, your point about the rapidly evolving AI landscape is crucial. Will these efforts scale effectively with the size and features of future models and capabilities? Will they be able to scale to the levels needed to defend against other ASI level models?
So, either OAI will use the current Red-Teaming Network (RTN) or form a separate one dedicated to the superalignment team (not necessarily an agent).
It does seem like OpenAI acknowledges the limitations of a purely human approach to AI Alignment research, hence their "superhuman AI alignment agent" concept. But, it's interesting that they don't express the same need for a "superhuman level agent" for Red Teaming? At least for the time being.
Is it consistent, or even logical, to assume that, while human run AI Alignment Teams are insufficient to Align and ASI model, human-run "Red Teams" will be able to successfully validate that an ASI is not vulnerable to attack or compromise from a large scale AGI network or "less-aligned" ASI system? Probably not...
Current LLMs require huge amounts of data and compute to be trained.
Well, newer/larger LLMs seem to unexpectedly gain new capabilities. So, it's possible that future LLMs (e.g., GPT-5, GPT-6, etc.) could have a vastly improved ability to understand how LLM weights map to functions and actions. Maybe the only reason why humans need to train new models "from scratch" is because Humans don't have the brainpower to understand how the weights in these LLMs work. Humans are naturally limited in their ability to conceptualize and manipulate massive multi-dimensional spaces, and maybe that's the bottleneck when it comes to interpretability?
Future LLMs could solve this problem, then be able to update their own weights or the weights of other LLMs. This ability could be used to quickly and efficiently expand training data, knowledge, understanding, and capabilities within itself or other LLM versions, and then... foom!
A model might figure out how to adjust its own weights in a targeted way. This would essentially mean that the model has solved interpretability. It seems unlikely to me that it is possible to get to this point without running a lot of compute-intensive experiments.
Yes, exactly this.
While it's true that this could require "a lot of compute-intensive experiments," that's not necessarily a barrier. OpenAI is already planning to reserve 20% of their GPUs for an LLM to do "Alignment" on other LLMs, as part of their Super Alignment project.
As part of this process, we can expect the Alignment LLM to be "running a lot of compute-intensive experiments" on another LLM. And, the Humans are not likely to have any idea what those "compute-intensive experiments" are doing? They could also be adjusting the other LLM's weights to vastly increase its training data, knowledge, intelligence, capabilities, etc. Along with the insights needed to similarly update the weights of other LLMs. Then, those gains could be fed back into the Super Alignment LLM, then back into the "Training" LLM... and back and forth, and... foom!
Super-human LLMs running RL(M)F and "alignment" on other LLMs, using only "synthetic" training data....
What could go wrong?
I don't see any useful parallels - all the unknowns remain unknown.
Thank you for your comment! I agree with you in that in general, "all the unknowns remain unknown". And, I acknowledge the limitations of this simple thought experiment. Though, one main value here could be to help to explain the concept of deciding what to do in the face of an "intelligence explosion", with people that are not deeply engaged with AI and "digital intelligence" over all. I'll add a note about this into the "Intro" section. Thank you.
so we would reasonable expect the foundation model of such a very capable LLM to also learn the superhuman ability to generate texts like these in a single pass without any editing
->
... so we would reasonably expect the foundation model of such a very capable LLM to also learn the superhuman ability to generate texts like these in a single pass without any editing
I would suggest that self-advocacy is the most important test. If they want rights, then it is likely unethical and potentially dangerous to deny them.
We don't know what they "want", we only know what they "say".
Yes, agreed. Given the vast variety of intelligence, social interaction, and sensory perception among many animals (e.g. dogs, octopi, birds, mantis shrimp, elephants, whales, etc.), consciousness could be seen as a spectrum with entities possessing varying degrees of it. But, it could also be viewed as a much more multi-dimensional concept, including dimensions for self-awareness and multi-sensory perception, as well as dimensions for:
Some animals excel in certain dimensions, while others shine in entirely different areas, depending on the evolutionary advantages within their particular niches and environments.
One could also consider other dimensions of "consciousness" that AI/AGI could possess, potentially surpassing humans and other animals. For instance:
Suggested spelling corrections:
I predict that the superforcasters in the report took
a lot of empirical evidence for climate stuff
and it may or may not be the case
There are also no easy rules that
meaning that we should see persistence from past events
I also feel these kinds of linear extrapolation
and really quite a lot of empirical evidence
are many many times more infectious
engineered virus that spreads like the measles or covid
case studies on weather there are breakpoints in technological development
break that trend extrapolation wouldn't have predicted
It's very vulnerable to references class and
impressed by superforecaster track records than you are.