RLHF and Fine-Tuning have not worked well so far. Models are often unhelpful, untruthful, inconsistent, in many ways that had been theorized in the past. We also witness goal misspecification, misalignment, etc. Worse than this, as models become more powerful, we expect more egregious instances of misalignment, as more optimization will push for more and more extreme edge cases and pseudo-adversarial examples.
These three links are:
If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are.
Yes? Not sure what to say beyond that.
Without saying anything about the obstacles themselves, I'll make a more meta-level observation: the field of ML has a very specific "taste" for research, such that certain kinds of problems and methods have really high or really low memetic fitness, which tends to make the tails of "impressiveness and volume of research papers, for ex. seen on Twitter" and "absolute progress on bottleneck problems" come apart.
+1. While I will also respect the request to not state them in the comments, I would bet that you could sample 10 ICML/NeurIPS/ICLR/AISTATS authors and learn about >10 well-defined, not entirely overlapping obstacles of this sort.
We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.
I don't want people to skim this post and get the impression that this is a common view in ML.
The problem with asking individual authors is that most researchers in ML don't have a wide enough perspective to realize how close we are. Over the past decade of ML, it seems that people in the trenches of ML almost always think their research is going slower than it is because only a few researchers have broad enough gears models to plan the whole thing in their heads. If you aren't trying to run the search for the foom-grade model in your head at all times, you won't see it coming.
That said, they'd all be right about what bottlenecks there are. Just not how fast we're gonna solve them.
>If you have technical understanding of current AIs, do you truly believe there are any major obstacles left?
I‘ve been working in AI (on and off) since 1979. I don’t work on it any more, because of my worries about alignment. I think this essay is mostly correct about short timelines.
That said, I do think there is at least one obstacle between us and dangerous superhuman AI. I haven’t seen any good work towards solving it, and I don’t see any way to solve it myself in the short term. That said, I take these facts as pretty weak evidence. Surprising capabilities keep emerging from LLMs and RL, and perhaps we will solve the problem in the next generation without even trying. Also, the argument from personal incomprehension is weak, because there are lots of people working on AI, who are smarter, more creative, and younger.
I’m of mixed feelings about your request not to mention the exact nature of the obstacle. I respect the idea of not being explicit about the nature of the Torment Nexus. But I think we could get more clarity about alignment by discussing it explicitly. I bet there are people working on it already, and I don’t think discussing it here will cause more people to work on it.
But in the last few years, we’ve gotten: [...]
- Robots (Boston Dynamics)
Broadly agree with this post, though I'll nitpick the inclusion of robotics here. I don't think it's progressing nearly as fast as ML, and it seems fairly uncontroversial that we're not nearly as close to human-level motor control as we are to (say) human-level writing. I only bring this up because a decent chunk of bad reasoning (usually underestimation) I see around AGI risk comes from skepticism about robotics progress, which is mostly irrelevant in my model.
I'm not sure why some skepticism would be unjustified from lack of progress in robots.
Robots require reliability, because otherwise you destroy hardware and other material. Even in areas where we have had enormous progress, (LLMs, Diffusion) we do not have reliability, such that you can trust the output of them without supervision, broadly. So such lack of reliability seems indicative of perhaps some fundamental things yet to be learned.
Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are.
I think this request, absent a really strong compelling argument that is spelled out, creates an unhealthy epistemic environment. It is possible that you think this is false or that it's worth the cost, but you don't really argue for either in this post. You encourage people to question others and not trust blindly in other parts of the post, but this portion expects people to not elaborate on their opinions without an explanation as to why. You repeat this again by saying "So our message is: things are worse than what is described in the post!" without justifying yourselves or, imo, properly conveying the level of caution people should be treating such an unsubstantiated claim.
I'm tempted to write a post replying with why I think there are obstacles to AGI, what broadly they are with a few examples, and why it's important to discuss them. (I'm no...
The reasoning seems straightforward to me: If you're wrong, why talk? If you're right, you're accelerating the end.
I can't in general endorse "first do no harm", but it becomes better and better in any specific case the less way there is to help. If you can't save your family, at least don't personally help kill them; it lacks dignity.
I think that is an example of the huge potential damage of "security mindset" gone wrong. If you can't save your family, as in "bring them to safety", at least make them marginally safer.
(Sorry for the tone of the following - it is not intended at you personally, who did much more than your fair share)
Create a closed community that you mostly trust, and let that community speak freely about how to win. Invent another damn safety patch that will make it marginally harder for the monster to eat them, in hope that it chooses to eat the moon first. I heard you say that most of your probability of survival comes from the possibility that you are wrong - trying to protect your family is trying to at least optimize for such miracle.
There is no safe way out of a war zone. Hiding behind a rock is not therfore the answer.
No idea about original reasons, but I can imagine a projected chain of reasoning:
AGI is happening soon. Significant probability of it happening in less than 5 years.
I agree that there is at least some probability of AGI within 5 years, and my median is something like 8-9 years (which is significantly advanced vs most of the research community, and also most of the alignment/safety/LW community afaik).
Yet I think that the following statements are not at all isomorphic to the above, and are indeed - in my view - absurdly far off the mark:
We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.
If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources?
Let's look at some examples for why.
I see several large remaining obstacles. On the one hand, I'd expect vast efforts thrown at them by ML to solve them at some point, which, at this point, could easily be next week. On the other hand, if I naively model Earth as containing locally-smart researchers who can solve obstacles, I would expect those obstacles to have been solved by 2020. So I don't know how long they'll take.
(I endorse the reasoning of not listing out obstacles explicitly; if you're wrong, why talk, if you're right, you're not helping. If you can't save your family, at least don't personally contribute to killing them.)
If you think you've got a great capabilities insight, I think you PM me or somebody else you trust and ask if they think it's a big capabilities insight.
Maybe it'd be helpful to not list obstacles, but do list how long you expect them to add to the finish line. For instance, I think there are research hurdles to AGI, but only about three years' worth.
Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are.
I guess the reasoning behind the "do not state" request is something like "making potential AGI developers more aware of those obstacles is going to direct more resources into solving those obstacles". But if someone is trying to create AGI, aren't they going to run into those obstacles anyway, making it inevitable that they'll be aware of them in any case?
People are often unaware of what they're repeatedly running into. Problem formulation can go a long way towards finding a solution.
If you do, state so in the comments, but please do not state what those obstacles are.
Yes. But the "reliably" in
The kind of problems that AGI companies could reliably not tear down with their resources?
is doing a lot more work than I'd like.
It's not just alignment that could use more time, but also less alignable approaches to AGI, like model based RL or really anything not based on LLMs. With LLMs currently being somewhat in the lead, this might be a situation with a race between maybe-alignable AGI and hopelessly-unalignable AGI, and more time for theory favors both in an uncertain balance. Another reason that the benefits of regulation on compute are unclear.
LLM characters are human imitations, so there is some chance they remain human-like on reflection (in the long term, after learning from much more self-generated things in the future than the original human-written datasets). Or at least sufficienly human-like to still consider humans moral patients. That is, if we don't go too far from their SSL origins with too much RL and don't have them roleplay/become egregiously inhuman fictional characters.
It's not much of a theory of alignment, but it's closest to something real that's currently available or can be expected to become available in the next few years, which is probably all the time we have.
What I'm expecting, if LLMs remain in the lead, is that we end up in a magical, spirit-haunted world where narrative causality starts to actually work, and trope-aware people essentially become magicians who can trick the world-sovereign AIs into treating them like protagonists and bending reality to suit them. Which would be cool as fuck, but also very chaotic. That may actually be the best-case alignment scenario right now, and I think there's a case for alignment-interested people who can't do research themselves but who have writing talent to write a LOT of fictional stories about AGIs that end up kind and benevolent, empower people in exactly this way, etc., to help stack the narrative-logic deck.
The same game theory that has all the players racing to improve their models in spite of ethics and safety concerns will have them getting the models to self improve if that provides an advantage.
I get the vibe that Conjecture doesn't have forecasting staff, or a sense of iterating on beliefs about the future to update strategy. I sorta get a vibe that Conjecture is just gonna stick with their timelines until new years day 2028 and if we're not all dead, write a new strategy based on a new forecast. Is this accurate?
AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for
This is just a false claim. Seriously, where is the evidence for this? We have AIs that are superhuman at any task we can define a benchmark for? That's not even true in the digital world let alone in the world of mechatronic AIs. Once again i will be saving this post and coming back to it in 5 years to point out that we are not all dead. This is getting ridiculous at this point.
There already are general AIs. They just are not powerful enough yet to count as True AGIs.
Can you say what you have in mind as the defining characteristics of a True AGI?
It's becoming a pet peeve of mine how often people these days use the term "AGI" w/o defining it. Given that, by the broadest definition, LLMs already are AGIs, whenever someone uses the term and means to exclude current LLMs, it seems to me that they're smuggling in a bunch of unstated assumptions about what counts as an AGI or not.
Here are some of the questions I have for folks tha...
Good article! I share some skepticism on the details with other comments. Let me take this opportunity to point out that the government would be in a good position to slow down AI capabilites research.
AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for
Something that I’m really confused about: what is the state of machine translation? It seems like there is massive incentive to create flawless translation models. Yet when I interact with Google translate or Twitter’s translation feature, results are not great. Are there flawless translation models that I’m not aware of? If not, why is translation lagging behind other text analysis and generation tasks?
I am very interested in finding more posts/writing of this kind. I really appreciate attempts to "look at the game board" or otherwise summarize the current strategic situation.
I have found plenty of resources explaining why alignment is a difficult problem and I have some sense of the underlying game-theory/public goods problem that is incentivizing actors to take excessive risks in developing AI anyways. Still, I would really appreciate any resources that take a zoomed-out perspective and try to identify the current bottlenecks, key battlegrounds, local win conditions, and roadmaps in making AI go well.
Why have Self-Driving Vehicle companies made relatively little progress compared to expectations? It seems like autonomous driving in the real world might be nearly AGI-complete, and so it might be a good benchmark to measure AGI progress against. Is the deployment of SDCs being held up to a higher degree of safety than humans holding back progress in the field? Billions have been invested over the past decade across multiple companies with a clear model to operate on. Should we expect to see AGI before SDCs are widely available? I don't think anyone in the field of autonomous vehicles think they will be widely deployed in difficult terrain or inclement weather conditions in five years.
agreed on all points. I'd like to see work submitted to https://humanvaluesandartificialagency.com/ as I think that has a significant chance of being extremely high impact work on fully defining agency and active, agentic coprotection. I am not on my own able to do it, but if someone was up to pair programming with me regularly I could.
This post reads like it wants to convince its readers that AGI is near/will spell doom, picking and spelling out arguments in a biased way.
Just because many ppl on the Forum and LW (including myself) believe that AI Safety is very important and isn't given enough attention by important actors, I don't want to lower our standards for good arguments in favor of more AI Safety.
Some parts of the post that I find lacking:
..."We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to
Setting aside all of my broader views on this post and its content, I want to emphasize one thing:
But in the last few years, we’ve gotten:
[...]
- AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for
I think that this is painfully overstated (or at best, lacks important caveats). But regardless of whether you agree with that, I think it should be clear that this does not send signals of good epistemics to many of the fence-sitters[1] you'd presumably like to persuade.
(Note: Sen also addresses the above quote i...
There are many obstacles with no obvious or money-can-buy solutions.
The claim that current AI is superhuman in just about any task we can benchmark is not correct. The problems being explored are chosen because the researchers think AI have a shot at beating humans at it. Think about how many real world problems we pay other people money to solve that we can benchmark that aren't being solved by AI. Think about why these problems require humans right now.
My upper bound is much more than 15 years because I don't feel I have enough informat...
Gossiping and questioning people about their positions on AGI are prosocial activities!
Surely this depends on how the norm is implemented? I can easily see this falling into a social tarpit where people with mixed agree/disagree with common alignment thinking must either prove ingroup membership by forswearing all possible benefits of getting AGI faster, or else they are otherwise extremized into the neargroup (the "evil engineers" who don't give a damn about safety).
I'm not claiming you're advocating this. But I was quite worried about this when I read the quoted portion.
Monitoring of increasingly advanced systems does not trivially work, since much of the cognition of advanced systems, and many of their dangerous properties, will be externalized the more they interact with the world.
Externalized reasoning being a flaw in monitoring makes a lot of sense, and I haven’t actually heard of it before. I feel that should be a whole post on itself.
I also disagree about whether there are major obstacles left before achieving AGI. There are important test datasets on which computers do poorly compared to humans.
2022-Feb 2023 should update our AGI timeline expectations in three ways:
Anyone know how close we are to things that require operating in the physical world, but are very easy for human beings, like loading a dishwasher, or making an omelette? It seems to me that we are quite far away.
I don't think those are serious obstacles, but I will delete this message if anyone complains.
Do you really think AdeptAI, DeepMind, OpenAI, and Microsoft are the AIs to worry about? I'm more worried about what nation-states are doing behind closed doors. We know about China's Wu Dao, for instance; what else are they working on? If the NRO had Sentient in 2012, what do they have now?
...The Chinese government has a bigger hacking program than any other nation in the world. And their AI program is not constrained by the rule of law and is built on top of massive troves of intellectual property and sensitive data that they've stolen ove
Hmm, while I share your view about the timelines getting shorter and apparent capabilities growing leaps and bounds almost daily, I still wonder if the "recursively self-improving" part is anywhere on the horizon. Or maybe it is not necessary before everything goes boom? I would be more concerned if there was a feedback loop of improvement, potentially with "brainwashed" humans in the loop. Maybe it's coming. I would also be concerned if/once there is a scientific or technological breakthrough thanks to an AI (not just protein folding or exploring too-many...
...1. AGI is happening soon. Significant probability of it happening in less than 5 years.
[Snip]
We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.
Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacl
The definition you quoted is "a machine capable of behaving intelligently over many domains."
It seems to me like existing AI systems have this feature. Is the argument that ChatGPT doesn't behave intelligently, or that it doesn't do so over "many" domains? Either way, if you are using this definition, then saying "AGI has a significant probability of happening in 5 years" doesn't seem very interesting and mostly comes down to a semantic question.
I think it is sometimes used within a worldview where "general intelligence" is a discrete property, and AGI is something with that property. It is sometimes used to refer to AI that can do more or less everything a human can do. I have no idea what the OP means by the term.
My own view is that "AGI company" or "AGI researchers" makes some sense as a way to pick out some particular companies or people, but talking about AGI as a point in time or a specific technical achievement seems unhelpfully vague.
I can think of several obstacles for AGIs that are likely to actually be created (i.e. seem economically useful, and do not display misalignment that even Microsoft can't ignore before being capable enough to be xrisk). Most of those obstacles are widely recognized in the rl community, so you probably see them as solvable or avoidable. I did possibly think of an economically-valuable and not-obviously-catastrophic exception to the probably-biggest obstacle though, so my confidence is low. I would share it in a private discussion, because I think that we are past the point when strict do-no-harm policy is wise.
Yes, there remain many obstacles to AGI. Although current models may seem impressive, and to some extent they are, the way they function is very different to how we think AGI will work. My estimation is more like 20y.
I suppose one question I have to ask, in the context of "slowing down" the development of AI....how? the only pathway I can muster is government regulation. But such an action would need to be global, as any regulation passed in one nation would undoubtedly be bypassed by another, no?
I don't see any legitimate pathway to actually slow down the development of AGI, so I think the question is a false one. The better question is, what can we do to prepare for its emergence? I imagine that there are very tangible actions we can take on that front.
From our point of view, we are now in the end-game for AGI, and we (humans) are losing. When we share this with other people, they reliably get surprised. That’s why we believe it is worth writing down our beliefs on this.
1. AGI is happening soon. Significant probability of it happening in less than 5 years.
Five years ago, there were many obstacles on what we considered to be the path to AGI.
But in the last few years, we’ve gotten:
We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.
Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are.
2. We haven’t solved AI Safety, and we don’t have much time left.
We are very close to AGI. But how good are we at safety right now? Well.
No one knows how to get LLMs to be truthful. LLMs make things up, constantly. It is really hard to get them not to do this, and we don’t know how to do this at scale.
Optimizers quite often break their setup in unexpected ways. There have been quite a few examples of this. But in brief, the lessons we have learned are:
No one understands how large models make their decisions. Interpretability is extremely nascent, and mostly empirical. In practice, we are still completely in the dark about nearly all decisions taken by large models.
RLHF and Fine-Tuning have not worked well so far. Models are often unhelpful, untruthful, inconsistent, in many ways that had been theorized in the past. We also witness goal misspecification, misalignment, etc. Worse than this, as models become more powerful, we expect more egregious instances of misalignment, as more optimization will push for more and more extreme edge cases and pseudo-adversarial examples.
No one knows how to predict AI capabilities. No one predicted the many capabilities of GPT3. We only discovered them after the fact, while playing with the models. In some ways, we keep discovering capabilities now thanks to better interfaces and more optimization pressure by users, more than two years in. We’re seeing the same phenomenon happen with ChatGPT and the model behind Bing Chat.
We are uncertain about the true extent of the capabilities of the models we’re training, and we’ll be even more clueless about upcoming larger, more complex, more opaque models coming out of training. This has been true for a couple of years by now.
3. Racing towards AGI: Worst game of chicken ever.
The Race for powerful AGIs has already started. There already are general AIs. They just are not powerful enough yet to count as True AGIs.
Actors
Regardless of why people are doing it, they are racing for AGI. Everyone has their theses, their own beliefs about AGIs and their motivations. For instance, consider:
AdeptAI is working on giving AIs access to everything. In their introduction post, one can read “True general intelligence requires models that can not only read and write, but act in a way that is helpful to users. That’s why we’re starting Adept: we’re training a neural network to use every software tool and API in the world”, and furthermore, that they “believe this is actually the most practical and safest path to general intelligence” (emphasis ours).
DeepMind has done a lot of work on RL, agents and multi-modalities. It is literally in their mission statement to “solve intelligence, developing more general and capable problem-solving systems, known as AGI”.
OpenAI has a mission statement more focused on safety: “We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome”. Unfortunately, they have also been a major kickstarter of the race with GPT3 and then ChatGPT.
(Since we started writing this post, Microsoft deployed what could be OpenAI’s GPT4 on Bing, plugged directly into the internet.)
Slowing Down the Race
There has been literally no regulation whatsoever to slow down AGI development. As far as we know, the efforts of key actors don’t go in this direction.
We don’t know of any major AI lab that has participated in slowing down AGI development, or publicly expressed interest in it.
Here are a few arguments that we have personally encountered, multiple times, for why slowing down AGI development is actually bad:
Remember that arguments are soldiers: there is a whole lot more interest in pushing for the “Racing is good” thesis than for slowing down AGI development.
Question people
We could say more. But:
So our message is: things are worse than what is described in the post!
Don’t trust blindly, don’t assume: ask questions and reward openness.
Recommendations:
4. Conclusion
Let’s summarize our point of view:
Should we just give up and die?
Nope! And not just for dignity points: there is a lot we can actually do. We are currently working on it quite directly at Conjecture.
We’re not hopeful that full alignment can be solved anytime soon, but we think that narrower sub-problems with tighter feedback loops, such as ensuring the boundedness of AI systems, are promising directions to pursue.
If you are interested in working together on this (not necessarily by becoming an employee or funding us), send an email with your bio and skills, or just a private message here.
We personally also recommend engaging with the writings of Eliezer Yudkowsky, Paul Christiano, Nate Soares, and John Wentworth. We do not endorse all of their research, but they all have tackled the problem, and made a fair share of their reasoning public. If we want to get better together, they seem like a good start.
5. Disclaimer
We acknowledge that the points above don’t go deeply into our models of why these situations are the case. Regardless, we wanted our point of view to at least be written in public.
For many readers, these problems will be obvious and require no further explanation. For others, these claims will be controversial: we’ll address some of these cruxes in detail in the future if there’s interest.
Some of these potential cruxes include:
Edited to include DayDreamer, VideoDex and RT-1, h/t Alexander Kruel for these additional, better examples.