Here's something which makes me feel very much as if I'm in a cult:

After LLMs became a massive thing, I've heard a lot of people p(doom) on the basis that we were in shorter timelines. 

How have we updated p(doom) on the idea that LLMs are very different than hypothesized AI? 

Firstly, it would seem to me to be much more difficult to FOOM with an LLM, it would seem much more difficult to create a superintelligence in the first place, and it seems like getting them to act creatively and be reliable are going to be much harder problems than making sure they aren't too creative.

LLMs often default to human wisdom on topics, the way we're developing them with AutoGPT they can't even really think privately, if you had to imagine a better model of AI for a disorganized species to trip into, could you get safer than LLMs?

Maybe I've just not been looking the right places to see how the discourse has changed, but it seems like we're spending all the weirdness points on preventing the training of a language model that at the end of the day will be slightly better than GPT-4.

I will bet any amount of money that GPT-5 will not kill us all.

New Answer
New Comment

6 Answers sorted by

DirectedEvolution

157

Firstly, it would seem to me to be much more difficult to FOOM with an LLM, it would seem much more difficult to create a superintelligence in the first place, and it seems like getting them to act creatively and be reliable are going to be much harder problems than making sure they aren't too creative.

 

Au contraire, for me at least. I am no expert on AI, but prior to the LLM blowup and seeing AutoGPT emerge almost immediately, I thought that endowing AI with the agency[1] would take an elaborate engineering effort that went somehow beyond imitation of human outputs, such as language or imagery. I was somewhat skeptical of the orthogonality thesis. I also thought that it would take massive centralized computing resources not only to train but also to operate trained models (as I said, no expert). Obviously that is not true, and in a utopian outcome, access to LLMs will probably be a commodity good, with lots of roughly comparable models from many vendors to choose from and widely available open-source or hacked models as well.

Now, I see the creation of increasingly capable autonomous agents as just a matter of time, and ChaosGPT is overwhelming empirical evidence of orthogonality as far as I'm concerned. Clearly morality has to be enforced on the fundamentally amoral intelligence that is the LLM.

For me, my p(doom) increased due to the orthogonality thesis being conclusively proved correct and realizing just how cheap and widely available advanced AI models would be to the general public.

Edit: One other factor I forgot to mention is how instantaneously we'd shift from "AI doom is sci-fi, don't worry about it" to "AI doom is unrealistic because it just won't happen, don't worry about it" as LLMs became an instant sensation. I have been deeply disappointed on this issue by Tyler Cowen, who I really did not expect to shift from his usual thoughtful, balanced engagement with advanced ideas to just utter punditry on the issue. I think I understand where he's coming from - the huge importance of growth, the desire not to see AI killed by overregulation in the manner of nuclear power, etc - but still.

It has reinforced my belief that a fair fraction of the wealthy segment of the boomer generation will see AI as a way to cheat death (a goal I'm a big fan of), and will rush full-steam ahead to extract longevity tech out of it because they personally do not have time to wait to align AI, and they're dead either way. I expect approximately zero of them to admit this is a motivation, and only a few more to be crisply conscious of it. 

  1. ^

    creating adaptable plans to pursue arbitrarily specified goals in an open-ended way

It sounds like your model of AI apocalypse is that a programmer gets access to a powerful enough AI model that they can make the AI create a disease or otherwise cause great harm?

Orthogonality and wide access as threat points both seem to point towards that risk.

I have a couple of thoughts about that scenario- 

OpenAI (and hopefully other companies as well) are doing the basic testing of how much harm can be done with a model used by a human, the best models will be gate kept for long enough that we can expect the experts will know the capabilities of ... (read more)

3DirectedEvolution
AI risk is disjunctive - there are a lot of ways to proliferate AI, a lot of ways it could fail to be reasonably human-aligned, and a lot of ways to use or allow an insufficiently aligned AI to do harm. So that is one part of my model, but my model doesn't really depend on gaming out a bunch of specific scenarios. I'd compare it to the heuristic economists use that "growth is good:" we don't know exactly what will happen, but if we just let the market do its magic, good things will tend to happen for human welfare. Similarly, "AI is bad (by default):" we don't know exactly what will happen, but if we just let capabilities keep on enhancing, there's a >10% chance we'll see an unavoidably escalating or sudden history-defining catastrophe as a consequence. We can make micro-models (i.e. talking about what we see with ChaosGPT) or macro-models (i.e. coordination difficulties) in support of this heuristic. I don't think this is accurate. They are testing specific harm scenarios where they think the risks are manageable. They are not pushing AI to the limit of its ability to cause harm. In this model, the experts may well release a model with much capacity for harm, as long as they know it can cause that harm. As I say, I think it's unlikely that the experts are going to figure out all the potential harms - I work in biology, and everybody knows that the experts in my field have many times released drugs without understanding the full extent of their ability to cause harm, even in the context of the FDA. My field is probably overregulated at this point, but AI most certainly is not - it's a libertarian's dream (for now). Models are small enough that if hacked out of the trainer's systems, they could be run on a personal computer. It's training that is expensive and gatekeeping-compatible. We don't need to posit that a human criminal will be actively using the AI to cause havok. We only need imagine an LLM-based computer virus hacking other computers, importing its LL

Daniel Kokotajlo

14-4

I disagree with your premise; what's currently happening is very much in-distribution for what was prophecied. It's definitely got a few surprises in it, but "much more difficult to FOOM" and the other things you list aren't among them IMO.

I agree that predict-the-world-first, then-develop-agency (and do it via initially-human-designed-bureaucracies) is a safer AGI paradigm than e.g. "train a big NN to play video games and gradually expand the set of games it can play until it can play Real Life." (credit to Jan Leike for driving this point home to me). I don't think this means things will probably be fine; I think things will probably not be fine.

We could have had CAIS (Comprehensive AI Services) though, and that would have been way safer still. (At least, five years ago more people seemed to think this, I was not among them) Alas that things don't seem to be heading in that direction.

By "what was prophecied", I'm assuming you mean EY's model of the future as written in the sequences and moreover in hanson foom debates.

EY's foom model goes something like this:

  • humans are nowhere near the limits of intelligence - not only in terms of circuit size, but also crucially in terms of energy efficiency and circuit/algorithm structure

  • biology is also not near physical limits - there is a great room for improvement (ie strong nanotech)

  • mindspace is wide and humans occupy only a narrow slice of it

So someday someone creates an AGI, and then it can "rewrite its source code" to create a stronger or at least faster thinker, quickly bottoming out in a completely alien mind far more powerful than humans which then quickly creates strong nanotech and takes over the world.

But he was mostly completely wrong here - because human brains are actually efficient, and biology is actually pretty much pareto optimal so we can mostly rule out strong nanotech.

So instead we are more slowly advancing towards brain-like AGI, where we train ANNs through distillation on human thoughts to get AGI designed in the image of the human mind, which thinks human-like thoughts including our vario... (read more)

Upvoted for quality argument/comment, but agreement-downvoted. 

I wasn't referring specifically to Yudkowsky's views, no.

I disagree that energy efficiency is relevant, either as a part of Yudkowskys model or as a constraint on FOOM.

I also disagree that nanotech possibility is relevant. I agree that Yud is a big fan of nanotech, but FOOM followed by rapid world takeover does not require nanotech.

I think mindspace is wide. It may not be wide in the ways your interpretation of Yud thinks it is, but it's wide in the relevant sense -- there's lots of room for improvement in general intelligence, and human values are complex/fragile.

Thanks for the link to Hanson's old post; it's a good read! I stand my my view that Yudkowsky's model is closer to reality than Hanson's.

6jacob_cannell
I said efficiency in general, not energy efficiency specifically. Assume moore's law is over now, and the brain is fully flop efficient, such that training AGI requires at least 1e24 flops (and perhaps even 1e23B memops) on a 1e13B+ model. There is no significant further room for any software or hardware improvement - at all. In that world, is EY's FOOM model correct in the slightest? Everything about foom depends on efficiency of AGI vs the brain. You are also probably mistaken that efficiency is not a part of EY's model, in part because he seems to agree that foom depends on thermodynamic efficiency improvement over the brain, and explicity said so a bit over a year ago: This is a critical flaw in his model, which spurred me to write an entire post to refute. I also agree that nanotech is not that relevant (unless you are talking about practical toop-down nanotech, aka chip lithography), but I was discussing EY's model in which strong nanotech is important.
8Daniel Kokotajlo
Your link went to a post which we had previously argued about... Your wonderful post goes into all sorts of details about the efficiency of the brain, most centrally energy efficiency, but doesn't talk about the kinds of efficiency that matter most. The kind of efficiency that matters most is something like "Performance on various world-takeover and R&D tasks, as a function of total $, compute, etc. initially controlled." Here are the kinds of efficiency you talk about in that post (direct quote): Yeah, those are all interesting and worth thinking about but not what matters at the end of the day. (To be clear, I am not yet convinced by your arguments in the post, but that's a separate discussion) Consider the birds vs. planes analogy. My guess is that planes still aren't as efficient as birds in a bunch of metrics (energy expended per kg per mile travelled, dollar cost to manufacture per kg, energy cost to manufacture per kg...) but that hasn't stopped planes from being enormously useful militarily and economically, much more so than birds. (We used to use birds to carry messages; occasionally people have experimented with them for military purposes also e.g. anti-drone warfare). Funnily enough, I think these assumptions are approximately correct* & yet I think once we get human-level AGI, we'll be weeks rather than years from superintelligence. If you agree with me on this, then it seems a bit unfair to dunk on EY so much, even if he was wrong about various kinds of brain efficiency. Basically, if he's wrong about these kinds of brain efficiency, then the maximum limits of intelligence reachable by FOOM are lower than Yud thought, and also the slope of the intelligence explosion will probably be a bit less steep. And I'm grateful that your post exists carefully working through the issues there. But quantitatively if it still takes only a few weeks to reach superintelligence -- by which I mean AGI which is significantly more competent than the best-ever humans at
2jacob_cannell
Efficiency in terms of intelligence/$ is obviously downstream dependent on the various lower level metrics I cited. I may somewhat agree, depending on how we define SI. However the current transformer GPU paradigm seems destined for a slowish takeoff. GPT4 used perhaps 1e25 flops and produced only a proto-AGI (which ironically is far more general than any one human, but still missing critical action/planning skills/experience), and it isn't really feasible to continue that scaling to 1e27 flops and beyond any time soon. I don't think its unfair at all. EY's unjustified claims are accepted at face value by too many people here, but in reality his sloppy analysis results in a poor predictive track record. The AI/ML folks who dismiss the LW doom worldview as crankish are justified in doing so if this is the best argument for doom. I'm not sure what the "only a few weeks" measures, but I'll assume you are referring to the duration of a training run. For various reasons I believe this will tend to be a few months or more for the most competitive models at least for the foreseeable future, not a few weeks. We already have proto-AGI in the form of GPT4 which is already more competent than the average human at most white-collar, non-robotic tasks. Further increase in generality is probably non-useful, most of the further value will come from improving agentic performance to increasingly out-compete the most productive/skilled humans in valuable skill niches - ie going for more skill depth rather than width. This may require increasing specialization and larger parameter counts - for example if creating the world's best lisp programmer requires 1T params just by itself, that will result in pretty slow takeoff from here. I also suspect it may also be possible to soon have a (very expensive) speed-intelligence that is roughly human-level ability but thinks 100x or 1000x faster, but that isn't the kind of FOOM EY predicted. That's a scenario I predicted and hanson and othe
4habryka
Not commenting on this whole thread, which I do have a lot of takes about that I am still processing, but a quick comment on this line: I don't see any reason for why we wouldn't see a $100B training run within the next few years. $100B is not that much (it's roughly a third of Google's annual revenue, so if they really see competition in this domain as an existential threat, they alone might be able to fund a training run like this).  It might have to involve some collaboration of multiple tech companies, or some government involvement, but I currently expect that if scaling continues to work, we are going to see a $100B training run (though like, this stuff is super hard to forecast, so I am more like 60% on this, and also wouldn't be surprised if it didn't happen).
6jacob_cannell
In retrospect I actually somewhat agree with you so I edited that line and denoted with a strike-through. Yes a $100B training run is an option in theory, but it is unlikely to translate to a 100x increase in training compute due to datacenter scaling difficulties, and this is also greater than OpenAI's estimated market cap. (I also added a note with a quick fermi estimate showing that a training run of that size would require massively increasing nvidia's GPU output by at least an OOM) For various reasons I expect even those with pockets that deep to instead invest more in a number of GPT4 size runs exploring alternate training paths.

tailcalled

95

I basically agree that LLMs don't seem all that inherently dangerous and am somewhat confused about rationalists' reaction to them. LLMs seem to have some inherent limitations.

That said, I could buy that they could become dangerous/accelerate timelines. To understand my concern, let's consider a key distinction in general intelligence: horizontal generality vs vertical generality.

  • By horizontal generality, I mean the ability to contribute to many different tasks. LLMs supersede or augment search engines in being able to funnel information from many different places on the internet right to a person who needs it. Since the internet contains information about many different things, this is often useful.
  • By vertical generality, I mean the ability to efficiently complete tasks with minimal outside assistance. LLMs do poorly on this, as they lack agency, actuators, sensors and probably also various other things needed to be vertically general.

(You might think horizontal vs vertical generality is related to breadth vs depth of knowledge, but I don't think it is. The key distinction is that breadth vs depth of knowledge concerns fields of information, whereas horizontal vs vertical generality concerns tasks. Inputs vs outputs. Some tasks may depend on multiple fields of knowledge, e.g. software development depends on programming capabilities and understanding user needs, which means that depth of knowledge doesn't guarantee vertical generality. On the other hand, some fields of knowledge, e.g. math or conflict resolution, may give gains in multiple tasks, which means that horizontal generality doesn't require breadth of knowledge.)

While we have had previous techniques like AlphaStar with powerful vertical generality, they required a lot of data from those domains they functioned in in order to be useful, and they do not readily generalize to other domains.

Meanwhile, LLMs have powerful horizontal generality, and so people are integrating them into all sorts of places. But I can't help but wonder - I think the integration of LLMs in various places will develop their vertical generality, partly by giving them access to more data, and partly by incentivizing people to develop programmatic scaffolding which increases their vertical generality.

So LLMs getting integrated everywhere may incentivize removing their limitations and speeding up AGI development.

mako yass

20

Note that a lot of people are responding to a nontrivial enhancement of LLMs that they can see over the horizon, but wont talk about publicly for obvious reasons, so it wont be clear what they're reacting to and they also might not say when you ask.

Though, personally, although my timelines have shortened, my P(Doom) has decreased in response to LLMs, as it seems more likely now that we'll be able to get machines to develop an ontology and figure out what we mean by "good" before having developed enough general agency to seriously deceive us or escape the lab. However, shortening timelines have still led me to develop an intensified sense of focus and urgency. Many of the things that I used to be interested in doing don't make sense any more. I'm considering retraining.

Hey Mako, I haven't been able to identify anyone who seems to be referring to an enhancement in LLMs that might be coming soon.

Do you have evidence that this is something people are implicitly referring to? Do you personally know someone who has told you this possible development, or are you working as an employee for a company which makes it very reasonable for you to know this information?

If you have arrived at this information through a unique method, I would be very open to hearing that.

2mako yass
Basically everyone working AGI professionally sees potential enhancements on prior work that they're not talking about. The big three have NDAs even just for interviews, and if you look closely at what they're hiring for it's pretty obvious they're trying a lot of stuff that they're not talking about. It seems like you're touching on a bigger question: Do the engines of invention see where they're going, before they arrive. Personally, I think so, but it's not a very legible skill so people underestimate it, or half-ass it.
[+][comment deleted]10

Raemon

20

I didn't really update on LLMs in the past year. I did update after GPT2* that LLMs were a proof of concept that we could do a variety of types of cognition, and the mechanism of how the cognition played out seemed to have similar mid-level-building-blocks of my cognition. So, it was an update on timelines (which can affect p(doom)).

GPT4 is mostly confirming that hypothesis rather that providing significant new evidence (it'd have been an update for me if GPT4 hadn't been that useful)

*in particular after this post https://slatestarcodex.com/2020/01/06/a-very-unlikely-chess-game/

(I think people are confusing "rationalists are pointing at LLMs as a smoking gun for a certain type of progress being possible" as "rationalists are updating on LLMs specifically being dangerous")

9Rudi C
EY explicitly calls for an indefinite ban on training GPT5. If GPTs are harmless in the near future, he’s being disingenuous by scaring people from nonexistent threats and making them forgo economic (and intellectual) progress so that AGI timelines are vaguely pushed a bit back. Indeed, by now I won’t be surprised if EY’s private position is to oppose all progress so that AGI is also hindered along everything else. This position is not necessarily wrong per se, but EY needs to own it honestly. p(doom) doesn’t suddenly make deceiving people okay.
2Raemon
The reason to ban GPT5 (at least in my mind), is because each incremental chunk of progress reduces the amount of distance from here to AGI Foom and total loss of control of the future, and because there won't be an obvious step after GPT5 at which to stop. (I think GPT5 wouldn't be dangerous by default, but could maybe become dangerous if used as the base for a RL trained agent-type AI, and we've seen with GPT4 that people move on to that pretty quickly)
3Rudi C
1. This argument (no apriori known fire alarm after X) applies to GPT4 not much better than any other impressive AI system. More narrowly, it could have been said about GPT3 as well. 2. I can’t imagine a (STEM) human-level LLM-based AI to FOOM. 2.1 LLMs are slow. Even GPT3.5-turbo is only a bit faster than humans, and I doubt a more capable LLM to be able to reach even that speed. 2.1.1 Recursive LLM calls ala AutoGPT are even slower. 2.2 LLMs’ weights are huge. Moving them around is difficult and will leave traceable logs in the network. LLMs can’t copy themselves ad infinitum. 2.3 LLMs are very expensive to run. They can’t just parasitize botnets to run autonomously. They need well funded human institutions to run. 2.4 LLMs seem to be already plateauing. 2.5 LLMs can’t easily self-update like all other deep models; “catastrophic forgetting.” Updating via input consumption (pulling from external memory to the prompt) is likely to provide limited benefits. So what will such a smart LLM accomplish? At most, it’s like throwing a lot of researchers at the problem. The research might become 10x faster, but such an LLM won’t have the power to take over the world. One concern is that once such an LLM is released, we can no longer pause even if we want to. This doesn’t seem that likely on a first thought; human engineers are also incentivized to siphon GPU hours to mine crypto, yet this did not happen at scale. So the smart LLM will also not be able to stealthily train other models on institutional GPUs. 1. I do not expect to see such a smart LLM in this decade. GPT4 can’t even play tic-tac-toe well; Its reasoning ability seems very low. 2. Mixing RL and LLMs seems unlikely to lead to anything major. AlphaGo etc. probably worked so well because of the search mechanism (simple MCTS beats most humans) and the relatively low dimensionality of the games. ChatGPT is already utilizing RLHF and search in its decoding phase. I doubt much more can be added. AutoGPT h

waveman

10

My only update was the thought that maybe more people will see the problem. The whole debate in the world at large has been a cluster***k.

* Linear extrapolation - exponentials apparently do not exist
* Simplistic analogies e.g. the tractor only caused 10 years of misery and unemloyment so any further technology will do no worse.
* Conflicts of interest and motivated reasoning
* The usual dismissal of geeks and their ideas
* Don't worry leave it to the experts. We can all find plenty of examples where this did not work. https://en.wikipedia.org/wiki/List_of_laboratory_biosecurity_incidents
* People saying this is risky being interpreted as a definite prediction of a certain outcome.

As Elon Musk recently pointed out the more proximate threat may be the use of highly capable AIs as tools e.g. to work on social media to feed ideas to people and manipulate them. Evil/amoral/misaligned AI takes over the world would happen later. 

Some questions I ask people:

* How well did the advent of homo sapiens work out for less intelligent species like homo habilis? Why would AI be different?
* Look at the strife between groups of differing cognitive abilities and the skewed availability of resources between those groups (deliberately left vague to avoid triggering someone).
* Look how hard it is to predict the impact of technology - e.g. Krugman's famous insight that the internet would have no more impact than the fax machine. I remember doing a remote banking strategy in 1998 and asking senior management where they thought the internet fitted into their strategy. They almost all dismissed it as a land of geeks and academics and of no relevance to real businesses. A year later they demanded to know why I had misrepresented their clear view that the internet was going to be central to banking henceforth. Such is the ability of people to think they knew it all along, when they didn't. 
 

What are your opinions about how the technical quirks of LLMs influences their threat levels? I think the technical details are much more amenable to a lower threat level. 

If you update on P(doom) every time people are not rational you might be double-counting btw. (AKA you can't update every time you rehearse your argument.)

7 comments, sorted by Click to highlight new comments since:

How have we updated p(doom) on the idea that LLMs are very different than hypothesized AI? 

Actually. what were your predictions? "Hypothesized AI", as far as I understood you, is only a final step - AGI that kills us. Path to it can be very weird. I think that before GPT many people could say "my peak of probability distribution lies on model-based RL as path to AGI", but they still had very fat and long tails in this distribution.

it seems like we're spending all the weirdness points on preventing the training of a language model that at the end of the day will be slightly better than GPT-4.

The point of slowing down AI is not preventing training next model, the point is to slow down AI. There is no right moment to slow down AI in future, because there is no fire alarm for AI (i.e., there is no formally defined threshold in capabilities that can logically convince everyone to halt development of AI until we solve alignment problem), right moment is "right now" and that was true for every moment of time since the moment we realized that AI can kill us all (somewhen in 1960s?).

I suspect it to be worth distinguishing cults from delusional ideologies. As far as I can tell, it is common for ideologies to have inelastic false poorly founded beliefs; the classical example is belief in the supernatural. I'm not sure what the exact line between cultishness and delusion is, but I suspect that it's often useful to define cultishness as something like treating opposing ideologies as infohazards. While rationalists are probably guilty of this, the areas where they are guilty of it doesn't seem to be p(doom) or LLMs, so it might not be informative to focus cultishness accusations on that.

My timelines got shorter. ChatGPT to GPT-4 rollout was only a few months (the start of an exponential takeoff, like our recent experience with COVID?), and then we had the FLI petition, and Eliezer's ongoing podcast tour, and the ARC experiment with GPT-4 defeating a captcha by lying to a human.

I also personally experienced talking to these things, and they can more-or-less competently write code, one of the key requirements for an intelligence explosion scenario.

Before all this, I felt that the AI problem couldn't possibly happen at present, and we still had decades, at least. I don't think so anymore. All of the pieces are here and it's only a matter of putting them together and adding more compute.

I used to have the bulk of my probability mass around 2045, because that's when cheap compute would catch up with estimates of the processing power of the human brain. I now have significant probability mass on takeoff this decade, and noticeably nonzero mass on it having happened yesterday and not caught up with me.

I will bet any amount of money that GPT-5 will not kill us all.

What's the exchange rate for USD to afterlife-USD, though? Or what if they don't use currency in the afterlife at all? Then how would you pay the other party back if you lose?

I'll make an even stronger bet: I will bet any amount of USD you like, at any odds you care to name, that USD will never become worthless.

if you had to imagine a better model of AI for a disorganized species to trip into, could you get safer than LLMs?

Conjecture's CoEms, which are meant to be cognitively anthropomorphic and transparently interpretable. (They remind me a bit of the Chomsky-approved concept of "anthronoetic AI".) 

I don't see how LLMs are "very different" from hypothesized AI. 

Personally my p(doom) was already high and increased modestly but not fundamentally after recent advances.