You can say that none of this represents China being concerned for existential risk, and you’d be right. You can say that the primary motivation is the ideological purity of China and any information circulating in China, and you’d be right again. I still say that this reveals a situation in which China has its own reasons to want to slow down AI, in addition to the fact that China is losing the economic competition, and that their primary concern is the wrong AI would be bad rather than them hoping for something actively good.
I think it's plausible that the person who writes that policy cares a lot about existential risk but needs to make ideological arguments to get his policy broad support within the CCP.
GPT-3 to GPT-4 took 3 years. Why is it surprising that the training run for GPT-5 has not yet started?
The crucial thing is that OpenAI never stopped developing, training and tweaking GPT-3 during that time and capabilities made significant progress. They certainly won't stop putting a ton of compute and data and algorithmic ingenuity into GPT-4 until they feel that compute and algorithms have reached a point where training a completely new model from scratch makes sense.
Given that the image perception functionality of GPT-4 hasn't even been rolled out yet, they also haven't been able to collect feedback on which parts need tweaking most.
One could argue that the important point of this piece of information is that the capabilities jump that will come with GPT-5 is not as imminent as if the training had already started. But I strongly suspect that the later the training starts the larger the jump will be.
I wouldn't be worried about a GPT-5 that started training already. I would expect it to be a bigger GPT-4. I'm definitely more worried about a GPT-5 that only starts training after the lessons of massive scale deployment and a thousand different use cases have been incorporated into its architecture.
A fun example are the people who think, simultaneously:
- We should worry about deepfakes, or GPT-4-level-AI-generated propaganda.
- Nope, an artificial superintelligence couldn’t fool me, I’m too smart for that.
This is what people have been saying about advertising for decades. "It works... on other people."
Right. From what I've seen, the people that support censoring misinformation are almost never doing it out of worries that themselves will get misinformed.
Feel free to disagree vociferously, but if Elon launches his own LLM, I am not nearly as worried as if he launches "SpaceX for AutoGPT." My view is that LLMs, while potentially dangerous on their own, are not nearly as intrinsically dangerous as LLMs that are hooked up to robotics, the internet, and given liberty to act autonomously. I agree with Zvi that the people claimining the LLM is the fundamental bottleneck to better AutoGPT performance are calling it way too early. Put a billion dollars and the best engineers in the world behind enhancing AutoGPT capabilities, and Things Will Happen. Making destructive stuff will be the easy part.
“We are not currently training GPT-5. We’re working on doing more things with GPT-4.” – Sam Altman at MIT
Count me surprised if they're not working on GPT-5. I wonder what's going on with this?
I saw rumors that this is because they're waiting on supercomputer improvements (H100s?), but I would have expected at least early work like establishing their GPT-5 scaling laws and whatnot. In which case perhaps they're working on it, just haven't started what is considered the main training run?
I'm interested to know if Sam said any other relevant details in that talk, if anyone knows.
I'm not sure if you've seen it or not, but here's a relevant clip where he mentions that they aren't training GPT-5. I don't quite know how to update from it. It doesn't seem likely that they paused from a desire to conduct more safety work, but I would also be surprised if somehow they are reaching some sort of performance limit from model size.
However, as Zvi mentions, Sam did say:
“I think we're at the end of the era where it's going to be these, like, giant, giant models...We'll make them better in other ways”
The expectation is that GPT-5 would be the next GPT-N but 100x the training compute of GPT-4, but that would probably cost tens of $billions, so GPT-N scaling is over for now.
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
People Would Like a Better Explanation of Why People Are Worried That AI Might Kill Everyone
it seems to me like we need a Young Lady’s Illustrated Primer limited to just patiently explaining to the user why AI Ruin. Where a link could be sent along with instructions to just, “Keep talking to this until it makes sense.” Which in turn seems like we absolutely have the technology currently to make,
I’d apply via the common application to develop this, but I only bring the project management skills (haven’t really coded in over 15 years), and I’m not sure how to hire the right programmers anymore, either.
I think you misunderstood Jorda. They never said the "best possible" engineer is a 2x, nor did they imply it; they were talking about what they had observed in their own career. And there's no indication that they failed to understand that a 2x who uses ChatGPT to double their productivity becomes a 4x relative to the pre-GPT average. This makes the rest of the section read as strawman bashing; if it isn't, it deserves a better illustrative example IMO.
That's a pretty solid point.
I haven't seen any falsifiable theories or compelling evidence that would indicate the superintelligence-human relationship in the far future would substantially deviate from the human-mouse relationship.
Or mouse-ant relationship for that matter.
Though to be fair probably not a whole lot of writers considered things from this perspective.
The most compelling-to-me argument I've seen in that vein is that human civilization is currently, even without AI, on a trajectory to demand more and more energy, and eventually that will involve doing things on a scale sufficient to significantly change the amount of sunlight that reaches the surface of the Earth.
Humans probably won't do that, because we live here (though even there, emphasis on "probably" -- we're not exactly doing great in terms of handling climate change from accidentally changing the amount of CO2 in the atmosphere, and while that's unlikely to be an existential threat it's also not a very good sign for what will happen when humans eventually scale up to using 1000x as much energy).
An AI that runs on silicon can survive in conditions that humans can't survive in, and so its long-term actions probably look bad for life on Earth unless it specifically cares about leaving the Earth habitable.
This argument probably holds even in the absence of a single coherent AI that seizes control of the future, as long as things-which-need-earthlike-conditions don't retain enough control of the future.
My model is that the relevant analogy is not "human relationship with mice in general", it's with "human relationship with mice that live on a patch of ground where we want to build a chip fab, and also there's nowhere else for the mice to go".
Earth could be turned into one huge nature reserve. Analogous to what present day nature reserves are to mice.
A lot of my hope for "humans do not go extinct within the next 50 years" looks something like that, yeah (a lot of the rest is in "it turns out that language models are just straightforwardly easy to align, and that it's just straightforwardly easy to teach them to use powerful tools"). If it turns out that "learn a heuristic that you should avoid irreversible actions that destroy complex and finely-tuned systems" is convergent that could maybe look like the "human reserve".
There's an anthropic argument that if that's what the future looks like, most humans that ever live would live on a human reserve, and as such we should be surprised that we're not. But I'm kinda suspicious of anthropic arguments.
Although it might be possible for various cyborg scenarios, where humans and AI co-exist, co-evolve, co-modify, etc., to follow the space expansion paradigm.
The big news this week was that OpenAI is not training GPT-5, and that China’s draft rules look to be crippling restrictions on their ability to develop LLMs. After all that talk of how a pause was impossible and working with China was impossible and all we could do was boldly rush ahead, the biggest American player and biggest foreign rival both decided for their own internal reasons to do something not entirely unlike a pause.
They just went ahead and did it. We kept saying they’d never do it no matter what, and they just… went ahead and did it. At least somewhat.
This is excellent news. I sincerely hope people are updating on the new information, now that they know such things are not only possible but happening.
In terms of capabilities, the week was highly incremental. Lots of new detail, nothing conceptionally surprising or even unexpected.
Table of Contents
I’m going to suggest reading #29-#30 this time, even if you usually don’t think much about race dynamics or long term worries, as these seem like important notes. Otherwise, the usual applies, mostly start at the top, make sure to check In Other AI News and anything else that seems likely to be useful.
Language Models Offer Mundane Utility
New updated version of perplexity.ai is available.
Website offering resources to have LLMs help scientists with workflows.
Overview of AI performance in strategy games and some talk of how this might translate to its use in an actual war, hits the usual beats. AI is clearly at its best in the micro and tactics, both in games and in the real world, the air force that gives more AI control to its fighter planes is going to have a huge advantage in dogfights. One does not need to solve to know the equilibrium.
You can generate spam messages, except of course it cannot generate inappropriate or offensive content. Which revealed a 59k Twitter account spam network.
While you’re collecting mundane utility, it seems if you go to github.dev instead of github.com you get VS code in a browser? I did not know that so passing along.
Be careful not to subscribe to too much mundane utility, though.
I do worry about the spread of the subscription-based economic model. It strongly encourages specialization and is terrible for consumer surplus. Pay-as-you-go like you do for OpenAI’s API services actually aligns with the real costs, which seems much better.
Theory that Bing works better if you treat it like it is hypercompetent and can handle anything, which will cause Bing to respond as if it understands, whereas if you act like you expect it not to understand then it will respond as if it doesn’t.
Play a game with Bing called ‘visionary’ to get good image prompts.
Summarize business books.
Language Models Don’t Offer Mundane Utility
I still say schools have valuable lessons to teach students. Observe.
Rachel Woods reports six more workplaces banning ChatGPT on April 13.
For many businesses, it will increasingly seem crazy either to ban ChatGPT and similar tools, and also to not ban those tools. Choices will have to be made.
Bing gets itself suspended from image generation for content violations.
I Was Promised Flying Cars But I Will Accept Driverless Cars
From IWF’s Patricia Patnode: I Rode in a Driverless Car, This is the Future. I mean, yes, of course it is, the question is how far away is that future. The two most hilarious things in this write-up are the argument for the safety of driverless cars – gesturing at the fact that one might do the math without actually doing that math – and the author’s relatives being terrified of the driverless car, with two of them following behind her in another car, and her mother texting ‘get out now.’ Wow, everyone, if you feel that way about driverless cars, do I have news for you about non-driverless cars.
Still no idea when we’ll get our driverless cars. People keep saying ‘the problem is actually very hard.’ I have no doubt perfect performance is hard, but that only points out the real problem, which I’m strongly guessing is the performance standard.
Fun With Image Models and Speech Generation
Timothy Lee experiments with a (free) AI generated copy of his voice, finds it uncanny.
EFF talks about copyright law and image models. Deep skepticism of legal suits against image models, and also deep skepticism that artists would like it if they won.
They Took Our Jobs
Never stop doubling down, Roon.
Everyone becoming three times as productive at the thing you are doing, as I discussed before, can go both ways in terms of pay.
Perhaps there is additional induced demand, everyone captures a portion of the new value and pay goes way up.
Perhaps there is an essentially fixed pool of demand for this particular skill, as some people saw in copyediting, at least in the medium term, and now you are doing something you enjoy less, have less skill advantage at, and are working harder to fight for one of a third as many job slots. Or worse, your job goes away, automated entirely.
I remain an optimist overall about overall employment effects. I am not that level of optimist about the specific jobs where productivity jumps. Some will be fine. Many will not.
Alternatively, Roon could be making a weaker yet still quite interesting claim, which is that if you get with the program and adapt ahead of the curve you will do great when your job gets disrupted. Yes, lots of fools who don’t do this might have to find other work, but you will do great. Highly plausible.
This paper reports on which jobs are most exposed to AIs. The abstract:
This is the general ‘looks like there’s a problem therefore government should step in’ logic one sees everywhere. This could be a really interesting paper or it could be mostly worthless, let me know if you think I should do a deep dive.
Bing summarizes the methodology this way:
This seems like it is more question-begging than question-answering.
From the comments of my previous post this week on jobs, a great illustration of a common confusion.
Thus Jorda is asserting these two things at the same time:
This is in theory possible if Jorda was previously a below-average engineer.
In practice, there is an obvious contradiction.
This keeps happening. The same person will confidently assert that:
This is the position that anything too far above ‘typical current human using current best practices’ is magical and absurd and impossible unless we have actually seen it in the wild, or you can actually provide lots of detail on every step such that I could actually implement it, in which case fine, typical human performance changed. Or something?
A fun example are the people who think, simultaneously:
Deepfaketown and Botpocalypse Soon
Fabiens asks about the issue of deep fakes:
Both. The risk was overestimated. Also the risks are rapidly escalating. People haven’t had the time to adjust, to experiment, to build tools and networks and infrastructure. The technology for deepfakes is great compared to where it was a year ago, it’s still nothing compared to a year from now.
Right now, almost all deepfakes can be detected by the human eye, often intuitively without having to actively check. If you actually run robust detail checks, ensure all the shadows and proportions and everything lines up, the tech isn’t there yet. Even if no detail is wrong exactly, there is a kind of vibe to AI artwork versus the vibe of a real thing.
Over time, those flaws will rapidly get harder and harder to detect.
So yes, the problems are coming. I do think they will prove overhyped, as we will still have many defenses, and we will adjust. I also think anyone dismissing such concerns is going to look quite foolish, including in many pictures.
Grading GPT-4
The latest final exam given to GPT-4 was Scott Aaronson’s Quantum Computing final exam from 2019, on which it scored a B, which likely would be modestly improved if it could use a calculation plug-in like WolframAlpha. I can’t say much more about this one because, unlike the economics exams, I don’t know the first thing about the material being tested.
Subbarao Kambhampati tests GPT-4 on block world toy planning tests. It does okay initially, about 30% versus 5% for previous models (100% is very possible if you use GOFAI STRIPS planners, he says). Then when the names of things are obscured and replaced by meaning-bearing other words correctness drops to 3%.
He cites this as ‘oh GPT-4 is mostly pattern matching and pulling examples from its training, nothing to worry about.’ Certainly that is a lot of what it is doing, yet is that not also what we are mostly doing when we work on such problems? Every game designer knows that if you want humans to know what is going on and get the hang of things, and both have a good time and make good decisions, you want your names and labels of everything to make intuitive sense to humans. If you give humans a fully abstract problem, often what they do first is they start giving names to things and set up a concrete metaphor.
Before I take any comfort in GPT-4’s failure here, I’d at least want to see the human performance drop-off from the name changes. I predict it would be large.
Plug-In And Play
Cerebral Valley announces the winners of a GPT-4 plug-in hack-a-thon.
(I did manage to secure GPT-4 API access and am working on learning that, but I still don’t have plug-ins, so I can’t toy around with them, sure would be neat, nudge nudge.)
Anything cool?
In my universe this is anti-useful, why would you want to put LLM content into video form like this, it’s strictly worse than text? Many seem to think differently.
Sounds like a mix of useful things if done well, and things I would not want my plug-in to have permission to do. There’s a video demo, which didn’t help me assess.
Hard to tell from where I sit if this is useful.
The second and third prize winners point to the general new hotness, of integrating LLM agents into workflow. Still way too early to know what they can do well, or safely.
Prompting GPT-4
John David Pressman thinks he has found the way.
Then an example is given.
I have never been impressed by the conversations I have seen using variants of this technique, but I also have not tried it out myself, and keep not finding reasons to get curious about this kind of reasoning in a practical way.
If you want characters and unexpected decisions, here’s an interesting version of thinking step by step.
Go Go Gadget AutoGPT
How to quickly set up AutoGPT on your phone.
Or use a web interface (I did not verify them): Cognosys.ai, AiAgent.app, Godmode.space.
Jim Fan expresses deep skepticism of AutoGPT’s abilities, including the future abilities of similar programs so long as they are tied to GPT-4.
Jon Stokes tries to get AutoGPT to write a post about AutoGPT in the style of Jon Stokes. It did not, in my judgment, even match what you can do with normal GPT-4 here.
So, what has AutoGPT done since last time?
It discovered it needed Node and didn’t have it, googled how to install it, found a stackoverflow article with link, downloaded it, extracted it, and then spawned the server while Varun Mayya watched.
I do notice that ‘install a software package someone linked to on Stack Overflow that my LLM found while doing a subtask’ is not the most secure way to run one’s operation.
Here’s a thread on what’s happened with ‘BabyGPT.’ A lot of ‘use this with a different interface,’ not as much ‘here is something it actually accomplished.’
It is still early. The key claim of Fan’s is that the problems of AutoGPT are inherent to GPT-4 and cannot be fixed with further wrapping. If we are getting close to the maximum amount we can get out of creating a framework and using reflection and memory and other tricks, then that seems rather fast. We are only doing some quite basic first things here. Perhaps the core engine simply is not up to the task, yet there are definitely things I would try well before I gave up on it.
Simon Willison discusses details of prompt injection attacks, and why this will go badly for you if you start hooking up LLM-based systems with permissions and automatic loops. I am planning very much on sticking to safer systems.
Good Use of Polling
The question of how worried Americans are about AI has been placed in proper context.
This is a double upgrade. We get the extra category ‘this is impossible’ and also we get to compare AI to several other potential threats. People are more worried about AI than they are about asteroid impacts or alien invasions, less than they worry about climate change, pandemics or an outright act of God. Quite reasonably, nuclear weapons and the related world war are the largest concern of all.
One way to think about this is that AI is currently at 46% concerned, versus 39% for asteroid impact and 62% for climate change, where asteroid impact is ‘we all agree this is possible but we shouldn’t make real sacrifices for this’ and climate change is ‘we treat this as an existential threat worthy of massive economic sacrifices that puts our entire civilization in extreme danger.’ So that’s 30% of the way (21% if looking only at ‘very concerned’) from fun theoretical worry to massive economic sacrifices in the name of risk mitigation.
It will be good to track these results over time.
In other polling news, Americans broadly supportive of an AI pause when asked.
This is not surprising, since American public is very negative on AI (direct link).
I am not one to talk a lot about Democratic Control or Unelected Corporations or anything like that. In almost all situations I agree with Robin’s position on regulation, and on ignoring public calls for it. It still seems worth noting who is the strange doomer saying our standard practices and laws risk destroying most future value unless we ignore the public’s wishes, versus who thinks along with the public that failure to take proper precautions endangers us all.
Or even saying that if we listen to the public, we are doomed to ‘lose to China.’
Robin Hanson, and others opposed to such regulation, are the doomers here.
The Art of the Jailbreak
Asking the Chatbot to act as your dead grandma, who used to tell you how to produce napalm to help you fall asleep.
Various advice on how to communicate with Bing, from Janus.
Tenacity of Life had quite the idea.
The Week in Podcasts
Cognitive Revolution (Nathan Labenz) had Jaan Tallinn on to discuss Pausing the AI Revolution. Jaan explains the logic behind the pause letter, including the need for speed premium, and gives his perspective on the AI landscape and leading AI labs. Lots of great questions here, lots of good detailed responses, better than the usual. Jaan thinks that every new large model carries substantial (1%-50%, point estimate 7%) chance of killing everyone from this point out, and thinks decisions should be made with that in mind.
One interesting question Nathan asked was, would Jaan have applied this logic to GPT-4 as well? Jaan answers yes, it would have been reasonable to not assign at least 1% chance of ruin to training GPT-4, given what we know at the time. I would not go that far, I thought the risk was close to zero, yet I can certainly see ways in which it might not have been zero.
Lex Fridman follows up his interview with Eliezer with an interview of Max Tegmark. Consider this the ‘normal sounding person’ version. There are large differences between Max’s and Eliezer’s models of AI risk, and even larger differences in their media strategies. Max’s presentation and perspective are much more grounded, with an emphasis on the idea that if you create smarter and more capable things that humans, then the future likely belongs to those things you created.
I consider most of what Max has to say here highly sensible. Lex’s questions reveal that he remains curious and well-meaning, but that he mostly failed to understand Eliezer’s perspective (as shown in his ‘Eliezer would say’ questions). I wish we had a better way of knowing which strategies got through to people.
While not technically a podcast, Elon Musk went on Tucker Carlson (clips at link, the original version I saw got removed). The first section deals with his views on AI, with Tucker Carlson playing the straight man who has no idea and letting Elon talk.
Elon tells the story, well-known to many but far from all, that OpenAI happened because he would talk into the night about AI safety with Google founder Larry Page, Page would say that he wanted ASI (artificial superintelligence) as soon as possible, and when Elon quite sensibly what the plan was to make sure humans were going to be all right, Larry Page called Elon a ‘speciesist.’
That’s right. Larry Page was pushing his company to create ASI, and when asked about the risks to humanity, his response was that he did not care, and that caring about whether humanity was replaced by machines created by Larry Page would make you a bad person. You know, like a racist.
I’m not saying I would have done the worst possible thing and founded OpenAI, and reflexively created ‘the opposite’ of Google, open source because Google was closed.
But I understand.
What I don’t understand is his new plan for TruthGPT, on the thought that ‘an AI focused on truth would not wipe us out because we would be an interesting part of the universe.’ I suppose those are words forming a grammatical English sentence? They do not lead me to think Elon Musk has thought through this scenario? Human flourishing is not going to be the ‘most interesting’ possible arrangement of atoms in the solar system, galaxy or universe from the perspective of a future AI or AIs.
Of all the things to aim for in the name of humanity this is a rather foolish one. If anything it is more promising from the Page perspective, in the hopes that it might make the AI inherently valuable once we’re gone. Tucker then says we don’t wipe out chimpanzees because we have souls, so yes Elon is making some sense I guess if that is the standard.
Connor Leahy on AGI and Cognitive Emulation. Haven’t had a chance to listen to this one yet.
Robin Hanson goes on Bankless to explain why Eliezer Yudkowsky is wrong and we’re not all going to die.
Oh look who else was on a podcast this week. Yep, it me. Would be great to show the podcast some love. I felt it was a good discussion. However, if you’ve been reading my posts, then you are not going to find any new content here. I welcome any tips on how to improve my performance in the future.
Occasionally people tell me I should have a podcast. I am highly unconvinced of this, but I reliably enjoy being on other people’s podcasts, so don’t hesitate to reach out.
Sometime in the future: A 6-minute Eliezer TED talk that got a standing ovation, given on less than 4 days notice. Could be a while before we see it.
Scott Aaronson goes on AXRP. Going to put my notes here on the transcript.
He restates his endorsement of the incremental approach to alignment, working on problems where there is data and there are concrete questions to be solved, and hoping that leads somewhere useful – rather than asking what problems actually matter, start somewhere and see what happens. He expects gradual improvements in things like deceit rather than step changes, and agrees this is a crux – if we did expect a step change, Scott would feel pretty doomed. Scott endorses the ‘we can’t slow down because China and Facebook’ rhetoric.
He says, as Robin Hanson highlighted, that he thinks the first time we see an AI really deceive someone, that the whole conversation about will change. I am definitely on the other side of that prediction. I have heard this story several times too many, that some event will definitely be a wake-up call or fire alarm, that people will see the danger. This time, I expect everyone to shrug, I mean phishing emails exist, right? Scott clearly sees deception as a key question in what to worry about, in ways where it seems safe to say that ‘evidence that Scott says would convince him to worry will inevitably be given to Scott in the future.’
Why does everyone doubt that AI will become able to manipulate humans? What, like it’s hard?
Scott seems to put a lot of hope in ‘well if we can extrapolate deception ability over time then we can figure out when we should stop and stop.’ I seriously urge him to think this through and ask if that sounds like the way humans work, or would act, in such situations, even if we had a relatively clear danger line which we probably don’t get – even if deception increases steadily who knows if that leads to an effectiveness step function, and even if it doesn’t, what deception level is ‘dangerous’ and what isn’t? Especially with the not-quite-dangerous AIs giving the arguments about why not to worry. And that’s, as Scott notes, assuming there isn’t any deception regarding ability to deceive, and so on.
There’s a bunch of talk about probabilistic watermarking. Scott seems far more hopeful than anyone else I’ve read on that subject. I notice I expect all the suggested methods here not to work so well against an intelligent adversary, but there is hope they could beat a non-intelligent adversary that is blindly copying output or otherwise not taking steps to evade you.
Which is oddly consistent with Scott’s other approaches – trying to find ways to defeat a stationary or unintelligent foe, that reliably wouldn’t work otherwise, because you got to start somewhere.
In Other AI News
Good competing news summaries are available from Jack Clark at ImportAI, link goes to latest. As you’d expect, mostly covers similar stories to the ones I cover, although in a very different style. This week’s post talked about several things including the paper Emergent autonomous scientific research capabilities of large language models (arXiv), where scientists created a researching agent that scared them.
Announcing Amazon Bedrock. Amazon plans to make a variety of foundational models (LLMs) available through AWS. This will be a one-stop shop to get the compute you need in efficient form, and they plan to offer bespoke fine tuning off even a small handful of examples, with the goal being that customers will mostly be creating custom models.
Amazon is also making coding aid Amazon CodeWhisperer freely available.
Kevin Fisher announces he is putting together the Cybernetic Symbioscene Grant program for funding ambitious tools for human cognition x LLMs, says to DM him if you want to help.
Does it help if I’m in both the first and second camps? I have nothing against the third as written either, and I don’t see any contradictions anywhere. The problem is that I have no idea what this combination of words intends to mean in practice, or how this constitutes a plan. I’m not against it or for it, instead I notice I am confused.
The Responsible AI Index is here to tell you that, statistically speaking, very few companies are being responsible with regard to AI, complete with report.
Scale AI releases AI corporate readiness report (direct link). In 2022, they say 65% of companies created or accelerated their AI strategy, running a broad gamut of industries. 64% of companies used OpenAI models, 26% AI21labs, 26% cohere.
A paper discussing how best to structure your ‘AI Ethics Board.’ If you’re calling it that, I am already skeptical. They suggest an 11-20 person board, which I’d say is clearly too big for almost any board.
Demis Hassabis talks to 60 Minutes (video). Puff piece.
Tammy announces Orthogonal: A new agent foundations alignment organization. I continue to be sad there hasn’t been more efforts going into agent foundations, even if it seems far less hopeful than we previously thought. We don’t exactly have great alternatives.
StabilityAI releases new open source language model, 3B and 7B parameter versions. 15B and 65B versions coming later.
Databricks offers us Dolly 2.0 (announcement, GitHub), a 12B parameter LLM tuned for specific capabilities: Open Q&A, Closed Q&A, extracting information from Wikipedia, summarizing information from Wikipedia, brainstorming, classification and creative writing.
Replit’s Reza Shabani talks about how to train LLMs using Databricks, Hugging Face and MosaicML.
Hype!
A new occasional section, inspired by the speed-running community. Hype hype!
32k Context Window Hype!
I am very excited to get my hands on the 32k token context window, because I write posts more than 8k (and sometimes more than 32k) tokens long. There’s a lot of other great use cases out there for this, too.
This still seems like Bad Use of Hype. Yes, the leap enables some new use cases, but for 95%+ of all use cases it’s not worth the extra cost – remember that you have to pay for the full context window, including your inputs.
In time it will be worth it more often to go large, as people come up with bespoke detailed prompts and want to include lots more info, but most of the mundane utility I expect to stay the same.
Words of Wisdom
This is one of many problems when trying to tell stories about possible futures. Either your tale includes lots of people being dumb in kind of random ways and a lot of random or weird or dumb things happening, or your scenario is highly unrealistic and your predictions are terrible. But if you tell a story with such moves, people point to those moves as unrealistic and stupid and weird.
Roon reminds us that as much as he’s all acceleration talk, at core he agrees on the problem structure.
I am always fascinated by this type of hedging. What’s the ‘maybe’ doing in that sentence? Why draw the line here?
It also raises the question, if that is indeed the most important work that will ever be done, why would one want to allow as little time as possible to get it done?
The Quest For Sane AI Regulation Continues
Regulation of AI is a strange beast.
The normal result of regulations is to slow down progress, prevent innovation, protect insiders at the expense of outsiders and generally make things worse. Usually, that’s terrible. In AI, it is not so obvious, given that the default outcome of progress and innovation is everyone dies and all value is lost.
Thus, calls to make ‘good’ regulatory decisions with regard to short term concerns are effectively calls for accelerationism and taking existential risks, whereas otherwise ‘bad’ regulatory decisions often hold out hope. Of course, we should as usual expect many of the worst possible regulatory decisions, those that destroy short-term utility without providing much help against longer term dangers.
In that vein the EU has, as per usual, decided to do exactly the opposite of what is useful. Rather than attempt to regulate AI in any way that might prevent us all from dying, they instead are regulating it at the application layer.
Thus, if you don’t want your AI to be regulated, that’s easy, all you have to do is make it general purpose, and you’ll be excluded from the EU’s AI act and can do anything you want because it’s general purpose. AI Now seems to have noticed this, and is saying that perhaps general AIs should not be excluded from regulations and able to abdicate all responsibilities using a standard disclaimer.
Over here in America, Chuck Schumer is proposing focusing on four guardrails.
That won’t help with the existential risks, but as regulations go this seems highly reasonable and plausibly could be a good starting place.
Ezra Klein continues to bring sanity to the New York Times.
All of this seems wise and reasonable.
Jason Crawford suggests we should clearly define liability for AI-inflicted harms.
This too seems like something that would be sensible when dealing with ordinary small-scale AI harms, and would do absolutely zero good when dealing with existential risks.
If you need to buy liability insurance for any harms done by your AI, who is able to sell you liability insurance in case an existential catastrophe? By definition, neither Warren Buffet nor Lloyds of London nor the US Government can pay.
So either this becomes ‘AI that poses an existential threat is rightfully illegal’ or ‘We have good financial incentives to care about small AI harms, and no incentive to care about larger AI harms’ and we amplify the biggest moral hazard problem in history, making it even worse than the one that exists by default.
Luke Muehlhauser at Open Philanthropy offers 12 tentative ideas for US AI policy, aimed at increasing the likelihood of good outcomes from future transformative AI. After a bunch of caveats, he lists them as:
More details at the link. As Luke notes, Luke hasn’t operationalized all the details of these proposals, done enough investigation of them, or anything like that. Those are next steps.
This does seem like quite a good practical, Overton-window-compatible set of first steps. All twelve steps seem net positive to me.
I’d emphasize #9 in particular, creating a safe harbor from anti-trust law, and I’d also include safe harbor from shareholder lawsuits. A common claim is that acting responsibly is illegal due to anti-trust law and the requirement to maximize shareholder value. I believe such concerns are highly overblown in practice, in the sense that I do not expect a successful ‘you were too worried about safety’ lawsuit to succeed even if it did happen, nor do I expect any anti-trust action to be brought against companies cooperating to ensure everyone doesn’t die.
The problem is that such rhetoric and worry presents a serious practical barrier to action. Whereas if there was official clear permission, even clear official approval, for such measures, this would make things much easier. It is also the best kind of regulation, where you remove regulation that was preventing good action, rather than putting up additional barriers.
Tyler Cowen responds here, attacking as usual any proposal of any form using ludicrous concerns (yes, you would be allowed to bring in a computer from Mexico) and characterizing security features as a ‘call for a complete surveillance society’ and threatening widespread brand damage to a movement for someone even saying such a proposal out loud in an unofficial capacity. Discussion must be ‘stillborn’ and shut down, stat. And calling the risks things that ‘have not even been modeled,’ as another non-sequitur way to continue to insist that we need not treat any AI risk concerns as real.
How do we get from brainstorming ideas on feasible safety regulations to Tyler asking why not a call to ‘shut down all high skill immigration’? This line seems telling, a very heroes-and-villains view of the situation where there are only friends and enemies of progress, and a complete failure to grapple with people like Luke being on the pro-technology, pro-capability, pro-growth side of almost every other issue.
The correct response to early practical proposals is not offering worse ones, it is to offer better ones, or raise real objections, and to grapple with the underlying needs. I am so tired of variations on ‘if you are proposing this reasonable thing, why are you not, if the world is at stake, doing these other crazy obviously evil things that are deeply stupid?’ This is not how one engages in good faith. This is viewing arguments as soldiers on a battlefield.
That does not mean there are no good points. Tyler’s point about subsidies is well taken. We should absolutely stop actively encouraging the training of larger models, on that we can all agree.
I also think Tyler’s note about #11 being the wrong way to do liability could be right, but will wait to hear the promised additional details. Certainly some damage must be the provider’s fault, other damage must not be, so how do you draw the line?
(I am continuing to hold off responding to a variety of other Hanson and Cowen posts until I figure out how to best productively engage, and ideally until we can have discussions.)
Newsweek post from expert in bio security warns that our AI security is inadequate. Doesn’t offer actionable suggestions beyond the obvious, seems focused on tangible short-term harms, included for completeness.
The Good Fight
How would one push back against ChatGPT and AI, if one wanted to?
Barry Diller, veteran of trying to get Something Done about Google, sets his sights on AI in defense of the publishing industry, which he says it ‘threatens to obliterate.’ Not exactly the top of my worry list, but sure, what have you got? He thinks a pause is impossible, since getting people to agree to things is an unknown technology and never works, and instead suggests:
So yes, it sounds like someone is going to be the bastard that sues the other bastards.
Also, one of those bastards could be Elon Musk?
Be right back. Grabbing popcorn.
Environmental Regulation Helps Save Environment
It’s happening.
Not to oversimplify, but a datacenter is a building that contains a bunch of computers. If you want to build that in the middle of a city, or any particular special location, perhaps I can see that being an issue. Instead, this seems like ‘somewhere in the middle of America, build a building where we can put a bunch of computers’ is facing over a year of delays due to environmental regulations. There is literally nowhere they can simply build a building that would have what it takes to store a bunch of computers.
That is kind of terrible. It is also kind of a ‘one-time’ delay, in the sense that we get to be permanently lagged by a year and a half on this until things stabilize, but that only has a dramatic impact during an exponential rise in needs, and the delay doesn’t then get longer.
Also I find this rather hard to believe. We are bottlenecked not on chips but on buildings to put the chips into? I was told by many that there was a clear chip shortage. Also Microsoft is working on making its own chips.
The Quest for AI Safety Funding
There’s a new common application. The game theory is on and I for one am here for it.
Full EA forum post here. Application deadline (this time anyway) is May 17, 2023.
My experience from SFF was indeed that if you had an AI Safety project your chances of getting funded were quite good.
If you are already seeking EA-branded funding at all for your project, this is presumably a very good idea, and you should do this. Hell, this makes me tempted to throw out an application that is literally: This is Zvi Mowshowitz, my blog itself is funded but if you fund me I will hire engineers at generous salaries to try out things and teach me things and build demonstration projects and investigate questions and other neat stuff like that, maybe commission a new virtual world for LLM agents to take over in various ways, and otherwise scale up operations as seems best, so if you want to throw money at that I’m going to give you that option but absolutely no pressure anyone.
As in, make that 50%+ of the entire application and see what happens, cause sure, why not? Should I do it?
People Would Like a Better Explanation of Why People Are Worried That AI Might Kill Everyone
Some of those people want this explanation for themselves. Others want the explanation to exist so it can be given to others.
David Chalmers seeks a canonical source.
There were a number of excuses given for why we haven’t definitively done better than Bostrom’s Superintelligence.
or:
I understand all these problems and excuses. I still think that’s what they are. Excuses.
Tyler Cowen is fond of saying ‘given the stakes’ when criticizing people who failed to do whatever esoteric thing he’s talking about that particular day. This can certainly be obnoxious and unreasonable. Here, I think it applies. We need a canonical explanation, at every level of complexity, that can adjust to what someone’s objections might be, and also adjust to which premises they already know and accept, and which ones are their true objections or require further explanation.
Is this easy? No. Does it need to be done? Is it worth doing? Yes.
It is approaching the ‘if you want it done right, you got to do it yourself’ stage.
Many of these discussions were triggered by this:
This is indeed a key problem. As I keep saying, the ruin result is highly robust. When you introduce a more intelligent and more capable source of optimization than humans into the world, you should expect it to determine how to configure the matter rather than us continuing to decide how to configure the matter. Most configurations of the matter do not involve us being alive.
The tricky part is finding a plausible way for this not to happen.
Yet most people take the position of ‘everything will by default work out unless you can prove a particular scenario.’ You can respond with ‘it’s not about a particular scenario’ but then they say it is impossible unless you give them an example, after which they treat you as saying that example will definitely happen, and that finding one step to disbelieve means they get to stop noticing we’re all going to die.
Matt Yglesias points out the obvious, which is that it does take much worry about AI to realize that training a more powerful core AI model has larger risks and downsides to consider than, say, putting a roof deck on a house. Regulation is clearly quite out of whack.
I assume that AI accelerationists, who say there is nothing to worry about, mostly agree with this point – they think we are crazy to put so many barriers around roof decks, and also everything else involving atoms.
Alas, this then makes them even more eager to push forward with AI despite its obvious dangers, because the otherwise sensible alternative methods of growth and improvement have been closed off, so (in their view) either we roll the dice on AI or we lose anyway.
Which makes it that much more important to let people build houses where people want to live, approve green energy projects, make it viable to ship spices from one American port to another American port and otherwise free us up in other ways. A world where everything else wasn’t strangled to death will be much better able to handle the choices ahead.
Seb Krier has thoughts on what might be helpful for safety.
A lot of that is ‘do what you’re doing, but do it properly and well, with good versions.’
With a side of ‘tell it to the people who matter.’
That’s always easier said than done. Usually still good advice. It is still good to point out where the low hanging fruit is.
In particular, #1 and #2, a combination of a well-organized hyper-linked map of key arguments, and a basic FAQ for people coming in early, seems like something people keep talking about and no one has done a decent job with. A response suggests Stampy.ai for #2.
For #3, I notice it’s a case of ‘what exactly is a comprehensive research agenda?’ in context. We don’t know how this would work, not really.
For #4, agreed that would all be great, except that I continue to wonder what it would mean to have detailed models or cost-benefit analyses, and we are confused on what policies to propose. I get the instinct to want models and cost-benefit and I’d love to provide them, but in the context of existential risks I have no idea how to usefully satisfy such requests, even if I keep working on the problem anyway.
Vibes are not evidence. People arguing using vibes may or may not be evidence, since people will often use vibe arguments in favor of true things, and if there is a clear vibe people will reason from it no matter the underlying truth. However, if people keep falling back on vibes more than one would expect, that does become evidence…
Quiet Speculations
This two-axis model of AI acceleration and risk from Samuel Hammond seems spot on in its central claim, then has quite the interesting speculation.
Building bigger and stronger models is expensive. It is centralized. It offers great rewards if it turns out well. It also puts us all at risk. Accelerating other AI capabilities lets us reap the rewards from our models, without net incurring much additional risk. It can even lower risk by showing us the dangers of current core models in time to prevent or modify future more powerful core models.
It can also accelerate them, if it ties AI systems more into everything and makes us dependent on them, potentially increasing the damage an AI system could do at a given capabilities or core model level.
The other danger is that increasing capabilities increases funding and demand for future more powerful core models.
In the past, I think this danger clearly overrode other considerations. If you were building stuff that mattered at scale using AI, you were making things worse, almost no matter what it was.
Now, with the new levels of existing hype and investment and attention, it is no longer obvious that this effect is important enough to dominate. In at least some places, such as AutoGPTs, I am coming around to the need to find out what damage can be done dominating the calculus.
In addition to the conceptualization, the new thing here that I hadn’t considered is the interaction of the axes and regulatory scrutiny. As Samuel points out, by default our regulatory actions will be exactly wrong, slowing down horizontal practical progress while not stopping the training of more powerful models. There are a bunch of reasonable regulatory proposals starting to develop, yet they are mostly still aimed at the wrong problem.
However, if any new model will get plugged into a bunch of existing practical systems, then releasing that model suddenly endangers all those systems. In turn, that means the regulatory state becomes interested. Could this perhaps be a big deal?
AI medical diagnostic systems seem to be severely underperforming compared to the progress one might have expected. I continue to be surprised at our lack of progress here, which we can compare to the sudden rapid advances in LLMs. Why aren’t such systems doing much better? The thread seems to think the answer is, essentially, ‘these systems do not do well out of sample and aren’t good at the messiness you see in the real world’ but the value on offer here is immense and all these problems seem eminently fixable with more work.
What Google was accomplishing was not making things worse.
This is, I suppose, the classic ‘overhang’ argument once again. That by not shipping, Google didn’t give us time to prepare.
I don’t see how this can be right. What Google mostly didn’t do was accelerate capabilities development, as evidenced by almost all efforts now going into capabilities with orders of magnitude more funding and effort and attention, whereas the actual necessary work for us all to not die lags far behind.
I suppose one can use the full ‘overhang’ argument where if everything happens later then everything happens quicker and is even worse, on the theory that the limiting factors are time-limited and can’t be accelerated? Except that is so clearly not how capitalism or innovation actually works.
I agree that Google did not ship in large part because Google is not in the habit of shipping things and has ordinary barriers stopping it from shipping. That doesn’t make it less good to not ship. Either shipping is net helpful or unhelpful, no matter the mechanisms causing it – I am deeply happy to take ‘ordinary incompetence’ as the reason harmful things don’t happen. We can all think of many such cases.
This points to my concept of Asymmetric Justice. In most people’s default method of evaluation, you get credit for good effects only if you intended them, whereas you are responsible for all bad effects. Which means that anything ‘framed as an action’ becomes net bad.
What’s strange about the AI debate is the turning this on its head, which I only now just realized fully. The ‘action’ here is no longer training large models that might kill someone and pushing capabilities, the ‘action’ here is not training large models or delaying releases, or choosing not to delay capabilities. The ‘default world’ that you are blameless for is charging full speed ahead into the AI future, so suddenly if you suggest not risking everyone dying then you have to answer for everything.
How did they pull off that trick? Can we reverse it? Can we make people see that the action that requires justification is creating new intelligent systems with capabilities that potentially exceed human ones, rather than the ‘action’ being a ‘regulation’ to prevent this from happening?
And yes, in general this anti-action bias is harmful. Still seems worth using it here.
Liron Shapira suggests:
There is AI the tech/econ trend. It is quite the important tech and econ trend, the most important in a very long time, even considered only on those terms. It is also an intelligent threat actor, a new source of intelligence and optimization pressure that is not human. That is a fundamentally different thing from all previous technologies.
I do not think this is true, while noting that if true, we are all super dead.
OpenAI’s attitude towards AI safety and AI NotKillEveryoneism is irresponsible and inadequate. Meta’s attitude towards AI safety and AI NotKillEveryoneism is that they are against them. So they do the worst possible things, like open sourcing raw models.
The Third Non-Chimpanzee
In a post primarily about linking to the ‘Eight Things To Know About LLMs,’ Alex Tabarrok lays out a ubiquitous false dichotomy unusually clearly, which is great:
In the model Alex is laying out, the AI is either
Except there’s also something in the very normal, pedestrian realm of:
Even if we ‘solve the alignment problem’ at essentially zero cost, if our understanding catches up to our capabilities, competition between humans (and also some people’s ethical considerations) will see AIs set loose with various agendas, and that will be that.
This is why I say the result of our losing control over the future is robust. Why should it be otherwise? The future already belongs to the most powerful available intelligent optimization processes. That’s going to be a highly difficult thing to change.
People Are Worried AI Might Kill Everyone
Made it to the cover for the Financial Times, as well as at least briefly being the #1 story on the website.
We now have a non-paywalled version. As Daniel Eth notes, stories about existential risk from AI have legs, people pay attention to them.
If you are reading this, I doubt the post contains anything that is new to you, beyond learning its tactical decisions and rhetorical beats.
A central focus is to notice that what people are attempting to build, or cause to exist, is better called God-Like AI (GLI?) than simple AGI (artificial general intelligence) or even ASI (artificial superintelligence). The problem is when people hear AGI, they think of a digital human, perhaps with some modest advantages, and all their ‘this is fine’ instincts get kicked into gear. If they hear ASI, there’s a lot of ‘well, how super, really’ so that’s better but it’s not getting the full job done.
The suggestions included are standard incremental things, such as calls for international coordination and democratic attention, correctly framing such actions as highly precedented, and the current situation as posing large and existential risks, if not right away then potentially soon.
Connor Leahy says he expects my #19 prediction from my AutoGPT post to happen. As a reminder, that was:
This is definitely a prediction where we very much want to notice if it comes true. Ideally, we would have a clear understanding of where the goalposts are now, and we would say in advance that if indeed the goalposts are moved in this way, it would be quite an important fire alarm.
Paul Graham has no idea how worried to be, in probabilistic terms, or what to think.
Rob Bensinger offers The Basic Reasons I Expect AGI Ruin. Here are the core points.
Eliezer explains why, if you ask an entity to learn to code, it might respond by learning general complex reasoning, the same way humans learned general complex reasoning, and it might plausibly do it much more efficiently than one can learn that trick from regular human text that doesn’t have as strong logical structure. It seems like a highly plausible hypothesis.
An implication is that perhaps one can turn a knob on ‘general reasoning ability’ that is distinct from other knobs of LLM capabilities, which would be interesting.
Oh, Elon
A few weeks ago Elon called for a pause in training more powerful AI models. Now he’s going to, once again, do the worst possible thing and create another AI lab. He sees the risk of civilizational destruction, so he’s going to do the thing most likely to lead to civilizational destruction.
Thus: Reuters: Elon Musk says he will launch rival to Microsoft-backed ChatGPT.
I will leave ‘why a maximally truth-seeking AI would not be all that aligned with human interests’ as an exercise to the reader, and also something about how truth-seeking-aligned is the new Twitter and a trick called solving for the equilibrium.
What kills me about this is that it substantially increases our probability of all dying. What symbolically kills me is how many times does this need to happen?
Except instead of standards it is teams racing to get everyone killed.
Elon Musk directly created OpenAI because he was upset with DeepMind. Then those upset with OpenAI created Anthropic. Now… ‘TruthGPT’?
Sigh.
Other People Are Not Worried AI Might Kill Everyone
For example Yann LeCun, whose statements Eliezer presents without comment.
Not only are Bryne Hobbart and Tobias Hubert not worried about AI killing everyone, they are worried that ‘safetyism has become one of the biggest existential risks to humanity’ – not developing intelligences smarter than humans would, after all, be the real risk. That’s not to say they don’t ‘bring the receipts’ on the many ways in which our society has gone completely bonkers in the name of ‘safety,’ including the short-term risks of AI, on issues like genetic engineering and nuclear power. The take here on gain of function research is confusing, I can’t tell whether they’re against it because it causes pandemics or for it because it is supposed to be preventing pandemics and being worried about it is ‘safetyism.’ As usual with such tracts, there is no engagement with what it would mean to develop intelligences, or any thought on whether the warnings do or don’t have merit, simply a cultural ‘people who say new tech is scary are bad’ take.
OpenAI Is Not Currently Training GPT-5
Sam Altman gave us some important news this week.
One could almost say they were… pausing?
They are very much not holding off on GPT-5 due to the pause letter, or due to any related safety concerns. They are holding off because they’re not ready to train GPT-5 in a way that would be useful.
Which is, indeed, exactly what I was saying in my reaction to the pause letter.
A six month pause in training now, for OpenAI won’t push back long term capabilities developments six months. It will delay them approximately zero months.
Meanwhile, no doubt, Google is training their would-be GPT4-killer, whatever it is, while Anthropic prepares to start training theirs.
Sam Altman also responded to the open letter, saying some aspects of it were good and he agrees with them, while others miss the mark. The biggest thing that missed the mark was exactly that they’re already not training GPT-5, and indeed no longer believe that Scale Is All You Need – it’s time to train smarter, he says, not larger.
His essential answer on safety is that safety is important, he’s doing lots of safety work every time they deploy anything, including stuff the people writing the letter might not even know about. The disagreement is he doesn’t see the pause tactic as useful.
That certainly could be true right now, that there is no need or meaningful and useful way to pause. In which case, perfect, let’s use that time to figure out future plans, including when in the future to call for a pause.
If size does not matter as much going forward, what does that mean? If it means progress slows on base models, then that’s excellent news. If it means that progress stays the same but shifts to algorithmic improvements, that’s good if we can keep secret those improvements, and disastrous if we can’t because it makes it much more difficult to keep things under wraps or slow them down in the future. We are kind of counting on scale to act as an enforcement enabler, here.
People Are Worried About China
I continue to not worry much about China in the context of AI. I worry far more about those saying ‘China’ as a boogeyman against anyone who would avoid putting the planet at risk. For those who say the Chinese are an enemy that cannot know danger, cannot be reasoned with, would not cooperate even its own interest, I continue to be confused as to why we think these things.
This thread and post talk about the current chip capability outlook for China. For cutting edge chips, China is in a bad spot. For trailing less cutting edge chips, they are doing reasonably well.
Then we saw this release of draft ‘generative AI’ guidelines (direct in Chinese). It is not the law of the land, it is open for comments, anything could happen. It still is our best indication of where such rules might be headed. At minimum, anything here is well within the Chinese Overton window.
So, does this look like the actions of a country unwilling to consider slowing down?
Since Chinese auto-translation is not so reliable in its details, I’m going to go with Mitchell’s description, which roughly matches the translation.
You can say that none of this represents China being concerned for existential risk, and you’d be right. You can say that the primary motivation is the ideological purity of China and any information circulating in China, and you’d be right again. I still say that this reveals a situation in which China has its own reasons to want to slow down AI, in addition to the fact that China is losing the economic competition, and that their primary concern is the wrong AI would be bad rather than them hoping for something actively good.
All of that points, once again, to an eager partner in a negotiation. While you were saying they would defect, they were getting ready to take one for the team. Who cares if you play for a different team than they do?
Bad AI NotKillEveryoneism Takes Bingo Card
Many don’t know it exists, so makes sense to share the bingo card. Use freely.
By their nature these are very short versions of all the replies. For some people, the short version will work. For others, one must treat it as only a sketch of one potential response. These aren’t all ‘the way I would respond’ at all, still a good attempt.
The Lighter Side
“Rapid progress in primate intelligence over the past few millennia…has led to growing concern about the risks humans pose to mouse wellbeing. In particular, it is not clear how to ensure that human superintelligence, if it is ever achieved, stays aligned w/ mouse values.”
There has never been a clearer case of “laugh while you can, monkey boy.”
One meme, endless fun and variations. Oh, there’s so many more.