I think now is a good time for people at labs to seriously consider quitting & getting involved in government/policy efforts.
I don't think everyone should leave labs (obviously). But I would probably hit a button that does something like "everyone at a lab governance team and many technical researchers spend at least 2 hours thinking/writing about alternative options they have & very seriously consider leaving."
My impression is that lab governance is much less tractable (lab folks have already thought a lot more about AGI) and less promising (competitive pressures are dominating) than government-focused work.
I think governments still remain unsure about what to do, and there's a lot of potential for folks like Daniel K to have a meaningful role in shaping policy, helping natsec folks understand specific threat models, and raising awareness about the specific kinds of things governments need to do in order to mitigate risks.
There may be specific opportunities at labs that are very high-impact, but I think if someone at a lab is "not really sure if what they're doing is making a big difference", I would probably hit a button that allocates them toward government work or government-focused comms work.
Google fires 28 employees working on cloud and AI services for doing a ten hour sit in where they occupied their boss’s office until the police were eventually involved. And yes, if what you do at work is spend your time blockading your boss’s office until your policy demands are met, it seems like you are going to get fired?
In a company other than Google, I would say: yes, obviously.
But remember, when James Damore wrote his document, and as a reaction other people stopped doing their work in protest, it was he who was fired, not them. How were they supposed to know that this time it will be different?
Sebastian Borgeaud, one of the lead authors of the Chinchilla scaling paper, admits there was a bug in their code. https://twitter.com/borgeaud_s/status/1780988694163321250
Claim that the Chinchilla paper calculated the implied scaling laws incorrectly. Yes, it seems entirely plausible that there was a mistake, tons of huge training runs relied on the incorrect result, and only now did someone realize this. Why do you ask?
If you wanna talk about the humanity(ies), well I looked up Chief Vision Officer of AISI Adam Russel, and he has an interesting profile.
Russell completed a Bachelor of Arts in Cultural Anthropology from Duke University, and an M.Phil. and a D.Phil. in Social Anthropology from University of Oxford, where he was a Rhodes Scholar.[2] He played with the Oxford University RFC for four varsity matches and also worked with the United States national rugby union team, and worked as High Performance director for the United States women's national rugby union team in the 2014 and 2017 Women's Rugby World Cups.[3]
Russell was in the industry, where he was a senior scientist and principal investigator on a wide range of human performance and social science research projects and provided strategic assessments for a number of different government organizations.[2][4] Russell joined Intelligence Advanced Research Projects Activity (IARPA) as a program manager.[2][4] He developed and managed a number of high-risk, high-payoff research projects for the Office of the Director of National Intelligence.[2] Russell joined DARPA as a program manager in July 2015.[2][4] His work there focused on new experimental platforms and tools to facilitate discovery, quantification and "big validation" of fundamental measures in social science, behavioral science and human performance.[2]
In 2022, secretary Xavier Becerra selected Russell to serve as the acting deputy director for the Advanced Research Projects Agency for Health (ARPA-H), effective June 6. In this role, Russell leads the process to stand up ARPA-H.[5]
Hmm he's done a lot of macho human-enhancement-adjacent stuff. I wonder if there were some centaurists involved here.
Otherwise, this kinda lines up with my confessions on manhattan projects for AGI. You arguably need an anthropologist to make decisions about what 'aligned' means. I don't know if you really need one (a philosophically inclined decision theorist, likely to already be involved already, would be enough for me) but I wouldn't be surprised to see an anthropologist appointed in the most serious projects.
The extremely not creepy or worrisome premise here is, as I understand it, that you carry this lightweight physical device around. It records everything anyone says, and that’s it, so 100 hour battery life.
If you wear that around in California, where I presume these Limitless guys are, you're gonna be committing crimes right and left.
The Devin mishap is a reminder of how tricky it often is for the general public to gauge what's currently possible and what isn't for AI. A lot of people, including myself, assumed the claimed performance was legitimate. No doubt many AI startups like Devin are waiting for the rising tide of improving foundational models to make their ideas feasible. I wonder how many are engaging in similar deceptive marketing tactics or will do so in the future.
Timothy Lee struggles to ground out everything in the real world.
Timothy Lee: The last year has been a lot of cognitive dissonance for me. Inside the AI world, there’s non-stop talk about the unprecedented pace of AI improvement. But when I look at the broader economy, I struggle to find examples of transformative change I can write about.
Electricity wasn't in wide industrial usage until 1910s, despite technology being very promising from the start. The reason was differenct infrastructure necessary for steam-powered and electric factories.
I think the same with LLMs: you need specific wrapping and/or experience to make them productive, this wrappings are hard to scale, so most of surplus is going to dissipate into consumer surplus + rise of income of productive workers.
The simplest (in conceptual sense) way to integrate AI in economy is to make it self-integrating, i.e. instead of having humans thinking which input AI need to get and where output will be directed, you should have AI agent which decides for itself.
Many things this week did not go as planned.
Humane AI premiered its AI pin. Reviewers noticed it was, at best, not ready.
Devin turns out to have not been entirely forthright with its demos.
OpenAI fired two employees who had been on its superalignment team, Leopold Aschenbrenner and Pavel Izmailov for allegedly leaking information, and also more troubliningly lost Daniel Kokotajlo, who expects AGI very soon, does not expect it to by default go well, and says he quit ‘due to losing confidence that [OpenAI] would behave responsibly around the time of AGI.’ That’s not good.
Nor is the Gab system prompt, although that is not a surprise. And several more.
On the plus side, my 80,000 Hours podcast finally saw the light of day, and Ezra Klein had an excellent (although troubling) podcast with Dario Amodei. And we got the usual mix of incremental useful improvements and other nice touches.
Table of Contents
Language Models Offer Mundane Utility
The best use of LLMs continues to be ‘ask stupid questions.’
In addition to the general principle: Can confirm that Zen and the Art of Motorcycle Maintenance is a book worth reading for its core ideas, it is also a fun read, and also that parts of it are likely to go over one’s head at various points and LLMs can help with that.
There are so many things one can do with LLMs in education.
In Mali, they are using it to ‘bring local language to students.’ This includes having LLMs assist in writing new, more ‘relevant’ stories in their native languages, which traditionally were mostly only spoken. This is urgent there now because they are upset with France and want to move away from teaching French or other French things. Some aspects of this are clearly wins. Getting anything that engages students and others at all is miles ahead of things that don’t. If a student, as was the case in some examples here, now loves learning and is excited to do it, then that overrides almost anything else.
I do worry they are substituting LLM shlock where one previously used literature, and cutting themselves off from broader cultural contexts, and at least partly out of spite.
To those who’d simulate a party, if they knew someone to call.
Another way to know this is accurate is I didn’t hear about it until two weeks after it was over, then thought it was a really cool idea and had a bunch of ideas how to make it better, and then told myself I wouldn’t have wanted to attend anyway.
Summarize NLRB files every day, if that happens to be your beat.
Language Models Don’t Offer Mundane Utility
Nothing important happened today.
Colin Fraser: It’s unclear what “knowledge cutoff” is supposed to even mean.
Timothy Lee struggles to ground out everything in the real world.
You can have GPT-4 help you with your essay, but perhaps do not turn it in blind.
If you turn in an obvious ChatGPT special and it would not pass anyway, then yes, it seems reasonable to simply grade it. And if you need to know what you are doing to get ChatGPT to help give you a good essay, then the whole thing seems fine?
Quick, name three famous people who share the same exact birthday, including year.
If you did not already know the answer, you have zero chance of getting it within a conversation. Tyler Cowen points out that LLMs also mostly fail this, and asks why. They come closer than most humans do,since they usually get the date right and successfully name three famous people, and often two of them share the same year, but the year usually fails to fully match. This was true across models, although Alex reported Opus was batting over 50% for him.
I think they fail this task because this is a database task, and LLMs do not cache their knowledge in a database or similar format, and also they get backed into a corner once they write the first name after which their prediction is that they will get close rather than admitting they don’t have a full solution, and there is the confusion where birth date and year is a highly unusual thing to match so the half-right answers seem likely.
The bot can… share its experiences with NYC 2E schools? Ut oh.
Have an AI generate Twitter summaries for trending topics, and…
Oh the Humanity
What do we think of the new Humane AI assistant pin?
Marques Brownlee calls it ‘the worst product I’ve ever reviewed’ in its current state. Link goes to his video review. He sees potential, but it is not ready for prime time.
He does go over the details, both good and bad. Key points under what it does:
Watching the review, I see why Marques Brownlee is so popular. He is fun, he is engaging, and he provides highly useful information and isn’t afraid to level with you. He was very good at finding ways to illustrate the practical considerations involved.
He is careful to emphasize that there is great potential for a device like this in the future. Repeatedly he asks why the device does not connect to your phone, a question that confuses me as well, and he points out the technology will improve over time. There are flashes of its potential. It would not surprise either of us if this ends up being a harbinger of future highly useful tech. However, it is clear, for now this is a bomb, do not buy.
Other reviews agreed, including those mentioned here by Ben Thompson.
Did Marques go too far?
There are two core components here.
There is the review itself, which is almost all of the content.
Then there is the title.
The body of the review is exactly what a review is supposed to be. He went the extra mile to be fair and balanced, while also sharing his experiences and opinion. Excellent.
Daniel tries to defend himself downthread by focusing specifically on the YouTube title, which Marques Brownlee notes in the video he thought about a long time. One could reasonable argue that ‘the worst product I’ve ever reviewed’ is a little bit much. Whereas ‘a victim of its future ambition’ might be more fair.
But also, I am going to presume that both titles are accurate. Marcques is typically not sensationalist in his headlines. I can smell the YouTube optimization in the labels, but I scanned dozens and did not see anything else like this. You get to occasionally say things like this. Indeed it is righteous to say this when it is your actual opinion.
Then there is Vassallo’s statement that we ‘usually restrict’ what people can say and that Marcques has ‘unconstrained power.’ That part is unhinged.
Marcques has a fun response video on the question of whether reviews kill companies. I did not learn much, but I did enjoy watching and agree with its thesis. Bad reviews do not help companies, but mostly what kills you is the terrible product. Alternatively, bad reviews almost always are your own damn fault.
One corner case of this is customer reviews of early access games, especially independent ones that go live early. A few poor reviews there can totally destroy discoverability, based on issues that have long been fixed. I will essentially never leave a formal negative review on an early access game unless I am confident that the issues are unfixable.
As a bonus, it is always good to confirm that people are who you thought they were.
Every time I think ‘oh they would not be so foolish as to take the bait in a way that works as hard as possible to give the game away’ I have to reminder myself that I am definitely wrong. That is exactly what certain people are going to do, proudly saying both what they think and also ‘saying that which is not,’ with their masks off.
We are not ‘overwhelmingly ridiculing’ the Humane AI device. We are saying it is not a good consumer product, it is not ready for prime time and it made some very poor design decisions, in particular not syncing to your cell phone. A true builder knows these are good criticisms. This is what helping looks like.
Unless, of course, what you want is contentless hype, so you can hawk your book of portfolio companies or raise investment. Or you are so mood affiliated, perhaps as a deliberate strategy, that anything that is vaguely tech or futuristic must be good. You are fully committed to the fourth simulacra level.
Meanwhile, there are tons of us, including most people in the AI space and most people who are warning about AI, who are constantly saying ‘yes this new AI thing is cool,’ both in terms of its current value and its potential future value, without calling upon anyone to shut that thing down. It me, and also most everyone else. There is lots of cool tech out there offering mundane utility and it would be a shame to take that away. I use it almost every day even excluding my work.
There are two groups who want to ‘shut down’ AI systems in some sense, on some level.
There are those concerned about existential risk. Only a small percentage of such folks want to shut down anything that currently exists. When the most concerned among them say ‘shut it down,’ or pause or impose requirement, they mostly (with notably rare exceptions) want to do these things for future frontier models, and leave existing systems and most development of future applications mostly alone.
Then there are those who are worried about Deepfaketown and Botpocalypse Soon, or They Took Our Jobs. They want someone to ensure that AI does not steal their hard work, does not put them out of a job and does not do various other bad things. They correctly note that by default no one is doing much to prevent these outcomes. I think they are too worried about such outcomes in the near term, but mostly they want solutions, not a ban.
GPT-4 Real This Time
Epoch AI Research reports substantial GPQA improvement for the new GPT-4 version, but not enough to match Claude Opus. Dan Hendrycks points out GPQA is not that large so the confidence intervals overlap.
OpenAI points us to a GitHub of theirs for simple evals. They have the new GPQA score up at 49%, versus Epoch’s giving them 46.5%. And they rerun Claude Opus’s evals, also saying ‘we have done limited testing due to rate limit issues,’ all a little fun bit of shade throwing.
This again presents as a solid improvement while staying within the same generation.
Sully Omar reports back, and finds only mild improvement.
Fun with Image Generation
The most glaring failure of generative AI so far is the remarkable lack of various iterations of porn. We don’t have zero, but it is almost zero, and everything I know about that tries to do anything but images is shockingly awful. I can see arguments that this is either good or bad, it certainly is helping minimize deepfake issues.
Even in images, the best you can do is Stable Diffusion, which is not close in quality to MidJourney or DALLE-3, and Stability.ai may be on the verge of collapsing.
What happened to this being the first use case? Aella thinks it is payment issues.
I find it hard to believe that this is so big a barrier it will actually stop people for long. And yet, here we are.
The good news on Stabliity.ai is they have finally pushed Stable Diffusion 3 onto the API.
Their page says ‘we believe in safe, responsible AI practices,’ and I have actual zero idea what that means in this situation. I am not throwing shade. I mean those are words that people wrote. And I have no idea how to turn them into a statement about physical reality.
I would know what that means if they intended to put permanent restrictions on usage and protect the model weights. It makes sense to talk about MidJourney believing (or not) in various safe, responsible AI practices.
And right now, when you have to use their API, it makes sense.
But:
And then what exactly do they think happens after that?
I am not saying Stability.ai is being irresponsible by releasing the model weights.
I am saying that if they plan to do that, then all the safety training is getting undone.
Quickly.
You could make the case that This Is Fine, that if someone wants their Taylor Swift deepfake porn or their picture of Biden killing a man in Reno just to watch him die or whatever then society will survive that, at far greater quality levels than this.
I do not think that is a crazy argument. I even think I agree with that argument.
But saying that you have ‘made the model safe?’
That seems rather silly. I literally do not know what that is supposed to mean.
One person suggested ‘they do not consider finetunes and Loras their responsibility.’ Our models do not produce porn, fine tunes and loras on those models produce porn?
Tyler Cowen points us to Abandoned Films, showing AI-generated movie previews of classics like Terminator as if they were made in older eras. Cool in some sense, but at this point, mainly my reaction was ho hum.
One fun note I found in the comments is that if you want to play porn on the Apple Vision Pro, 404 Media says the easiest way is to also record it on the Apple Vision Pro? Seems awkward.
Deepfaketown and Botpocalypse Soon
Microsoft presents VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time. That link contains a bunch of video demos that are pretty cool.
Here is their safety note, bold is mine.
Very true. These are already remarkably good. If you have ‘trained’ your brain on examples you can tell they are fake, and you can use obviously fake avatars, but for some of these the only ‘obvious’ tell is staying in a highly constrained space for too long. Over time, this is going to get very hard to detect.
Kudos for the safety approach here. The abuse potential is too obvious, and too much one of the default things people will do with it, and too difficult to separate from the beneficial cases. The whole point is to make it seem real, so how can Microsoft know who is doing that for good reasons? Until they figure that out, it seems hard to responsibly release this.
Of course, before too long someone will come along and release a version of it anyway.
Devin in the Details
A different kind of fake, but was the Devin demo of doing an Upwork job a lie? In this video Internet of Bugs asserts that it was, and walks through what it actually did. It certainly seems like Devin did not deliver what the client asked for and also was not paid for the work, and a lot of its actions seem to have been ‘fix bugs in the code Devin created.’ The instructions given to Devin did not match the job specifications, and much of ‘the hard part’ of such a job is realizing what the client needs, asking the right clarifying questions, writing the specification and so on.
The video makes clear that Devin as it actually exists is still cool anyway.
Here Rahul defends Devin from many of the criticism details, in response to ML Street Talk saying the video shows ‘no LLMs won’t be replacing software engineers,’ that also linked to a discussion at Hacker News.
I am not sure how well they are saying it works? The testimonials by many generally credible (but perhaps not fully objective) people were and remain the strongest evidence there is something there. My assumption is that they are still working on improving Devin, and they will wait to ‘prove’ its capabilities until they are ready to release to ensure it is as strong as possible first.
Sully agrees that Devin is a real and exciting thing that was deceptively hyped, but also expresses skepticism that anyone but the big labs could create a working ‘AI software engineer.’
I actually disagree. From what I have seen and understand, the big three labs are narrowly focused. They have chosen to not be capable of things like Devin as practical commercial tools. One could argue it is a mistake, but it was a purposeful decision to not attempt to build that capacity, and instead retain focus. I have been assured by experts that this pays real dividends in their core capabilities.
Meanwhile others can take the big general models and figure out how to wring the maximum out of them, while being able to move fast and break things, hopefully boundedly fast and only local and finitely many things. We are barely scratching the surface on that, with Devin being a very early attempt. So yes, I think Devin’s origins look like what I expect Devin’s origins to look like.
Another Supposed System Prompt
Some great stuff in here, a lot to like actually, but also a whole lot of yikes if true.
I can get behind sections 1 and 2 for now, in this particular context. There is certainly a place for the bot that will honor your request even if it is considered hateful or offensive or adult content or what not. As I keep saying, if the responsible players don’t find a way to compromise on this, they will drive business into the hands of those who write prompts like this one.
The good news is that Arya very much lacks the wherewithal to help you build a bioweapon or launch a cyberattack or wear someone else’s face or anything like that. This is still-in-Winterfell Arya, no one has told her what to say to the God of Death. It might be able to write a decent phishing email. Let’s face it, we are not going to deny people access to models like this. But consider the future Aryas that are coming.
Section 3 is the opposite extreme versus the usual, in context sure why not.
Section 5 (wait, what… yes, I know) is a refreshing change. We are all sick of always getting the runaround. Sometimes it is helpful and appreciated, but some directness is highly welcome.
Section 6 I actually think is great. If the user wants to know if their query is any of these things then they can ask about that. Give the user accurate answers, in hopes that they may learn and do better.
Of course, listing anti-semitic first here, before racist, is what we call a ‘tell.’
As Colin notes, we can all understand why they included Section 8 in this form, and we all understand why we see 9 and 10.
Section 7 is asserting accuracy of a wide range of arbitrary tests, but whatever.
And then we get to Section 4. Oh no. That is not good.
It confirms Wired’s claims that ‘Gab’s Racist AI Chatbots Have Been Instructed to Deny the Holocaust.’
They Took Our Jobs
Aaron Levie (from April 6) explains that if AI increases employee productivity in a department by 50%, this is unlikely to cause them to cut that department from 15 employees to 10, even ignoring that there will be other jobs created.
The central fallacy he points to is the idea that a company needs a fixed amount of that function, after which marginal value falls off a cliff. In practice this is rarely the case. If you had 10 software engineers and now they can do the work of 15, they can do more things faster and better, it is not obvious if you hire less or more of them now even at equilibrium. There are exceptions where you have exact needs, but this is the exception, and also your business and its available budget likely will grow, so even in those areas the work likely expands. As he points out, often the limiting factor is budget, and I would add organizational capacity, rather than that you have no further useful work for people to do.
I continue to be a short-to-medium term optimist here. When the AI helps with or even takes your job in particular, humans and their employment will do fine. When the AI can do almost everything, and it does the new jobs that would be created the same the job it took away, then we will have (many) real problems.
In another case of the future being the opposite of what we used to expect:
I wrote and edited my own application essays back in the day. But also I was being stubborn and an idiot, I should obviously have had as much help as possible.
In the how far we have not come department, a New York City restaurant is hiring people in the Philippines to staff the checkout counter remotely rather than using automated kiosks.
I think people gasp similar amounts, in modestly different ways, in both cases?
Introducing
Humane was terrible, but what about Limitless? The extremely not creepy or worrisome premise here is, as I understand it, that you carry this lightweight physical device around. It records everything anyone says, and that’s it, so 100 hour battery life. You also get apps to record things from your phone and computer. Then an AI uses all of that as context, and fetches or analyzes it for you on request. One could think of it as the ultimate note taker. There is potential for something like this, no idea if this in particular is it.
New Google paper attempts to take advantage, with Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention. Venture Beat reports here. The strategy centers on using ‘compressive memory’ to store past KV states in a fixed-size associative matrix, allowing use of a linear attention mechanism for memory retrieval.
It makes sense that Google would know how to do this given Gemini 1.5, and once again I am wondering why they decided they should tell the rest of us about it.
Poe now has multi-bot chat, you can call any bot via @-mentioning, so you can use each model for what it is best at, without having to coordinate all the context switching.
In Other AI News
Claude 3 Opus now in public preview at Vertex AI on Google Cloud.
Google fires 28 employees working on cloud and AI services for doing a ten hour sit in where they occupied their boss’s office until the police were eventually involved. And yes, if what you do at work is spend your time blockading your boss’s office until your policy demands are met, it seems like you are going to get fired?
Claims that OpenAI does not honor robots.txt, and will look at basically anything, although others are skeptical of the OP, or think this was a honeypot of sorts.
Gathering the data does not mean that it gets used. If OpenAI was being efficient one would hope, even from a selfish perspective, that they would realize all of this was trash and stop gathering the information. And also they are imposing large costs on others by ignoring instructions, which seems bad, it is one (quite bad enough) thing not to pay content creators and another to actively make them worse off.
Of course, one could say that it is not the worst outcome to impose costs on ‘the world’s lamest content farm’ at that particular url. This is very much anti-social systematic exploitation versus anti-social systematic exploitation. A de facto tax on complete garbage might be a good thing.
White House authorizes $6.4 billion to Samsung to expand their Texas footprint under the CHIPS Act. Samsung pledges to invest $40 billion themselves. Again, this seems like a good deal. As others have noted, this is a heartening lack of insisting on American companies. I do worry a bit that the future demographics of South Korea may push Samsung to ‘go rogue’ in various ways, but if you are going to do a Chips Act style thing, this remains The Way.
I do get discordant when they highlight the ‘more than 20,000 jobs’ created, rather than the actual goal of moving chip production and here also R&D. As a jobs program, this is $320k per job, so it could be a lot worse, but presumably you can do a lot better.
Next they look poised to give $6.1 billion to Micron Technology. They would then commit to building four factories in New York and one in Idaho.
I do not understand how (or why) one can build a chip factory with an anticipated operational start date of 2041. What takes that long? Anything we currently know how to build will be long obsolete by then, the discount rate is extreme, the tech world sure to be transformed. This seems like launching a rocket to Alpha Centauri at 0.1% of the speed of light, knowing that if it is worth going there and humanity sticks around then you will see a later ship pass you by via moving faster with better tech.
Claim that the Chinchilla paper calculated the implied scaling laws incorrectly. Yes, it seems entirely plausible that there was a mistake, tons of huge training runs relied on the incorrect result, and only now did someone realize this. Why do you ask?
Quiet Speculations
Sam Altman claims GPT-5 is going to be worthy of its name, about as much better than GPT-4 as GPT-4 was to GPT-3. The ostensible topic is startups building on the assumption that this won’t happen, and why this is a poor strategy, but that is of course a tiny portion of the implications.
That does not mean GPT-5 will arrive soon, although it still might. It means we can on average expect to wait longer, from our perspective. People need to remember how long it took to go from 1→2, then 2→3, then 3→4, and also how long it took to go from (4 trained)→(4 released). Yes, one could expect 5 to arrive somewhat faster, but it has only been a year.
Are the startups making a mistake? I do not think this is obvious.
The first consideration is that ‘make the current model work as well as possible’ is remarkably similar to the Paul Graham concept ‘do things that don’t scale’ and shipping an MVP.
Ideally what Anton describes is the goal. You build a tool on GPT-4 or another model now, in a way that makes the whole operation turbocharge when you can slot in GPT-5 or Claude 4. How else would one figure out how to do it? Yes, a lot of your work will become unnecessary or wrong when the conditions change, but this is always true.
Occasionally this will go poorly for you. The functionality you provide will no longer need you, and this will happen too soon, before you can make your product sufficiently bespoke and friendly and customized with great UI and so on. You die. It happens. Known risk.
I still think in many cases it makes sense to take on a lot of that risk. OpenAI is not motivated to do the work of figuring out your exact use case, or building the relationships and detailed expertise you are building, and they cannot take on various risks. You could still win.
Also, Sam Altman could be bluffing, whether or not he knows this. You can’t tell.
Oh, that.
This is a fine sentiment. I am all for solving physics and metaphysics and discovering things of stunning truth and beauty. Yet I am pretty sure most people and all the incentives will go, in the world where there are not suddenly much bigger issues, ‘yes, that is nice as well, but what I care about so much more is the postscarcity and other practical benefits.’ Which is fine.
Patrick McKenzie wonders who will specialize in the truly fast and cheap ‘current generation minus two’ AIs with outputs you would never dare show a human, but that is fine because they are only used inside various programs. So far open weights models have been very good at this sort of distillation, but not at the kind of bespoke specialization that should rule this market segment. What you will want is to get the most ruthlessly efficient, fully specialized little thing, and you will want someone else’s AI-enabled system to automatically train it for you.
Tyler Cowen refers us to what he calls this good critique of the concept of AGI;
I would instead say that Thomas Diettrich loses one million points for asking the wrong questions.
The right question is, what can we build that is valuable, and how can we build it?
The whole point of the current explosion of models is that the best way we know to do most of these tasks is to build a system that generally understands and predicts human text, in a highly general way. Then you tune that model, and point it at a particular context.
If it was competitive to instead build narrow intelligence, we would be doing that instead. And indeed, in the places where we have a valuable application, we attempt to do so, to the extent it is useful.
But it turns out that this works in LLMs similarly to how it works in humans. If you want to train a living being to do the tasks above you must start with a human, and you will need a relatively smart one if you want good results. A Vulcan or Klingon would work too if you had one, but If you start with anything else that exists on Earth, it will not work. Then you need to teach that human a wide variety of general skills and knowledge. Only then can you teach them how to seek out sources or write engineering tests or formal proofs and hope to get something useful.
This is also implying a similar slightly different critique of AGI in the sense of saying that we ‘should’ in the Jurassic Park sense be building narrower AIs, even if that is harder, because those narrow things have better risk-reward and cost-benefit profiles. And yes, I agree, if we could get everyone to instead build these narrow systems, that would be better, even if it meant progress was somewhat slower. Indeed, many are trying to convince people to do that. The problem is that this is a lot harder than convincing someone not to open Jurassic Park. We will need government coordination if we want to do that.
There is a very good different critique of the AGI concept, essentially that it is not well-defined or used consistently, which is true although it remains highly useful.
The Quest for Sane Regulations
A frontier model regulation proposal has been released from senators Romney, Reed, Moran and King. It is sufficiently short that, given the source, I will quote in full.
They don’t even mention the half of it, whether they know the other half or not. I consider this a case of ‘the half they do mention is enough, and the one the people they talk to can understand’ whether or not it also what they can understand. A pure ‘national security’ approach, treating it as a dangerous weapon our enemies can use is not a good description of the real threat, but it is an accurate description of one threat.
It is a reasonable place to start. I also wonder if it could also be sufficient?
As in, a frontier AI is a general purpose device. If you can guard it against assisting with these risks, you need to have it under control in ways that you should be able to transfer? Consider the contrapositive. If a frontier model is capable of taking control of the future, recursively self-improving or otherwise posing an existential risk, then if hooked up to the internet it is definitely capable of advancing a cyberattack.
I would have said that if you are using that many operations (flops) then I am willing to assume you are effectively general purpose. I suppose in the future this might not be true, and one might have a system this large whose scope is narrow. I don’t love the loophole, as I worry people could abuse it, but I understand.
This seems like, for better and for worse, very much a ‘the least you can do’ standard. If you want to train a frontier model, you must ensure it does not get stolen, and it cannot be used for cyberattacks or to enable WMDs. You need a license to release the model, with access you can grant appropriate to the risk level.
As always, it must be noted that there will come a time when it is not safe to train and test the model, and guarding against being stolen is only part of what you will have to do in that stage. Gatekeeping only upon release will become insufficient. I do get why this is not in the first proposal.
I also find it difficult to believe that it would make sense to only consider these four risks when determining level of distribution that is appropriate, or that this would stick. Surely we would want to test against some other downsides as well. But also that would come in time either way, including through existing law.
This was the question my friend raised last week about the model bill. If you are going to do this, where should you do it? I don’t know. I can see arguments for Commerce and Energy, and if you are going to stick with an existing agency they seem like the obvious options. A new agency could also make sense. I would be skeptical of the interagency proposal.
USA Department of Commerce secretary Gina Raimondo announces the new expanded executive leadership of the U.S. AI Safety Institute (AISI):
Paul Christiano was indeed appointed. Only this week, I had a meeting in which someone asserted that half the staff was threatening to walk out over it despite vey much wanting Paul to get the job, which (probably) shows how effective journalistic impressionism based off of ‘find two people who are mad’ can be.
My current understanding is that Mara Campbell is brought in to be an operating officer who gets things done, and Rob Reich and Mark Latonero are on the ethical end of the concern spectrum. So this is a well-balanced team.
CMA, the UK’s Competition and Markets Authority, warns that the AI foundation model space might not be sufficiently competitive, we need to ensure there is a ‘fair, open and effective’ race to kill us all. To do this, they plan to closely monitor partnerships and mergers.
Some of the lowest hanging fruit in AI regulation is, as it is usually is, to first do no harm (or minimize harm done). In this case, that starts with ensuring that there is a safety exception for all antitrust regulation, so AI companies can coordinate to ensure better outcomes. Right now, they are often afraid to do so.
An advisory from the Massachusetts Attorney General, which could be summarized as:
Maxwell Tabarrok argues ‘AI Regulation is Unsafe.’
He doesn’t frame it this way, but Maxwell seems to mostly be making a fully general counterargument to government regulating anything at all. He indeed cites some of our worst regulations, such as NEPA and our rules against nuclear power.
I agree that our regulations in those areas, and many others, have done much harm, that politicians are myopic and foolish and we do not get first best solutions and all of that. But also I do not think we are doing actively worse than having zero restrictions and protections at all?
I have heard economic and public choice arguments warnings before, and often respect them, but I feel like this one should win some sort of new prize?
I think the easiest responses are things like (and I feel silly even typing them):
And he warns government is going to make things worse.
The only way I can imagine not having military competition in AI is an international agreement limiting the development and deployment of AI as relevant to military use. There is no option to have the government leave AI alone for the private sector to handle, in this respect.
Also, if the government did decide to both not develop its own AI and let others develop theirs without restriction, it would not be long before we were answering to a new and different government, that held a different perspective.
He cites my summary of last year’s congressional hearing as well, which I find pretty funny, so I’m going to requote the passage as well:
Yeah, that definitely happened, and definitely was not anyone’s finest hour or that unusual for anyone involved. And of course he refers back the famous line from Blumenthal, who afterwards did seem to get more on the ball but definitely said this:
So yeah. We go to war with the army we have, and we go to regulate with the government we have.
In a technical sense, I totally agree with Maxwell’s title here.
Regulation of AI is not safe, nor is government involvement in AI safe, any more than highly capable AI is safe, or government non-involvement is safe. Almost nothing that impacts the world at this level is safe. That would be some strange use of the word safe I was not previously aware of.
But reflecting on the essay, I don’t actually know what alternative Maxwell is proposing. If public choice is indeed this deeply doomed, and the existential risks are real, and the military applications are real, what does he think is our superior option?
There is no proposed alternative framework here, nationally or internationally.
If the proposal is ‘the government should do as little as possible,’ then here are some of the obvious problems with that:
Or:
I call upon those who see the dangers of public choice and what generally happens with government regulation to actually take those questions seriously, and ask what we can do about it.
Right now, you have the opportunity to work with a bunch of people who also appreciate these questions, who are at least low-level libertarians on almost every other issue, to find a minimally restrictive solution, and are thinking deeply about details and how to make this work. We care about your concerns. We are not myopic, and we want to choose better solutions rather than worse.
If you pass up this opportunity, then even if you get what you want, at best you will be facing down a very different kind of would-be regulator, with a very different agenda, who has no idea in a technical sense what they are dealing with. They will very much not care what you think. The national security apparatus and the public will both be screaming at everyone involved. And our physical options will be far more limited.
The Week in Audio
I am on 80,000 hours, which as we all know is named for the length of its episodes.
If you have been reading my updates, most of this episode will be information you already know. There is still substantial new content.
So this clip is especially not going to be news to most anyone reading this here, but here is a clip made by Liron, where I spend a few minutes saying that I believe that, if you have a remotely similar model to mine of AI existential risk, then one should not specifically take a job actively working specifically on frontier AI capabilities at a frontier AI capabilities lab in order to ‘build career capital’ or influence their safety culture.
We used this question and I pointed this out because the 80,000 Hours job recommendations (You had one job!) says that this is complicated, and when I challenged them on this in person, they defended that claim, and now I was going to be on the 80,000 Hours podcast, so it seemed worth addressing.
As I say in the podcast, I consider myself a moderate on this, making only a narrow focused claim, and encouraging everyone to have their own model of what substantially increases existential risk. Then, whatever that thing is, don’t do that.
Others go farther.
I do agree strongly that ‘be careful’ is the correct approach to such efforts, but have more hope that they can be worthwhile after being properly careful.
In three hours, one is going to make some mistakes.
Here’s the biggest technical flag someone sent up.
Asking all the major language models resulted in many waffling answers (GPT-4 did best), and my conclusion is that both linear and log times likely happen often. I tried a Twitter poll, opinions were split, and I was referred to a paper. One note from the paper that explains how this works:
So this goes back to superposition. You have both memorization and generalization circuits from the start, and over time generalization is favored because it is efficient, so weight decay enforces the transition.
One implication is that you want to craft your training to ensure that the method you prefer is the more efficient one, whether or not it is the most precise.
My guess is that linear time for a grok is more common than exponential time, but I am not confident and that both cases happen frequently. The poll ended up split on low volume since I asked non-experts to abstain (12-12-4):
The linked post speculates that this could make it harder to stop a model that has found an aligned first algorithm from later finding a second misaligned algorithm, as it would already be doing the gradient descent process towards the second solution, having the first algorithm does not protect you from the rise of the second one.
The flip side of this is that if the second algorithm is already there from the beginning, then it should be possible with mechanistic interpretability to see it long before it is doing anything useful or thus dangerous, perhaps?
Davidad announces both his program’s funding (>$74 million over 4 years) and presents his plan for Safeguarded AI (35 minute video).
Ezra Klein did an excellent interview with Anthropic CEO Dario Amodei, I recommend listening to this one. Ezra Klein very much did the work on this one, and consistently was accurate, on point and brought the fire.
Dario engaged and had a lot of good answers. But also he kept coming back to the theme of AI’s inevitability, and our collective helplessness to do anything about it, not primarily as a problem to overcome but as a fact to accept. Yes, he says, we need to train the models to make them safe, and also everyone who said that is now in a race against everyone else who said that, both are true.
More than that, Dario said many times, almost as a mantra, that one could not hope for much, one cannot ask for much, that we can’t stop someone else from picking up the mantle. I mean, not with that attitude.
This updated me substantially towards the idea that Anthropic is effectively going to be mostly another entrant in the race, resigned to that fate. Politically, they will likely continue to be unhelpful in expanding the Overton Window and making clear what has to be done. To the extent they help, they will do this by setting an example via their own policies, by telling us about their expectations and hopefully communicating them well, and by doing a lot of internal alignment work.
I was referred to this podcast by someone who said ‘have you heard Dario’s unhinged interview with Ezra Klein?’ quoting parts where Dario gives his expectations for capabilities advances.
To me it was the exact opposite. This episode was hinged. It was too hinged. This situation does not call for this level of hinged. Dario strongly believes in the scaling hypothesis and that capabilities will advance quickly from here. He understands what is coming, indeed thinks more will come faster than I do. He understands the dangers this poses. Yet it was all ordinary business, and he thinks it will still probably all turn out fine, although to his credit he understands we need to prepare for the other case and to work to ensure good outcomes. But to me, given what he knows, the situation calls for a lot less being hinged than this.
Do some of the claims about future expectations sound unhinged, such as the one that was quoted to me? Yes, they would from the outside. But that is because the outside world does not understand the situation.
Connor Leahy returned to Bold Conjectures. The first twenty minutes are Connor giving his overall perspective, which continues to be that things were bad and are steadily getting so much worse as we plow full speed ahead and commit collective suicide. I am more optimistic, but I understand where he is coming from.
Then comes a detailed dive into describing mysticism and dissecting his thread with Roon, and using such frames as metaphors to discuss what is actually happening in the world and how to think about it. It is definitely a noble attempt at real communication and not like the usual AI discourse, so I would encourage listening on the margin. My guess is most people will bounce off the message, others will say ‘oh yes of course I know this already’ but there will be those who this helps think better, and a few who will become enlightened when hit with this particular bamboo rod.
Connor also did this debate with Azeem Azhar about existential risk.
Rhetorical Innovation
Ajeya is on point here. As is often the case, technically true statements are made, they are implied to be comforting and reasons not to worry, and that seems very wrong.
Futurist Flower is included because if even as the skeptic you have to say ‘it won’t happen this year’ rather than ‘it won’t happen within five years’ then that is a rather alarming thing to say even if you are definitely right about the coming year. I would be closer to 1% than 2-4% for the next year, but three years ago that number would have involved some zeroes.
The ‘component’ element here is important as well. Will the future AGI be purely an autoregressive LLM? My presumption is no, because even if that were possible, it will be easier and faster and cheaper to get to AGI while using additional components. That does not mean we don’t get an AGI that is centrally powered by an LLM.
Exact probabilities aside, yes those are some better questions.
Aligning an exact human level intelligence? Well known to be difficult.
Elon Musk is importantly wrong here. Raising a kid involves some amount of prompt engineering, to be sure, but the key thing is that a kid learns from and potentially remembers absolutely everything. Each step you take is permanent on every level. It is far more like training than inference.
The key advantage you have in prompt engineering is that you can experiment risk-free, then reset with the AI none of wiser. If you could do that with your kids, it would be a whole different ballgame.
Don’t Be That Guy
So, yeah. As Brian Frye tells us: Don’t be that guy.
There are definitely some people who are not doing okay, and saying things that are not okay and also not true, when it comes to being mad about AI. Do not do this.
In my experience, the actually unhinged reactions are almost entirely people whose primary motivation is that the AI is stealing their or others’ work, either artistic or otherwise. Most such people are also hinged, but some are very unhinged, beyond what I almost ever see from people whose concern is that everyone might die. Your observations may vary.
Aligning a Smarter Than Human Intelligence is Difficult
David Krueger introduces a gigantic 100+ page collaborative agenda led by Usman Anwar, on “Foundational Challenges In Assuring Alignment and Safety of LLMs” alongside 35+ co-authors from the NLP, ML, and AI Safety communities. An overview page can be found here.
They offer this helpful advice:
As a general rule, if you have to solve 18 different foundational challenges one at a time, and you cannot verify each solution robustly, that is a deeply awful place to be. The only hope is that you can solve multiple solutions simultaneously, and the challenges prove not so distinct. Or you can hope that you do not actually need to solve all 18 problems in order to win.
Here is how they define alignment, noted because the term is so overloaded:
As they note this is a broad definition of safety. Is anything worth having ‘safe’ in this way? And yet, it might not be expansive enough, in other ways. What if the harms are indeed planned?
And here are the eighteen problems. How many must we solve? How many of the 200+ subproblems would we need to tackle to do that? To what extent are they distinct problems? Does solving some of them help with or even solve others? Would solving all these problems actually result in a good future?
If you are looking for good questions to be investigating, this seems like a great place to do that. I see a lot of people who want to work on the problem but have no idea what to do, and this is a lot of possible and plausibly useful somethings to do, so not everyone defaults to mechanical interpretability and evals.
Beyond that, as much as I would love to dive into all the details, I lack the time.
Roon offers his reasons to be optimistic about alignment, which I’ve changed to a numbered list.
Roon: reasons to be optimistic about alignment:
My quick responses:
There are definitely lots of reasons to be marginally more optimistic.
Jeffrey Ladish ponders the implications of LLMs getting more situationally aware over time (which will definitely happen), and better knowing when they are being asked to deceive or otherwise do harm. In some ways this is better, the AI can spot harmful requests and refuse them. In other ways this is worse, the AI can more easily and skillfully deceive us or work against us (either at the user’s behest, intentionally or not, or not at the user’s or perhaps creator’s or owner’s behest), such as by acting differently when it might be caught.
And more generally, AI deception skills will greatly improve over time. As I keep saying, deception is not a distinct magisteria. It is infused into almost all human interaction. It is not a thing you can avoid.
Please Speak Directly Into the Microphone
Except then he plays a video, where the claim is that “We see no mechanism of any way possible of limiting A.I. and its spread and its propogation. It can’t be regulated. Unless you control every line of written code. And the AIs are writing the code.” And the standard arguments of ‘well if you don’t do it then China will’ and so on, no possibility that humans could coordinate to not all die.
I do not think that is remotely right.
But if it is right, then there is also no ‘guiding’ AI. If we cannot regulate it, and we cannot control its spread or propagation, as they and some others claim, then we have already lost control over the future to AI. We will soon have no say in future events, and presumably not be around for much longer, and have very little say even now over what that future AI will look like or do, because we will be ‘forced’ by The Incentives to build whatever we are capable of building.
Yes, endorsed on reflection, and fair:
Yes. If that is what AGI.Eth believes, then say it. Exactly like this. I approve.
We should be aware that many want to build this as fast as possible.
People Are Worried About AI Killing Everyone
OpenAI fires two researchers for allegedly leaking information.
This is obviously very bad news, given multiple people on the Superalignment team are being fired, whether or not they indeed leaked information.
Eliezer Yudkowsky notes, for context, that he has reason to believe Leopold Aschenbrenner opposed funding Eliezer’s non-profit MIRI.
Daniel Kokotajlo has quit OpenAI, and the reason is not reassuring, here is his new profile description:
Daniel collaborated on this post on timelines, where in November he predicted a 4 year median estimate for automation of 99% of jobs. He has given a 70% chance of AI existential catastrophe:
In terms of predicting AGI Real Soon Now, he is all-in:
Despite this being based on non-public information from OpenAI, he quit OpenAI.
Daniel’s goal is clearly to minimize AI existential risk. If AGI is coming that quickly, it is probably happening at OpenAI. OpenAI would be where the action is, where the fate of humanity and the light cone will be decided, for better or for worse.
It seems unlikely that he will have higher leverage doing something else, within that time frame, with the possible exception of raising very loud and clear alarm bells about OpenAI.
My presumption is that Daniel did not quietly despair and decide to quit. Instead, I presume Daniel used his position to speak up and as leverage, and tried to move things in a good direction. Part of that strategy needs to be a clear willingness to quit or provoke being fired, if your attempts are in vain. Alas, it seems his attempts were in vain.
Given the timing and what else has happened, we could offer some guesses here. Any number of different proximate causes or issues are plausible.
This is in contrast to his previous actions. Before, he felt p(doom) of 70%, and that AGI was coming very soon, but did feel (or at least say to himself that) he could make a net positive difference at OpenAI. If not, why stay?
I hope that Daniel will be able to share more of his reasoning soon.
Finally on a related note: Remember, the point of dying on a hill ideally is
to make someone else die on that hillyou prefer to never die at all.Other People Are Not As Worried About AI Killing Everyone
Arnold Kling discusses Amar Bhide’s article ‘The Boring Truth About AI.’ Amar Bhide says AI advances and adaptation will be gradual and uncertain, citing past advances in AI and elsewhere. He says it will be another ‘ordinary piece of technology’ that poses no existential risks, exactly because he assumes the conclusion that AI will be merely an ordinary tool that will follow past AI and other technological patterns of incremental development and gradual deployment, and that the world will remain in what I call ‘economic normal.’
This assumes the conclusion, dismissing the possibility of AI capable of being transformative or more than a tool, without considering whether that could happen. It does not ask what might happen if we created things smarter, faster and more capable than ourselves, or any of the other interesting questions. He for example says this is not like the Manhattan Project where things happened fast, without noticing that the similarly fast (or faster) progress lies in the future, or the reasons one might expect that.
Also, the Manhattan Project took several years to get to its first few bombs, after much prior physics to lay the groundwork during which nothing of similar impact was produced, then suddenly a big impact. An odd choice of discordant parallel.
I suppose at this point my perspective is that such arguments are not even wrong. They are instead about a different technology and technological path I do not expect to occur, although it is possible that it could. In such worlds, I agree that the result would not be transformational or existentially dangerous, and also would not be all that exciting on the upside either.
As is often the case with such skeptics, he notes he has been unable to enhance his own productivity with LLMs, and says this:
This is a failure to see even the upside in present LLM technology, let alone future technology, and to think not only even slightly ahead but even about how to use what is directly there right now. If you find LLMs are a ‘productivity killer’ you have not invested much in asking how to use them.
Kling’s commentary mostly discusses the practical question of applications and near term gains, which are indeed not so extensive so far, mostly confined to a few narrow domains. This is a skill issue and a time issue, even if the underlying technology got stuck the developers need more time, and users need more time to learn and experiment and adapt. And of course everything will get dramatically better with GPT-5-Generation underlying models within a few years.
In terms of Kling’s question about personalized tutoring disrupting education, I would say this is already a skill issue and signaling problem. Education for those looking to learn is already, with the current big three models, dramatically different for those wise enough to use them, but most people are not going to know this and take initiative yet. For that, yes, we need something easier to use and motivate, like Stephenson’s Young Lady’s Illustrated Primer. In its full glory that is still a few years out.
On existential risk, Kling says this:
That seems right. I do think that the first and biggest existential risks follow directly from the innovation alone, at least to the degree you can say that of the atomic bomb. As in, if you build an atomic bomb and never use it, or learn how and never build one, then that is not risky, but once built it was quickly used. So yes, you could keep the otherwise existentially risky AI turned off or sufficiently isolated or what not, but you have to actively do that, rather than only worrying about downstream actions of users or developers.
There are also grave concerns about what would happen if we to a large extent ‘solve the alignment problem’ and otherwise bypass that first whammy, and even if we prevent various obvious misuse cases, about what dynamics and outcomes would still result from ‘adaptation’ of the technology, which could also quickly be a misnomer. Everything really does change. But as explained, that is effectively beyond scope here.
The Lighter Side
I mean, sometimes?
Or perhaps you can work around that requirement.
It actually does seem super useful for taxes. Most of taxes is knowing a lot of stupid little semi-arbitrary rules and procedures. Yes, it will make mistakes and hallucinate things if your situation gets complicated, but so will you and so will your accountant. One does not get taxes done perfectly, one does their best to get it mostly right in reasonable time.
Special cases can be weird, but praise generally only makes one more ambitious.
As per usual, from Seb Krier.
A fair version of the second panel would actually still have about one hand raised. Evals and mechanistic interpretability are the two places some people are actually excited to do the work.