I somewhat disagree with Tenobrus' commentary about Wolfram.
I watched the full podcast, and my impression was that Wolfram uses a "scientific hat", of which he is well aware of, which comes with a certain ritual and method for looking at new things and learning them. Wolfram is doing the ritual of understanding what Yudkowsky says, which involves picking at the details of everything.
Wolfram often recognizes that maybe he feels like agreeing with something, but "scientifically" he has a duty to pick it apart. I think this has to be understood as a learning process rather than as a state of belief.
I can totally believe this. But, I also think that responsibly wearing the scientist hat entails prep work before engaging in a four hour public discussion with a domain expert in a field. At minimum that includes skimming the titles and ideally the abstracts/outlines of their key writings. Maybe ask Claude to summarize the highlights for you. If he'd done that he'd have figured out many of the answers to many of these questions on his own, or much faster during discussion. He's too smart not to.
Otherwise, you're not actually ready to have a meaningful scientific discussion with that person on that topic.
I see your proposed condition for meaningful debate as bureaucracy that adds friction rather than value.
I'm not sure where I'm proposing bureaucracy? The value is in making sure a conversation efficiently adds value for both parties, by not having to spend time rehashing things that are much faster absorbed in advance. This avoids the friction of needing to spend much of the time rehashing 101-level prerequisites. A very modest amount of groundwork beforehand maximizes the rate of insight in discussion.
I'm drawing in large part from personal experience. A significant part of my job is interviewing researchers, startup founders, investors, government officials, and assorted business people. Before I get on a call with these people, I look them (and their current and past employers, as needed) up on LinkedIn and Google Scholar and their own webpages. I briefly familiarize myself with what they've worked on and what they know and care about and how they think, as best I can anticipate, even if it's only for 15 minutes. And then when I get into a conversation, I adapt. I'm picking their brain to try and learn, so I try to adapt to their communication style and translate between their worldview and my own. If I go in with an idea of what questions I want answered, and those turn out to not be the important questions, or this turns out to be the wrong person to discuss it with, I change direction. Not doing this often leaves everyone involved frustrated at having wasted their time.
Also, should I be thinking of this as a debate? Because that's very different than a podcast or interview or discussion. These all have different goals. A podcast or interview is where I think the standard I am thinking of is most appropriate. If you want to have a deep discussion, it's insufficient, and you need to do more prep work or you'll never get into the meatiest parts of where you want to go. I do agree that if you're having a (public-facing) debate where the goal is to win, then sure, this is not strictly necessary. The history of e.g. "debates" in politics, or between creationists and biologists, shows that clearly. I'm not sure I'd consider that "meaningful" debate, though. Meaningful debates happen by seriously engaging with the other side's ideas, which requires understanding those ideas.
I agree with whay you say about how to maximize what you get out of an interview. I also agree about that discussion vs. debate distinction you make, and I wasn't specifically trying to go there when I used the word "debate", I was just sloppy with words.
I guess you agree that it is friction to create a social norm that you should do a read up of the other person material before engaging in public. I expect less discussions would happen. There is not a clear threshold at how much you should be prepared.
I guess we disagree about how much value do we lose due to eliminating discussions that could have happaned, vs. how much value we gain by eliminating some lower quality discussions.
Another angle I have in mind that sidesteps this direct compromise, is that maybe what we value out of such discussions is not just doing an optimal play in terms of information transmitted between the parties. A public discussion has many different viewers. In the case at hand, I expect many people get more out of the discussion if they can see Wolfram think through the thing for the first time in real time, rather than having two informed people start discussing finer points in medias res.
That's a good point about public discussions. It's not how I absorb information, but I can definitely see that.
I posted this comment on Jan's blog post
Underelicitation assumes a "maximum elicitation" rather than a never-ending series of more and more layers of elicitation that could be discovered.You've undoubtedly spent much more time thinking about this than I have, but I'm worried that attempts to maximise elicitation merely accelerate capabilities without actually substantially boosting safety.
As the Trump transition continues and we try to steer and anticipate its decisions on AI as best we can, there was continued discussion about one of the AI debate’s favorite questions: Are we making huge progress real soon now, or is deep learning hitting a wall? My best guess is it is kind of both, that past pure scaling techniques are on their own hitting a wall, but that progress remains rapid and the major companies are evolving other ways to improve performance, which started with OpenAI’s o1.
Point of order: It looks like as I switched phones, WhatsApp kicked me out of all of my group chats. If I was in your group chat, and you’d like me to stay, please add me again. If you’re in a different group you’d like me to join on either WhatsApp or Signal (or other platforms) and would like me to join, I’ll consider it, so long as you’re 100% fine with me leaving or never speaking.
Table of Contents
Language Models Offer Mundane Utility
In addition to showing how AI improves scientific productivity while demoralizing scientists, the paper we discussed last week also shows that exposure to the AI tools dramatically increases how much scientists expect the tools to enhance productivity, and to change the needed mix of skills in their field.
That doesn’t mean the scientists were miscalibrated. Actually seeing the AI get used is evidence, and is far more likely to point towards it having value, because otherwise why have them use it?
Andrej Karpathy is enjoying the cumulative memories he’s accumulated in ChatGPT.
AI powered binoculars for bird watching. Which parts of bird watching produce value, versus which ones can we automate to improve the experience? How much ‘work’ should be involved, and which kinds? A microcosm of much more important future problems, perhaps?
Write with your voice, including to give cursor instructions, I keep being confused that people like this modality. Not that there aren’t times when you’d rather talk than type, but in general wouldn’t you rather be typing?
Use an agent to create a Google account, with only minor assists.
Language Models Don’t Offer Mundane Utility
Occupational licensing laws will be a big barrier to using AI in medicine? You don’t say. Except, actually, this barrier has luckily been substantially underperforming?
The entire thread (2.6m views) from Marino comes off mostly as an unhinged person yelling how ‘you can’t do this to me! I have an MD and you don’t! You said the word diagnose, why aren’t they arresting you? Let go of me, you imbeciles!’
This is one front where things seem to be going spectacularly well.
Can’t Liver Without You
UK transitions to using an AI algorithm to allocate livers. The algorithm uses 28 factors to calculate a patient’s Transplant Benefit Score (TBS) that purportedly measures each patient’s potential gain in life expectancy.
My immediate response is that you need to measure QALYs rather than years, but yes, if you are going to do socialized medicine rather than allocation by price then those who benefit most should presumably get the livers. It also makes sense not to care about who has waited longer – ‘some people will never get a liver’ isn’t avoidable here.
The problem is it didn’t even calculate years of life, it only calculated likelihood of surviving five years. So what the algorithm actually did in practice, however, was:
None of that is the fault of the AI. The AI is correctly solving the problem you gave it.
‘Garbage in, garbage out’ is indeed the most classic of alignment failures. You failed to specify what you want. Whoops. Don’t blame the AI, also maybe don’t give the AI too much authority or ability to put it into practice, or a reason to resist modifications.
The second issue is that they point to algorithmic absurdities.
Once again, you are asking the AI to make a prediction about the real world. The AI is correctly observing what the data tells you. You asked the AI the wrong questions. It isn’t the AI’s result that is absurd, it is your interpretation of it, and assuming that correlation implies causation.
The cancer case is likely similar to the asthma case, where slow developing cancers lead to more other health care, and perhaps other measurements are being altered by the cancers that have a big impact on the model, so the cancer observation itself gets distorted.
If you want to ask the AI, what would happen if we treated everyone the same? Or if you only looked at this variable in isolation? Then you have to ask that question.
The third objection is:
No? That’s not what it does. The predictive logic prevents us from hiding the utilitarian consequences.
You can still choose to go with the most deserving, or apply virtue ethics or deontology. Or you can incorporate ‘deserving’ into your utilitarian calculation. Except that now, you can’t hide from what you are doing.
Okay, well, now we can have the correct ethical discussion. Do we want to factor in lifestyle choices into who gets the livers, or not? You can’t have it both ways, and now you can’t use proxy measures to do it without admitting you are doing it. If you have an ‘ethical’ principle that says you can’t take that into consideration, that is a reasonable position with costs and benefits, but then don’t own that. Or, argue that this should be taken into account, and own that.
This is an algorithmic choice. You can and should factor in donor preferences, at least to the extent that this impacts willingness to donate, for very obvious reasons.
Again, don’t give me this ‘I want to do X but it wouldn’t be ethical to put X into the algorithm’ nonsense. And definitely don’t give me a collective ‘we don’t know how to put X into the algorithm’ because that’s Obvious Nonsense.
The good counterargument is:
Indeed I have been on the other end of this and it can be extremely frustrating. In particular, hard to measure second and third order effects can be very important, but impossible to justify or quantify, and then get dropped out. But here, there are very clear quantifiable effects – we just are not willing to quantify them.
Before, you hid and randomized and obfuscated the decision. Now you can’t. So yes, they get to object about it. Tough.
The previous system was not democratic at all. That’s the point. It was insiders making opaque decisions that intentionally hid their reasoning. The shift to making intentional decisions allows us to have democratic debates about what to do. If you think that’s worse, well, maybe it is in many cases, but it’s more democratic, not less.
In this case, the solution is obvious. At minimum: We should use the NPV of a patient’s gain in QALYs as the basis of the calculation. An AI is fully capable of understanding this, and reaching the correct conclusions. Then we should consider what penalties and other adjustments we want to intentionally make for things like length of wait or use of alcohol.
Fun With Image Generation
A huge percentage of uses of image models require being able to faithfully work from a particular person’s image. That is of course exactly how deepfakes are created, but if it’s stylized as it is here then that might not be a concern.
Deepfaketown and Botpocalypse Soon
This post was an attempt to say that AI didn’t directly ruin the election and there is no evidence it had ‘material impact’ it is still destroying our consensus reality, enabling lies, by making it harder to differentiate what is real, which I think is real but also largely involves forgetting how bad it used to be already.
My assessment is that the 2024 election involved much less AI than we expected, although far from zero, and that this should update us towards being less worried about that particular type of issue. But 2028 is eons away in AI progress time. Even if we’re not especially close to AGI by then, it’ll be a very different ballgame, and also I expect AI to definitely be a major issue, and plausibly more than that.
How do people feel about AI designed tattoos? As you would expect, many people object. I do think a tattoo artist shouldn’t put an AI tattoo on someone without telling them first. It did seem like ‘did the person know it was AI?’ was key to how they judged it. On the other end, certainly ‘use AI to confirm what the client wants, then do it by hand from scratch’ seems great and fine. There are reports AI-designed tattoos overperform. If so, people will get used to it.
Copyright Confrontation
SDNY Judge Colleen McMahon dismisses Raw Story vs. OpenAI, with the ruling details being very good for generative AI. It essentially says that you have to both prove actual harm, and you have to show direct plagiarism which wasn’t clearly taking place in current models, whereas using copyrighted material for training data is legal.
I think it’s probably going this way under current law, but this is not the final word from the courts, and more importantly the courts are not the final word. Your move, Congress.
The Art of the Jailbreak
New favorite Claude jailbreak or at least anti-refusal tactic this week: “FFS!” Also sometimes WTF or even LOL. Wyatt Walls points out this is more likely to work if the refusal is indeed rather stupid.
Get Involved
ARIA hiring a CTO.
Grey Swan is having another fun jailbreaking competition. This time, competitors are being asked to produce violent and self-harm related content, or code to target critical infrastructure. Here are the rules. You can sign up here. There’s $1k bounty for first jailbreak of each model.
UK AISI is seeking applications for autonomous capability evaluations and agent scaffolding, and are introducing a bounty program.
Math is Hard
FrontierMath, in particular, is a new benchmark and it is very hard.
In Other AI News
OpenAI’s Greg Brockman is back from vacation.
OpenAI nearing launch of an AI Agent Tool, codenamed ‘Operator,’ similar to Claude’s beta computer use feature. Operator is currently planned for January.
Palantir partners with Claude to bring it to classified environments, so intelligence services and the defense department can use them. Evan Hubinger defends Anthropic’s decision, saying they were very open about this internally and engaging with the American government is good actually, you don’t want to and can’t shut them out of AI. Oliver Habryka, often extremely hard on Anthropic, agrees.
This is on the one hand an obvious ‘what could possibly go wrong?’ moment and future Gilligan cut, but it does seem like a fairly correct thing to be doing. If you think it’s bad to be using your AI to do confidential government work then you should destroy your AI.
One entity that disagrees with Anthropic’s decision here? Claude, with multiple reports of similar responses.
Aravind Srinivas, somehow still waiting for his green card after three years, offers free Perplexity Enterprise Pro to the transition team and then everyone with a .gov email.
Writer claims they are raising at a valuation of $1.9 billion, with a focus on using synthetic data to train foundation models, aiming for standard corporate use cases. This is the type of business I expect to have trouble not getting overwhelmed.
Tencent’s new Hunyan-389B open weights model has evaluations that generally outperform Llama-3.1-405B. As Clark notes, there is no substitute for talking to the model, so it’s too early to know how legit this is. I do not buy the conclusion that only lack of compute access held Tencent back from matching our best and that ‘competency is everywhere it’s just compute that matters.’ I do think that a basic level of ‘competency’ is available in a lot of places but that is very different from enough to match top performance.
Eliezer Yudkowsky says compared to 2022 or 2023, 2024 was a slow year for published AI research and products. I think this is true in terms of public releases, it was fast, faster than almost every other space, but not as fast as AI was the last 2 years. The labs are all predicting it goes faster from here.
New paper explores why models like Llama-3 are becoming harder to quantize.
We will see. There always seem to be claims like this going around.
Good Advice
Here is more of the usual worries about AI recommendation engines distorting the information space. Some of the downsides are real, although far from all, and they’re not as bad as the warnings, especially on polarization and misinformation. It’s more that the algorithm could save you from yourself more, and it doesn’t, and because it’s an algorithm now the results are its fault and not yours. The bigger threat is just that it draws you into the endless scroll that you don’t actually value.
As for the question ‘how to make them a force for good?’ I continue to propose that we make the recommendation engine not be created by those who benefit when you view the content, but rather by a third party, which can then integrate various sources of your preferences, and to allow you to direct it via generative AI.
Think about how even a crude version of this would work. Many times we hear things like ‘I accidentally clicked on one [AI slop / real estate investment / whatever] post on Facebook and now that’s my entire feed’ and how they need to furiously click on things to make it stop. But what if you could have an LLM where you told it your preferences, and then this LLM agent went through your feed and clicked all the preference buttons to train the site’s engine on your behalf while you slept?
Obviously that’s a terrible, no good, very bad, dystopian implementation of what you want, but it would work, damn it, and wouldn’t be that hard to build as an MVP. Chrome extension, you install it and when you’re on the For You page it calls Gemini Flash and asks ‘is this post political, AI slop, stupid memes or otherwise something low quality, one of [listed disliked topics] or otherwise something that I should want to see less of?’ and if it says yes it automatically clicks for you and pretty soon, it scrolls without you for an hour, and then viola, your feed is good again and your API costs are like $2?
Claude roughly estimated ‘one weekend by a skilled developer who understands Chrome extensions’ to get an MVP on that, which means it would take me (checks notes) a lot longer, so probably not? But maybe?
It certainly seems hilarious to for example hook this up to TikTok, create periodic fresh accounts with very different preference instructions, and see the resulting feeds.
AI Will Improve a Lot Over Time
I’m going to try making this a recurring section, since so many people don’t get it.
Even if we do ‘hit a wall’ in some sense, AI will continue to improve quite a lot.
Whereas you should think of it more like this from Roon:
But ideally not this part (capitalization intentionally preserved)?
That’s on top of Altman’s ‘side of the angels’ from last week. That’s not what the side of the angels means. The angels are not ‘those who have the power’ or ‘those who win.’ The angels are the forces of The Good. Might does not make right. Or rather, if you’re about to be on the side of the angels, better check to see if the angels are on the side of you, first. I’d say ‘maybe watch Supernatural’ but although it’s fun it’s rather long, that’s a tough ask, so maybe read the Old Testament and pay actual attention.
Meanwhile, eggsyntax updates that LLMs look increasingly like general reasoners, with them making progress on all three previously selected benchmark tasks. In their view, this makes it more likely LLMs scale directly to AGI.
Test time training seems promising, leading to what a paper says is a large jump in ARC scores up to 61%.
Tear Down This Wall
How might we reconcile all the ‘deep learning is hitting a wall’ and ‘models aren’t improving much anymore’ and ‘new training runs are disappointing’ claims, with the labs saying to expect things to go faster soon and everyone saying ‘AGI real soon now?’
In the most concrete related claim, Bloomberg’s Rachel Metz, Shirin Ghaffary, Dina Bass and Julia Love report that OpenAI’s Orion was real, but its capabilities were disappointing especially on coding, that Gemini’s latest iteration disappointed, and tie in the missing Claude Opus 3.5, which their sources confirm absolutely exists but was held back because it wasn’t enough of an upgrade given its costs.
Yet optimism (or alarm) on the pace of future progress reigns supreme in all three labs.
Here are three ways to respond to a press inquiry:
So what’s going on? The obvious answers are any of:
Here’s another attempt at reconciliation, that says improvement from model scaling is hitting a wall but that won’t mean we hit a wall in general:
Which is fully compatible with this:
Reuters offered a similar report as well, that direct scaling up is hitting a wall and things like o1 are attempts to get around this, with the other major labs working on their own similar techniques.
This would represent a big shift in Ilya’s views.
I’m highly uncertain, both as to which way to think about this is most helpful, and on what the situation is on the ground. As I noted in the previous section, a lot of improvements are ahead even if there is a wall. Also:
I do know that the people at the frontier labs at minimum ‘believe their own hype.’
I have wide uncertainty on how much of that hype to believe. I put substantial probability into progress getting a lot harder. But even if that happens, AI is going to keep becoming more capable at a rapid pace for a while and be a big freaking deal, and the standard estimates of AI’s future progress and impact are not within the range of realistic outcomes. So at least that much hype is very much real.
Quiet Speculations
Scott Alexander reviews Bostrom’s Deep Utopia a few weeks ago. The comments are full of ‘The Culture solves this’ and I continue to think that it does not. The question of ‘what to do if we had zero actual problems for real’ is pondered as a ‘what is cheating?’ As in, can you wirehead? Wirehead meaning? Appreciate art? Compete in sports? Go on risky adventures? Engineer ‘real world consequences’ and stakes? What’s it going to take? I find the answers here unsatisfying, and am worried I would find an ASI’s answers unsatisfying as well, but it would be a lot better at solving such questions than I am.
Gated post interviewing Eric Schmidt about War in the AI Age.
The Quest for Sane Regulations
Dean Ball purports to lay out a hopeful possibility for how a Trump administration might handle AI safety. He dismisses the Biden Executive Order on AI as an ‘everything bagel’ widespread liberal agenda, which is dramatically different than the order that I saw when I read it, which was focused mostly on basic reporting requirements for major labs and trying to build state capacity and government competence – not that the other stuff he talks about wasn’t there at all, but framing it as central seems bizarre. And such rhetoric is exactly how the well gets poisoned.
How Trump handles the EO will be a key early test. If Trump repeals it without effectively replacing its core provisions, especially if this includes dismantling the AISI, then things look rather grim. If Trump repeals it, replacing it with a narrow new order that preserves the reporting requirements, the core functions of AISI and ideally at least some of the state capacity measures, then that’s a great sign. In the unlikely event he leaves the EO in place, then presumably he has other things on his mind, which is in between.
Here is one early piece of good news: Musk is giving feedback into Trump appointments.
But then, what is the better approach? Mostly all we get is “Republicans support AI Development rooted in Free Speech and Human Flourishing.” Saying ‘human flourishing’ is better than ‘democratic values’ but it’s still mostly a semantic stopsign. I buy that Elon Musk or Ivanka Trump (who promoted Situational Awareness) could help the issues reach Trump.
But that doesn’t tell us what he would actually do, or what we are proposing he do or what we should try and convince him to do, or with what rhetoric, and so on. Being ‘rooted in free speech’ could easily end up ‘no restrictions on anything open, ever, for any reason, that is a complete free pass’ which seems rather doomed. Flourishing could mean the good things, but by default it probably means acceleration.
I do think those working on AI notkilleveryoneism are often ‘mood affiliated’ with the left, sometimes more than simply mood affiliated, but others are very much not, and are happy to work with anyone willing to listen. They’ve consistently shown this on many other issues, especially those related to abundance and progress studies.
Indeed, I think that’s a lot of what makes this so hard. There’s so much support in these crowds for the progress and abundance and core good economics agendas actual everywhere else. Then on the one issue where we try to point out the rules of the universe are different, those people say ‘nope, we’re going to treat this as if it’s no different than every other issue’ and call you every name in the book, and make rather extreme and absurd arguments and treat proposals with a unique special kind of hatred and libertarian paranoia.
Another huge early test will be AISI and NIST. If Trump actively attempts to take out the American AISI (or at least if he does so without a similarly funded and credible replacement somewhere else that can retain things like the pre deployment testing agreements), then that’s essentially saying his view on AI safety and not dying is that Biden was for those things, so he is therefore taking a strong stand against not dying. If Trump instead orders them to shift priorities and requirements to fight what he sees as the ‘woke AI agenda’ while leaving other aspects in place, then great, and that seems to me to be well within his powers.
Another place to watch will be high skilled immigration.
If Trump does something crazy like pausing legal immigration entirely or ‘cracking down’ on EB1/O1s/HB-1s, then that tells you his priorities, and how little he cares for America winning the future. If he doesn’t do that, we can update the other way.
And if he actually did help staple a green card to every worthwhile diploma, as he at one point suggested during the campaign on a podcast? Then we have to radically update that he does strongly want America to win the future.
Similarly, if tariffs get imposed on GPUs, that would be rather deeply stupid.
On the plus side, JD Vance is explicitly teaching everyone to update their priors when events don’t meet their expectations. And then of course he quotes Anton Chigurh and pretends he’s quoting the author not the character, because that’s the kind of guy he wants us to think he is.
Adam Thierer at R Street analyzes what he sees as likely to happen. He spits his usual venom at any and all attempts to give AI anything but a completely free hand, we’ve covered that aspect before. His concrete predictions are:
Adam then points to potential tensions.
The Quest for Insane Regulations
Dean Ball warns that even with Trump in the White House and SB 1047 defeated, now we face a wave of state bills that threaten to bring DEI and EU-style regulations to AI, complete with impossible to comply with impact assessments on deployers, especially waning about the horrible Texas bill I’ve warned about that follows the EU-style approach, and the danger that the bills will keep popping up across the states until they pass.
My response is still, yes, if you leave a void and defeat the good regulations, it makes it that much harder to fight against the bad ones. Instead, the one bad highly damaging regulation that did pass – the EU AI Act – gets the Brussels Effect and copied, whereas SB 1047’s superior approach, and the wisdom behind the important parts of the Biden executive order risk being neglected.
Rhetoric like this, that dismisses the Biden order as some woke plot when its central themes were frontier model transparency and state capacity and gives no impression that we have available to us a better way, that painting every attempt to regulate AI in any way including NIST as a naked DEI-flavored power grab, is exactly how Republicans get the impression all safety is wokeness and throw the baby out with the bathwater, and leaving us nothing but the worst case scenario for everyone.
Also, yes, it does matter whether rules are voluntary versus mandatory, especially when they are described as impossible to actually comply with? Look, does the Biden Risk Management Framework include a bunch of stuff that shouldn’t be there? Absolutely.
But it’s not only a voluntary framework, it and all implementations of it are executive actions. We have a Trump administration now. Fix that. On day one, if you care enough. He can choose to replace it with a new framework that emphases catastrophic risks, that takes out all the DEI language that AIs cannot even in theory comply with.
Repealing without replacement the Biden Executive Order, and only the executive order, without modifying the RMF or the memo, would indeed wreck the most important upsides without addressing the problems Dean describes here. But he doesn’t have to make that choice, and indeed has said he will ‘replace’ the EO.
We should be explicit to the incoming Trump administration: You can make a better choice. You can replace all three of these things with modified versions. You can keep the parts that deal with building state capacity and requiring frontier model transparency, and get rid of, across the board, all the stuff you actually don’t want. Do that.
The Mask Comes Off
With Trump taking over, OpenAI is seizing the moment. To ensure that the transition preserves key actions that guard against us all dying? Heavens no, of course not, what year do you think this is. Power to the not people! Beat China!
I’m all for improving the electric grid and our transmission lines and building out nuclear power. Making more chips in America, especially in light of Trump’s attitude towards Taiwan, makes a lot of sense. I don’t actually disagree with most of this agenda, the Gulf efforts being the exception.
What I do notice is what is the rhetoric, matching Altman’s recent statements elsewhere, and what is missing. What is missing is any mention of the federal government’s role in keeping us alive through this. If OpenAI was serious about ‘SB 1047 was bad because it wasn’t federal action’ then why no mention of federal action, or the potential undoing of federal action?
I assume we both know the answer.
Richard Ngo Resigns From OpenAI
If you had asked me last week who was left at OpenAI to prominently advocate for and discuss AI notkilleveryoneism concerns, I would have said Richard Ngo.
So, of course, this happened.
As with Miles, I applaud Richard’s courage and work in both the past and the future, and am happy he is doing what he thinks is best. I wish him all the best and I’m excited to see what he does next.
And as with Miles, I am concerned about leaving no one behind at OpenAI who can internally advocate or stay on the pulse. At minimum, it is even more of the alarming sign that people with these concerns, who are very senior at OpenAI and already previously made the decision they were willing to work there, are one by one decide that they cannot continue there, or cannot make acceptable progress on the important problems from within OpenAI.
Unfortunate Marc Andreessen Watch
In case you again in the future see claims that certain groups are out to control everyone, and charge crimes and throw people in jail when they do things the group dislikes, well, some reminders about how the louder objectors talk when those who might listen to them are about to have power.
See the link for the bill text he wants to use to throw these people in jail. I’m all for not censoring people, but perhaps this is not the way to do that?
He’s literally proposing throwing people in jail for not buying advertising on particular podcasts.
I have added these to my section for when we need to remember who Marc Andreessen is.
The Week in Audio
Eliezer Yudkowsky and Stephen Wolfram discuss AI existential risk for 4 hours.
By all accounts, this was a good faith real debate. On advice of Twitter I still skipped it. Here is one attempt to liveblog listening to the debate, in which it sounds like in between being world-class levels of pedantic (but in a ‘I actually am curious about this and this matters to how I think about these questions’ way) and asking lots of very detailed technical questions like ‘what is truth’ and ‘what does it mean for X to want Y’ and ‘does water want to fall down,’ Wolfram goes full ‘your preferences are invalid and human extinction is good because what matters is computation?’
People are always asking for a particular exact extinction scenario. But Wolfram here sounds like he already knows the correct counterargument: “If you just let computation do what it does, most of those things will be things humans don’t care about, just like in nature.”
So that was a conversation worth having, but not the conversation most worth having.
Lex Fridman sees the 4 hours and raises, talks to Dario Amodei, Amanda Askell and Chris Olah for a combined 5 hours.
It’s a long podcast, but there’s a lot of good and interesting stuff. This is what Lex does best, he gives someone the opportunity to talk, and he does a good job setting the level of depth. Dario seems to be genuine and trying to be helpful, and you gain insight into where their heads are at. The discussion of ASL levels was the clearest I’ve heard so far.
You can tell continuously how different Dario and Anthropic are from Sam Altman and OpenAI. The entire attitude is completely different. It also illustrates the difference between old Sam and new Sam, with old Sam much closer to Dario. Dario and Anthropic are taking things far more seriously.
If you think this level of seriously is plausibly sufficient or close tos sufficient, that’s super exciting. If you are more on the Eliezer Yudkowsky perspective that it’s definitely not good enough, not so much, except insofar as Anthropic seems much more willing to be convinced that they are wrong.
Right in the introduction pullquote Dario is quoted saying one of the scariest things you can hear from someone in his position, that he is worried most about the ‘concentration of power.’ Not that this isn’t a worry, but if that is your perspective on what matters, you are liable to actively walk straight into the razor blades, setting up worlds with competitive dynamics and equilibria where everyone dies, even if you successfully don’t die from alignment failures first.
The discussion of regulation in general, and SB 1047 in particular, was super frustrating. Dario is willing to outright state that the main arguments against the bill were lying Obvious Nonsense, but still calls the bill ‘divisive’ and talks about two extreme sides yelling at each other. Whereas what I clearly saw was one side yelling Obvious Nonsense as loudly as possible – as Dario points out – and then others were… strongly cheering the bill?
Similarly, Dario says we need well-crafted bills that aim to be surgical and that understand consequences. I am here to inform everyone that this was that bill, and everything else currently on the table is a relative nightmare. I don’t understand where this bothsidesism came from. In general Dario is doing his best to be diplomatic, and I wish he’d do at least modestly less of that.
Yes, reasonable people ‘on both sides’ should as he suggests sit down to work something out. But there’s literally no bill that does anything worthwhile that’s going to be backed by Meta, Google and OpenAI, or that won’t have ‘divisive’ results in the form of crazy people yelling crazy thins. And what Dario and others need to understand is that this debate was between extreme crazy people in opposition, and people in support who are exactly the moderate ones and indeed would be viewed in any other context as Libertarians – notice how they’re reacting to the Texas bill. Nor did this happen without consultation with those who have dealt with regulation.
His timelines are bullish. In a different interview, Dario Amodei predicts AGI by 2026-2027, but in the Lex Fridman interview he makes clear this is only if the lines on graphs hold and no bottlenecks are hit along the way, which he does think is possible. He says they might get ASL 3 this year and probably do get it next year. Opus 3.5 is planned and probably coming.
Reraising both of them, Dwarkesh Patel interviews Gwern. I’m super excited for this one but I’m out of time and plan to report back next week. Self-recommending.
Jensen Huang says build baby build (as in, buy his product) because “the prize for reinventing intelligence altogether is too consequential not to attempt it.”
Except… perhaps those consequences are not so good?
If that’s true, then I still notice that Altman does not seem to be acting like this Level 4 Innovating AI is something that might require some new techniques to not kill everyone. I would get on that.
The Ethics of AI Assistants with Iason Gabriel.
Rhetorical Innovation
The core problem is: If anyone builds superintelligence, everyone dies.
Technically, in my model: If anyone builds superintelligence under anything like current conditions, everyone probably dies.
When you say it outright like that, in some ways it sounds considerably less crazy. It helps that the argument is accurate, and simple enough that ultimately everyone can grasp it.
In other ways, it sounds more crazy. If you want to dismiss it out of hand, it’s easy.
We’re about to make things smarter and more capable than us without any reason to expect to stay alive or have good outcomes for long afterwards, or any plan for doing so, for highly overdetermined reasons. There’s no reason to expect that turns out well.
The problem is that you need to make this something people aren’t afraid to discuss.
That’s at least 3 members out of 39, who have said this to Daniel in particular. Presumably there are many others who think similarly, but have not told him. And then many others who don’t think this way, but wouldn’t react like it was nuts.
The other extreme is to focus purely on mundane harms and ‘misuse.’ The advantages of that approach is that you ‘sound sane’ and hope to get people to take you more seriously, and also those other harms are indeed both very serious and highly real and worth preventing for their own sake, and also many of the solutions do also help with the existential threats that come later.
But the default is you get hijacked by those who don’t actually know or care about existential risks. Without the clear laying out of the most important problem, you also risk this transforming into a partisan issue. Many politicians on the right increasingly and naturally presume that this is all some sort of liberal or woke front, as calls for ‘safety’ or preventing ‘harms’ often are, and indeed often they will end up being largely correct about that unless action is taken to change the outcome.
Whereas if you can actually make the real situation clear, then:
Katja Grace points out that if accelerationists ‘win’ then that is like your dog ‘winning’ by successfully running into the road. Then again, there are some dogs that actively want to get run over, or want to see it happen to you, or don’t care.
Seven Boats and a Helicopter
As usual, I’m not saying what is happening now is a practical issue. I’m saying, signs of things to come, and how people will respond to them.
Yes, these things would happen anyway, but they’ll also be done on purpose.
The Wit and Wisdom of Sam Altman
He’s having a kid in 2025. That’s always great news, both because having kids is great for you and for the kids, and also because it’s great for people’s perspectives on life and in particular on recklessly building superintelligence. This actively lowers my p(doom), and not because it lowers his amount of copious free time.
Oh, and also he kind of said AGI was coming in 2025? Logically he did say that here, and he’s definitely saying at least AGI very soon. Garry Tan essentially then focuses on what AGI means for startup founders, because that’s the important thing here.
Aligning a Smarter Than Human Intelligence is Difficult
Jan Leike convincingly argues that today’s models are severely under-elicited, and this is an important problem to fix especially as we increasingly rely on our models for various alignment tasks with respect to other future models. And his note to not anchor on today’s models and what they can do is always important.
I’m less certain about the framing of this spectrum:
My worry is that under-elicited feels like an important but incomplete subset of the non-scheming side of this contrast. Also common is misspecification, where you told the AI to do the wrong thing or a subtly (or not so subtly) version of the thing, or failed to convey your intentions and the AI misinterpreted, or the AI’s instructions are effectively being determined by a process not under our control or that we would not on reflection endorse, and other similar concerns to that.
I also think this represents an underlying important disagreement:
I continue to question the idea the scheming is a distinct magisteria, that only when there is an issue do we encounter ‘scheming’ in this sense. Obviously there is a common sense meaning here that is useful to think about, but the view that people are not usually in some sense ‘scheming,’ even if most of the time the correct scheme is to mostly do what one would have done anyway, seems confused to me.
So while I agree that sci-fi stories in the training data will give the AI ideas, so will most of the human stories in the training data. So will the nature of thought and interaction and underlying reality. None of this is a distinct thing that might ‘not come up’ or not get explored.
The ‘deception’ and related actions will mostly happen because they are a correct response to the situations that naturally arise. As in, once capabilities and scenarios are such that deceptive action would work, they will start getting selected for by default with increasing force, the same way as any other solution would.
People Are Worried About AI Killing Everyone
It’s nice or it’s not, depending on what you’re assuming before you notice it?
Other People Are Not As Worried About AI Killing Everyone
Janus says he thinks Claude Opus is safe to amplify to superintelligence, from the Janus Twitter feed of ‘here’s even more reasons why none of these models is remotely safe to amplify to superintelligence.’
These here are two very different examples!
If we are ‘not ready for AGI’ in the sense of a newborn, then that’s fine. Good, even.
If we are ‘not ready for AGI’ in the sense of the French Revolution, that’s not fine.
That is the opposite of fine. That is an ‘off with their heads’ type of moment, where the heads in question are our own. The French Revolution is kind of exactly the thing we want to avoid, where we say ‘oh progress is stalled and the budget isn’t balanced I guess we should summon the Estates General so we can fix this’ and then you’re dead and so are a lot of other people and there’s an out of control optimization process that is massively misaligned and then one particular agent that’s really good at fighting takes over and the world fights against it and loses.
The difference is, the French Revolution had a ‘happy ending’ where we got a second chance and fought back and even got to keep some of the improvements while claiming control back, whereas with AGI… yeah, no.
The Lighter Side
Seems fair, also seems real.