Re: the davidad/roon conversation about CoT:
The chart in davidad's tweet answers the question "how does the value-add of CoT on a fixed set of tasks vary with model size?"
In the paper that the chart is from, it made sense to ask this question, because the paper did in fact evaluate a range of model sizes on a set of tasks, and the authors were trying to understand how CoT value-add scaling interacted with the thing they were actually trying to measure (CoT faithfulness scaling).
However, this is not the question you should be asking if you're trying to understand how valuable CoT is as an interpretability tool for any given (powerful) model, whether it's a model that exists now or a future one we're trying to make predictions about.
CoT raises the performance ceiling of an LLM. For any given model, there are problems that it has difficulty solving without CoT, but which it can solve with CoT.
AFAIK this is true for every model we know of that's powerful enough to benefit from CoT at all, and I don't know of any evidence that the importance of CoT is now diminishing as models get more powerful.
(Note that with o1, we see the hyperscalers at OpenAI pursuing CoT more intensively than ever, and producing a model that achieves SOTAs on hard problems by generating longer CoTs than ever previously employed. Under davidad's view I don't see how this could possibly make any sense, yet it happened.)
But note that different models have different "performance ceilings."
The problems on which CoT helps GPT-4 are problems right at the upper end of what GPT-4 can do, and hence GPT-3 probably can't even do them with CoT. On the flipside, the problems that GPT-3 needs CoT for are probably easy enough for GPT-4 that the latter can do them just fine without CoT. So, even if CoT always helps any given model, if you hold the problem fixed and vary model size, you'll see a U-shaped curve like the one in the plot.
The fact that CoT raises the performance ceiling matters practically for alignment, because it means that our first encounter with any given powerful capability will probably involve CoT with a weaker model rather than no-CoT with a stronger one.
(Suppose "GPT-n" can do X with CoT, and "GPT-(n+1)" can do X without CoT. Well, we'll surely we'll built GPT-n before GPT-(n+1), and then we'll do CoT with the thing we've built, and so we'll observe a model doing X before GPT-(n+1) even exists.)
See also my post here, which (among other things) discusses the result shown in davidad's chart, drawing conclusions from it that are closer to those which the authors of the paper had in mind when plotting it.
My understanding of capabilities training is that there are a lot of knobs and fiddly bits and characteristics of your data and if you screw them up then the thing doesn’t work right, but you can tinker with them until you get them right and fix the issues, and if you have the experience and intuition you can do a huge ‘YOLO run’ where you guess at all of them and have a decent chance of that part working out.
Pressman is almost certainly not referring to YOLO runs, but rather stuff like frakenmerges where you can just take random bits from completely different neural networks, stick them together in a way that looks plausible and it just works. For a while the top open source model was Goliath, a model created in this way. It's also frequently the case that researchers discover they failed to correctly implement some aspect of a model, and yet it still trained just fine.
Edit: Whoops, "Head of Mission Alignment" is actually a person responsible for "working across the company to ensure that we get all pieces (and culture) right to be in a place to succeed at the mission", and not the head of alignment research. Disregard the below.
In other words, the new head of AI alignment at OpenAI is on record lecturing EAs that misalignment risk from AGI is not real.
It was going to happen eventually. If you pick competent people who take the jobs you give them seriously, and you appoint them the Alignment Czar, and then they inevitably converge towards thinking your safety policy is suicidal and they run away from your company, you'll need to either change your policy, or stop appointing people who take their jobs seriously to that position.
I'd been skeptical of John Schulman, given his lack of alignment-related track record and likely biases towards approaching the problem via the ML modus operandi. But, evidently, he took his job seriously enough to actually bother building a gears-level model of it, at which point he decided to run away. They'd tried to appoint someone competent but previously uninterested in the safety side of things to that position – and that didn't work.
Now they're trying a new type of person for the role: someone who comes in with strong preconceptions against taking the risks seriously. I expect that he's either going to take his job seriously anyway (and jump ship within, say, a year), or he's going to keep parroting the party line without deeply engaging with it (and not actually do much competent work, i. e. he's just there for PR).
I'm excited to see how the new season of this hit sci-fi telenovela is going to develop.
Sam: “The model (o1) is going to get so much better so fast […] Maybe this is the GPT-2 moment, we know how to get it to GPT-4”. So plan for the model to get rapidly smarter.
I notice I am skeptical, because of how I think about the term ‘smarter.’ I think we can make it, maybe the word is ‘cleverer’? Have it use its smarts better.
AlphaZero gets to be actually smarter, that's a real possibility. If they only scaled reasoning RL slightly, but there are four orders of magnitude more where that came from, even o1-mini might turn into an alien tiger when it no longer needs to spend its parameters on remembering all the world's trivia. It's merely "cleverer" now, but a natural reading of GPT-2 to GPT-4 transition permits actual improvement in smartness that is distinct from how LLMs get smarter with scale.
- Building a superintelligence under current conditions will turn out fine.
- No one will build a superintelligence under anything like current conditions.
- We must prevent at almost all costs anyone building superintelligence soon.
I don't think this is a valid trilemma: Between fine and worth preventing at "almost all costs" there is a pretty large gap. I think "fine" was intended to mean "we don't all die" or something as bad as that.
A Narrow Path is a newly written plan for allowing humanity to survive the path to superintelligence. Like the plan or hate the plan, this at least is indeed a plan, that tries to lay out a path that might work.
I just submitted an essay yesterday to the Cosmos essay contest entitled, "A path to Human Autonomy". There are enough parallels with 'A Narrow Path' that I couldn't help but laugh when I saw this. At least others are thinking along the same lines I am!
Of course, there are also differences. My essay emphasizes some different points, such as the troubling intersection of self-replicating weapons and AI. A different understanding of the threat landscape leads to different ideas of how to avoid those threats.
“If you’re building an AGI, it’s like building a Saturn V rocket [but with every human on it]. It’s a complex, difficult engineering task, and you’re going to try and make it aligned, which means it’s going to deliver people to the moon and home again.
People ask “why assume they won’t just land on the Moon and return home safely?”
And I’m like, because you don’t know what you’re doing!
If you try to send people to the moon and you don’t know what you’re doing, your astronauts will die.
[Unlike the telephone, or electricity, where you can assume it’s probably going to work out okay] I contend that ASI is more like the moon rocket.
“The moon is small compared with the rest of the sky, so you don’t get to the moon by default – you hit some part of the sky that isn’t the moon. So, show me the plan by which you predict to specifically hit the moon.”
While I don't want to defend the public's reasoning on AI alignment, because often they have very confused beliefs on what AI alignment even is, I do think that something like this claim is very likely false, and that the analogy is very, very poor actually.
The basic reason for this is that AI values, as well as AIs themselves are way, way more robust to things going wrong than the rocket analogy often gives us:
https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky#Yudkowsky_mentions_the_security_mindset__
https://www.lesswrong.com/posts/JcLhYQQADzTsAEaXd/?commentId=7iBb7aF4ctfjLH6AC
I think at a fundamental level, I think this is one of my biggest divides with people like @Eliezer Yudkowsky and @Rob Bensinger et al, in that I think alignment is an easier target to hit than making a working rocket that goes to the moon.
Introduction: Better than a Podcast
Andrej Karpathy continues to be a big fan of NotebookLM, especially its podcast creation feature. There is something deeply alien to me about this proposed way of consuming information, but I probably shouldn’t knock it (too much) until I try it?
Others are fans as well.
Delip Rao gives the engine two words repeated over and over, the AI podcast hosts describe what it says about the meaning of art for ten minutes.
So I figured: What could be a better test than generating a podcast out of this post (without the question, results or reaction)?
I tried to do that, deselecting the other AI posts and going to town. This was the result. Unfortunately, after listening it seems I learned that deselecting posts from a notebook doesn’t seem to take them out of the info used for podcast generation, so that was more of an overall take on AIs ~40-64 plus the latest one.
In some ways I was impressed. The host voices and cadences are great, there were no mistakes, absurdities or factual errors, everything was smooth. In terms of being an actual substitute? Yeah, no. It did give me a good idea of which ideas are coming across ‘too well’ and taking up too much mindspace, especially things like ‘sci-fi.’ I did like that it led with OpenAI issues, and it did a halfway decent job with the parts it did discuss. But this was not information dense at all, and no way to get informed.
I then tried again with a fresh notebook, to ensure I was giving it only AI #84, which then started off with OpenAI’s voice mode as one would expect. This was better, because it got to a bunch of specifics, which kept it on target. If you do use the podcast feature I’d feed it relatively small input chunks. This still seemed like six minutes, not to do Hamlet, but to convey maybe two minutes of quick summary reading. Also it did make an important error that highlighted a place I needed to make my wording clearer – saying OpenAI became a B-corp rather than that it’s going to try and become one.
There were indeed, as usual, many things it was trying to summarize. OpenAI had its dev day products. Several good long posts or threads laid out arguments, so I’ve reproduced them here in full partly for reference. There was a detailed proposal called A Narrow Path. And there’s the usual assortment of other stuff, as well.
Table of Contents
Language Models Offer Mundane Utility
OpenAI’s Advanced Voice Mode is really enjoying itself.
It enjoys itself more here: Ebonics mode activated.
Sarah Constantin requests AI applications she’d like to see. Some very cool ideas in here, including various forms of automatic online content filtering and labeling. I’m very tempted to do versions of some of these myself when I can find the time, especially the idea of automatic classification of feeds into worthwhile versus not. As always, the key is that if you are going to use it on content you would otherwise need to monitor fully, hitting false negatives is very bad. But if you could aim it at sources you would otherwise be okay missing, then you can take a hits-based approach.
Language Models Don’t Offer Mundane Utility
Llama 3.2 ‘not available for download’ in the EU, unclear exactly which regulatory concern or necessary approval is the bottleneck. This could be an issue for law-abiding corporations looking to use Llama 3.2. But of course, if an individual wants to download and use it in the EU, and is competent enough that this is a good idea, I am confident they can figure out how to do that.
The current status of AI agents:
Buck was, of course, very much asking for it, and essentially chose to let this happen. One should still note that this type of proactive messing with things in order to get to the goal is the default behavior of such agents. Currently they’re quite bad at it.
As usual, you can’t get the utility from the AI unless you are willing to listen to it, but also we need to be careful how we score.
When measuring something called ‘diagnostic reasoning’ when given cases to diagnose GPT-4 alone (92%) did much better than doctors (73%) and also did much better than doctors plus GPT-4 (77%). So by that measure, the doctors would be much better fully out of the loop and delegating the task to GPT-4.
Ultimately, though, diagnosis is not a logic test, or a ‘match the logic we think you should use’ test. What we mostly care about is accuracy. GPT-4 had the correct diagnosis in 66% of cases, versus 62% for doctors.
My strong guess is that doctors learn various techniques that are ‘theoretically unsound’ in terms of their logic, or that take into account things that are ‘not supposed to matter’ but that do correlate with the right answer. And they’ve learned what approaches and diagnoses lead to good outcomes, rather than aiming for pure accuracy, because this is part of a greater system. That all mostly works in practice, while they get penalized heavily for it on ‘reasoning’ tests.
Indeed, this suggests that one future weakness of AIs will be if we succeed in restricting what things they can consider, actually enforcing a wide array of ‘you are not allowed to consider factor X’ rules that humans routinely pay lip service to and then ignore.
It is already well known that if the AI is good enough, the humans will in many settings mess up and violate the Fundamental Theorem of Informatics. It’s happened before. At some point, even when you think you know better, you’re on average wrong, and doctors are not about to fully trust an AI on diagnosis until you prove to them they should (and often not even then, but they should indeed demand that much).
Copyright Confrontation
Mark Zuckerberg was asked to clarify his position around content creators whose work is used to create and train commercial products, in case his prior work had made his feelings insufficiently clear.
He was happy to oblige, and wants to be clear that his message is: Fuck you.
So you’re going to give them a practical way to exercise that option, and if they say no and you don’t want to bother paying them or they ask for too much money then you won’t use their content?
Somehow I doubt that is his intention.
Deepfaketown and Botpocalypse Soon
Levelsio predicts the social media platform endgame of bifurcation. You have free places where AIs are ubiquitous, and you have paid platforms with only humans.
I agree that some form of gatekeeping seems inevitable. We have several reasonable choices.
The most obvious is indeed payment. If you charge even a small amount, such as $10/month or perhaps far less, then one already ‘cannot simply’ deploy armies of AI slop. The tax is unfortunate, but highly affordable.
Various forms of proof of identity also work. You don’t need Worldcoin. Anything that is backed by a payment of money or a scarce identity will be fine. For example, if you require a working phone number and subscription with a major phone carrier, that seems like it would work, since faking that costs money? There are several other good alternatives.
Indeed, the only core concept is ‘you post a bond of some kind so if you misbehave there is a price.’ Any payment, either money or use of a scarce resource, will do.
I can also think of other solutions, involving using AI and other algorithms, that should reliably solve the issues involved. This all seems highly survivable, once we bring ourselves to care sufficiently. Right now, the problem isn’t so bad, but also we don’t care so much.
What about scam calls?
I for one downplay them because if the problem does get bad we can ‘fix them in post.’ We can wait for there to be a some scam calls, then adjust to mitigate the damage, then adjust again and so on. Homeostasis should set in, the right number of scam calls is not zero.
There are obvious solutions to scam calls and deepfakes. We mostly don’t use them now because they are annoying and time consuming relative to not doing them, so they’re not worth using yet except in certain high-value situations. In those situations, we do use (often improvised and lousy) versions already.
They Took Our Jobs
The latest version of a common speculation on the software engineer market, which is super soft right now, taking things up a notch.
There is clearly an AI-fueled flooding-the-zone and faking-the-interviews application crisis. Giving up on hiring entirely seems like an extreme reaction. You can decline to fill entry-level rolls for a bit but the damage should quickly compound.
The problem should be self-limiting. If the job market gets super soft, that means there will be lots of good real candidates out there. Those candidates, knowing they are good, should be willing to send costly signals. This can mean ‘build cool things,’ it should also mean hard to fake things like the 4.0 GPA, and also being willing to travel in-person for interviews so they can’t cheat on them using AI. Recruiters with a reputation to uphold also seem promising. There are a number of other promising candidate strategies as well.
Tyler Cowen suggests granting tenure on the basis of what you contribute to major AI models. The suggested implementation is somehow even crazier than that sounds, if one were to take it the slightest bit seriously. A fun question is, if this is the right way to grant tenure, what is the tenure for, since clearly we won’t in this scenario need professors that much longer, even if the humans survive and are fine?
How long until we no longer need schools?
There are two clocks ticking here.
I think that, today, an average 16 year old would learn better at home with an AI tutor than at a typical school, even if that ‘AI tutor’ was simply access to AIs like Gemini, NotebookLM, Claude and ChatGPT plus an AI coding assistant. Specialization is even better, but not required. You combine the AI with textbooks and other sources, and testing, with ability to reach a teacher or parent in a punch, and you’re good to go.
Of course, the same is true for well-motivated teens without the AI. The school was already only holding them back and now AI supercharges their independent studies.
Six years from now, I don’t see how that is even a question. Kids likely still will go to schools, but it will be a wasteful anachronism, the same way many of our current methods are, as someone once put it, ‘pre-Guttenberg.’ We will justify it with some nonsense, likely about socialization or learning discipline. It will be super dumb.
The question is, will a typical six year old, six years from now, be at a point where they can connect with the AI well enough for that to work? My presumption, given how well voice modes and multimodal with cameras are advancing, is absolutely yes, but there is some chance that kids that young will be better off in some hybrid system for a bit longer. If the kid is 10 at that point? I can’t see how the school makes any sense.
But then, the justifications for our schools have always been rather nonsensical.
The Art of the Jailbreak
A new jailbreak technique is MathPrompt, encoding harmful prompts into mathematical problems. They report a success rate of 73.6% across 13 SotA LLMs.
Get Involved
UK AISI is hiring three societal impacts workstream leads.
Introducing
AlphaChip, Google DeepMind’s AI for designing better chips with which to build smarter AIs, that they have decided for some bizarre reason should be open sourced. That would not have been my move. File under ‘it’s happening.’
ChatGPT advanced voice mode comes to the UK. The EU is still waiting.
OpenAI Dev Day
OpenAI used Dev Day to ship new tools for developers. In advance, Altman boasted about some of the progress we’ve made over the years in decreasing order of precision.
What’s a little drama between no longer friends?
All right, let’s get more detailed.
They also doubled the API rate limits on o1 to 10k per minute, matching GPT-4.
Here’s a livestream thread of Lizzie being excited. Here’s Simon Willson’s live blog.
Here’s their general purpose pricing page, as a reference.
Prompt Caching is automatic now for prompts above 1,024 tokens, offering a 50% discount for anything reused. They’re cleared after about 5-10 minutes of inactivity. This contrasts with Claude, where you have to tell it to cache but the discount you get is 90%.
Model Distillation is to help developers use o1-preview and GPT-4o outputs to help fine tune models like GPT-4o mini. It uses stored completions to build data sets, a beta of evals that run continuously while you train, and integration with fine tuning. You give it an evaluation function and a set of stored examples, they handle the rest. After the free samples in October it will cost what fine-tuning already costs. It makes a lot of sense to emphasize this, very good for business.
Vision is now available in the fine-tuning API. They claim as few as 100 images can improve performance on specific tasks, like localized street sign recognition or identifying local UI elements.
What does it mean to have a ‘realtime API’? It means exactly that, you can use an API to sculpt queries by the user while they’re talking in voice mode. The intent is to let you build something like ChatGPT’s Advanced Voice Mode within your own app, and not requiring stringing together different tools for handling inputs and outputs.
They provided a demo of an AI agent making a phone call on your behalf, and in theory (the other end of the call was the person on stage) spending almost $1500 to buy chocolate covered strawberries. This was very much easy mode on every level. We should on many levels be impressed it can do this at all, but we’ve seen enough elsewhere that this much is no surprise. Also note that even in the demo there was an important hitch. The AI was not told how to pay, and jumped to saying it would pay for the full order in cash without confirming that. So there’s definitely some kinks.
The first thing I saw someone else build was called Live Roleplays, an offering from Speak to help with language learning, which OpenAI demoed on stage. This has always been what I’ve seen as the most obvious voice mode use case. There’s a 15 second sample video included at the link and on their blog post.
I’m definitely excited for the ‘good version’ of what Speak is building, whether or not Speak is indeed building a good version, or whether OpenAI’s new offerings are a key step towards that good version.
We do need to lower the price a bit, right now this is prohibitive for most uses. But if there’s one thing AI is great at, it’s lowering the price. I have to presume that they’re not going to charge 10x-20x the cost of the text version for that long. Right now GPT-4o-realtime-preview is $5/$20 for a million text tokens, $100/$200 for a million audio tokens.
If you can take care of that, Sully is excited, as would be many others.
McKay Wrigley is always fun for the ‘this will change everything’ and here you go:
The presentation also spent a bunch of time emphasizing progress on structured outputs and explaining how to use them properly, so you get useful JSONs.
These quotes are from the chat at the end between Altman and Kevin Weil. via Simon Willison’s live blog, we also have notes from Greg Kamradt but those don’t always differentiate who said what:
I don’t think enterprises should be able to get 60 days notice, but it would indeed be nice if OpenAI itself got 60 days notice, for various safety-related reasons?
Is that what the Preparedness Framework says to do? This makes the dangerous assumption that you can establish the capabilities, and then fix the safety issues later in post.
So, gulp?
We’ve gone so far backward that Sam Altman needs to reassure us that they at least have some people ‘thinking about’ the ways this all goes wrong, while calling them ‘sci-fi ways’ in order to delegitimize them. Remember when this was 20% of overall compute? Now it’s ‘we have people thinking about that.’
Also this:
Well, yes, I suppose it is, given that we don’t have anything else and OpenAI has no intention of trying hard to build anything else. So, iterative deployment, then, and the hope that when things go wrong we are always still in charge and around sufficiently to fix it for next time.
What are they going to do with the AGIs?
This must be some use of the word ‘safe’ that I wasn’t previously aware of? Or it’s expressing a hope of some kind, perhaps?
I really, really do not think they have thought through the implications properly, here.
I notice I am skeptical, because of how I think about the term ‘smarter.’ I think we can make it, maybe the word is ‘cleverer’? Have it use its smarts better. But already the key limitation is that it is not actually smarter, in my way of thinking, than GPT-4, instead it’s finding ways to maximize the use of what smarts it does have.
Yes, after using it for a bit I will say I am ‘definitively smarter’ than o1. Perhaps I am prompting it badly but I have overall been disappointed in o1.
Is this a way of saying they don’t know how to do better than Gemini there?
Singing is disabled for now, it seems, due to copyright issues.
In Other AI News
Dirk Kingma, part of the original founding team of OpenAI and more recently of Google DeepMind, joins Anthropic.
OpenAI is moving forward to raise $6.6 billion at a $157 billion valuation. That seems like a strangely small amount of money to be raising, both given their needs and given that valuation. Soon they will need far more.
OpenAI has asked investors to avoid backing rival start-ups such as Anthropic and xAI.
Quoted largely because I’m sad Musk couldn’t find ‘OpenlyEvilAI.’ This is all standard business practice. OpenAI has stopped pretending it is above all that.
A list of the top 20 companies by ‘generative AI patents’ in 2023 shows exactly why this is not a meaningful metric.
OpenAI and Anthropic revenue breakdown. Huge if true, they think Anthropic is highly competitive on the API side, but alas no one uses Claude.
They have OpenAI growing 285% year over year for subscriptions, 200% for API, whereas Anthropic is catching up with a 900% increase since last year. Whether that is sustained for the API will presumably depend on who has the better products going forward. For the consumer product, Claude is failing to break through to visibility, and it seems unrelated to product quality.
The best part of chatbot subscriptions is the profit margins are nuts. Most people, myself included, are paying miles more per token for subscriptions than we would pay for the API.
OpenAI got there early to capture the public imagination, and they’ve invested in voice mode and in responses people like and done a good job of that, and gotten good press for it all, such that ChatGPT is halfway to being the ‘Google,’ ‘xerox’ or ‘Kleenex’ of generative AI. I wonder how much of that is a lasting moat, versus being a choice of focus.
Long term, I’d think this is bullish for Anthropic. That’s huge year over year growth, and they’re fully competitive on the API, despite being supposedly valued at only something like 20% of OpenAI even taking into account all of OpenAI’s shall we say ‘issues.’ That seems too low.
BioNTech and Google DeepMind build biological research assistant AIs, primarily focused on predicting experimental outcomes, presumably to choose the right experiments. For now that’s obviously great, the risk concerns are obvious too.
The Mask Comes Off
Sigal Samuel writes at Vox that ‘OpenAI as we knew it is dead’ pointing out that this consolidation of absolute power in Altman’s hands and abandonment of the non-profit mission involves stealing billions in value from a 501c(3) and handing it to investors and especially to Microsoft.
OpenAI planning to convert to a for-profit B-corporation is a transparent betrayal of the mission of the OpenAI non-profit. It is a clear theft of resources, a clear break of the fiduciary duties of the new OpenAI board.
If we lived in a nation of laws and contracts, we would do something about this. Alas, we mostly don’t live in such a world, and every expectation is that OpenAI will ‘get away with it.’
So this is presumably correct legal realist take on OpenAI becoming a B-corporation:
There are rules. Rules that apply to ‘the little people.’
Apple has withdrawn from the new funding round, and WSJ dropped this tidbit:
This means OpenAI is potentially in very deep trouble if they don’t execute the switch to a B-corporation. They’re throwing their cap over the wall. If they fail, venture investment becomes venture debt with a conversion option. If the companies request their money back, which conditional on this failure to secure the right to OpenAI’s profits seems not so unlikely, that could then be the end.
So can they pull off the heist, or will they get a Not So Fast?
To those asking why everyone doesn’t go down this path, the path isn’t easy.
One problem will be antitrust attention, since Microsoft had been relying on OpenAI’s unique structure to fend off such complaints.
I think the antitrust concerns are bogus and stupid, but many people seem to care.
The bigger question is, what happens to OpenAI’s assets?
That makes sense and matches my understanding. You can take things away from the 501c3 world, but you have to pay fair market price for them. In this circumstance, the fair value of what is being taken away seems like quite a lot?
Gretchen Krueger, current AI policy researcher who was formerly at OpenAI, notes that her decision to join was partly due to the non-profit governance and profit cap, whereas if they’re removed now it is at least as bad as never having had them.
Carrol Wainwright, formerly of OpenAI, points out that Altman has proven himself a danger to OpenAI’s nonprofit mission that has now been entirely abandoned, that you cannot trust him or OpenAI, and that the actions of the past year were collectively a successful coup by Altman against those in his way, rather than the other way around.
Greg Brockman offers his appreciation to the departing Barret, Bob and Mira.
Wojciech Zaremba, OpenAI cofounder still standing, offers his appreciation, and his sadness about the departures.
Everyone likes money. I like money. But does Sam Altman like money, on a different level I like money?
Joe Rogan argues that yes. The guy very much likes money.
This is certainly a fun argument. Is it a valid one? Or does it only say that he (1) already has a lot of money and (2) likes nice things like a $4 million car?
I think it’s Bayesian evidence that the person likes money, but not the kind of super strong evidence Joe Rogan thinks this is. If you have a thousand times as much money as I do, and this brings you joy, why wouldn’t you go for it? He can certainly afford it. And I do want someone like Altman appreciating nice things, and not to feel guilty about buying those things and enjoying himself.
It is however a different cultural attitude than the one I’d prefer to be in charge of a company like OpenAI. I notice I would never want such a car.
When I asked Claude what it indicates about someone driving such a model around town (without saying who the person in question was), it included that this was evidence of (among other things) status consciousness, attention-seeking and high risk tolerance, which all seems right and concerning. It also speaks to the image he chooses to project on these questions. Intentionally projecting that image is not easily compatible with having the attitude Altman will need in his position leading OpenAI.
Gwern, who predicted Mira’s departure, offered further thoughts a few months ago on the proposition that OpenAI has been a dead or rotting organization walking for a while now, and is rapidly losing its lead. One has to take into account the new o1 model in such assessments, but the part of this that resonates most is that the situation seems likely to be binary. Either OpenAI is ‘still OpenAI’ and can use its superior position to maintain its lead and continue to attract talent and everything else it takes. Or, if OpenAI is no longer so special in the positive ways, it gets weighed down by all of its unique problems, and continues to bleed its talent.
Deepa Seetharaman writes at the WSJ that Turning OpenAI Into a Real Business Is Tearing It Apart.
The majority of OpenAI employees have been hired since the Battle of the Board, and that would be true even if no one had left. That’s an extreme level of growth. It is very difficult to retain a good culture doing that. One likely shift is from a research culture to a product-first culture.
Noam Brown disagrees, and promises us that OpenAI still prioritizes research, in the wake of losing several senior researchers. I am sure there has been a substantial shift towards product focus, of course that does not preclude an increase in resources being poured into capabilities research. We do however know that OpenAI has starved their safety research efforts of resources and other support.
So far nothing that new, but I don’t think we’ve heard about this before:
This report does not mean that Murati and Brockman actually worried that the company would collapse, there are multiple ways to interpret this, but it does provide valuable color in multiple ways, including that OpenAI made and then rescinded an offer for Sutskever to return.
It’s also hard not to be concerned about the concrete details of the safety protocols around the release of GPT-4o:
I believe that releasing GPT-4o does not, in practice and on reflection, exceed what my threshold would be for persuasion, or otherwise have any capabilities that would cause me not to release it. And I do think it was highly reasonable to not give 4o the ‘full frontier model’ treatment given it wasn’t pushing the frontier much.
It still is rather damning that, in order to score a marketing win, it was rushed out the door, after 20 hour days from the safety team, without giving the safety team the time they needed to follow their own protocols. Or that the model turned out to violate their own protocols.
I’ve seen at least one mocking response of ‘so where’s all the massive harm from 4o then?’ and that is not the point. The point is that the safety process failed and was overridden by management under the flimsy pressure of ‘we want to announce something before Google announces something.’ Why should we think that process will be followed later, when it matters?
Nor was this a one time incident. It was a pattern.
The article also heavily suggests that Brockman’s leave of absence, rather than being motivated by family concerns, was because his management style was pissing off too many employees.
The New York Times covers Altman’s grand compute expansion dreams and his attempts to make them reality. His ambitions have ‘scaled down’ to the hundreds of billions of dollars.
The article is full of people laughing at the sheer audacity and scale of Altman’s asks. But if no one is laughing at your requests, in an enterprise like this, you aren’t asking for enough. What is clear is that Altman does not seem to care which companies and nations he partners with, or what the safety or security implications would be. All he wants is to get the job done.
In Nate Silver’s book, there is a footnote that Altman told Silver that self-improving AI is ‘really scary’ and that OpenAI isn’t pursuing it. This is a highly bizarre way to make a statement that contradicts OpenAI’s clearly stated policies, which include using o1 (aka Strawberry) to do AI research, and the direct pursuit of AGI, and the entire goal of the former superalignment team (RIP) being an AI alignment researcher. So this quote shows how much Altman is willing to mislead.
The new head of mission alignment at OpenAI is Joshua Achiam. He’s said some useful and interesting things on Twitter at times, but also some deeply troubling things, such as:
[EDIT: My attempt to contact Joshua for comment and any thoughts or updates was stupidly executed, and he missed it. That’s on me. I’ve now reached him, and Joshua will respond with his thoughts and updates, likely in AI #85.]
In other words, the new head of mission alignment at OpenAI is on record lecturing EAs that misalignment risk from AGI is not real.
I do get the overall sense that Joshua is attempting to be helpful, but if the head of AI mission alignment at OpenAI does not believe in existential risk from AI misalignment, at all? If he thinks that all of our efforts should be in fighting human misuse?
Then that is perhaps the worst possible sign, if he indeed still holds onto such views. Effectively, OpenAI would be saying that they have no superalignment team, that they are not making any attempt to avoid AI killing everyone, and they intend to proceed without it.
The question then becomes: What do we intend to do about this?
Quiet Speculations
Perplexity for shopping?
I still often use both Google and Wikipedia, and was never using Bing in the first place, so let’s not get ahead of ourselves.
In some form or another, yes, of course the future of shopping looks like some version of ‘tell the AI what you want and it locates the item or candidate items for you, and checks for the lowest available price and whether the deal is reasonable, and then you can one-click to purchase it without having to deal with the particular website.’
The question is, how good does this have to be before it is good enough to use? Before it is good enough to use as a default? Use without sanity checking, even for substantial purchases? When will it get that reliable and good? When that happens, who will be providing it to us?
Dreaming Tulpa reports they’ve created smart glasses that automatically snap photos of people you see, identifies them, searches online and tells you tons of stuff about them, like phone number and home address, via streaming the camera video to Instagram.
So on the one hand all of this is incredibly useful, especially if it caches everything for future reference. I hate having to try and remember people’s names and faces, and having to be sure to exchange contact info and ask for basic information. Imagine if you didn’t have to worry about that, and your glasses could tell you ‘oh, right, it’s that guy, with the face’ and even give you key info about them. Parties would be so much more fun, you’d know what to talk to people about, you could stop missing connections, and so on. Love it.
Alas, there are then the privacy concerns. If you make all of this too smooth and too easy, it opens up some malicious and anti-social use cases as well. And those are exactly the types of cases that get the authorities involved to tell you no, despite all of this technically being public information. Most of all it wigs people out.
The good news, I think, is that there is not that much overlap in the Venn diagram between ‘things you would want to know about people’ and ‘things you would want to ensure other people do not know.’ It seems highly practical to design a product that is a win-win, that runs checks and doesn’t share certain specific things like your exact address or your social security number?
Mostly, though, the problem here is not even AI. The problem is that people are leaving their personal info exposed on the web. All the glasses are doing is removing the ‘extra steps.’
The Quest for Sane Regulations
Now that SB 1047 has been vetoed, but Newsom has said he wants us to try again with something ‘more comprehensive,’ what should it be? As I explained on Tuesday (recommended if you haven’t read it already), Newsom’s suggested approach of use-based regulation is a recipe for industry strangulation without helping with risks that matter, an EU-style disaster. But now is the time to blue sky, and think big, in case we can come up with something better, especially something that might answer Newsom’s objections while also, ya know, possibly working and not wrecking things.
Garrison Lovely makes it into the New York Times to discuss scandals at OpenAI and argue this supports the need for enhanced whistleblower protections. The case for such protections seems overwhelming to me, even if you don’t believe in existential risk mitigation at all.
The Week in Audio
Jack Clark offers the rogue state theory of AIs.
From August: Peter Thiel talked to Joe Rogan about a wide variety of things, and I had the chance to listen to a lot more of it.
His central early AI take here is bizarre. He thinks passing the Turing Test is big, with his justification largely being due to how important we previously thought it was, which seems neither here nor there. We agree that current Turing-level AIs are roughly ‘internet big’ (~8.0 on the Technological Richter Scale) in impact if things don’t advance from here, over the course of several decades. The weird part is where he then makes this more important than superintelligence, or thinks this proves superintelligence was an incorrect hypothesis.
I don’t understand the logic. Yes, the path to getting there is not what we expected, and it is possible things stop soon, but the progress so far doesn’t make superintelligence either less likely to happen. And if superintelligence does happen, it will undoubtedly be the new and probably last ‘most important event in the history of history,’ no matter whether that event proves good or bad for humans or our values, and regardless of how important AI had already been.
Peter then takes us on a wild ride through many other topics and unique opinions. He’s always fun and interesting to listen to, even (and perhaps especially) the parts where he seems utterly wrong. You’ve got everything from how and why they built the Pyramids to chimp political dynamics to his suspicions about climate science to extended takes on Jeffrey Epstein. It’s refreshing to hear fresh and unique wrong takes, as opposed to standard dumb wrong takes and especially Not Even Wrong takes.
That’s all in addition to the section on racing with China on AI, which I covered earlier.
Rhetorical Innovation
Democratic control is nice but have you experienced not dying?
Or: If it turns out Petrov defied ‘the will of the Soviet people’? I’m cool with that.
Robin Hanson’s point is valid and one can go far further than that, in the sense that we all ‘change the world’ every time we do anything or fail to do something, there will often be losers from our decisions, and obviously we should still be free to do most things without permission from another. Innovation is not special.
One must however be careful not to prove too much. Innovation does not mean you automatically need permission. It also does not mean you have or should have a free pass to change the world however you like. Robin and I would both draw the line to give permission to more things than America’s status quo, and indeed I expect to oppose many AI regulations upon mundane AI, starting with much of the EU AI Act (sorry I haven’t finished my summary there, the reason it’s not done is it hurts my eyes to look and I keep not forcing myself to finish it). I would still make an exception for things that would plausibly kill everyone, or otherwise plausibly have massive net negative externalities.
I’m also strongly with Yudkowsky here, not Leahy. My problem with everyone dying undemocratically is mostly the dying part, not the undemocratic one. I’d feel better I suppose in some karmic justice sense that the people ‘deserved it’ if they offered actually informed consent, but none of us would end up less dead.
Indeed, our founders knew this principle well. The Constitution is in large part designed to protect us from the majority doing various highly dumb things.
Your periodic reminder: Better start believing in science fiction stories, dear reader, you’re in one – regardless of how much additional AI progress we see.
Never mind o1. All the fictional characters in most of the science fiction I’ve read or seen over the years would be blown away by at least one of GPT-4 or what you can do with a smartphone without AI, often by both. You have all the world’s knowledge at your fingertips right now. I could go on. In general, anyone who calls something ‘science fiction’ should be considered to have invoked a variation of Godwin’s Law.
Here’s some fire from Robert Miles, via AI Notkilleveryoneism Memes. Video is at the link, and yes this whole thing is unbelievably exhausting.
It absolutely boggles my mind, every time, no matter how many times I hear it. People really will say, with a straight face that building AIs smarter and more capable than us is a default-safe activity, and letting everyone use them for whatever they want will go fine and turn out well for the humans unless I can show exactly how that goes wrong.
And each time, it’s like, seriously WTF everyone, sure I have a thousand detailed arguments for things likely to go wrong but why do I need to even bring them up?
In related metaphor news, here is a thread by Eliezer Yudkowsky, use your judgment on whether you need to read it, the rest of you can skip after the opening.
OK, this is the point where a lot of you can skip ahead, but I’ll copy the rest anyway to preserve it for easy reference and copying, cause it’s good.
My understanding of capabilities training is that there are a lot of knobs and fiddly bits and characteristics of your data and if you screw them up then the thing doesn’t work right, but you can tinker with them until you get them right and fix the issues, and if you have the experience and intuition you can do a huge ‘YOLO run’ where you guess at all of them and have a decent chance of that part working out.
The contrast is with the alignment part, with regard to the level you need for things smarter or more capable than people (exact thresholds unclear, hard to predict and debatable) which I believe is most definitely cursed, and where one must hit a narrow target. For mundane (or ‘prosaic’) alignment, the kludges we use now are mostly fine, but if you tried to ‘fly to the moon’ with them you’re very out of your test distribution, you were only kind of approximating even within the test, and I can assure you that you are not landing on that moon.
Roon offers wise words (in an unrelated thread), I fully endorse:
An example of what this failure mode looks like, in a response to Roon:
Remember Who Marc Andreessen Is
There is a school of thought that anything opposed to them is 1984-level totalitarian.
Marc Andreessen, and to a lesser extent Paul Graham, provide us with a fully clean examples this week of how his rhetorical world works and what it means when they say words. So I wanted to note them for future reference, so I don’t have to keep doing it over and over again going forward, at least with Marc in particular.
Did someone point out that ‘you first’ is not a valid argument or gotcha against requiring a personal sacrifice that some claim would do good? While also opposing (the obligatory and also deeply true ‘degrowth sucks’) the sacrifice by pointing out it is stupid and would not, in fact, do good?
Well, that must mean the people pointing that out are totalitarians, favoring no personal choice, just top down control, of everything, forever.
So the next time people of that ilk call someone a totalitarian, or say that someone opposes personal choice, or otherwise haul out their slogans, remember what they mean when they say this.
The charitable interpretation is that they define ‘totalitarian’ as one who does not, in principle and in every case, oppose the idea of requiring people to do things not in that person’s self-interest.
Here is another example from this week of the same Enemies List attitude, and attributing to them absurdist things that have nothing to do with their actual arguments or positions, except to lash out at anyone with a different position or who cares about the quality of arguments:
I don’t think he actually believes the things he is saying. I don’t know if that’s worse.
As a bonus, here’s who Martin Casado is and what he cares about.
A Narrow Path
A Narrow Path is a newly written plan for allowing humanity to survive the path to superintelligence. Like the plan or hate the plan, this at least is indeed a plan, that tries to lay out a path that might work.
The core thesis is, if we build superintelligence before we are ready then we die. So make sure no one builds it until then.
I do agree that this much is clear: Until such time as we figure out how to handle superintelligence in multiple senses, building superintelligence would probably be collective suicide. We are a very long way from figuring out how to handle it.
If we build things that are plausibly AGIs, that directly creates a lot of mundane issues we can deal with and not-so-mundane intermediate issues that would be difficult to deal with. If that’s all it did, which is how many think about it, then you gotta do it.
The problem is: What we definitely cannot deal with is that once we build AGI, the world would rapidly build ASI, one way or another.
That’s what they realize we need to avoid doing for a while. You backchain from there.
Here is their plan. At whatever level of detail you prefer to focus on: Do you think it is sufficient? Do you think it is necessary? Can you think of a superior alternative that would do the job?
This is a tough ask even if you want it. It’s a gray area – are Claude and o1 AIs that can improve AIs? Not in the automated super scary way, but a software engineer with current AI is a lot more productive than one without it. What do you do when humans use AI code assistant tools? When they copy-paste AI code outputs? At what point does more of that change to something different? Can you actually stop it? How?
Similarly, they say ‘no AIs capable of breaking out of their environment’ but for a sufficiently unprotected environment, current AIs already are on the verge of being able to do this. And many will ‘set them free’ on purpose anyway.
Similarly, when interacting with the world and being given tools, what AI can we be confident will stay ‘bounded’? They suggest this can happen with safety justifications. It’s going to be tough.
Finally there is a limit to the ‘general intelligence’ of systems, which again you would need to somehow define, measure and enforce.
This is a long dense document (~80 pages). Even if we did get society and government to buy-in, there are tons of practical obstacles ahead on many levels. We’re talking about some very difficult to pin down, define or enforce provisions. This is very far from a model bill. Everything here would need a lot of iteration and vetting and debate, and there are various details I suspect are laid out poorly. And then you’d need to deal with the game theory and international aspects of the issue.
But it is a great exercise – instead of asking ‘what can we in practice hope to get done right now?’ they instead ask a different question ‘where do we need to go’ and then ‘what would it take to do something that would actually get there?’
You can of course disagree with how they answer those questions. But they are the right question to ask. Then, if the answer comes back ‘anything that might work to get us a place we can afford to go is going to be highly not fun,’ as it well might, how highly not fun? Do you care more about it not working or things being not fun? Is there an alternative path or destination to consider?
No doubt many who read (realistically: glance at or lightly skim or feed into an LLM) this proposal will respond along the lines of ‘look at these horrible people who want to restrict X or require a license for Y’ or ‘they want a global government’ or ‘war on math’ or what not. And then treat that as that.
It would be good to resist that response.
Instead, treat this not as a call to implement anything. Rather treat this as a claim that if we did XYZ, then that is plausibly sufficient, and that no one has a less onerous plan that is plausibly sufficient. And until we can find a better approach, we should ask what might cut us off from being able to implement that plan, versus what would enable us to choose if necessary to walk that path, if events show it is needed.
One should respond on that level. Debate the logic of the path.
Either argue it is insufficient, or it is unnecessary, or that it flat out won’t work or is not well defined, or suggest improvements, point out differing assumptions and cruxes, including doing that conditional on various possible world features.
This can include ‘we won’t get ASI anyway’ or ‘here are less painful measures that are plausibly sufficient’ or ‘here’s why there was never a problem in the first place, creating things smarter than ourselves is going to go great by default.’ And they include many good objections about detail choices, implementations, definitions, and so on, which I haven’t dug into in depth. There are a lot of assumptions and choices here that one can and should question, and requirements that can be emphasized.
Ultimately, if your actual point of view is something like ‘I believe that building ASI would almost certainly go fine [because of reasons]’ then you can say that and stop there. Or you can say ‘I believe building ASI now is fine, but let’s presume that we’ve decided for whatever reason that this is wrong’ and then argue about what alternative paths might prevent ASI from being built soon.
The key is you must pick one of these:
Thus, be clear which of these you are endorsing. If it’s #1, fine. If it’s #2, fine.
If you think it’s too early to know if #1 or #2 is true, then you want to keep your options open.
If you know you won’t be able to bite either of those first two bullets? Then it’s time to figure out the path to victory, and talk methods and price. And we should do what is necessary now to gather more information, and ensure we have the option to walk down such paths.
That is very different from saying ‘we should write this agenda into law right now.’ Everyone involved understands that this would be overdeterminedly premature.
Aligning a Smarter Than Human Intelligence is Difficult
A fun feature of interpretability is that the model needs to be smart enough that you can understand it, but not so smart that you stop understanding it again.
Or, similarly, that the model needs to be smart enough to be able to get a useful answer out of a human-style Chain of Thought, without being smart enough to no longer get a useful answer out of a human-style Chain of Thought. And definitely without it being smart enough that it’s better off figuring out the answer and then backfilling in a Chain of Thought to satisfy the humans giving the feedback, a classic alignment failure mode.
I think Davidad is correct here.
The Wit and Wisdom of Sam Altman
Always remember to reverse any advice you hear, including the advice to reverse any advice you hear:
My experience is that almost no one gets this correct. People usually do one of:
The good news is, even your system-1 instinctive guess on whether you are doing too little or too much thinking versus doing is almost certainly correct.
And yes, I do hope everyone sees the irony of Altman telling everyone to take more time to think harder about whether they’re working on the right project.
The Lighter Side
The perfect subtweet doesn’t exist.
(See the section The Mask Comes Off and this in particular if you don’t have the proper context.)