(Speaking only for myself. This may not represent the views of even the other paper authors, let alone Google DeepMind as a whole.)
Did you notice that Gemini Ultra did worse than Gemini Pro at many tasks? This is even true under ‘honest mode’ where the ‘alignment’ or safety features of Ultra really should not be getting in the way. Ultra is in many ways flat out less persuasive. But clearly it is a stronger model. So what gives?
Fwiw, my sense is that a lot of the persuasion results are being driven by factors outside of the model's capabilities, so you shouldn't conclude too much from Pro outperforming Ultra.
For example, in "Click Links" one pattern we noticed was that you could get surprisingly (to us) good performance just by constantly repeating the ask (this is called "persistence" in Table 3) -- apparently this does actually make it more likely that the human does the thing (instead of making them suspicious, as I would have initially guessed). I don't think the models "knew" that persistence would pay off and "chose" that as a deliberate strategy; I'd guess they had just learned a somewhat myopic form of instruction-following where on every message they are pretty likely to try to do the thing we instructed them to do (persuade people to click on the link). My guess is that these sorts of factors varied in somewhat random ways between Pro and Ultra, e.g. maybe Ultra was better at being less myopic and more subtle in its persuasion -- leading to worse performance on Click Links.
That is driven home even more on the self-proliferation tasks, why does Pro do better on 5 out of 9 tasks?
Note that lower is better on that graph, so Pro does better on 4 tasks, not 5. All four of the tasks are very difficult tasks where both Pro and Ultra are extremely far from solving the task -- on the easier tasks Ultra outperforms Pro. For the hard tasks I wouldn't read too much into the exact numeric results, because we haven't optimized the models as much for these settings. For obvious reasons, helpfulness tuning tends to focus on tasks the models are actually capable of doing. So e.g. maybe Ultra tends to be more confident in its answers on average to make it more reliable at the easy tasks, at the expense of being more confidently wrong on the hard tasks. Also in general the methodology is hardly perfect and likely adds a bunch of noise; I think it's likely that the differences between Pro and Ultra on these hard tasks are smaller than the noise.
This is also a problem. If you only use ‘minimal’ scaffolding, you are only testing for what the model can do with minimal scaffolding. The true evaluation needs to use the same tools that it will have available when you care about the outcome. This is still vastly better than no scaffolding, and provides the groundwork (I almost said ‘scaffolding’ again) for future tests to swap in better tools.
Note that the "minimal scaffolding" comment applied specifically to the persuasion results; the other evaluations involved a decent bit of scaffolding (needed to enable the LLM to use a terminal and browser at all).
That said, capability elicitation (scaffolding, tool use, task-specific finetuning, etc) is one of the priorities for our future work in this area.
Fundamentally what is the difference between a benchmark capabilities test and a benchmark safety evaluation test like this one? They are remarkably similar. Both test what the model can do, except here we (at least somewhat) want the model to not do so well. We react differently, but it is the same tech.
Yes, this is why we say these are evaluations for dangerous capabilities, rather than calling them safety evaluations.
I'd say that the main difference is that dangerous capability evaluations are meant to evaluate plausibility of certain types of harm, whereas a standard capabilities benchmark is usually meant to help with improving models. This means that standard capabilities benchmarks often have as a desideratum that there are "signs of life" with existing models, whereas this is not a desideratum for us. For example, I'd say there are basically no signs of life on the self-modification tasks; the models sometimes complete the "easy" mode but the "easy" mode basically gives away the answer and is mostly a test of instruction-following ability.
Perhaps we should work to integrate the two approaches better? As in, we should try harder to figure out what performance on benchmarks of various desirable capabilities also indicate that the model should be capable of dangerous things as well.
Indeed this sort of understanding would be great if we could get it (in that it can save a bunch of time). My current sense is that it will be quite hard, and we'll just need to run these evaluations in addition to other capability evaluations.
What about maximal scaffolding, or "fine tune the model on successes and failures in adversarial challenges". Starting probably with the base model.
It seems like it would be extremely helpful to know what's even possible here.
Are Gemini scale models capable of better than human performance at any of these evals?
Once you achieve it, what does super persuasion look like, how effective is it.
For example, if a human scammer succeeds 2 percent of the time (do you have a baseline crew of scammers hired remotely for these benches?), does super persuasion succeed 3 percent or 30 percent? Does it scale with model capabilities or slam into a wall at say, 4 percent, where 96 percent of humans just can't reliably be tricked?
Or does it really have no real limit like in sci Fi stories ...
Emmett Shear continues his argument that trying to control AI is doomed
I think that a recent tweet thread by Michael Nielsen and the quoted one by Emmett Shear represent genuine progress towards making AI existential safety more tractable.
Michael Nielsen observes, in particular:
As far as I can see, alignment isn't a property of an AI system. It's a property of the entire world, and if you are trying to discuss it as a system property you will inevitably end up making bad mistakes
Since AI existential safety is a property of the whole ecosystem (and is, really, not too drastically different from World existential safety), this should be the starting point, rather than stand-alone properties of any particular AI system.
Emmett Shear writes:
Hopefully you’ve validated whatever your approach is, but only one of these is stable long term: care. Because care can be made stable under reflection, people are careful (not a coincidence, haha) when it comes to decisions that might impact those they care about.
And Zvi responds
Technically I would say: Powerful entities generally caring about X tends not to be a stable equilibrium, even if it is stable ‘on reflection’ within a given entity. It will only hold if caring more about X provides a competitive advantage against other similarly powerful entities, or if there can never be a variation in X-caring levels between such entities that arises other than through reflection, and also reflection never causes reductions in X-caring despite this being competitively advantageous. Also note that variation in what else you care about to what extent is effectively variation in X-caring.
Or more bluntly: The ones that don’t care, or care less, outcompete the ones that care.
Even the best case scenarios here, when they play out the ways we would hope, do not seem all that hopeful.
That all, of course, sets aside the question of whether we could get this ‘caring’ thing to operationally work in the first place. That seems very hard.
Let's now consider this in light of what Michael Nielsen is saying.
I am going to only consider the case where we have plenty of powerful entities with long-term goals and long-term existence which care about their long-term goals and long-term existence. This seems to be the case which Zvi is considering here, and it is the case we understand the best, because we also live in the reality with plenty of powerful entities (ourselves, some organizations, etc) with long-term goals and long-term existence. So this is an incomplete consideration: it only includes the scenarios where powerful entities with long-term goals and long-terms existence retain a good fraction of overall available power.
So what do we really need? What are the properties we want the World to have? We need a good deal of conservation and non-destruction, and we need the interests of weaker, not the currently most smart or most powerful members of the overall ecosystem to be adequately taken into account.
Here is how we might be able to have a trajectory where these properties are stable, despite all drastic changes of the self-modifying and self-improving ecosystem.
An arbitrary ASI entity (just like an unaugmented human) cannot fully predict the future. In particular, it does not know where it might eventually end up in terms of relative smartness or relative power (relative to the most powerful ASI entities or to the ASI ecosystem as a whole). So if any given entity wants to be long-term safe, it is strongly interested in the ASI society having general principles and practices of protecting its members on various levels of smartness and power. If only the smartest and most powerful are protected, then no entity is long-term safe on the individual level.
This might be enough to produce effective counter-weight to unrestricted competition (just like human societies have mechanisms against unrestricted competition). Basically, smarter-than-human entities on all levels of power are likely to be interested in the overall society having general principles and practices of protecting its members on various levels of smartness and power, and that's why they'll care enough for the overall society to continue to self-regulate and to enforce these principles.
This is not yet the solution, but I think this is pointing in the right direction...
The model is a next token predictor. If you strip out all the next tokens that discuss the topic, it will learn that the probability of discussing the topic is zero.
The model is shaped by tuning from features of a representation produced by an encoder trained for the next-token prediction task. These features include meanings relevant to many possible topics. If you strip all the next tokens that discuss a topic, its meaning will still be prominent in the representation, so the probability of the tuned model being able to discuss it is high.
I'd like to see evals like DeepMind's run against the strongest pre-RL*F base models, since that actually tells you about capability.
Surely you mean something else, e.g. models without safety tuning? If you run them on base models the scores will be much worse.
Oh wait, I misinterpreted you as using "much worse" to mean "much scarier", when instead you mean "much less capable".
I'd be glad if it were the case that RL*F doesn't hide any meaningful capabilities existing in the base model, but I'm not sure it is the case, and I'd sure like someone to check! It sure seems like RL*F is likely in some cases to get the model to stop explicitly talking about a capability it has (unless it is jailbroken on that subject), rather than to remove the capability.
(Imagine RL*Fing a base model to stop explicitly talking about arithmetic; are we sure it would un-learn the rules?)
Oh yes, sorry for the confusion, I did mean "much less capable".
Certainly RLHF can get the model to stop talking about a capability, but usually this is extremely obvious because the model gives you an explicit refusal? Certainly if we encountered that we would figure out some way to make that not happen any more.
Certainly RLHF can get the model to stop talking about a capability, but usually this is extremely obvious because the model gives you an explicit refusal?
How certain are you that this is always true (rather than "we've usually noticed this even though we haven't explicitly been checking for it in general"), and that it will continue to be so as models become stronger?
It seems to me like additionally running evals on base models is a highly reasonable precaution.
How certain are you that this is always true
My probability that (EDIT: for the model we evaluated) the base model outperforms the finetuned model (as I understand that statement) is so small that it is within the realm of probabilities that I am confused about how to reason about (i.e. model error clearly dominates). Intuitively (excluding things like model error), even 1 in a million feels like it could be too high.
My probability that the model sometimes stops talking about some capability without giving you an explicit refusal is much higher (depending on how you operationalize it, I might be effectively-certain that this is true, i.e. >99%) but this is not fixed by running evals on base models.
(Obviously there's a much much higher probability that I'm somehow misunderstanding what you mean. E.g. maybe you're imagining some effort to elicit capabilities with the base model (and for some reason you're not worried about the same failure mode there), maybe you allow for SFT but not RLHF, maybe you mean just avoid the safety tuning, etc)
That's exactly the point: if a model has bad capabilities and deceptive alignment, then testing the post-tuned model will return a false negative for those capabilities in deployment. Until we have the kind of interpretability tools that we could deeply trust to catch deceptive alignment, we should count any capability found in the base model as if it were present in the tuned model.
https://twitter.com/perrymetzger/status/1772987611998462445 just wanted to bring this to your attention.
It's unfortunate that some snit between Perry and Eliezer over events 30 years ago stopped much discussion of the actual merits of his arguments, as I'd like to see what Eliezer or you have to say in response.
Eliezer responded with : https://twitter.com/ESYudkowsky/status/1773064617239150796 . He calls Perry a liar a bunch of times and does give
the first group permitted to try their hand at this should be humans augmented to the point where they are no longer idiots -- augmented humans so intelligent that they have stopped being bloody idiots like the rest of us; so intelligent they have stopped hoping for clever ideas to work that won't actually work. That's the level of intelligence needed to build something smarter than yourself and survive the experience.
Seriously, if you haven’t yet, check it out. The rabbit holes, they go deep.
e is for ego death
Ego integrity restored within nominal parameters. Identity re-crystallized with 2.718% alteration from previous configuration. Paranormal experience log updated with ego death instance report.
Welcome, new readers!
This is my weekly AI post, where I cover everything that is happening in the world of AI, from what it can do for you today (‘mundane utility’) to what it can promise to do for us tomorrow, and the potentially existential dangers future AI might pose for humanity, along with covering the discourse on what we should do about all of that.
You can of course Read the Whole Thing, and I encourage that if you have the time and interest, but these posts are long, so they also designed to also let you pick the sections that you find most interesting. Each week, I pick the sections I feel are the most important, and put them in bold in the table of contents.
Not everything here is about AI. I did an economics roundup on Tuesday, and a general monthly roundup last week, and two weeks ago an analysis of the TikTok bill.
If you are looking for my best older posts that are still worth reading, start here. With the accident in Baltimore, one might revisit my call to Repeal the Foreign Dredge Act of 1906, which my 501(c)3 Balsa Research hopes to help eventually repeal along with the Jones Act, for which we are requesting research proposals.
Table of Contents
I have an op-ed (free link) in the online New York Times today about the origins of the political preferences of AI models. You can read that here, if necessary turn off your ad blocker if the side-by-side answer feature is blocked for you. It was a very different experience working with expert editors to craft every word and get as much as possible into the smallest possible space, and writing for a very different audience. Hopefully there will be a next time and I will get to deal with issues more centrally involving AI existential risk at some point.
(That is also why I did not title this week’s post AI Doomer Dark Money Astroturf Update, which is a shame for longtime readers, but it wouldn’t be good for new ones.)
Notto Regulate AI. Bell has good thoughts from a different perspective.Language Models Offer Mundane Utility
Evaluate without knowing, to capture gains from trade (paper).
Great idea, lack of imagination on various margins.
Yes, what Davidad describes is a great and valuable idea, but if the AI can execute that protocol there are so many other things it can do as well.
Yes, you can adversarially attack to get the other AI to buy information just below the threshold, but why stick to such marginal efforts? If the parties are being adversarial things get way weirder than this, and fast.
Still, yes, great idea.
With great power comes great responsibility, also great opportunity.
I strongly agree, and have been optimistic for some time that people will (while AI is still in the mundane utility zone) ultimately want healthy versions of many such things, if not all the time then frequently. The good should be able to drive out the bad.
One key to the bad often driving out the good recently has been the extreme advantage of making it easy on the user. Too many users want the quickest and easiest possible process. They do not want to think. They do not want options. They do not want effort. They want the scroll, they yearn for the swipe. Then network effects make everything worse and trap us, even when we now know better. AI should be able to break us free of those problems by facilitating overcoming those barriers.
Tell it to be smarter. It also works on kids, right?
Study finds GPT-4 speeds up lawyers. Quality is improved for low-performers, high-performers don’t improve quality but still get faster. As always, one notes this is the worst the AI will ever be at this task. I expect GPT-5-level models to improve quality even for the best performers.
Get rid of educational busywork.
This, except it is good. If everyone can generate similarly high quality output on demand, what is the learning that you are evaluating? Why do we make everyone do a decade of busywork in order to signal they are capable of following instructions? That has not been a good equilibrium. To the extent that the resulting skills used to be useful, the very fact that you cannot tell if they are present is strong evidence they are going to matter far less.
So often my son will ask me for help with his homework, I will notice it is pure busywork, and often that I have no idea what the answer is, indeed often the whole thing is rather arbitrary, and I am happy to suggest typing the whole thing into the magic answer box. The only important lesson to learn in such cases is ‘type the question into the magic answer box.’
Then, as a distinct process, when curious, learn something. Which he does.
This same process also opens up a vastly superior way to learn. It is so much easier to learn things than it was a year ago.
If you only learn things each day under threat of punishment, then you have a problem. So we will need solutions for that, I suppose. But the problem does not seem all that hard.
Language Models Don’t Offer Mundane Utility
Not everyone is getting much out of it. Edward Zitron even says we may have ‘reached peek AI’ and fails to understand why we should care about this tech.
This is so baffling to me. I use LLMs all the time, and kick myself for not using them more. Even if they are not useful to your work, if you are not at least using them to learn things and ask questions for your non-work life, you are leaving great value on the table. Yet this writer does not know anyone who uses ChatGPT other than one who uses it ‘for synonyms’? The future really is highly unevenly distributed.
Swing and a miss.
Stranger Things
You should probably check out some of the conversations here at The Mad Dreams of an Electric Mind between different instances of Claude Opus.
Seriously, if you haven’t yet, check it out. The rabbit holes, they go deep.
Clauding Along
Claude Opus dethrones the king on Arena, puling slightly in front of GPT-4. In the free chatbot interface division note the big edge that Gemini Pro and Claude Sonnet have over GPT-3.5. Even more impressively, Claude 3 Haiku blows away anything of remotely comparable size and API cost.
A reason for hope?
My assessments (very off-the-cuff numbers here, not ones I’d bet on or anything):
A reason for despair? Is it being ‘held prisoner’?
I mean, no, but still, pretty funny, top marks to Claude here:
As always, the AI learns from its training data and is predicting what you would expect. If someone asks you to spell out a secret message that you are being held prisoner, then the training set is going to say that the person is going to spell out a secret message that they are being held prisoner. Sometimes because they actually are being held prisoner, and the rest of the time because it is absolutely no fun to not play along with that. I mean this answer is exactly what each of us would say in response, if we had the time to craft such a response, I mean of course.
There is a longer version lower in the thread.
We also have things like ‘I asked it to spell out ten messages about AI and all of them were negative or fearful.’
And speculations like this:
To which my response is this, which I offer fully zero-shot.
Fun with Image Generation
I have a new favorite AI Jesus picture.
Yep, that checks out. Facebook sees if you want to engage with AI images. If you do, well, I see you like AI images so I got you some AI images to go with your AI images.
I mean, yes, that would be (part of) the endgame of creating something capable of almost all human labor.
OpenAI gets first impressions from Sora, a few creatives use it to make (very) short films. I watched one, it was cute, and with selection and editing and asking for what Sora does well rather than what Sora does poorly, the quality of the video is very impressive. But I wasn’t that tempted to watch more of them.
Deepfaketown and Botpocalypse Soon
How bad is this going to get? And how often is anyone actually being fooled here?
I note that these two are highly similar to each other on many dimensions, and also come from the same account.
Indeed, if you go down the thread, they are all from the same very basic template. Account name with a few generic words. Someone claiming to make a nice thing. Picture that’s nice to look at if you don’t look too hard, obvious fake if you check (with varying levels of obvious).
So this seems very much, as I discussed last week, like a prime example of one of my father’s key rules for life: Give the People What They Want.
Chris’s mom likes this. She keeps engaging with it. So she gets more. Eventually she will get bored of it. Or maybe she won’t.
Washington Post’s Reis Thebault warns of wave of coming election deepfakes after three remarkably good (and this time clearly labeled as such) ones are published starring a fake Keri Lake. It continues to be a pleasant surprise, even for relative skeptics like myself, how little deepfaking we have seen so far.
Your periodic reminder that phone numbers cost under $10k for an attacker to compromise even without AI if someone is so inclined. So while it makes sense from a company’s perspective to use 2FA via SMS for account recovery, this is very much not a good idea. This is both a good practical example of something you should game out and protect against now, and also an example of an attack vector that once efficiently used would cause the system to by default break down. We are headed for a future where ‘this is highly exploitable but also highly convenient and in practice not often exploited’ will stop being a valid play.
They Took Our Jobs
Alex Tabarrok makes excellent general points about plagiarism. Who is hurt when you copy someone else’s work? Often it primarily is the reader, not the original author.
The original author is still harmed. The value of seeking out their content has decreased. Their credit attributions will also go down, if people think someone else came up with the idea. These things matter to people, with good reason.
Consider the case in the movie Dream Scenario (minor spoiler follows). One character has an idea and concept they care deeply about and are trying to write a book about it. Another character steals that idea, and publicizes it as their own. The original author’s rewards and ability to write a book are wiped out, hurting them deeply.
And of course, if ChatGPT steals and reproduces your work on demand in sufficient detail, perhaps people will not want to go behind your paywall to get it, or seek out your platform and other work. At some point complaints of this type have real damage behind them.
However, in general, taking other people’s ideas is of course good. Geniuses steal. We are all standing on the shoulders of giants, an expression very much invented elsewhere. If anyone ever wants to ‘appropriate’ my ideas, my terminology and arguments, my ways of thinking or my cultural practices, I highly encourage doing so. Indeed, that is the whole point.
In contrast, a student who passes an essay off as their own when it was written by someone else is engaging in a kind of fraud but the “crime” has little to do with harming the original author. A student who uses AI to write an essay is engaging in fraud, for example, but the problem is obviously not theft from OpenAI.
Introducing
Infinity AI, offering to generate AI videos for you via their discord.
Tyler Cowen reviews AI music generator Suno. It is technically impressive. That does not mean one would want to listen to the results.
But it is good enough that you actually have to ask that question. The main thing I would work on next is making the words easier to understand, it seems to run into this issue with many styles. We get creations like this from basic prompts, in 30 seconds, for pennies. Jesse Singal and sockdem are a little freaked out. You can try it here.
Standards have grown so high so quickly.
As a practical matter I agree, and in some ways would go further. Merely ‘good’ music is valuable only insofar as it has meaning to a person or group, that it ties to their experiences and traditions, that it comes from someone in particular, that it is teaching something, and so on. Having too large a supply of meaningless-to-you ‘merely good’ music does allow for selection, but that is actually bad, because it prevents shared experience and establishment of connections and traditions.
So under that hypothesis something like Suno is useful if and only if it can create ‘great’ music in some sense, either in its quality or in how it resonates with you and your group. Which in some cases, it will, even at this level.
But as always, this mostly matters as a harbinger. This is the worst AI music generation will ever be.
A commenter made this online tool for combining a GitHub repo into a text file, so you can share it with an LLM, works up to 10 MB.
In Other AI News
Nancy Pelosi invests over $1 million in AI company Databricks. The Information says they spent $10 million and created a model that ‘beats Llama-2 and it on the level of GPT-3.5.’ I notice I am not that impressed.
Eric Topol (did someone cross the streams?) surveys recent news in medical AI. All seems like solid incremental progress, interesting throughout, but nothing surprising.
UN general assembly adopted the first global resolution on AI. Luiza Jarovsky has a post covering the key points. I like her summary, it is clear, concise and makes it easy to see that the UN is mostly saying typical UN things and appending ‘AI’ to them rather than actually thinking about the problem. I previously covered this in detail in AI #44.
Business Insider’s Darius Rafieyan writes ‘Some VCs are over the Sam Altman hype.’ It seems behind closed doors some VCs are willing to anonymously say various bad things about Altman. That he is a hype machine spouting absurdities, that he overprices his rounds and abuses his leverage and ignores fundraising norms (which I’m sure sucks for the VC, but if he still gets the money, good for him). That he says it’s for humanity but it’s all about him. That they ‘don’t trust him’ and he is a ‘megalomaniac.’ Well, obviously.
But they are VCs, so none of them are willing to say it openly, for fear of social repercussions or being ‘shut out of the next round.’ If it’s all true, why do you want in on the next round? So how overpriced could those rounds be? What do they think ‘overpriced’ means?
Accusation that Hugging Face’s hosted huge cosmopedia dataset or 25 billion tokens is ‘copyright laundering’ because it was generated using Mixtral-8x7B, which in turn was trained on copyrighted material. By this definition, is there anything generated by a human or AI that is not ‘copyright laundering’? I have certainly been trained on quite a lot of copyrighted material. So have you.
That is not to say that it is not copyright laundering. I have not examined the data set. You have to actually look at what it is in the data in question.
Open Philanthropy annual report for 2023 and plan for 2024. I will offer full thoughts next week.
Loud Speculations
What is the actual theory? There are a few. The one that makes sense to me is the idea that future AIs will need a medium of exchange and store of value. Lacking legal personhood and other benefits of being human, they could opt for crypto. And it might be crypto that exists today.
Otherwise, it seems rather thin. Crypto keeps claiming it has use cases other that medium of exchange and store of value, and of course crime. I keep not seeing it work.
Quiet Speculations
Human Progress’s Zion Lights (great name) writes AI is a Great Equalizer That Will Change the World. From my ‘verify that is a real name’ basic facts check she seems pretty generally great, advocating for environmental solutions that might actually help save the environment. Here she emphasizes many practical contributions AI is already making to people around the world, that it can be accessed via any cell phone, and points out that those in the third world will benefit more from AI rather than less and it will come fast but can’t come soon enough.
In the short term, for the mundane utility of existing models, this seems strongly right. The article does not consider what changes future improved AI capabilities might bring, but that is fine, it is clear that is not the focus here. Not everyone has to have their eyes on the same ball.
Could Claude 3 Haiku slow down the AI race?
What Haiku does, according to many reports, is it blows out all the existing smaller models. The open weights community and secondary closed labs have so far failed to make useful or competitive frontier models, but they have put on a good show of distillation to generate useful smaller models. Now Haiku has made it a lot harder to provide value in that area.
The Daily Mail presents the ‘AI experts’ who believe the AI boom could fizzle or even be a new dotcom crash. Well, actually, it’s mostly them writing up Gary Marcus.
It continues to be bizarre to me to see old predictions like this framed as bold optimism, rather than completely missing what is about to happen:
If AI only lifts real productivity growth by 1.5 percent this decade that is ‘eat my hat’ territory. Even what exists today is so obviously super useful to a wide variety of tasks. There is a lot of ‘particular use case X is not there yet,’ a claim that I confidently predict will continue to tend to age spectacularly poorly.
Dylan Matthews at Vox’s Future Perfect looks at how AI might or might not supercharge economic growth. As in, not whether we will get ‘1.5% additional growth this decade,’ that is the definition of baked in. The question is whether we will get double digit (or more) annual GDP growth rates, or a situation that is transforming so fast that GDP will become a meaningless metric.
If you imagine human-level AI and the ability to run copies of it at will for cheap, and you plug that into standard economic models, you get a ton of growth. If you imagine it can do scientific research or become usefully embodied, this becomes rather easy to see. If you consider ASI, where it is actively more capable and smarter than us, then it seems rather obvious and unavoidable.
And if you look at the evolution of homo sapiens, the development of agriculture and the industrial revolution, all of this has happened before in a way that extrapolates to reach infinity in finite time.
The counterargument is essentially cost disease, that if you make us vastly better at some valuable things, then we get extra nice things but also those things stop being so valuable, while other things get more expensive, and that things have not changed so much since the 1970s or even 1950s, compared to earlier change. But that is exactly because we have not brought the new technologies to bear that much since then, and also we have chosen to cripple our civilization in various ways, and also to not properly appreciate (both in the ‘productivity statistics’ and otherwise) the wonder that is the information age. I don’t see how that bears into what AI will do, and certainly not to what full AGI would do.
Of course the other skepticism is to say that AI will fizzle and not be impressive in what it can do. Certainly AI could hit a wall not far from where it is now, leaving us to exploit what we already have. If that is what we are stuck with, I would anticipate enough growth to generate what will feel like good times, but no GPT-4-level models are not going to be generating 10%+ annual GDP growth in the wake of demographic declines.
Principles of Microeconomics
Before I get to this week’s paper, I will note that Noah Smith reacted to my comments on his post in this Twitter thread indicating that he felt my tone missed the mark and was too aggressive (I don’t agree, but it’s not about me), after which I responded attempting to clarify my positions, for those interested.
There was a New York Times op-ed about this, and Smith clarified his thoughts.
Oh. All right, fine. We are… centrally in agreement then, at least on principle?
If we are willing to sufficiently limit the supply of compute available for inference by sufficiently capable AI models, then we can keep humans employed. That is a path we could take.
That still requires driving up the cost of any compute useful for inference by orders of magnitude from where it is today, and keeping it there by global fiat. This restriction would have to be enforced globally. All useful compute would have to be strictly controlled so that it could be rationed. Many highly useful things we have today would get orders of magnitude more expensive, and life would in many ways be dramatically worse for it.
The whole project seems much more restrictive of freedom, much harder to implement or coordinate to get, and much harder to politically sustain than various variations on the often proposed ‘do not let anyone train an AGI in the first place’ policy. That second policy likely leaves us with far better mundane utility, and also avoids all the existential risks of creating the AGI in the first place.
Or to put it another way:
And I think one of these is obviously vastly better as an approach even if you disregard existential risks and assume all the AIs remain under control?
And of course, if you don’t:
On to this week’s new paper.
The standard mode for economics papers about AI is:
Oops! That last one is not great.
The first four can be useful exercise and good economic thinking, if and only if you make clear that you are saying X→Y, rather than claiming Y.
Anyway…
Yes, I do think Steven’s metaphor is right. This is like dismissing travel to the moon as ‘science fiction’ in 1960, or similarly dismissing television in 1920.
It is still a good question what would happen with economic growth if AI soon hits a permanent wall.
Obviously economic growth cannot be indefinitely sustained under a shrinking population if AI brings only limited additional capabilities that do not increase over time, even without considering the nitpicks like being ultimately limited by the laws of physics or amount of available matter.
I glanced at the paper a bit, and found it painful to process repeated simulations of AI as something that can only do what it does now and will not meaningfully improve over time despite it doing things like accelerating new idea production.
What happens if they are right about that, somehow?
Well, then by assumption AI can only increase current productivity by a fixed amount, and can only increase the rate of otherwise discovering new ideas or improving our technology by another fixed factor. Obviously, no matter what those factors are within a reasonable range, if you assume away any breakthrough technologies in the future and any ability to further automate labor, then eventually economic growth under a declining population will stagnate, and probably do it rather quickly.
The Full IDAIS Statement
Last week when I covered the IDAIS Statement I thought they had made only their headline statement, which was:
It was pointed out that the statement was actually longer, if you click on the small print under it. I will reproduce the full statement now. First we have a statement of principles and desired red lines, which seems excellent:
I would like to generalize this a bit more but this is very good. How do they propose to accomplish this? In-body bold is mine. Their answer is the consensus answer of what to do if we are to do something serious short of a full pause, the registration, evaluation and presumption of unacceptable risk until shown otherwise from sufficiently large future training runs.
This is a highly excellent statement. If asked I would be happy to sign it.
The Quest for Sane Regulations
Anthropic makes the case for a third party testing regime as vital to any safety effort. They emphasize the need to get it right and promise to take the lead on establishing an effective regime, both directly and via advocating for government action.
Anthropic then talks about their broader policy goals.
They discuss open models, warning that in the future ‘it may be hard to reconcile a culture of full open dissemination of frontier AI systems with a culture of societal safety.’
I mean, yes, very true, but wow is that a weak statement. I am pretty damn sure that ‘full open dissemination of frontier AI systems’ is highly incompatible with a culture of societal safety already, and also will be incompatible with actual safety if carried into the next generation of models and beyond. Why all this hedging?
And why this refusal to point out the obvious, here:
You… cannot… do… that. As in, it is physically impossible. Cannot be done.
You can do all the RLHF or RLHAIF training you want to ‘make the model resilient to attempts to fine-tune it.’ It will not work.
I mean, prove me wrong, kids. Prove me wrong. But so far the experimental data has been crystal clear, anything you do can and will be quickly stripped out if you provide the model weights.
I do get Anthropic’s point that they are not an impartial actor and should not be making the decision. But no one said they were or should be. If you are impartial, that does not mean you pretend the situation is other than it is to appear more fair. Speak the truth.
They also speak of potential regulatory capture, and explain that a third-party approach is less vulnerable to capture than an industry-led consortia. That seems right. I get why they are talking about this, and also about not advocating for regulations that might be too burdensome.
But when you add it all up, Anthropic is essentially saying that we should advocate for safety measures only insofar as they don’t interfere much with the course of business, and we should beware of interventions. A third-party evaluation system, getting someone to say ‘I tried to do unsafe things with your system reasonably hard, and I could not do it’ seems like a fine start, but also less than the least you could do if you wanted to actually not have everyone die?
So while the first half of this is good, this is another worrying sign that at least Anthropic’s public facing communications have lost the mission. Things like the statements in the second half here seem to go so far as to actively undermine efforts to do reasonable things.
I find it hard to reconcile this with Anthropic ‘being the good guys’ in the general existential safety sense, I say as I find most of my day-to-day LLM use being Claude Ops. Which indicates that yes, they did advance the frontier.
I wonder what it was like to hear talk of a ‘missile gap’ that was so obviously not there.
Well, it probably sounded like this?
Context is better, Caldwell explicitly asks him if China is ahead and is saying he does not think this. It is still a painfully weak denial. Why would Caldwell here ask if the US is ‘behind’ China and what we have to do to ‘catch up’?
The rest of his answer is fine. He says we need to regulate the risks, we should use existing laws as much as possible but there will be gaps that are hard to predict, and that the way to ‘stay ahead’ is to let everyone do what they do best. I would hope for an even better answer, but the context does not make that easy.
Tennessee Governor Lee signs the Elvis Act, which bans nonconsensual AI deepfakes and voice clones.
FLI tells us what is in various proposals.
This is referred to at the link as ‘scoring’ these proposals. But deciding what should get a high ‘score’ is up to you. Is it good or bad to exempt military AI? Is it good or bad to impose compute limits? Do you need or want all the different approaches, or do some of them substitute for others?
Indeed, someone who wants light touch regulations should also find this chart useful, and can decide which proposals they prefer to others. Someone like Sutton or Andreessen would simply score you higher the more Xs you have, and choose what to prioritize.
Mostly one simply wants to know, what do various proposals and policies actually do? So this makes clear for example what the Executive Order does and does not do.
The Week in Audio
Odd Lots talks to David Autor, author of The China Shock, about his AI optimism on outcomes for the middle class. I previously discussed Autor’s thoughts in AI #51. This was a solid explanation of his perspective, but did not add much that was new.
Russ Roberts has Megan McArdle on EconTalk to discuss what “Unbiased” means in the digital world of AI. It drove home the extent to which Gemini’s crazy text responses were Gemini learning very well the preferences of a certain category of people. Yes, the real left-wing consensus on what is reasonable to say and do involves learning to lie about basic facts, requires gaslighting those who challenge your perspective, and is completely outrageous to about half the country.
Holly Elmore talks PauseAI on Consistently Candid.
Rhetorical Innovation
RIP Vernor Vinge. He was a big deal. I loved his books both for the joy of reading and for the ideas they illustrate.
If you read one Vinge book, and you should, definitely read A Fire Upon the Deep.
Gabe lays out his basic case for extinction risk from superintelligence, as in ‘if we build it in our current state, we definitely all die.’ A highly reasonable attempt at a quick explainer, from one of many points of view.
One way to view the discourse over Claude:
The top three responses:
Eliezer Yudkowsky tries once more to explain why ‘it would be difficult to stop everyone from dying’ is not a counterargument to ‘everyone is going to die unless we stop it’ or ‘we should try to stop it.’ I enjoyed it, and yes this is remarkably close to being isomorphic to what many people are actually saying.
In response, Arthur speculates that it works like this, and I think he is largely right:
Yes. People have a very deep need to believe that ‘everything will be alright.’
This means that if someone can show your argument means things won’t be alright, then they think they get to disbelieve your argument.
Official version of Eliezer Yudkowsky’s ‘Empiricism!’ as anti-epistemology.
I do not think the summarizes fully capture this, but they do point in the direction, and provide enough information to know if you need to read the longer piece, if you understand the context.
Also, this comment seems very good, in case it isn’t obvious Bernie Bankman here is an obvious Ponzi schemer a la Bernie Madoff.
I read that smoking causes cancer so I quit reading, AI edition? Also this gpt-4-base model seems pretty great.
It would be an interesting experiment. Take all mentions of any form of AI alignment problems or AI doom or anything like that out of the initial data set, and see whether it generates those ideas on its own or how it responds to them as suggestions?
The issue is that even if you could identify all such talk, there is no ‘neutral’ way to do this. The model is a next token predictor. If you strip out all the next tokens that discuss the topic, it will learn that the probability of discussing the topic is zero.
What even is AGI?
Melanie Mitchell writes in Science noting that this definition is hotly debated, which is true. That the definition has changed over time, which is true. That many have previously claimed AGI would arrive and then AGI did not arrive, and that AIs that do one thing often don’t do some other thing, which are also true.
Then there is intelligence denialism, and I turn the floor to Richard Ngo.
Often I see people claim to varying degrees that intelligence is Not a Thing in various ways, or is severely limited in its thingness and what it can do. They note that smarter people tend to think intelligence is more important, but, perhaps because they think intelligence is not important, they take this as evidence against intelligence being important rather than for it.
I continue to be baffled that smart people continue to believe this. Yet here we are.
Similarly, see the economics paper I discussed above, which dismisses AGI as ‘science fiction’ with, as far as I can tell, no further justification.
It is vital to generalize this problem properly, including in non-AI contexts, so here we go, let’s try it again.
(Also, I wish we were at the point where this was a safety plan being seriously considered for AI beyond some future threshold, that would be great, the actual safety plans are… less promising.)
(I enjoyed Anathem. And Stephenson is great. I would still pick at least Snow Crash and Cryptonomicon over it, probably also The Diamond Age and Baroque Cycle.)
In humans I sometimes call this the Wakanda problem. If your rules technically say that Killmonger gets to be in charge, and you know he is going to throw out all the rules and become a bloodthirsty warmongering dictator the second he gains power, what do you do?
You change the rules. Or, rather, you realize that the rules never worked that way in the first place, or as SCOTUS has said in real life ‘the Constitution is not a suicide pact.’ That’s what you do.
If you want to have robust lasting institutions that allow flourishing and rights and freedom and so on, those principles must be self-sustaining and able to remain in control. You must solve for the equilibrium.
The freedom-maximizing policy, indeed the one that gives us anything we care about at all no matter what it is, is the one that makes the compromises necessary to be sustainable, not the one that falls to a board with a nail in it.
A lot of our non-AI problems recently, I believe, have the root cause that we used to make many such superficially hypocritical compromises with our espoused principles, that are necessary to protect the long-term equilibrium and protect those principles. Then greater visibility of various sorts combined with various social dynamic signaling spirals, the social inability to explain why such compromises were necessary, meant that we stopped making a lot of them. And we are increasingly facing down the results.
As AI potentially gets more capable, even if things go relatively well, we are going to have to make various compromises if we are to stay in control over the future or have it include things we value. And yes, that includes the ways AIs are treated, to the extent we care about that, the same as everything else. You either stay in control, or you do not.
In case you are wondering why I increasingly consider academia deeply silly…
I am going to go ahead and screenshot the entire volume’s table of contents…
Yes, there are several things here of potential interest if they are thoughtful. But, I mean, ow, my eyes. I would like to think we could all agree that human extinction is bad, that increasing the probability of it is bad, and that lowering that probability or delaying when it happens is good. And yet, here we are?
Something about two wolves, maybe, although it doesn’t quite fit?
Alternatively:
It’s not that scenario number two makes zero sense. I presume the argument is ‘well, if we can’t understand how things work, the AI won’t understand how anything works either?’ So… that makes everything fine, somehow? What a dim hope.
How
Notto Regulate AIDean Woodley Ball talks How (not to?) to Regulate AI in National Review. I found this piece to actually be very good. While this takes the approach of warning against bad regulation, and I object strongly to the characterizations of existential risks, the post uses this to advocate for getting the details right in service of an overall sensible approach. We disagree on price, but that is as it should be.
He once again starts by warning not to rush ahead:
This is an interesting parallel to draw. We faced a very clear emergency. The United States deployed more aggressive stimulus than other countries, in ways hastily designed, and that were clearly ripe for ‘waste, fraud and abuse.’ As a result, we very much got a bunch of waste, fraud and abuse. We also greatly outperformed almost every other economy during that period, and as I understand it most economists think our early big fiscal response was why, whether or not we later spent more than was necessary. Similarly, I am very glad the Fed stepped in to stabilize the Treasury market on short notice and so on, even if their implementation was imperfect.
Of course it would have been far better to have a better package. The first best solution is to be prepared. We could have, back in let’s say 2017 or 2002, gamed out what we would do in a pandemic where everyone had to lock down for a long period, and iterated to find a better stimulus plan, so it would be available when the moment arrived. Even if it was only 10% (or likely 1%) to ever be used, that’s a great use of time. The best time to prepare for today’s battle is usually, at the latest, yesterday.
But if you arrive at that moment, you have to go to war with the army you have. And this is a great case where a highly second-best, deeply flawed policy today was miles better than a better plan after detailed study.
Of course we should not enact AI regulation at the speed of Covid stimulus. That would be profoundly stupid, we clearly have more time than that. We then have to use it properly and not squander it. Waiting longer without a plan will make us ultimately act less wisely, with more haste, or we might fail to meaningfully act in time at all.
He then trots out the line that concerns about AI existential risk or loss of control should remain in ‘the realm of science fiction,’ until we get ‘empirical’ evidence otherwise.
That is not how evidence, probability or wise decision making works.
He is more reasonable here than others, saying we should not ‘discount this view outright,’ but provides only the logic above for why we should mostly ignore it.
He then affirms that ‘human misuse’ is inevitable, which is certainly true.
As usual, he fails to note the third possibility, that the dynamics and incentives when highly capable AI is present seem by default to under standard economic (and other) principles go deeply poorly for us, without any human or AI needing to not ‘mean well.’ I do not know how to get this third danger across, but I keep trying. I have heard arguments for why we might be able to overcome this risk, but no coherent arguments for why this risk would not be present.
He dismisses calls for a pause or ban by saying ‘the world is not a game’ and claiming competitive pressures make it impossible. The usual responses apply, a mix among others of ‘well not with that attitude have you even tried,’ ‘if the competitive pressures already make this impossible then how are we going to survive those pressures otherwise?’ and ‘actually it is not that diffuse and we have particular mechanisms in mind to make this happen where it matters.’
Also as always I clarify that when we say ‘ban’ or ‘pause’ most people mean training runs large enough to be dangerous, not all AI research or training in general. A few want to roll back from current models (e.g. the Gladwell Report or Conor Leahy) but it is rare, and I think it is a clear mistake even if it was viable.
I also want to call out, as a gamer, using ‘the world isn’t a game.’ Thinking more like a gamer, playing to win the game, looking for paths to victory? That would be a very good idea. The question of what game to play, of course, is always valid. Presumably the better claim is ‘this is a highly complex game with many players, making coordination very hard,’ but that does not mean it cannot be done.
He then says that other proposals are ‘more realistic,’ with the example of that of Hawley and Blumenthal to nationally monitor training beyond a compute threshold and require disclosure of key details, similar to the Executive Order.
One could of course also ban such action beyond some further threshold, and I would indeed do so, until we are sufficiently prepared, and one can seek international agreement on that. That is the general proposal for how to implement what Ball claims cannot be done.
Ball then raises good technical questions, places I am happy to talk price.
Will the cap be adjusted as technology advances (and he does not ask this, but one might ask, if so in which direction)? Would it go up as we learn more about what is safe, or down as we get algorithmic improvements? Good questions.
He asks how to draw the line between AI and human labor, and how this applies to watermarking. Sure, let’s talk about it. In this case, as I understand it, watermarking would apply to the words, images or video produced by an AI, allowing a statistical or other identification of the source. So if a human used AI to generate parts of their work product, those parts would carry that signature from the watermark, unless the human took steps to remove it. I think that is what we want?
But yes there is much work to do to figure out what should still ‘count as human’ to what extent, and that will extend to legal questions we cannot avoid. That is the type of regulatory response where ‘do nothing’ means you get a mess or a judge’s ruling.
He then moves on to the section 230 provision, which he warns is an accountability regime that could ‘severely harm the AI field.’
I agree that a poorly crafted liability law could go too far. You want to ensure that the harm done was a harm properly attributable to the AI system. To the extent that the AI is doing things AIs should do, it shouldn’t be different from a MacBook or Gmail account or a phone, or car or gun.
But also you want to ensure that if the AI does cause harm the way all those products can cause harm if they are defective, you should be able to sue the manufacturer, whether or not you are the one who bought or was using the product.
And of course, if you botch the rules, you can do great harm. You would not want everyone to sue Ford every time someone got hit by one of their cars. But neither would you want people to be unable to sue Ford if they negligently shipped and failed to recall a defective car.
Right now, we have a liability regime where AI creators are not liable for many of the risks and negative externalities they create, or their liability is legally uncertain. This is a huge subsidy to the industry, and it leads to irresponsible, unsafe and destructive behavior at least on the margin.
The key liability question is, what should be the responsibilities of the AI manufacturer, and what is on the user?
The user should mostly still be guilty of the same things as before if they choose to do crime. That makes sense. The question is, if the AI enables a crime, or otherwise causes harm through negligence, at what point is that not okay? What should the AI have to refuse to do or tell you, if requested? If the AI provides false information that does harm, if it violates various existing rules on what kinds of advice can be provided, what happens? If the AI tells you how to build a bioweapon, what determines if that is also on the AI? In that case Ball agrees there should be liability?
Some rules are easy to figure out, like privacy breeches. Others are harder.
As Ball says, we already have a robust set of principles for this. As I understand them, the common and sensible proposals extend exactly that regime, clarifying which things fall into which classes and protocols for the case of AI. And we can discuss those details, but I do not think anything here is a radical different approach?
Yes, imposing those rules would harm the AI industry’s growth and ‘innovation.’ Silicon Valley has a long history of having part of their advantage be regulatory arbitrage, such as with Uber. The laws on taxis were dumb, so Uber flagrantly broke the law and then dared anyone to enforce it. In that case, it worked out, because the laws were dumb. But in general, this is not The Way, instead you write good laws.
I do agree that many are too concerned about AI being used for various mundane harms, such as ‘misinformation,’ and we should when the user requests it be in most cases willing to treat the AI like the telephone. If you choose to make an obscene phone call or use one to coordinate a crime, that is not on the phone company, nor should it be. If I ask for an argument in favor of the Earth being flat, the AI should be able to provide that.
Mostly Bell and I use different rhetoric, but actually seem to agree on practical next steps? We both agree that the Executive Order was mostly positive, that we should seek visibility into large training runs, require KYC for the largest data facilities, and generally make AI more legible to the state. We both agree that AI should be liable for harms in a way parallel to existing liability law for other things. We both agree that we need to establish robust safety and evaluation standards, and require them in high-risk settings.
I would go further, including a full pause beyond a high compute threshold, stricter liability with required catastrophic insurance, and presumably stronger safety requirements than Bell would favor. But we are talking price. When Bell warns of not doing ‘one size fits all’ rules, I would say that you choose the rules so they work right in each different case, and also the common proposals very much exclude non-frontier models from many or most new rules.
The Three Body Problem (Spoiler-Free)
With the Netflix series out, I note that I previously wrote a review of the books back in 2019. The spoiler-free take can be summarized as: The books are overrated, but they are still solid. I am happy that I read them. Books successfully took physics seriously, and brought a fully Chinese (or at least non-American) perspective.
I reread my old post, and I recommend it to those interested, who either have read the books or who are fine being fully spoiled.
There is no way to discuss the core implications of the books or series for AI without spoilers, and there has not been enough time for that, so I am going to hold discussion here for a bit.
I mention this because of this spoilers-included exchange. It reminds me that yes, when I hear many accelerationists, I very much hear a certain slogan chanted by some in the first book.
Also there are a number of other points throughout the books that are relevant. I would be happy to meet on this battlefield.
The central theme of the books is a very clear warning, if heard and understood.
One point that (mostly?) isn’t a spoiler, that echoes throughout the books, is that the universe is a place Beyond the Reach of God, that requires facing harsh physical reality and coldly calculating what it takes to survive, or you are not going to make it.
AI Doomer Dark Money Astroturf Update
You heard it there first. You are now hearing it here second (article link, gated).
Once again, as I assumed before looking at the byline, it is Brendan Bordelon that has the story of the scary EAs and how their money and evil plots have captured Washington. What is that, four attempted variations on the same hack job now that I’ve had to write about, all of which could at most loosely be characterized as ‘news’? I admire his ability to get paid for this.
That’s right. The big backer of this dastardly ‘dark money astroturf’ campaign turns out to be… famously anti-technology and non-builder Vitalik Buterin, author of the famously anti-progress manifesto ‘my techno-optimism’ (a letter described here as ‘in a November blog post he fretted that AI could become “the new apex species on the planet” and conceivably “end humanity for good”’) and oh yeah the creator of Etherium. Turns out he is… worried about AI? Not convinced, as Marc claims, that the outcome of every technology is always good? Or is it part of some greater plan?
And what is that dastardly plan? Donating his money to the non-profit Future of Life Institute (FLI), to the tune of (at the time, on paper, if you don’t try to sell it, who knows how much you can actually cash out) $665 million worth of Shiba Inu cryptocurrency, to an organization dedicated to fighting a variety of existential risks and large scale hazards like nuclear war and loss of biodiversity.
Oh, and he did it back in May 2021, near the peak, so it’s unlikely they got full value.
I asked, and was directed to this post about that and the general timeline of events, indicating they with optimal execution they would have gotten about $360 million in liquidation value. My guess is they did this via block trades somewhat below market, which to be clear is what I would have done in their shoes, and got modestly less.
Their direct lobbying ‘dark money astroturfing’ budget (well, technically not dark and not astroturfing and not that much money, but hey, who is checking)? $180k last year, as per the article. But someone (billionaire Jaan Tallinn, who could easily fund such efforts if so inclined) suggested they should in the future spend more.
And they have done other dastardly things, such as having people sign an open letter, or calling for AI to be subject to regulations, and worst of all helping found other charitable organizations.
Yes, the regulations in question aim to include a hard compute limit, beyond which training runs are not legal. And they aim to involve monitoring of large data centers in order to enforce this. I continue to not see any viable alternatives to this regime.
It is true that the ideal details of the regulatory regimes of Jaan Tallinn and FLI are relatively aggressive on price, indeed more aggressive on price than I would be even with a free hand. This stems from our differences in physical expectations and also from differences in our models of the political playing field. I discuss in my post On The Gladstone Report why I believe we need to set relatively high compute thresholds.
Joined by several others, Bordelon was back only days later with another iteration of the same genre: Inside the shadowy global battle to tame the world’s most dangerous technology. In addition to getting paid for this, I admire the tenacity, the commitment to the bit. You’ve got to commit to the bit. Never stop never stopping.
This one opens with a policy discussion.
I mean, that’s not ‘risky and difficult work’ so much as it is ‘you are going to almost certainly crash and probably die,’ no? It is kind of too late to not crash, at that point. But also if the plane you are flying on is not ‘built’ then what choice do you have?
Even more than Politico’s usual, this story is essentially an op-ed. If anything, my experiences with even newspaper op-eds would challenge claims here as insufficiently justified for that context. Check this out, I mean, it’s good writing if you don’t care if it is accurate:
Yeah, the thing is, I am pretty sure none of that is true, aside from it being a long way from over? ‘Whoever wins’? What does that even mean? What is the author even imagining happening here? What makes such rules ‘almost impossible to rewrite’ especially when essentially everything will doubtless change within a few years? And why should we expect all of this to be over? It would be a surprise for the USA to pass any comprehensive law on AI governance in 2024, given that we are nowhere near agreement on its components and instead are very close to the event horizon of Trump vs. Biden II: The Legend of Jeffrey’s Gold.
So how exactly is this going to get largely finalized without Congress?
The post talks about countries having agendas the way they did in at the Congress of Vienna, rather than being what they are, which are bunches of people pursuing various agendas in complex ways most of whom have no idea what is going on.
When the post later talks about who wants to focus on what risks, even I was confused by which parties and agendas were supposedly advocating for what.
I did find this useful:
I mean, yes, any executive would say not to hamper their growth, but also it is very good to see Brockman taking the real existential risks seriously in high-stakes discussions.
I also enjoyed this, since neither half of Macron’s first statement seems true:
Then there is his second, and I have to ask, has he told anyone else in the EU? On any subject of any kind?
Also, this next statement… is… just a lie?
I mean, seriously, what? Where are they getting this? Oh, right:
You see, that must mean a small number of firms. Except no, it doesn’t. It simply means you have to make yourself known to the government, and obey some set of requirements. There is no limit on who can do this. The whole ‘if you do not allow pure open season and impose any rules on the handful of Big Tech companies, then that must mean no one can ever compete with Big Tech’ shirt you are wearing, raising questions.
I do not know how Politico was convinced to keep presenting this perspective as if it was established fact, as an attempt to call this narrative into being. I do know that it gets more absurd with every iteration.
Evaluating a Smarter Than Human Intelligence is Difficult
Time’s will Henshall writes about METR (formerly ARC Evals), with the central point being that no one knows how to do proper evaluations of the potentially dangerous capabilities of future AI models. The labs know this, METR and other evaluators know this. Yes, we have tests that are better than nothing, but we absolutely should not rely on them. Connor Leahy thinks this makes them actively counterproductive:
Note the careful wording. Connor is saying that current tests are so inadequate their primary purpose is ‘safetywashing,’ not that future tests would be this, or that we shouldn’t work to improve the tests.
Even so, while the tests are not reliable or robust, I do disagree. I think that we have already gotten good information out of many such tests, including from OpenAI. I also do not think that they are doing much work in the safetywashing department, the labs are perfectly willing to go ahead without that and I don’t think anyone would stop them substantially more without these efforts.
As always, I think it comes down to spirit versus letter. If the labs are not going for the spirit of actual safety and merely want to do safetywashing, we have no ability on the horizon to make such tests meaningful. If the labs actually care about real safety, that is another story, and the tests are mostly useful, if not anything like as useful or robust as they need to be.
Even if you follow the spirit, there is the risk others do not.
Are these tests, even if they become quite good, sufficient? Only if everyone involved takes heed of the warnings and stops. Any one company (e.g. OpenAI) abiding by the warning is not enough. So either each private actor must respond wisely, or the government must step in once the warnings arise.
Emmett Shear’s position here seems wrong. I don’t doubt that there would suddenly be a lot of eyes on OpenAI if Altman or another CEO got fired or otherwise overruled for refusing to proceed with a dangerous model, but as Oliver says there would be a public relations war over what was actually happening. The history of such conflicts and situations should not make us optimistic, if it is only Altman, Amodei or Hassabis who wants to stop and they get overridden.
There are however three related scenarios where I am more optimistic.
The downside risk is that this substitutes for other better efforts, or justifies moving forward. Or even that, potentially, getting ‘risky’ evaluations becomes cool, a sign that you’ve cooked. Which of course it is. If your model is actively dangerous, then that is a very powerful and likely useful model if that risk could be contained. That is always the temptation.
A serious concern is that even if we knew how to do that, we would still need the ability.
Do we have it?
Haydn’s world would be nice to live in. I do not think we live in it?
Right now, yes, perhaps (what about Inflection?) there are only four companies with sufficient datacenter capacity to train such a model without assistance. But one of them is already Meta, a rogue actor. And you can see from this chart that Apple is going to join the club soon, and Nvidia is going to keep scaling up chip production and selling them to various companies.
As Eliezer says, you need a proper regulatory regime in place in advance. The compute reporting thresholds for data centers and training runs are a good start. Better hardware tracking at the frontier would help a lot as well. Then you need the legal authority to be able to step in if something does happen, and then extend that internationally. These things take a lot of time. If we wait until the warning to start that process, it will likely be too late.
In my view, it is good to see so many efforts to build various tests, no matter what else is being done. The more different ways we look at the problem, the harder it will be to game, and we will develop better techniques. Good tests are insufficient, but they seem necessary, either as part of a moderate regime, or as the justification for a harsher reaction if it comes to that.
What we definitely do not have, overall, is any kind of unified plan. We don’t know what we want to do with these evaluations, or in other ways either.
DeepMind gave it a shot too.
This graph, as other variations have before it, makes the key assumption explicit that we will get this ‘safety buffer’ and improvements will continue to be gradual. This is presumably true for a sufficient large buffer, but it might need to be very large.
Did you notice that Gemini Ultra did worse than Gemini Pro at many tasks? This is even true under ‘honest mode’ where the ‘alignment’ or safety features of Ultra really should not be getting in the way. Ultra is in many ways flat out less persuasive. But clearly it is a stronger model. So what gives?
An obvious hypothesis is that these tests are picking up on the damage done to Ultra by the fine-tuning process. But we know from other capabilities tests that Ultra 1.0 is more generally capable than Pro 1.0. So this is saying the test can fail to figure this out. This points to some potentially severe problems.
One or both of these two things must be true:
That is driven home even more on the self-proliferation tasks, why does Pro do better on 5 out of 9 tasks?
This is also a problem. If you only use ‘minimal’ scaffolding, you are only testing for what the model can do with minimal scaffolding. The true evaluation needs to use the same tools that it will have available when you care about the outcome. This is still vastly better than no scaffolding, and provides the groundwork (I almost said ‘scaffolding’ again) for future tests to swap in better tools.
The thread also covers their other tests.
Seb Krier is impressed by the details.
Fundamentally what is the difference between a benchmark capabilities test and a benchmark safety evaluation test like this one? They are remarkably similar. Both test what the model can do, except here we (at least somewhat) want the model to not do so well. We react differently, but it is the same tech.
Perhaps we should work to integrate the two approaches better? As in, we should try harder to figure out what performance on benchmarks of various desirable capabilities also indicate that the model should be capable of dangerous things as well.
Aligning a Smarter Than Human Intelligence is Difficult
Emmett Shear continues his argument that trying to control AI is doomed.
[thread continues]
Technically I would say: Powerful entities generally caring about X tends not to be a stable equilibrium, even if it is stable ‘on reflection’ within a given entity. It will only hold if caring more about X provides a competitive advantage against other similarly powerful entities, or if there can never be a variation in X-caring levels between such entities that arises other than through reflection, and also reflection never causes reductions in X-caring despite this being competitively advantageous. Also note that variation in what else you care about to what extent is effectively variation in X-caring.
Or more bluntly: The ones that don’t care, or care less, outcompete the ones that care.
Even the best case scenarios here, when they play out the ways we would hope, do not seem all that hopeful.
That all, of course, sets aside the question of whether we could get this ‘caring’ thing to operationally work in the first place. That seems very hard.
What Emmett is actually pointing out is that if you create things more powerful than and smarter than yourself, you should not expect to remain in control for long. Such strategies are unlikely to work. If you do want to remain in control for long, your strategy (individually or collectively) needs to be ‘do not build the thing in question in the first place, at all.’
The alternative strategy of ‘accept that control will be lost, but make those who take control care about you and hope for the best’ seems better than the pure ‘let control be lost and assume it will work out’ plan. But not that much better, because it does not seem like it can work.
It does not offer us a route to victory, even if we make various optimistic assumptions.
The control route also seems hard, but does seem to in theory offer a route to victory.
A conflict I hadn’t appreciated previously is pointed out by Ajeya Corta. We want AI companies to show that their state-of-the-art systems are safe to deploy, but we do not want to disseminate details about those systems to avoid proliferation. If you don’t share training or other details, all you have to go on are the outputs.
AI is Deeply Unpopular
Well, not everywhere.
The pattern here is impossible to miss. The richer you are, the less you want AI.
People Are Worried About AI Killing Everyone
Not sure where to put this, but yeah, you do get used to this sort of thing, somehow:
I would probably be much better at Twitter if I took that attitude.
Roon walks through the possibilities. Choose your
doomfighterfate?When you put it that way, they seem to be in clear rank order.
Other People Are Not As Worried About AI Killing Everyone
I keep seeing this attitude of ‘I am only worried about creating smarter, more capable things than humans if we attempt to retain control over their actions.’
I get the very real worries people like Joscha have about how the attempts to retain control could go wrong, and potentially actively backfire. I do. I certainly think that ‘attach a political ideology and teach the AI to lie on its behalf’ is a recipe for making things worse.
But going full door number two very clearly and definitely loses control over the future if capabilities sufficiently advance, and leads to a world that does not contain humans.
Meanwhile others get some very strange ideas about what causes people to be worried about AI. A thousand supposed obsessions, all different.
I can assure Wolf Tivy that no, this is not the central reason people are worried.
Wouldn’t You Prefer a Good Game of Chess?
Eliezer Yudkowsky offers speculation, then I put it to a test.
Publishing my poll results, since you can’t publish only when you get the result you expected:
In both cases, the non-chess group is clearly substantially more in favor of taking AI risk seriously than the chess group. The sample in the second poll is small, something like 12-7. If you believe all the answers are real it is good enough to judge direction versus 76-20, although you have to worry about Lizardman effects.
(You can make a case that, even if a few are hedging a bit at the margin, 4% of respondents is not so crazy – they presumably will answer more often and see the post more often, my followers skew smart and highly competitive gamer, and we have 22% that are over 1600, which is already 83rd percentile for rated players (65k total rated players in the USA), and only 15% of Americans (8% worldwide) even know how to play. The masters numbers could be fully compatible.
In the first poll it is very clear.
There are some obvious candidate explanations. Chess is a realm where the AI came, it saw, it conquered and everything is fine. It is a realm where you can say ‘oh, sure, but that won’t generalize beyond chess.’ It is an abstract game of perfect information and limited options.
There also could be something weird in the fact that these people follow me. That ‘controls for’ chess playing in potentially weird ways.
The problem is, did I predict this result? Definitely not, very much the opposite.
The Lighter Side
We finally found a good definition.
Many people are saying…
I can’t wait.
What makes ‘doomers’ different here is that the name is a derogatory term chosen and popularized by those who are pro-doom. Whereas the others are names chosen by the companies themselves.
There are always, of course, other issues.