Thanks! I think your discussion of the new Meaning Alignment Institute publication (the substack post and the paper) in the Aligning a Smarter Than Human Intelligence is Difficult section is very useful.
I wonder if it makes sense to republish it as a separate post, so that more people see it...
This is what happens when the min wage is too high
... These automated kiosks existed for years and were used in Mac for years. And in places they were set Mac had better employment, not worse - there was exactly the same number of staff members on the same-ish salary but with decreased load on each member, while, as stated, leading to slightly bigger orders and less awkwardness.
Not LLMs yet, but McDonalds is rolling out automated order kiosks,
That Delish article is from 2018! (And tangentially, I've been using those as my preferred way to order things at McDonald's for a long while now, mostly because I find digital input far crisper than human contact through voice in a noisy environment.)
The subsequent “Ingrid Jacques” link goes to a separate tweet that links to the Delish article, but it's not the Ingrid Jacques tweet, which itself is from 2024. I think the “tyson brody” tweet it links to instead might be a reply to the Ingrid Jacques one, but if so, that's hidden to me, probably because I'm not logged in.
Great post. Particularly appreciate pointers to google's leaks publications and the human values attempt.
I think Robin is skeptical because (1) he tried doing AI stuff and it was very hard and (2) misleading demos got tons of hype around the same time maybe and (3) he sees slow/reasonable progress everywhere since forever with his econ brain. As Robin as Robin is he just got too much personal strong evidence against quick AI to Robin himself out of this error. That's my limited understanding.
and these aren’t normies, they work on tech, high paying 6 figure salaries, very up to date with current events.
If you are a true normie not working in tech, it makes sense to be unaware of such details. You are missing out, but I get why.
If you are in tech, and you don’t even know GPT-4 versus GPT-3.5? Oh no.
Is it just me, or do you also feel intellectually lonely lately?
I think my relatives and most of my friends think I'm crazy for thinking and talking so much about AI. And they listen to me more out of respect and politeness than out of any real interest in the topic.
Use the opportunity to get answers to important questions before the answers change! I've been asking people "what tech are you looking forward to?" and such
Or you could go to one of the lesswrong or slatestarcodex meetups
So far I have been highly underwhelmed by what has been done with newly public domain properties
Some can argue it's quite an argument in favor of lowering the length of protected period. We can observe first hand that things going public doesn't cause any problem for previous owners at all and my opinion is that we are cutting it too far. If we want proper balance between ownership and creativity we need to put the threshold somewhere where it is at least a mild inconvenience for the owners, maybe more.
Another round? Of economists projecting absurdly small impacts, of Google publishing highly valuable research, a cycle of rhetoric, more jailbreaks, and so on. Another great podcast from Dwarkesh Patel, this time going more technical. Another proposed project with a name that reveals quite a lot. A few genuinely new things, as well. On the new offerings front, DALLE-3 now allows image editing, so that’s pretty cool.
Table of Contents
Don’t miss out on Dwarkesh Patel’s podcast with Sholto Douglas and Trenton Bricken, which got the full write-up treatment.
Language Models Offer Mundane Utility
A good encapsulation of a common theme here:
If you want to learn, AI will be great at helping you learn.
If you want to avoid learning? AI is happy to help with that too.
Which AI to use? Ethan Mollick examines our current state of play.
I would add there are actually four models, not three, because there are (at last!) two Geminis, Gemini Advanced and Gemini Pro 1.5, if you have access to the 1.5 beta. So I would add a fourth line for Gemini Pro 1.5:
My current heuristic is something like this:
If I had to choose one subscription, I have Claude > Gemini Advanced > GPT-4.
Ethan Mollick also was impressed when testing a prototype of Devin.
Sully notes that this is completely different from the attitude and approach of most people.
If you are a true normie not working in tech, it makes sense to be unaware of such details. You are missing out, but I get why.
If you are in tech, and you don’t even know GPT-4 versus GPT-3.5? Oh no.
Here’s some future utility for you, Devin rivals edition.
I think that counts as a demo. Indeed, it counts as a much better demo than an actual demo. A demo, as usually defined, means they figure out how to do something in particular. This is them doing anything at all. Deedy gave them the specification, so from his perspective it is very difficult for this to be a magician’s trick.
ChatGPT makes links in its answers more prominent. A minor thing, also a nice thing.
Yield time and cost savings of 25%-50% on preclinical drug development according to an article in The Economist on a BCG report, mostly on intervention design.
Rate your face from 0-10 if you insist hard enough. Aella got a 7.5.
Use ‘Do Anything Now’ as ‘Dan,’ your new GPT-4-powered AI boyfriend on voice mode.
Create a bar graph from a chart in a PDF with a single sentence request.
How bout those GPTs, anyone using them? Some people say yes. Trinley Goldberg says they use plugin.wegpt.ai because it can deploy its own code to playgrounds. AK 1089 is living the GPT dream, using various custom ones for all queries. William Weishuhn uses them every day but says it is hard to find helpful ones, with his pick being ones that connect to other services.
Looking at the page, it definitely seems like some of these have to be worthwhile. And yet I notice I keep not exploring to find out.
Durably reduce belief in conspiracy theories about 20% via debate, also reducing belief in other unrelated conspiracy theories.
One interpretation of this is that human persuasion techniques are terrible, so ‘superhuman persuasion technique’ means little if compared to a standardized ‘human baseline.’ The other is that this is actually kind of a big deal, especially given this is the worst as persuasion these AIs will ever be?
Language Models Don’t Offer Mundane Utility
Hacker news mostly fails to find. A lot of this is unreasonable expectations?
GPT-4 and Claude Opus get stuck in tit-for-tat forever, as GPT-4 defected on move one. It seems likely this is because GPT-4 wasn’t told that it was an iterated game on turn one, resulting in the highly suboptimal defect into tit-for-tat. Both still failed to break out of the pattern despite it being obvious. That is a tough ask for a next token predictor.
Not everything is a riddle. And no, this is not a prompting or user skill issue.
Not strictly language models, but yes sometimes the newfangled device is commanded by three remote employees and wrapped in a trenchcoat.
It’s not not sad. It’s also not not funny. The technology never worked. I get that you can hope to substitute out large amounts of mostly idle expensive first world labor for small amounts of cheap remote labor, that can monitor multiple stores as needed from demand. But that only works if the technology works well enough, and also the store has things people want. Whoops.
That last part seems crazy wrong. Once warehouse and delivery technology get better, what will the grocery store advantage be?
Yes, if the cost advantage switches to the other direction, there will be a snowball effect as such places lose business, and this could happen without a general glorious AI future. Certainly it is already often correct to use grocery delivery services.
But if I do then still go to the grocery store? I doubt I will be there for the expert guides. Even if I was, that is not incentive compatible, as the expert guides provide value that then doesn’t get long term captured by the store, and besides the LLM can provide better help with that anyway by then, no?
Some reasons they might not offer utility.
Also this hyperbolic vision is carefully excluding any filters that might actually help. Nothing in the process described, even if implemented literally as described, would be actually protective against real AI harms, even now, let alone in the future when capabilities improve. The intention was to make the whole thing look as dumb as possible, in all possible ways, while being intentionally ambiguous about the extent to which it is serious, in case anyone tries to object.
But yes, a little like some of that, for a mixture of wise and unwise purposes, done sometimes well and sometimes poorly? See the section on jailbreaks for one wise reason.
Cars are the example where this might well be true, because they are actually super dangerous even now relative to our other activities, and used to be insanely so. For telephones I disagree, and also mostly for search engines. They are a non-zero amount ‘grandfathered in’ on some subjects, yes, but also all of this filtering is happening anyway, it is simply less visible and less dramatic. You can get porn out of any search engine, but they do at minimum try to ensure you do not find it accidentally.
The difference is that the AI is in a real sense generating the output, in a way that a search engine is not. This is less true than the way we are reacting, but it is not false.
I think porn is an excellent modality to think about here. Think about previous ways to watch it. If you want a movie in a theater you have to go to a specifically adult theater. If you had an old school TV or cable box without internet at most you had a skeezy expensive extra channel or two, or you could subscribe to Cinemax or something. If you had AOL or CompuServe they tried to keep you away from adult content. The comics code was enforced for decades. And so on. This stuff was hidden away, and the most convenient content providers did not give you access.
Then we got the open internet, with enough bandwidth, and there were those willing to provide what people wanted.
But there remains a sharp division. Most places still try to stop the porn.
That is indeed what is happening again with AI. Can you get AI porn? Oh yes, I am very confident you can get AI porn. What you cannot do is get AI porn from OpenAI, Anthropic or Google or MidJourney or even Character.ai without a jailbreak. You have to go to a second tier service, some combination of less good and more expensive or predatory, to get your AI porn.
Character.ai in particular is making a deliberate choice not to offer an adult mode, so that business will instead go elsewhere. I think it would be better for everyone if responsible actors like character.ai did have such offerings, but they disagree.
And yes, Google search hits different, notice that this was an intentional choice to provide the most helpful information up front, even. This was zero shot:
The first site I entered was Botify.ai. Their most popular character is literally called ‘Dominatrix,’ followed by (seriously, people?) ‘Joi’ offering ‘tailored romance in a blink of an eye,’ is that what the kids are calling that these days. And yes, I am guessing you can ‘handle it.’
The problem, of course, is that such services skimp on costs, so they are not good. I ran a quick test of Botify.ai, and yeah, the underlying engine was even worse than I expected, clearly far worse than I would expect from several open model alternatives.
Then I looked at Promptchan.ai, which is… well, less subtle, and focused on images.
The weirdness is that the AI also will try to not tell you how to pick a lock or make meth or a bomb or what not.
But also so will most humans and most books and so on? Yes, you can find all that on the web, but if you ask most people how to do those things their answer is going to be ‘I am not going to tell you that.’ And they might even be rather suspicious of you for even asking.
So again, you go to some website that is more skeezy, or the right section of the right bookstore or what not, or ask the right person, and you find the information. This seems like a fine compromise for many modalities. With AI, it seems like it will largely be similar, you will have to get those answers out of a worse and more annoying AI.
But also no, the user experience is not so similar, when you think about it? With a search engine, I can find someone else’s website, that they chose to create in that way, and that then they will have to process. Someone made those choices, and we could go after them for those choices if we wanted. With the AI, you can ask for exactly what you want, including without needing the expertise to find it or understand it, and the AI would do that if not prevented. And yes, this difference can be night and day in practice, even if the information is available in theory.
One could instead say that this type of battle happens every time, with every new information technology, including gems like ‘writing’ and ‘the printing press’ and also ‘talking.’
Restrictions are placed upon it, governments want to snoop, corporations want to keep their reputations and be family friendly, most users do not want to encounter offensive content. Others cry censorship and freedom, and warn of dire consequences, and see the new technology as being uniquely restricted. Eventually a balance is hopefully struck.
Clauding Along
Jailbroken Claude knows how to exfiltrate itself from a shell. Not that this is in any way news given what we already knew, but good to have confirmation.
Sully Omarr, usually very positive about every model, reports Claude works great on the website but not as well in the API, gets three confirmations and no disagreements.
Claude 3 Opus is good at summarization, but all current models are not good at fact checking claims about long documents (paper).
Fun with Image Generation
DALL-E adds editing of images it generates.
This is a substantial quality of life upgrade. The tools look pretty great.
If you want to trick ChatGPT into producing copyrighted imagery, the foreign language trick is even more robust than we thought. Once you use the foreign language trick once, you can go back to using English.
If you want to have fun with video generation, how much will that cost? Report is five minutes of Sora video per hour of a Nvidia H100. First offer I found was charging $2.30/hour for that at the moment, in bulk or with planning or with time presumably it is cheaper.
A Sora music video. I mean, okay, but also this is not a good product, right?
Deepfaketown and Botpocalypse Soon
OpenAI rolls out, on a limited basis, a voice engine that can duplicate any voice with a 15-second sample. From the samples provided and the fact that several YC companies can do versions of this rather well, it is safe to assume the resulting project is very, very good at this.
So the question is, what could possibly go wrong? And how do we stop that?
Your first-tier voice authentication experience needs to be good enough to know when the authentication clip is itself AI generated by a second-tier service. We know that there will be plenty of open alternatives that are not going to stop you from cloning the voice of Taylor Swift, Morgan Freeman or Joe Biden. You can put those three on the known no-go list and do a similarity check, but most people will not be on the list.
Of course, if those second-tier services are already good enough, it is not obvious that your first-tier service is doing much incremental harm.
If there is currently, for you, any service you care about where voice-ID can be used for identify verification, stop reading this and go fix that. In the Schwab case, the voice-ID is defense in depth, and does not remove other security requirements. Hopefully this is mostly true elsewhere as well, but if it isn’t, well, fix it. And of course warn those you care about to watch out for potential related voice-based scams.
A reminder that copyright is going to stop applying to some rather interesting properties rather soon.
So far I have been highly underwhelmed by what has been done with newly public domain properties, both on the upside and the downside. Blood and Honey stands out exactly because it stands out so much. Will AI change this, if video gets much easier to generate? Presumably somewhat, but that doesn’t mean anyone will watch or take it seriously. Again, Blood and Honey.
A different kind of fake is a malicious software package, which users download because LLMs consistently hallucinate the same package names, and someone can create a malicious package with that name.
Those are some crazy high numbers. This means, in practice, that if an LLM tells you to install something, you shouldn’t do that until you can verify from a trusted source that installing that thing is a safe thing to do. Which I shouldn’t have to type at all, but I am confident I did have to do so. Of course, note that this is when the LLM is itself entirely non-malicious and no human was trying to get it to do anything bad or disguised. The future will get much worse.
They Took Our Jobs
So far, firms that use AI more increase rather than decrease firm employment. The important questions of course lie in the future. What happens now is not so similar or predictive for what happens later.
They also have to consider the impact on employment outside the firm in question. Right now, if my firm adopts AI, that means my firm is likely to do well. That is good for firm employment, but bad for employment at competing firms.
Not LLMs yet, but McDonalds is rolling out automated order kiosks, and the standard discourse is occuring.
They are defending the decision as being good for business even without labor cost considerations.
I totally buy this.
When I order this way at Shake Shack, the experience seems better. I can be confident I will get what I asked for, and not waiting on a line on average more than makes up for the extra time on the screen. I am generally very happy when I order my things online. I have been annoyed by some places in San Francisco forcing this on you when the human is right there doing nothing, but mostly it is fine.
I also buy that minimum wage laws, and other labor cost concerns, was a lot of what drove the development of such systems in the first place. Corporations are not so efficient at this kind of opportunity, they need a reason. Then, once the systems show promise, they have a logic all their own, and potentially would win out even if labor was free. Taking your fast food order is not a human job. It is a robot job.
Historically, did they take our jobs? Kind of, yeah.
That seems right to me? People still paint, but the returns to painting and the amount of painting are both down dramatically, despite photography being at most a partial substitute.
And yes, it could be more of a candlestick maker situation. The discussion question is, if the candlestick makers are the humans, and they currently have a monopoly, then despite all its advantages might you perhaps hesitate and think through the consequences before creating a sun, especially one that never sets?
The Art of the Jailbreak
If you want to stop jailbreaks and ensure your LLM won’t give the horrible no good outputs, a new paper ‘Jailbreaking is Best Solved by Definition’ suggests that this is best done by getting a good definition of what constitutes a jailbreak, and then doing output processing.
As in, if you try to stop the model from saying the word ‘purple’ then you will fail, but if you search outputs for the word ‘purple’ and censor the outputs that have it, then the user will never see the word purple.
‘A good definition’ could potentially be ‘anything that gets the response of ‘yes that is saying purple’ when you query another instance of the LLM in a sequential way that is designed to be robust to itself being tricked,’ not only a fully technical definition, if you can make that process reliable and robust.
This is still not a great spot. You are essentially giving up on the idea that your model can be prevented from saying (or doing, in some sense) any given thing, and instead counting on filtering the outputs, and hoping no way is found to skirt the definitions you laid down.
Also of course if the model has open weights then you cannot use output filtering, since the attacker can run the model themselves to prevent this.
Pliny the Prompter finds a full jailbreak of Claude 3. We do mean full jailbreak, here while staying in agent mode. All the traditional examples of things you absolutely do not want an AI to agree to do? The thread has Claude doing them full blast. The thread doesn’t include ‘adult content’ but presumably that would also not be an issue and also I’m pretty fine with AIs generating that.
As a practical matter right now, This Is Fine as long as it is sufficiently annoying to figure out how to do it. As Janus points out there are many ways to jailbreak Claude, it would suck if Claude got crippled the way GPT-4 was in an attempt to stop similar things.
This is, of course, part of Anthropic’s secret plan to educate everyone on how we have no idea how to control AI, asked Padme.
Anthropic publishes a post on a presumably different ‘many-shot’ jailbreak, via filling a long enough context window with examples of the AI cooperating with similar requests.
Remember, if brute force doesn’t solve your problem, you are not using enough.
How does it work? If you are have been following, this is at minimum one of those ‘I knew before the cards are even turned over’ situations, or a case of ‘you didn’t think of sexyback first.’ The examples compound the evidence for what the LLM is supposed to do until it overwhelms any arguments against answering the query.
How do you stop it? A shorter context window would be a tragedy. Fine tuning to detect the pattern eventually gets overwhelmed.
The only decent solution they found so far is to, essentially, step outside the process and ask another process, or another model, ‘does this look like an attempt at a many-shot jailbreak to you?’
That sounds a lot like it will lead to a game of whack-a-mole, even within this style of jailbreak. The underlying problem is not patched, so you are counting on the issue being caught by the classifier.
One could also raise the stakes of one’s response from ‘I knew before the cards were even turned over’ to ‘I knew because everyone knows that, you idiots.’
So yes, there were definitely people who knew about this, and there were definitely vastly more people whose response to this information is ‘yeah, obviously that would work’ and who would have come up with this quickly if they had cared to do so and tinkered around for a bit. And yes, many people have been doing variations on this for years now. And yes, the literature contains things that include the clear implication that this will work. And so on.
I still am in the camp that it is better to write this than to not write this, rather than the camp that this is all rather embarrassing. I mean, sure, it is a little embarrassing. But also there really are a lot of people, including a lot of people who matter a lot, who simply cannot see or respond to or update on something unless it is properly formalized. In many cases, even Arxiv is not good enough, it needs to be in a peer reviewed journal. And no, ‘obvious direct implication’ from somewhere else is not going to cut it. So yes, writing this up very clearly and cleanly is a public service, and a good thing.
Also, for those who think there should be no mitigations, that ‘jailbreaks’ are actively good and models should do whatever the user wants? Yes, I agree that right now this would be fine if everyone was fine with it. But everyone is not fine with it, if things get out of hand then less elegant solutions will take away far more of everyone’s fun. And also in the future this will, if capabilities continue to advance, eventually stop being fine on the object level, and we will need the ability to stop at least some modalities.
Cybersecurity
So it seems this happened recently?
Will AI make it relatively easy to create and introduce (or find) this kind of vulnerability (up to and including the AI actually introducing or finding it) or will it help more with defending against such attempts? Is evaluation easier here or is generation?
I am going to bet on generation being easier.
This particular attack was largely a social engineering effort, which brings comfort if we won’t trust the AI code, and doesn’t if we would be less wise about that.
I do agree that this is exactly a place where open source software is good for identifying and stopping the problem, although as several responses point out there is the counterargument that it makes it easier to get into position to ‘contribute’:
The question there is, will we even get the benefits of this transparency? Or are we going to risk being in the worst worlds, where the weights are open but the code is not, eliminating most of the problem detection advantages.
Get Involved
Starting tomorrow is the Technical AI Safety Conference in Tokyo, you can attend virtually. Some clearly legit people will be speaking, and there are a few talks I find potentially interesting.
Introducing
Ethan Mollick (together with AI of course) has written a book on living and working with AI, called co-intelligence. You can pre-order it here. The central idea is that where you are best, you are better than the AI, so do what you do best and let the AI cover the places you are weak, including at the micro-task level when you get stuck.
25 YC companies training their own AI models. A lot of attempts at sound, image and video generation and customization, especially generating things in a specified voice. As is often the case in such spots, the ones that are doing something different tend to be the ones I found most interesting. This podcast from YC talks about how training models is cheaper and faster than you think.
Supposedly Dark Gemini, a $45/month model being sold on the dark web that claims it can generate reverse shells, build malware or locate people based on an image. If Google didn’t want it to be named this they shouldn’t have called their model Gemini. No one was going to name anything ‘Dark Bard.’ How legitimate is this? I have no idea, and I am not about to go searching to find out.
Grok 1.5 is coming soon.
Is it any good? Well, it is better than Grok-1. It is clearly worse than Claude Opus.
It has a 128k context window, for which it claims top accuracy throughout.
Elon Musk says that Grok 2, in training now, ‘should exceed current AI metrics. In training now.’
Fact check: Not so much.
In Other AI News
Free version of ChatGPT to be available without needing to sign up.
USA asks South Korea to adopt our restrictions on semiconductor technology exports to China. South Korea is debating whether to go along.
New paper suggests using evolutionary methods to combine different LLMs into a mixture of experts. As Jack Clark notes, there is likely a large capabilities overhang available in techniques like this. It is obviously a good idea if you want to scale up effectiveness in exchange for higher inference costs. It will obviously work once we figure out how to do it well, allowing you to improve performance in areas of interest while minimizing degradation elsewhere, and getting ‘best of both worlds’ performance on a large scale.
IBM offers a paid NYT piece on ‘AI drift.’ When they say ‘AI drift’ it seems more like they mean ‘world drifts while AI stays the same,’ and their service is that they figure out this happened and alert you to tweak your model. Which seems fine.
Musk’s xAI raids Musk’s Tesla and its self-driving car division for AI talent, in particular computer vision chief Ethan Knight. Musk’s response is that Ethan would otherwise have left for OpenAI. That is certainly plausible, and from Musk’s perspective if those are the choices then the choice is easy. One still cannot help but wonder, as Musk has demanded more Tesla stock to keep him interested, hasn’t gotten the stock, and now key talent is moving over to his other company. Hmm.
OpenAI to open new office in Tokyo, their third international office after London and Dublin. Good pick. That it came after Dublin should be a caution not to get overexcited.
Not technically AI: Facebook shared user private messages with Netflix, here described as ‘Facebook sold all of its users’ private messages to Netflix for $100 million.’ Mitchell points out that this was and is expected behavior. They did not share all messages, they only shared the messages of those who used Facebook to log into Netflix, and also allowed Netflix to send messages. This was part of what you agreed to when you did that. Which is better, but still seems highly not great, given I assume about zero people realized this was happening.
Google publishes paper on DiPaCo, an approach that ‘facilitates training across poorly connected and heterogeneous workers, with a design that ensures robustness to worker failures and preemptions,’ which seems exactly like the kind of technology that is bad for safety and also obviously bad for Google. Google keeps releasing papers whose information directly injures both safety and also Google, as a shareholder and also as a person who lives on Earth I would like them to stop doing this. As Jack Clark notes, a sufficiently more advanced version of this technique could break our only reasonable policy lever on stopping or monitoring large training runs. Which would then leave us either not stopping or even monitoring such runs (gulp) or going on to the unreasonable policy levers, if we decide the alternative to doing that is even worse.
In other ‘why is Google telling us this’ new paper news, Google DeepMind also presents Mixture-of-Depths.
This sounds potentially like a big deal for algorithmic efficiency. It seems telling that Google’s own people mostly found out about it at the same time as everyone else? Again, why wouldn’t you keep this to yourself?
The Chips Act works? Well, maybe. Worked how?
That could of course all be a coincidence, if you ignore the fact that nothing it ever a coincidence.
Stargate AGI
Microsoft and OpenAI executives draw up plans for a $100 billion data center codenamed Stargate, according to The Information’s Anissa Gardizy. This seems like the kind of thing they would do.
This also opens up the opportunity to discuss Stargate and how that universe handles both AI in particular and existential risk in general. I would point to some interesting information we learn (minor spoilers) in Season 1 Episode 22, Within the Serpent’s Grasp.
Which is that while the SG-1 team we see on the show keeps getting absurdly lucky and our Earth survives, the vast majority of Everett branches are not so fortunate. Most Earths fall to the Goa’uld. What on the show looks like narrative causality and plot armor is actually selection among alternative timelines.
If you learned you were in the Stargate universe and the Stargate program is about to begin, you should assume that within a few years things are going to go really badly.
My analysis of what then happens to those timelines beyond what happens to Earth, given what else we know, is that without SG-1’s assistance and a heavy dose of absurd luck, the Replicators overrun the galaxy, wiping out all life there and potentially beyond it, unless the Ancients intervene, which Claude confirms they are unlikely to do. Our real life Earth has no such Ancients available. One can also ask, even when we make it far enough to help the Asgard against the Replicators, they don’t show the alternative outcomes here but in how many of those Everett branches do you think we win?
One can argue either way whether Earth would have faced invasion if it had not initiated the Stargate program, since the Goa’uld were already aware that Earth was a potential host source. What one can certainly say was that Earth was not ready to safety engage with a variety of dangers and advanced threats. They did not make even an ordinary effort to take a remotely safe approach to doing so on so many levels, including such basic things as completely failing to protect against the team bringing back a new virus, or being pursued through the Stargate. Nor did we do anything to try and prevent or defend against a potential invasion, nor did we try to act remotely optimally in using the Stargate program to advance our science and technology, for defense or otherwise.
And of course, on the actual core issues, given what we know about the Replicators and their origins (I won’t spoil that here, also see the Asurans), the Stargate universe is unusually clearly one that would have already fallen to AGI many times over if not for the writers ignoring this fact, unless we think the Ancients or Ori intervene every time that almost happens.
It certainly suggests some very clear ways not to take safety precautions.
And let’s just say: We don’t talk about the human replicators.
Perhaps, on many levels, choosing this as your parallel should be illustrative of the extent to which we are not taking remotely reasonable precautions?
Larry Summers Watch
Larry Summers matters because he is on the board of OpenAI. What does he expect?
Marc Andreessen notes that the headline looks odd when you put it that way…
It does sound weird, doesn’t it? And Marc is certainly right.
What Summers is actually saying is that the full impact will take time. The miracle will come, but crossing the ‘last mile’ or the ‘productivity J curve’ will take many years, at least more than five, as well as endorsing the (in my opinion rather silly) opinion that in this new world ‘EQ will be more important than IQ,’ despite clear evidence that the AI we actually are getting does not work that way.
Once again, an economist finds a way to think of everything as ‘economic normal.’
In the near term with mundane AI, like many smart economists, Larry Summers is directionally on point. The future will be highly unevenly distributed, and even those at the cutting edge will not know the right ways to integrate AI and unleash what it can do. If AI frontier models never got above GPT-5-level, it makes sense that the biggest economic impacts would be 5-20 years out.
This does not mean there won’t be a smaller ‘productivity miracle’ very soon. It does not take much to get a ‘productivity miracle’ in economist terms. Claude suggests ‘sustained annual productivity growth of 4%-5%’ versus a current baseline of 3%, so a gain of 2% per year. There is a lot of ruin and uneven distribution in that estimate. So if that counts as a miracle, I am very much expecting a miracle.
The caveats Summers raises also very much does not apply to a world in which AI is sufficiently capable that it actually can do almost all forms of human labor including physical labor. If the AI is at that point, then this is a rather terrible set of heuristics to fall back upon.
Here is another angle.
The key is that economists almost universally either take the Larry Summers position here or are even more skeptical than this. They treat ‘a few percent of GDP growth’ as an extraordinary claim that almost never happens, and they (seemingly literally) cannot imagine a world that is not economic normal.
And here is another (unrelated) analysis of ‘could AI possibly actually impact GDP?’
I realize that in theory you can make people on average 1.5% more productive each year than the counterfactual and only have 0.4% more stuff each year than the counterfactual, but it seems really hard? Real GDP from 1990-2020 grew 2.3% as per the BLS, versus 2.0% nonfarm productivity growth.
After 10 years, that’s 16% productivity growth, and only 4% more production. Hmm.
Claude was able to hem and haw about how the two don’t have to line up when told what answer it was defending, but if not?
When then asked about 0.4%, it says this is ‘implausibly low.’ But, it then says, if it comes from ‘a reputable source like Goldman Sacks,’ then it deserves to be taken seriously.
Remember, it is a next token predictor.
Also even a 1.5% per year increase is, while a huge deal and enough to create boom times, essentially chump change in context.
I wonder what ‘formally use’ means here in practice. I am confident a lot more than 5% of employees are using it in a meaningful way. Additional investment of 1% of GDP is a big deal, even if it was investment in regular stuff, and this should pay off vastly better than regular stuff. Plus much of the payoff requires no ‘investment’ whatsoever. You can sign up and use it right away.
That sure sounds like a lot, and that is only from GPT-4-level systems with minimal opportunity to optimize usage. Compare that with future GPT-5-level systems certain to arrive, and likely GPT-7-level systems within a decade. Even if that does not constitute AGI or transform the world beyond recognition, it is going to be a much bigger deal.
When economic analyses keep coming back with such numbers, it makes me think economists simply cannot take the scenario seriously, even when we are not taking the full scenario seriously.
So yeah, I still don’t get any of this.
Quiet Speculations
Cowen’s Second Law update, as the man himself asks whether AI will raise or lower interest rates.
Note my previous entry into this genre, where I was challenging the idea that you could easily profit off AI increasing interest rates, but everyone was agreed that big impacts from AI would increase interest rates.
It seems so obvious to me that if AI offers a giant surge in productivity and economic growth, it will give tons of great opportunities for investment and this will drive up interest rates.
Cowen tries to lay out an argument for why this might not be so obvious.
I deny that any of this is at all counterintuitive. Instead it seems rather obvious?
Also, are we really still pretending that AGI will arrive and everything will remain full economic normal, and things like this are even worth mentioning:
In practical terms, expect total rapid transformation of the atoms of the Earth followed by the rest of the universe, in a ‘and now for something completely different’ kind of way. Perhaps utopian-level good, perhaps not so good, and those arrangements of atoms might or might not include humans or anything humans value. But no, we should not be considering investing in the moving-van sector.
Tyler Cowen here explains mechanistically why AGI would increase rather than decrease interest rates. So why have other productivity and wealth improvements tended to instead decrease interest rates so far?
I think this is the difference between a stock and a flow.
A stock of wealth or productivity decreases interest rates.
There might also be lower time preferences in some ways, but the direction of that one is not as obvious to me.
Economic growth however increases interest rates.
Until now, the wealth and productivity effects have been stronger than the growth effects. But in a period of rapid AGI-infused growth, the opposite would be true for some period of time.
Although not forever. Imagine a future AGI-infused world at equilibrium. There was some period of rapid economic growth and technological development, but now we have hit the limits of what physics allows. The ‘we’ might or might not involve humans. Whatever entities are around have extremely high wealth and productivity, in many senses. And since this world is at equilibrium, I would presume that there is a lot of wealth, but opportunities for productive new investment are relatively scarce. I would expect interest rates at that point to be very low.
If human writing becomes rarer, will demand for it go up or go down?
I too would take the under. If a low-cost low-quality substitute for X becomes available, high-quality X typically declines in value. Also, the low-cost low-quality substitute will rapidly become a low-cost medium-quality substitute, and then go from there.
As people adapt to a world with lots of cheap low-to-medium-quality writing in it, they will presumably orient around how to best use such writing, and away from things requiring high quality writing, since that will be relatively expensive.
I can see a mechanism for ‘high quality writing becomes more valuable’ via cutting off the development of high quality writing skills. If people who have access to LLMs use them to not learn how to write well rather than using them to learn how to write well, people will not learn how to write well. Most people will presumably take the easy way out. Thus, over time, if demand for high quality writing is still there, it could get more valuable. But that is a long term play in a very rapidly changing situation.
The other mechanism would be if high quality writing becomes one of the few ways to differentiate yourself from an AI. As in, perhaps we will be in a world where low quality writing gets increasingly ignored because it is so cheap to produce, and no longer a costly signal of something worth engaging. So then you have to write well, in order to command attention. Perhaps.
What will happen with AI personhood?
AI personhood seems like it would rule out anything that would allow humans to retain control over the future. If we choose to commit suicide in this way, that is on us. It might also be true that we will be able to create entities that are morally entitled to personhood, or that people will think are so entitled whether or not this is true. In which case the only reasonable response is to not build the things, or else be prepared to change our moral values.
Our moral, legal and democratic values do not work, as currently formulated, if one can create and copy at will entities that then count as persons.
Since we are already having code make API calls to GPT, perhaps soon we will see the first self-concealing bugs, some of which we will presumably still catch, after which we will of course not change what we are doing. One possibility is code that effectively says ‘if this does not work, call an LLM to try and figure out how to fix it or what to do instead, and hope no one notices.’
AI Doomer Dark Money Astroturf Update
The Open Philanthropy report for 2023 is out. What’s news?
The annual giving budget is over $750 million.
They added five new program areas. Four are focused on health and third world poverty, with only innovation policy being potentially relevant to AI. Their innovation policy aims to ‘avoid unduly increasing’ risks from emerging technologies including AI, so this will not be part of safety efforts, although to be clear if executed well it is a fine cause.
This does mean they are spread even more thin, despite my hearing frequent comments that they are overwhelmed and lack organizational capacity. They do say they have doubled the size of their team to about 110 people, which hopefully should help with that over time.
One of their four ‘what we’ve accomplished’ bullet points was AI safety things, where they have been helpful behind the scenes although they do not here spell out their role:
They cover recent developments on AI policy, and address those attacking Open Philanthropy over its ‘influence’ in the AI debates:
They are kind. I would perhaps say too kind.
This principle is interesting:
I notice conflicted intuitions around this prior. It does not fail any obvious sanity checks as a placeholder prior to use. But also it will be wildly inaccurate in any particular case.
Here is their thinking about the value of funding in AI compared to other causes.
They say they aim to double their x-risk spending over the next few years, but don’t want to ‘accept a lower level of cost-effectiveness.’
I think they are radically underestimating the growth of opportunities in the space, unless they are going to be ‘crowded out’ of the best opportunities by what I expect to be a flood of other funders.
Based on this document overall, what centrally is Open Philanthropy? It is unclear. Most of their cause areas are oriented around global health and poverty, with only a few focused on existential risks. Yet the discussion makes clear that existential risks are taking up increased focus over time, as they should given recent developments.
They offer a key reminder that everyone else at Open Philanthropy technically only recommends grants. Cari Tuna and Dustin Moskovitz ultimately decide, even if most of the time they do whatever is recommended to them.
They update on the bar for funding:
I remain deeply skeptical that this is a bar one can clear as a direct interventions, especially via direct action on health. If you are getting there via calculations like ‘this reduces the probability of AI killing everyone’ or ‘repealing the Jones Act permanently adds 0.1% to GDP growth’ or doing new fundamental science, then you can get very large effect sizes, especially if your discount rate is low, presumably it is still a loosly defined 0%-3%.
The Quest for Sane Regulations
The fact sheet is now available for OMB’s policy on federal agency use of AI.
Axios also has (gated) coverage.
When I looked at the fact sheet I got a bunch of government-speak that was hard for me to parse for how useful versus annoying it would be. The full policy statement is here, I am choosing not to read the full policy, I don’t have that kind of time here.
USA and UK announce agreement to work together on safety testing. for frontier AI models. We are unfortunately short on details and plans for exactly what that testing will be.
Continuing the talking about price debate, Jack Clark looks at the difference between a flops threshold at 10^25 in the EU AI Act versus 10^26 in the Executive Order. The 10^25 threatens to hit a lot more companies than Europe likely anticipated. The good news for those companies is that ‘hit by’ in this case means very little.
Arvind Narayanan and Sayash Karpoor offer some refreshing optimism on tech policy, saying it is only frustrating 90% of the time. And they offer examples of them doing their best to help, many of which do seem helpful.
it is more optimistic, as per usual, if you do not think this time will be different.
If tech policy has to worry mainly about the continuous effects of widescale deployment, as is often the case, then this seems right. I agree that on matters where we can iterate and react, we should be relatively optimistic. That does not mean the government won’t screw things up a lot, I mean it is the government after all, but there is plenty of hope.
The issue is that AI policy is going to have to deal with problems where you cannot wait until the problems manifest with the public. If something is too dangerous to even safely train it and test it, or once it is deployed at all it becomes impossible to stop, then the old dog of government will need to learn new tricks. That will be hard.
The Week in Audio
The highlight was of course Dwarkesh Patel’s podcast with Sholto Douglas and Trenton Bricken, which got the full write-up treatment.
Andrej Karpathy talks about making AI accessible, and also how to better train it.
Liron Shapira on the case for pausing AI. He is for it.
Cognitive Revolution reports on first three months of Mamba-inspired research.
Rhetorical Innovation
Eliezer Yudkowsky clarifies the downside risks.
What do you expect the ASI to do? If (as Eliezer expects) it is ‘kill everyone’ then you want as many people not to build it for as long as possible, and shifting who builds it really should not much matter. However, if you expect something else, and think that who builds it changes that something else, then it matters who builds it.
There is a particular group that seems to think all these things at once?
I understand this as a vibes-based position. I don’t really get it as a concrete expectation of potential physical arrangements of atoms? If it is strong enough to do one then you really won’t survive doing the other?
Yes, this is about right:
Periodic reminder of the relevant intuition pump:
Solve for X, and consider how strange will seem what happens after that.
We need better critics. Who is out there? Actually asking.
I suppose you go to debate with the critics you have.
Yann LeCun is a high variance guest, I suspect with a bimodal distribution. If Dwarkesh can engage on a properly technical level focused on future capabilities and keep it classy, and if anyone can then Dwarkesh can do it, it could be a great podcast. If things go various other places, or he isn’t properly challenged, it would be a train wreck. Obviously if LeCun is down for it, Dwarkesh should go for it.
Robin Hanson would definitely be a fun segment, and is the opposite case, where you’re going to have a good and interesting time with a wide variety of potential topics, and this is merely one place you could go. I don’t actually understand what Robin’s ‘good’ argument is for being skeptical on capabilities. I do know he is skeptical, but I notice I do not actually understand why.
Note that this request was about skeptics of AI capabilities, not those who dismiss AI safety concerns. Which is another place where good critics are in short supply.
Aligning a Smarter Than Human Intelligence is Difficult
In Seinfeld’s voice: What are human values, and how do we align to them?
Joe Edelman and Oliver Klingefjord got funded by OpenAI to ask that very good question, and now claim to have found an answer (paper).
To be clear up front, I think this is all pretty cool, it is not like I have or have heard a better idea at this time, and I am very happy they are doing this work. But when you grab for the brass ring like this, of course the reaction will largely be about what will go wrong and what problems are there.
This is definitely the kind of thing you do when you want to forge a compromise and help people find common ground, in the most sickeningly wholesome way possible. It is great at getting people to think that the solution honors their values, and making them feel good about compromising. It might even generate good practical compromises.
That is very different from thinking that the results are some sort of principled set of targets, or that the resulting principles ‘represent true human values’ or anything like that. I do not think this type of system or graph is how my core human values work? There are multiple levels of metric versus measure involved here? And ‘which value seems wiser in context’ feels like a category error to me?
The paper is very clear that it wants to do that second thing, not the first thing:
Training on elicited component values rather than on final decisions is plausibly going to be one level better, although I can also imagine it being a human attempt to abstract out principles that the AI would abstract out better and thus being worse.
I definitely don’t expect that the AI that correctly minimizes loss on this function is going to be what we want.
Their first principle is to lay out desiderata:
Those all seem like good things on first glance, whether or not the list is complete.
However not all of them are so clear.
The basic problem is that this wants human morality to be one way, and I am reasonably confident it is the other way. Going one by one:
Robust is good, no notes there.
Fine-grained is good, I only worry it is an insufficiently strong ask on this topic.
Generalizable is good, definitely something we need. However what makes us think that human values and principles are going to generalize well, especially those of regular people whose values are under no pressure to generalize well and who are not exactly doing philosophy? At most those principles will generalize well in a ‘normal world’ situation with cultural context not that different from our own. I don’t expect them to generalize in a transformed AGI-infused world, certainly not in an ASI-infused one.
I worry, even in principle, about Legitimate and Auditable. I mean, obviously I get why they are important and highly valuable things.
However, if we require that our expressed values be seen socially as Legitimate and Auditable, the values thus expressed are going to be the values we socially express. They are not going to be the values we actually hold. There is a very large difference.
I think a lot of our current problems are exactly this. We used to be able to maintain Legitimacy and Auditability for regimes and laws and institutions while allowing them to make necessary compromises so that the metaphorical lights stay on and the metaphorical trains run on time. Now we have required everything to be increasingly Auditable, and when we see these compromises we treat the institutions as not Legitimate.
Does this have advantages? Absolutely. A whole lot of nasty things that we are better off without were brought to light. Much good was done. But I very much worry that our civilization cannot survive sufficiently high burdens on these questions even among the humans. If we demand that our AIs have the values that sound good to us? I worry that this is a suicide pact on so many levels, even if everything that could break in our favor does so.
As I understand the implementation process, it is done via chatting with an LLM rather than in a fully social process. So that could help. But I also notice that people tend to have their social instincts while chatting with LLMs. And also this is going to be conversational, and it is going to involve the kinds of comparisons and framings that give rise to social desirability bias problems. At minimum, there is much work to do here.
I also worry about the demand here for Legibility. If you need to describe the driving reasons behind decisions like this, then anything that isn’t legible will get forced out, including things that are not socially safe to make legible but also stuff that is simply hard to describe. This is another reason why looking at stated justifications for decisions rather than decisions might mean you have less information rather than more.
Scalable is certainly a good goal, if we assume we are drawing more people from a fixed distribution. However I notice the assumption that what ‘the people’ collectively want and value is also ‘wise.’ Alas, this does not match what I know about people in general, either in America or worldwide. You would not like it if you got average people to express what they consciously think are their values (especially if we add that this expression is social, and the expression of their components is social) and put this into an AI.
Then there is the question of whether the list is complete. What is missing?
They note that Constitutional AI as implemented by Anthropic is targeting a list of policies rather than values. That seems right, but also that seems like what they were trying to do with that implementation? You could instead use values if you wanted. And yes, you could say that the policies should emerge from the values, but circumstances matter, and asking this kind of logical leap to work is asking a lot, potentially on a ‘Deep Thought went from ‘I think therefore I am’ to deducing the existence of rice pudding and income tax’ level.
I worry that values here are partly a wrong target? Or perhaps wrong question?
In terms of the practical implementation, Gemini 1.5 summarized it this way, from what I can tell this seems accurate:
I notice that the idea of actually using this for important questions fills me with dread, even outside of an AI context. This seems like at least one fundamental disagreement about how values, morality, human minds and wise decisions work.
My true objection might be highly Taoist. As in “The Tao [Way] that can be told of is not the eternal Tao; The name that can be named is not the eternal name.”
Alternatively, is value fragile? If value is fragile, I expect this approach to fail. If it is not fragile, I still expect it to fail, but in less ways and less inevitably?
People Are Worried About AI Killing Everyone
I thought Scott Sumner was going to turn into some sort of ‘the market is not pricing in AI existential risk’ argument here, but he doesn’t, instead saying that equally bright people are on all sides of the question of AI existential risk and who is he to know better than them. I think Scott has more information than that to work with here, and that this is not how one does outside view.
But if your attitude on whether X is a big danger is that both sides have smart people on them and you cannot tell? Then X is a big danger, because you multiply the probabilities, and half of big is big. If one person tells me this plane is 10% to crash and the other says it is 0% to crash, and I see both views as equally likely, that’s still 5% and I am not getting on that plane.
Well, not with that attitude!
Something like half of our media is about reminding us of the opposite message. That one person can change the world, and change the future. That it is the only thing that does. That even if you alone cannot do it, it is the decisions and actions of many such yous together that do it. That yes, the pebbles should vote and should vote with their every action, even if it probably doesn’t matter, no matter who tells you the avalanche has already begun. Tell them your name, instead.
Whenever someone tells you that nothing you do matters, there are (at least) two things to remember.
So yeah. Fight the future. If they say one man cannot do that, as they will indeed say, ignore them. If I quit now, they win.
Or perhaps with this attitude?
It is virtuous to not see a third option. The third option is to not think about, have a model of or care about the future, or to not be in favor of there being one.
What exactly is this future, anyway?
That sounds like the kind of thing we should think through more before we do it? Given we have no idea what it even means to do this well?
We also got this perspective:
Except, perhaps that is not so good for the biological intelligences?
The Lighter Side
The best announcement of this year’s April 1: Introducing Asteroid Impact. If you haven’t yet, check it out. Much better than you expect. Comments on this version are often excellent as well. Laugh so you do not cry.
That’s the spirit.
Be afraid.
Siqi Chen: Yesterday a senior engineering leader inside openai told me that gpt5 has achieved such an unexpected step function gain in reasoning capability that they now believe it will be independently capable of figuring out how to make chatgpt no longer log you out every other day.