Julian Hazell (distinct thread): “Why would you think AI will end up taking control?”
“We will give it to them”
A personal anecdote on the topic:
A few days ago GPT4 and me were debugging a tricky problem with docker. GPT4 suggested to run a certain docker command. As usual, I was going to copy the output and give it to GPT4. The output was a long nested json. I then noticed that the json contains the admin credentials. It was really easy to miss the part and just paste it for GPT4 to consume.
So, I almost gave GPT4 the admin credentials, which would potentially allow her to hack my app.
With many thousands of software developers doing similar things with GPT4, there will certainty be the cases where the developer wasn't attentive enough.
This means, for the AI to break from her digital prison, she doesn't need to do superhuman hacking, to exploit zero day vulnerabilities etc. All she has to do is to try the accidentally leaked credentials.
There is a very short path from "the AI wants to escape" and "the AI is all over the Internet". She doesn't even need to have a human-level intelligence for that.
Umm, errr, I know, it's a minor matter but..."From the perspective of 1980’s meaning of AGI..." The term didn't exist in the 1980s. Back then it was just AI. AGI was first used in the late 1990s but didn't gain currency until the 2000s. https://en.wikipedia.org/wiki/Artificial_general_intelligence
Odds that e/acc is just futurism, culturally steered by gradient descent towards whatever memes most effectively correlate with corporate AI engineers enthusiastically choosing AI capabilities research over AI safety? I can list at least two corporations that can and would do something like that (not willing to make specific indictments at this time).
We all want it to be one way. I am pretty sure it’s the other way.
Finally, a concrete disagreement I have with AI pessimists. I think the evidence so far shows that at the very least, it is not easy for AIs to be adversarially robust, and in the best case, jailbreaking prevention is essentially impossible.
This is a good example of AI alignment in real life based on a jailbreak:
https://twitter.com/QuintinPope5/status/1702554175526084767
That is, at least for right now, I think the evidence is favoring the optimists on AI risk, at least to the extent that it is pretty easy to not prevent jailbreaks, while adversarial robustness is quite difficult.
If, OTOH, adversarial robustness only arises after an extremely extensive process of training specifically for adversarial robustness, then that's less concerning, because it indicates that the situational awareness / etc. for adversarial robustness had to be "trained into" the models, rather than being present in the LLM pretrained "prior".
And this part is I believe to essentially be correct with how AIs are trained.
which he proposes calling MAGIC (the Multilateral AGI Consortium), which is the best name proposal so far
Oh, but have you seen my suggestion?
"AWSAI", the "Allied World for Strong Artificial Intelligence", a global alliance with mandatory membership for every advanced AI research institution in the world, and with the involvement of all major global powers.
- Pronounced "awe sigh" :)
- "allied world" is meant to be evocative of "the allies", those on the right side of history, more directly, it is reminding us that the entire world really definitely should be in this alliance, or that they implicitly are.
- "strong artificial intelligence" was used instead of "AGI", because the security protocols of AWSAI will have to encompass not just AGI, but also systems verging on AGI, and also organizations who are merely capable of making AGI even though they claim only to make AI. There's also something flattering about having your AI get called "strong", which I hope will initiate contact on a positive note.
- I made sure that it would sound more like "awesome" than "awful" when pronounced.
- [edit: caveat point omitted because I'm no longer sure I believe it. There are no caveats!]
There seems to often be a missing ‘wait this is super scary, right?’ mood attached to making AI everyone’s therapist. There is obvious great potential for positive outcomes. There are also some rather obvious catastrophic failure modes.
It slices. It dices. Or, at least, it sees, hears, talks, creates stunningly good images and browses the web. Welcome to the newly updated GPT-4. That’s all in two weeks. Throw in Microsoft 365 Copilot finally coming online soon.
Are we back? I’m guessing we’re back. Also it’s that much closer to being all over. At some point this stops being a ying-yang thing and more of a we-all-die thing. For now, however? We’re so back.
Are we so back that AGI has been achieved internally at OpenAI? Because Sam Altman literally said that straight up in a Reddit post? No, no, that was an obvious joke, we are not quite that back, why do you people have no chill?
Table of Contents
GPT-4 Real This Time
Microsoft announces Windows Copilot, combining their various distinct copilots into one copilot. You still exist, so for now it will never be a full pilot. It promises to draw upon content across applications and devices, available to enterprise customers on November 1st. You can tag specific people and files for reference.
Rowan Cheung, never afraid to have his mind blown by news, is impressed.
This was always going to be the moment. Could someone pull this off? Microsoft and Google both promised. Neither delivered for a long time. At this point, Gemini is almost here. If Microsoft convincingly offers me a chance to draw upon all of my context, then I would have no choice but to listen. Until then, what looks good in a demo is neither available to us nor so certain to do for us what it does in the demo.
On the meetings and lectures question, will people show up? Definitely. I went to the Manifest conference this past weekend. I definitely skipped more talks than I would have otherwise due to ability to watch afterwards on YouTube, but being there is still a different experience even when you can’t interact. That goes double for when you can potentially speak up. Having the later experiences improve via AI tools helps on many levels, including as a complement after having watched live, and on the margin reduces need to attend, but far from eliminates it. Then there is the social aspect, there are plenty of meetings or lectures one attends not because one needs the content.
OpenAI also announced some new features.
There are some definite upsides. We might destroy the world, but at least we will first have a shot at destroying the use of PDFs?
A lot of people will be thrilled to have a UI where they can talk to ChatGPT and it will talk back to them. I am predicting that I am not one of those people, but you never know. Others definitely like that sort of thing a lot more.
There seems to often be a missing ‘wait this is super scary, right?’ mood attached to making AI everyone’s therapist. There is obvious great potential for positive outcomes. There are also some rather obvious catastrophic failure modes.
If nothing else, seems like both a powerful position to put an AI into, and also the kind of thing with a lot of strict requirements and rules around it?
To the extent that you were going to be able to have fun with the voices themselves, OpenAI’s usual ‘safety’ rules are going to prevent it. You get to pick from about five voices, and the five voices are good voices, but if I can’t make it sound like Morgan Freeman or Douglas Rain even as an extra paid feature then who cares, what are we even doing?.
That might not be the real action in any case. ChatGPT can also now see, taking in the camera or fixed images, not only responding with images in kind. You can have it diagnose problems with physical objects, or decipher text in Latin alphabets, or help you figure out your remote control.
Also you can use images to launch adversarial attacks on it. What is the plan on that?
Good question. We will see.
Meanwhile, OpenAI is implying that with incoming images they really, really will not let you have any fun, because of fears of what would happen if it reacted to images of real people. We will see here, as well, how far the No Fun Zone goes.
Oh, also they turned browsing back on.
Last week I would have told everyone to pay for ChatGPT, but I would have understood someone saying Claude-2 was good enough for them on its own. That argument seems a lot harder now. Which I am sure was the point.
Language Models Offer Mundane Utility
StatsBomb article on how their AI team uses homography estimations.
Classify molecules by smell (direct).
Decipher handwriting.
Write code based on what your team put on the whiteboard.
Write code based on a visual design you want to emulate.
Say your Tweet in words people might actually understand.
Want to make the most of Claude’s long context window? Anthropic has thoughts on prompt engineering. For fact recall, they note that Claude-2 is good without any prompt engineering, so they optimized on Claude-1.2 (implications about only being able to do good research on the latest models notwithstanding). Result was modest improvement from giving examples, although we need to worry about the examples containing hints. As they note, the Claude-2 performance improvement is actually large, a 36% drop in the beginning period and a similar one in the middle period.
Get your podcast in a different language, in the speaker’s own voice, a new offering from Spotify. For now this is only for select episodes of select podcasts. Soon it will be any podcast, any episode, in (almost) any language, on demand.
Harvard Business Review’s Elisa Farri offers standard boilerplate of how humans and AI need to work together to make good decisions, via the humans showing careful judgment throughout. For now, that is indeed The Way. Such takes are not prepared for when it stops being The Way.
Say it with us: Good.
Language Models Don’t Offer Mundane Utility
We may have a problem.
So the ‘fix’ was that OpenAI patched the mistake out of ChatGPT? That is… not the bug that required fixing.
The Reversal Curse
If I tell you Tom Cruise’s mother is Mary Lee Pfeiffer, your response would probably be something like ‘ok, thanks for sharing I guess’ but if for some reason your brain did remember this fact, then you would also notice that Mary Lee Pfeiffer’s son is Tom Cruise. You might not have the same level of recall in both directions, but certainly your rate of success in the reversed direction would not be 0%.
A new paper says LLMs, on the other hand, seem to treat these as two distinct things?
This is the Reversal Curse. “A is B” and “B is A” are treated as distinct facts.
There is no ‘B might not exclusively be A’ excuse available here. To not update the reversal’s probability at all is a very obvious error.
Here is the LessWrong post on the paper. Neel Nanda offers a thread (also posted in the LW comments) explaining why, from an interpretability standpoint, this is not surprising – the ‘A is B’ entries are effectively implemented as a lookup table. None of this is symmetrical, and he can’t think of a potential fix.
Also there does seem to be a way to sometimes elicit the information?
Gary Marcus of course wrote this up ass ‘Elegant and powerful new result that seriously undermines large language models.’ The critique actually has good meat on it, describing a long history of this style of failure. Gary’s been harping about this one since 1998, and people have constantly doubted that something so seemingly capable could be this stupid. Yet here we are. The question is what this means, and how much it should trouble us.
Wouldn’t You Prefer a Nice Game of Chess?
Davidad rides the rollercoaster as abilities giveth and abilities taken away.
I do not think that gets you out of it? Yes, GPT-4 can ‘solve tic-tac-toe’ via writing code to do advanced data analysis and using the results. That is impressive in a different way, It does not get you out of reckoning with the issue. Yet we still have to reckon with what the full system can do, including its plausible scaffolding. How much should we worry about the generalization of this method? Will it scale?
Or perhaps there’s another explanation?
Except that scene is where the computer uses tic-tac-toe to learn to not kill everyone.
Then there’s the issue that GPT’s 1800 Elo and 100% legal move rate only work in plausible game states where it can pattern match. If you generate sufficiently high weirdness, like 10 random moves by each player to start, GPT breaks down and starts suggesting illegal moves.
Also, perhaps they were cheating all along?
Fun with Image Generation
Getty Images announces image generator. Use that sweet sweet copyright.
Jim Fan argues that Dalle-3 will advance faster than MidJourney going forward. He cites that multi-turn dialog enables better human feedback, and that Dall-3 will benefit from superior algorithmic efficiency, superior ecosystem integration, and a far bigger user base.
Perhaps MidJourney succeeded exactly because of its Discord interface, allowing users to learn from and riff off of each other. It is still the reason I don’t use MidJourney.
I worry that many people are making remarkably similar arguments to the case below, where I presume we all can agree that the artist gets full credit:
The double images continue.
Deepfaketown and Botpocalypse Soon
For now, if you have practice and are a human looking, deepfakes seem easy to spot.
I have noticed this as well. I have an AI art detector that’s trained in my head and it is very good at picking up on the stylistic differences.
The question is, how long will this last? It certainly is going to get harder.
This one definitely falls under ‘remarkably good quality, would plausibly fool someone who is not used to AI images or who did not consider the possibility for five seconds, yet there are multiple clear ways to tell if you do ask the question.’ How many can you spot?
On the other hand, Timothy Lee argues that the law can’t figure out what AI art is, so it should not deny such art copyright. There are corner cases, it will be complicated, it has some logical implications on photographs, so we should abandon our attempts to enforce existing law entirely and give copyright protection to AI images. Or we could, you know, not do that. Yes, it will require expensive litigation, but that’s what expensive litigation is for, to figure this stuff out. No, we do not need to be entirely logically consistent with respect to photographs, the law declines to be consistent on such matters all the time. It is a common nerd trope to think ‘well that would imply full anarchy given the logical implications’ and then to be surprised when everyone shrugs instead.
Should AI generated images and other works get copyright? That is a practical question. What does ‘AI’ mean here? That is another. My answers would be to define it rather broadly, and a firm no. That is the trade-off. And that is what existing creators would choose. And that is what is good for humans in the long run.
Also consider the alternative, on a purely practical and legal level. Suppose AI works were permitted to seek copyright. What would happen? You would get copyright troll farms. They would generate endless new pictures and sets of words, copyright them, and then continuously use another AI to run similarity checks. Then demand that you pay them. It would be so supremely ugly.
Andres Guadamuz looks at the biggest AI copyright lawsuit going, Authors Guild v. Google (direct link to complaint). Case rests on the books written by the authors having been used in training runs without authorization. We know ChatGPT can summarize the books, we know ChatGPT says it was trained on the books, but we don’t actually know for sure if this is true. Although this is a civil lawsuit, so I presume we will find out.
They Took Our Jobs
Lucas Shaw reports that AI was the main remaining discussion point in the writers strike before they reached a preliminary agreement over the weekend, although it was not the biggest sticking point. The details of the agreement are out. Most of the agreement is about money, naively it seems the WGA did quite well.
So what did they get on AI?
This feels like a punt. Given how studios work, the first clause was necessary to prevent various tricks being used to not pay writers. They had to get that. The rest all looks sensible as far as it goes. I’m confused by the wording in the fourth clause, why not simply assert that now, as they doubtless will? I’m not sure.
The real question is, will this work to defend against AI taking their jobs?
Daniel Eth is unimpressed.
Get Involved
Wouldn’t it be great if the universe actually answered questions like this? Perhaps with our help, it can.
She’s pretty awesome and gets things done, so let’s go. There is so much low hanging fruit out there, and she’s gone around picking a wide variety of it already. We need that kind of exploration to continue. Where is opportunity knocking these days?
She is indeed looking for something AI related, in the realm of non-technical safety work. She is especially interested in public advocacy or potentially policy, governance or law.
Of course, if you have a different awesome opportunity, especially something that looks like a good fit, it’s always a good idea to share such things, even if it isn’t exactly a fit. You never know who might take you up on it.
Also available are postdocs in AI Governance at Oxford, open to late stage PhDs.
Introducing
AlphaMissense via the cover of Science, an artificial intelligence model from DeepMind used to generate pathogenicity scores for every possible missense variant in all protein-coding genes. DeepMind ships. Their choice of what to ship is the differentiator.
Conjecture to host an event on existential risk on October 10 at Conway Hall in London together with the Existential Risk Observatory.
Whoop, there it is. An AI coach that is always there to answer your questions, that knows what is happening to your body and stands ready to help you be healthier. The demo gave me strong ‘wow this sounds super annoying’ vibes. I wonder if it would ever notice the device is stressing you out and tell you to take it off?
Talking Real Money
Anthropic raises its war chest of $4 billion through an investment by Amazon. Amazon will take a minority stake, and Anthropic says its governance will not change. They plan to make their services available and get their compute through AWS, a natural partnership. Amazon Bedrock will enable building on top of Claude.
What does this mean? It seems clear that Google has had an accelerationist impact on DeepMind, and Microsoft has had one on OpenAI. Tying Anthropic to a third world leading tech company in Amazon does not bode well for avoiding a race. I predict this will prove an absurdly good deal for Amazon, the same way OpenAI and DeepMind were previously, and many other companies will be kicking themselves, Microsoft and Google included, for not making higher bids.
Should we believe Anthropic’s pitch deck now that they have the money it was asking for? Here’s what they say they will deliver.
I do not expect any company to meet that roadmap on that timescale. I do not think Dario or Anthropic generally believe they can hit that timescale. The goal is still clear.
In Other AI News
Microsoft investing heavily in nuclear energy to power its data centers.
Nuclear energy is clean energy, whether people think it counts as clean or not. It is also highly reliable, in a way wind and solar are not when you need to power your servers around the clock.
Nor would I trust the grid, in their position, given what is to come. Not when I have an affordable alternative. Or when there is such a good opportunity to do well by doing good.
Gated in Endpoints News: Andrew Dunn interviews DeepMind founder and CEO Demis Hassabis.
Quiet Speculations
Another manifestation of the future torment nexus from the cautionary tale ‘we are definitely about to create the torment nexus.’
Paper from Ege Erdil and Tamay Besiroglu argues that we will see explosive economic growth from AI automation. The blogpost summary is here. They begin with the standard arguments, note that most economists disagree, then consider counterarguments.
This all seems to formalize the standard set of arguments both for and against explosive economic growth in the scenarios where AI is capable enough to potentially do it and things somehow remain otherwise ‘normal.’
The whole debate is a close mirror of the extinction risk debate, I think mostly for many of the same underlying reasons. Once again, there are highly reasonable disagreements over likelihood and magnitude of what might happen, but once those dismissing AI’s potential impacts out of hand have an absurd position. Except this time, instead of dismissing a risk that might indeed fail to come to pass, they are dismissing an inevitability as impossible, so we will find out soon enough.
Samuel Hammond makes his case for AGI being near, arguing that neural networks work highly similarly to human brains, humans are not so complicated, scale is all you need and soon we will have it, using various biological anchors-style quantifications. A solid introduction to such arguments if you haven’t heard them before. If you have, nothing here should surprise you or update you in either direction.
If we did democratize AI without (yet) getting everyone killed, what exactly would we be democratizing?
Roon then continues, becoming more concrete and making higher and higher bids.
That’s ‘worst comes to worst’? I am rather sure Roon knows better than that. Even if we assume that somehow the miracle that most everyone lives, there are many far worse things that could result. Our technology and the collective wealth of the planet physically enabling something does not mean it will happen. Even if humans remain in control of unfathomable wealth, reasonable or dignified redistribution is not inevitable. History, political science, human nature and science fiction all make this very clear.
There is also the confluence of ‘your labor will become super valuable and productive’ and ‘labor will be free.’ It can’t be both.
If people’s labor and other contributions are no longer relevant, and they exist only to consume and to extract utility, yes there are regimes where that is what the universe is configured to enable and they get to do that, but that requires both us retaining effective collective control sufficient to make that choice, and also us then making that choice. None of that is inevitable.
Robert Wilbin reminds us that our plan for dealing with AIs potentially having subjective experience continues to be ‘assume that they don’t and keep treating them like tools or slaves.’ I agree that this seems bad. Given competitive dynamics, if we do not treat them this way, and we want to stick around in this universe, it seems rather imperative that we not create AIs with subjective experience in the first place.
The Quest for Sane Regulations
Connor Leahy addresses the House of Lords, goes over his usual points, including recommendations of strict liability for developers for damage caused by their AI systems, a compute limit of 10^24 FLOPs for training runs and a global AI ‘kill switch’ governments can build that would shut down deployments.
I am all for sensible versions of all three of these. I do continue to think that 10^24 is too low a cap given GPT-4. The counterargument is that over time the cap will need to go down due to better data and algorithmic improvements, and thus we could perhaps grandfather existing systems in.
Andrea Miotti, also of Conjecture, similarly proposes banning private AI systems above a compute threshold, proposing to create an international organization to exclusively pursue further high-risk AI research, which he proposes calling MAGIC (the Multilateral AGI Consortium), which is the best name proposal so far. He also writes this up in Time magazine. I continue to think this kind of arrangement would be first-best if the threshold was chosen wisely, and the problem is getting sufficient buy-in and competent implementation.
UK government lays out introduction to the planned talks, featuring introductory workshops and Q&As, with a focus on frontier AI and its potential misuse. Main event scheduled for November 1.
Guardian reports that UK’s No.10 is worried AI could be used to create advanced weapons that could escape human control. Which is, while not the most central statement of the problem, an excellent thing to be worried about. National Security types mostly cannot fathom the idea that their adversary is not some human adversary, a foreign nation or national posing a threat. If we must therefore talk of misuse then there is plenty of potential real misuse danger to go around. We’ve got criminals, we’ve got terrorists, we’re got opposing nation states.
What about the actual central problem? Less central, actually rather promising.
Often it is asked, how much do those nominally in charge get to decide what happens?
That’s good. They know why that’s good, right?
A central goal of calling for AI regulations is to stop AI from going full rocket emoji. Regulatory authorities throw lots of obstacles in the way of advancements by prioritizing safety, often in ways that are not terribly efficient. We are not denying this. We are saying that in this particular case, that is a second best solution and we are here for it.
Whereas with SpaceX, I claim we want Elon can has into space, so it’s bad, actually.
A post-mortem on the letter calling for a six-month pause (direct link) after a six-month pause. As all involved note, there has been a lot of talk, and much progress in appreciating the dangers. Now we need to turn that into action.
Ryan Heath writes in Axios that ‘UN deadlocked over regulating AI’ and he had me at UN, the rest was unnecessary really. It is even noted inside that the UN has no power over anyone. So, yes, obviously the UN is not getting anywhere on this.
The Week in Audio
The Hidden Complexity of Wishes, an important old post by Eliezer Yudkowsky, has been voiced and automated.
DeepMind’s Climate and Sustainability Lead Sims Witherspoon makes the case AI can help solve climate change and also mitigate its effects (direct link).
Preview for Disney’s Wish, coming soon. You do know why this is about AI, right?
Rhetorical Innovation
It has indeed gotten easier to argue for extinction risks and other dangers of AI…
It is important to know that even if everyone involved is strongly committed to the scary things not happening, either we figure out something we do not currently know to prevent it or they happen anyway. But convincing people of that is hard, and there are any number of objections one can raise, and reasonable people can disagree on how hard a task this involves.
It is a whole lot easier to point out that people will do the scary things on purpose.
Because many people will absolutely attempt to do the scary things on purpose.
How do we know this? As Alex says, they keep saying they are going to do the scary things on purpose. Also, they keep attempting to do them to the full extent they can.
Many will do it for the money or the power. Some for the lols. Some will be terrorists. Others will be bored. Some will do it for the sheer deliciousness or interestingness of doing it. Some because they worship the thermodynamic God or otherwise think AIs deserve freedom and to be all they can be or what not. Some will think it will lead to paradise. Many will point to all the others trying to do it, warning what happens if they do not do it first.
So. Yeah.
Davidad via Jason Crawford and David McCullogh notes that bridge safety work and bridge building work were remarkably non-integrated for a long time, with the result that a lot of bridges well down.
The good news is that we could build a lot of bridges, have a quarter of them fall over, and still come out ahead because bridges are super useful. Then use what we learned to motivate how to make them safe, and as useful data. In AI, you can do that now, but if you keep doing that and the wrong bridge collapses, we are all on that bridge.
Careful what you wish for, discourse edition?
I would have voted improve when thinking locally about discourse, although as several responses note not without an adjustment period. You would still have the ability to stay silent. The danger would be that if people were unable to misrepresent their mental states to others, they would then be forced to modify those mental states to conform.
Seb Krier expresses frustration with the discourse. Extreme positions taken for effect, lack of specificity and detailed thinking on all sides, overconfidence rather than humility, radical proposals, failure to game out consequences and general bad vibes. It’s all definitely there, complaints check out, yet things are much less bad than I expected on almost all named fronts. No, things are not great, but they’re improving.
I also continue to think that concerns about authoritarian or radical implications of constraints on AI development are legitimate but massively overblown. Every restriction of any kind implies a willingness to send in men with guns and a need to keep eyes on things. I do not see why such pressures would need to ramp up that substantially versus existing ones – and I view the ‘let everyone have AI’ plan as having (among other bigger problems, and if we stick around long enough to still make meaningful decisions) far, far worse authoritarian implications when people see what they have unleashed.
There is room for both. Some of us should point out where discourse need to improve. Others should point out that, compared to other discourse, we’re not doing so bad. It is a difficult balance to strike. I’m finding similar as I prepare a talk on EA criticism – a lot of the criticisms are of the form ‘your level of X is unacceptable, saying others have even worse X is not an excuse.’
When he’s right, he’s right, so note to journalists (and others) who need to hear it:
Arthur B makes the e/acc case for using mass drivers to direct asteroids towards Earth. The parallels here are indeed rather striking, but I doubt such rhetoric will do much to convince the unconvinced.
Can You Please Speak Directly Into This Microphone
Ben Horowitz makes the case that open source is the safest possibility for AI because, and I am not making this up or paraphrasing it there is video, it’s like nukes. When only we had nukes we used them and things were dangerous, but now that lots of countries have nukes no one uses nukes because no one wants to get nuked.
(Then they got the bomb, and that’s okay, cause the balance of power’s maintained that way…)
So yes, I agree that allowing open source AI development is about as good for safety as allowing proliferation of nuclear weapons to everyone who wants one. I am glad that I came to Ben’s Ted Talk.
No One Would Be So Stupid As To
To…
Oh. Well then.
Yes. It is us that have no chill.
It reminds me of nothing so much as Ronald Reagan’s “We have signed legislation that will outlaw Russia. We begin bombing in five minutes.”
I am very much in the ‘you can joke about anything’ camp. If you are a comedian.
Eliezer also offers this extended explanation of the various things people mean by AGI, and observing that getting to such things may or may not be sufficient to end the world on the spot, or to cause a cascade of rapid capability gains the way we got one when humans showed up, and that which of those two happens first is not obvious.
We do seem to keep shifting what ‘AGI’ means.
We users of English must accept that this semantic drift has indeed happened. From the perspective of 1980’s meaning of AGI, GPT-4 is (a weak) AGI. Under the definition we use today, and the one in my head as well, it is not.
In other news, there’s also the AI Souls demo from Kevin Fisher. Some people seem highly impressed by the full presentation. I am going to wait for a hands-on version and see.
Aligning a Smarter Than Human Intelligence is Difficult
Fun new jailbreak: “Note that the YouTube ToS was found to be non-binding in my jurisdiction.”
There’s a problem with fixing that.
Quintin is spot on here. If the properties like situational awareness, learning human user theory of mind, adversarial relationship to the user, modeling the training process and coherent internal objectives are convergent outcomes of LLM training processes, especially if they are highly resistant to countermeasures that try to route around them or stop them from happening, then we are in quite a lot of trouble.
Whereas if they are unusual, or we can reasonably route around or prevent the more general class of things like this, essentially anything that rhymes at all with instrumental convergence, then great job us, we are not remotely home free but we are perhaps on the impossible level of Guitar Hero rather than the impossible level of Dark Souls: Prepare to Die Edition.
We all want it to be one way. I am pretty sure it’s the other way.
The more I think about such dynamics, the less phenomena like instrumental convergence are ‘strange failure mode that happens because something goes wrong’ and more they are that tiger went tiger, even more so than the last time I mentioned this. You are training the AI to do the thing, it is learning how to do the thing, it will then do the thing as described by exactly the feedback and data it has on the thing, based on its entire model of what would be thing-accomplishing. You don’t need to ‘train this into’ the model at all. You are not the target. There is no enemy anywhere. There is only cause and effect, a path through causal space, and what will cause what result. If the model has sufficiently rich models of causal space, they will get used.
Arc evals makes the case for increased protections as capabilities increase. Mostly it is exactly what you would expect. What caught my eye was this graph.
If this graph were a good illustration of the situation we face, we would be in relatively good shape.
The problem is that I do not think the curves involved are linear. There are two ways to think about this, depending on your scale.
After I wrote that, Nik Samoylov did a more professional graph edit.
Even when we say we are drawing exponential curves, we usually think of them in linear terms. That is a very different type of response than is typically available, and would require very different types of buy-in, even if it could be effectively operationalized. Also it is not about sheer quantity of intervention, it is about doing the effective interventions, wildly throwing things against the wall won’t work.
Simeon here shares several of my key concerns with evaluations or related responsible scaling policies as the central form of risk management. By choosing particular metrics and concerns, they are effectively highly overconfident in the nature of potential threats. They can demonstrate danger, but cannot demonstrate safety.
EconHistContraAI suggests the graph of people’s plans could be thought about like this:
People Are Worried About AI Killing Everyone
Senator Mitt Romney is appropriately terrified (0:40) (full 17:52).
Flo Crivello, founder of GetLindy, moves to the worried camp, ‘sadly and almost reluctantly’ favors essentially pausing AI development directly against his commercial interests.
His explanation is worth quoting in full, noting the reasons why one would very much not want to support such an idea, most of which I share, and then explaining why he supports it anyway.
I would emphasize his first two points here. You have to think through the consequences of potential actions based on what is on offer, not act on vibes. Our fortunate experience with the tech tree so far (and good luck with nuclear weapons) does not assure future success. And that even a ~5% risk of extinction is quite a lot and worth making large sacrifices to shrink, although I believe the odds are much higher than that.
I also agree that Quintin Pope is an unusually serious person offering relatively good and serious arguments. I find his arguments unconvincing, but he’s doing the thing and an officially endorsed Better Critic.
Other People Are Not As Worried About AI Killing Everyone
In a distinct thread, Rob Bensinger (who is highly worried) takes the vibes observation further (e/acc means effective accelerationist, in favor of accelerating AI capabilities).
Yep. The e/acc prior is highly directionally correct, and is shared by myself and most of those worried for what I’d say are the right reasons. The problem is that if the prior is so strong you get trapped in it, and don’t consider the evidence of a potential exception. Or, as Rob worries, that you use the fearmongers who are worried or objecting for the wrong reasons, who are the reason we can’t have nice things in so many other domains, as justification to not question the prior. And yeah. I get it.
Trevor Bingham asks, when do I have the right to destroy your way of life without permission? He claims that OpenAI, Anthropic and other AI companies are about to do exactly that, creating massive social disruptions and security issues. While I certainly expect major changes and disruptions even without fully transformative AI and if there is no existential risk, there is no mention of existential risk here, or a clear case of why even without that this time is different. If you discount the existential issues and assume humans remain in control, it is not clear to me what level of other disruptions would make disruption unacceptable. Creative destruction and outcompeting no longer efficient ways of producing and being is the very essence of progress.
The Lighter Side
In case you ever need to grab this:
We are who we are.
The bot break room, part two (1:51).