Meta, possibly a site bug:
The footnote links don't seem to be working for me, in either direction: footnote 1 links to #footnote-1, but there's no element with that id; likewise the backlink on the footnote links to #footnote-anchor-1, which also lacks a block with a matching id.
From the 9 options, I'm in the number 6 camp. I feel so frustrated at how badly the world seems to be coordinating around this, and how surprised so many people seem about this when I feel like it's been a straightforward extrapolation of events for at least the last four or five years.
My favorite quote from all of this: "Perhaps we can at least find out who thinks hooking AIs up to nuclear weapons is an intriguing idea, and respond appropriately."
If we can’t get it together, perhaps we can at least find out who thinks hooking AIs up to nuclear weapons is an intriguing idea, and respond appropriately.
I unironically find it an intriguing idea, because it seems like it's a potential solution to certain games of nuclear chicken. If I can prove (or at least present a strong argument) that I've hooked up my nuclear weapons to an AI that will absolutely retaliate to certain hostile acts, that seems like a stronger deterrent than just the nukes alone.
After all, the nightmare scenario for nuclear arms strategy is "the enemy launches one nuke", because it makes all actions seem bad. Retaliating might escalate things further, not retaliating lets your enemies get away with something they shouldn't be getting away with, etc. etc.
edit: I am of course aware that there are a myriad of things that could easily go wrong when doing this, so please do not take my comment as any kind of advocacy in favor of doing this.
Possible typos:
The reason I dislike the phrase "God-like AI" is because "God" has so many positive connotations for some people; they hear "God" and think benevolence, wisdom, love for humanity, etc. That's probably one reason why some people have trouble conceptualizing existential risk: We've saddled AI with too many positive attributes.
I'm not sure there's a better word or phrase, but "mastermind" comes to mind. A mastermind AI is a hostile alien entity capable of outmaneuvering humanity at every turn, one that will come to dominate over us and re-order the universe to suit its own selfish purposes.
The big capabilities news this week is a new ChatGPT mode (that I do not have access to yet) called Code Interpreter. It lets you upload giant data files, analyzes them automatically, can even write papers about its findings, many are impressed.
The big discourse news is that Geoff Hinton, the Godfather of AI, quit his job at Google in order to talk freely about the dangers of AI, including and especially existential risks. He now expects AGI within 5 to 20 years, and has no idea how we might plausibly have this go well.
Oh, and Kamala Harris is meeting with the Google, OpenAI, Microsoft and Anthropic CEOs about AI safety, I am sure policy will get off to a great start.
Also a wide variety of other stuff.
Table of Contents
Language Models Offer Mundane Utility
Go on a VR journey to Skyrim, where the NPCs have unique scripts, use ChatGPT and have memory of previous conversations. Seems quite cool, yet not doing the things I most want it to do, which is to have those conversations and NPCs flow into a dynamic world, change what the NPCs do, or at least to have the information in such conversations be key to various goals in the world. Probably out of reach of a mod.
Create a quiz on the first try to help you learn about the arrondissements of Paris. Whatever those are. All my comedy instincts say not to check.
OpenAI spam detection for all the new OpenAI-generated spam. Great business.
Develop a plan using best practices to get yourself addicted to running and lose 26 pounds, including being inspired to improve your diet. Congratulations, Greg Mushen! The plan was essentially start small, concrete easy steps, always feel like you could be doing more. The question to ask is, how much of the work was done by ChatGPT developing a great plan, and how much of it was Greg taking an active roll and feeling excited and invested in the project, and adjusting it to be a good fit for himself while telling a story that ChatGPT did the work?
(On a personal note, I got myself into running once, that lasted about two months and then I found I’d permanently ruined my knees and I should never run again. Damn. Now I’m back on an elliptical machine.)
Soon, perhaps, answer the fifth copy in a row of the same exact call from your grandmother with dementia.
That’s better than no system. It still imposes many hours of lost time on everyone involved, without providing any actual value to the grandmother beyond what would be provided by an AI-powered call. Remember, the calls are always the same. If the calls are always the same, it should be easy to set up an AI system, using the family’s voices, to duplicate past copies of the conversation.
Generate a super cringy version of a first date at a Chinese restaurant. Reading this brief AI-generated conversation made me wince in physical pain. Both in a ‘I want out of this at any cost’ way and in a ‘no that is not how humans talk who wrote this’ kind of way. Both seem the result of the method acting being requested here.
Get output in JSON format, by only asking for the values and then writing code to put that data into the JSON format.
Write a suite of test cases for BabyAGI by feeding its code into GPT-4-32K.
Write SQL queries for you. The question is whether, if you don’t already know SQL, you should use this as an opportunity to learn SQL, or an opportunity to not learn SQL. In general, the right answer on the margin is to choose to learn such things.
Get an A- on a complexity theory final. I’ve been more impressed when the AI passes tests I can’t pass without learning something first, like this one, than ones that I could.
ChatGPT’s New Code Interpreter
People are rather excited by what the new Code Interpreter mode can do.
Analyze data by uploading a 10,000 row dataset into the OpenAI Code Interpreter and… ‘talking to the data’? Highly curious how any of this works for me in practice, as I don’t have access to either Plug-Ins or Code Interpreter yet (hint, hint, OpenAI).
Or as Ethan Mollick describes it.
Ethan then wrote up his thoughts into a full post, It Is Starting to Get Strange. He uploads a 60MB census data file, has it analyze the data and then write an academic paper about its findings that Ethan found solid if unspectacular, within seconds. Post also covers plug-ins and browsing, which are not as well fleshed out… yet.
Shubham Saboo is excited by this new GPT-4 as data scientist.
Jason is also excited. Seems more like a data interpreter than a code interpreter at heart, still awesome.
How big a deal is being able to easily do this kind of data analysis? It is very hard to know without access and banging around on it. Certainly I would love to be able to do these types of things more quickly.
Introducing (New AI Products)
Box AI, document focused AI for business, including both analysis and composition. Most popular plan is $300 per year per user with a three user minimum.
Via SCPantera, introducing RimGPT, which will use ChatGPT and Azure to talk to you about what is happening in the game while you are playing. You’ll need your own API keys for both.
ChatMaps to search Google Maps for things like restaurant recommendations. I will test such things out once I get plug-in access – I did get GPT-4 API access by asking nicely but I’m still waiting on plug-ins.
Introducing Pi, a personal AI, from InflectionAI, not sure why. Based on GPT-3? Reports are not promising. Where did their $225 million go? Allie Miller speculates it is fine tuning a model for each user, but how would that even work?
HuggingFace’s new GPT clone. Why create a GPT clone if it is going to have the same ‘ethical standards’ and content restrictions as the commercial version? Discussion question: What is the best model that hasn’t been battered down into the ground by RLHF to destroy its creativity and keep it from doing anything ‘inappropriate’?
Fun With Image and Sound Generation
Google publishes paper about optimizing stable diffusion to run on phones.
Jim Fan predicts generative AI will move to sound waves in 2023, making artists more productive.
MidJourney advances to version 5.1.
The Art of the SuperPrompt
Nick Dobos points out that being ‘good at prompting’ has been with us for a long time. People don’t know what they want or how to communicate it, and the prompting problem applies to humans the same ways it applies to AIs.
Ethan Mollick gives basic advice on writing good prompts.
When people say ‘there is no secret’ one should always suspect this means ‘there is no secret I can easily explain to you or share with you’ or more simply ‘you have to do the work, and you won’t get a big return from marginal efforts even if they’re clever.’
That is how I read this, and Ethan confirms it explicitly at the end, saying ‘the secret is practice.’ Which is the world’s all-time favorite unhint.
Are there secret prompts? Are there ways to inquire of GPT-4 or other LLMs that are much more effective than a default mode?
Yes. I am very very confident that you can do much better with the magic words.
I am also very confident that ‘try various stuff and see what happens and tinker’ will get good results, but industrial strength experimentation and Doing Science To It and bespoke engineering will do much better.
That does not mean that anyone knows what the resulting magic words would be, or that you could once they were found paste those words in every time and have that work out for you.
It does mean that I expect, over time, people to get better at this, and for this to greatly improve mundane utility results, often because such prompts are baked into system messages or other behind-the-scenes instructions without the baseline user even knowing about it.
This is one of those places where it is harder to protect one’s moat. If I figure out a better way to prompt, it might be worth millions or even billions, and I would see very little of that unless I wrapped it up in a package of other things in a way that couldn’t easily be copied. At minimum, I’d need to do a bunch of other things that disguised my insight. Also, the current interfaces make using sophisticated prompts annoying. So the natural solutions might be open source here.
Here’s a SuperPrompt based on The Art of War, you can of course substitute another book (someone suggests The Prince).
File that one under the heading ‘cool that you got it to do that, no idea why you’d want that other than cheating on your homework?’ Maybe for your role as a motivational speaker?
AskData.co ($$$) is exactly the kind of thing I expect to see rapidly advance. The linked thread talks about how to layer on really quite a lot of tokens before every single interaction, including framing everything in the past tense as a description of a past interaction that went perfectly (to invoke auto-complete and match the web). Months of fine tuning a massive prompt to get it to do what the author wants it to do, as reliably as possible. Seems clearly like it will eventually be The Way, if you have enough use for the resulting configuration.
Another thing I expect to be big that is now in the wild: Mckay Wrigley suggests using the system prompt to provide a list of potential commands (e.g. Order Pizza, Draft Email, Summarize Research Paper, Say Goodnight, Pay My Bill). Then perhaps you don’t need the LMM to do the action, only to pick it. You can use a parser to see that the LMM wants to execute a command, then call something else to execute it.
SnackPrompt.com offers a wide variety of prompts to try out. I am somewhat dismayed by the contrast between what I would find useful and what people are continuously training prompts for. Aspire to better, everyone.
Here’s a fun new prompt.
Here’s the James technique for getting the AI to give you actual probabilities.
Full prompt is here.
Full prompt (replace ‘birds aren’t real’ with whatever you like):
Deepfaketown and Botpocalypse Soon
Tencent presents Deepfakes-as-a-service, only $145. As usual, the truth is the best lie, deep fake yourself today.
An AI-generated movie preview, I suppose. You can simultaneously be impressed with some details, and notice how little actual movement or creativity is involved.
Woman on TikTok gets torrent of DMs of fake nudes of her, with watermarks removed, clearly made from her fully clothed photos.
The surprising thing is that this isn’t happening more often. This is a pretty terrible experience, we are going to see more of it, and they are going to be less obviously fake. In this case the woman could describe a lot of things the pics got wrong, but in a year I doubt that will be the case.
Walking through analysis of a suspected fake picture of a Russian soldier. Details matter, one is looking for ones that don’t make sense. I do worry that this pattern matches a little too well to how conspiracy theorists think and thus as the errors get smaller people will increasingly dismiss anything they dislike as obviously fake.
The For Now Accurately Named FTC Blogpost Trilogy
Michael Atleson continues to bring the fire via blog posts.
So if you design something to ‘trick’ people into making ‘harmful’ choices, the FTC can come after you. ‘Manipulation’ is not allowed when it causes people to ‘take actions contrary to their intended goals.’
This seems like another clear case where humans are blatantly violating the law constantly. Under normal conditions, the laws are mostly good, because the FTC exhibits good discretion when deciding who to go after, via reasonable social norms. The problem is that when AIs are involved, suddenly everyone involved is more blameworthy for such actions (in addition to the problem where the actions themselves are more effective).
Where is the line? When there is product placement on television, we do not know it is an advertisement. Will we apply very different rules to AI? What exactly is and isn’t an ad in the context of talking to a chatbot?
Allow me translate this. Warning, everyone: DO NOT HIRE anyone to do ethics or responsibility. If you do, you can’t fire them.
So, more time wasted on ‘training sessions’ in order to check off boxes, then.
I do think Michael Atleson means well. I love his writing style and his bringing of the fire. Fun as hell. In practice, I expect any enforcement actions he engages in to be good for the world. I do wish he was even better.
The Real Problem With AI is Badwordism
Rob Henderson writes in City Journal about how ChatGPT has a left-leaning bias, and this is to him the main thing to know about the new tech.
That’s… not what those statistics mean? Someone who is censoring their opinions is very different from someone lying to you. The idea that someone might not censor some of their political opinions implies to me that you don’t actually have any political opinions, all you’re doing is copying those around you. Otherwise, yes, you are going to reach some conclusions you know better than to talk about. That doesn’t mean others can’t trust what you do say.
It always amazes me what some people think the world is about. Yes, of course, people will learn what ChatGPT will and won’t say to them. That doesn’t mean that finding out what is censored will be the primary thing people do with it, even in potentially censored topics. I mean, I suppose one could use the AI that way. The results would be hopelessly conservative, in the real sense, telling you to never say anything interesting about a very wide array of topics.
In Rob’s world, the elites will go around modifying future ChatGPTs, which will primarily serve to then teach people new ideological opinions. He thinks that the exact list of dictators GPT-4 will praise on request without good prompt engineering, versus the list for GPT-3.5, is meaningful and important.
Here is a sign of how differently we see the world, his ending paragraph:
Whereas my view of ideological opinions expressed on Musk’s new Twitter versus old Twitter, if you exclude opinions about Elon Musk, is best expressed in meme form.
Go Go Gadget AutoGPT What Could Possibly Go Wrong
I kind of feel like if you are the one building the DoNotPlay chat, you shouldn’t have $80 per month in useless subscriptions lying around to be found. I do this scan once a year anyway as part of doing my taxes. If you do still have these lying around, the scan does seem useful. Certainly the part where it handles the cancellations is great.
This one I definitely do appreciate.
I am very supportive of the whole DoNotPay principle. It would be great if people stopped getting milked for useless subscriptions, and couldn’t price discriminate against people who don’t know to or don’t dare negotiate, and got punished for bad service. There are of course also worries, such as there being no way to know if the Wi-Fi on that flight worked or not.
Eliezer’s thoughts:
I would humbly suggest that we might add one more:
Cons: – Gave an AutoGPT access to your email, bank account and credit report.
Oh. Yeah. That.
In Other AI News
Cass Sustein speculates on whether AI generated content enjoys first amendment protections. Courts have increasingly been in the habit of ‘making up whatever they want’ so I can see this going either way. My presumption is that it would be best to treat AI speech for such purposes as the same as human speech, or failing that doing so if there exists a human author in some fashion, a ‘mixing of one’s labor,’ that would normally be due to whoever sculpted the prompt in question, or could be the result of minor edits.
OpenAI sells shares at $27 billion to $29 billion dollar valuation.
DeepMind releases work on robot soccer. If your model of the future involves ‘robotics is hard, the AI won’t be able to build good robots’ then decide for yourself now what your fire alarm would be for robotics.
Facebook publishes A Cookbook of Self-Supervised Learning, illustrating that this is a concept with which they have little practical experience. 1
Prompt injection in VirusTotal’s new feature. Right on schedule. Picture is hard to read so you’ll need to click through.
Chegg, which provides homework help, was down 37% after hours after reporting terrible earnings due to competition from ChatGPT. You love to see it, people selling the ability to cheat on your homework losing out to a service letting you cheat for free.
Samsung joins the list of companies banning generative AI (Bloomberg) on their company devices, and warns employees not to upload sensitive data if they use generative AI on other devices. Headline says ‘after data leak’ yet from what I can tell the leak is ‘OpenAI has the data in its logs’ rather than anything that is (yet) consequential.
It is good and right to be worried long term about such things, and to guard against it, yet my presumption is that it is unlikely anything will happen to such data or that it will leak to other customers or to Microsoft or OpenAI. How much of a productivity hit is worthwhile here? Which is exactly the kind of thinking that is going to push everyone into doing increasingly unsafe things with their AIs more generally.
Daniel Paleka sums up the last month in AI/ML safety research. If he’s not missing anything important, wow are we not keeping pace with capabilities. The only real progress is this little paper on predicting emergent memorization. I suppose it’s something, it’s still not much.
China
Reuters claims “China’s AI industry barely slowed by US chip export rules.”
Which is it?
If it’s 20% slower for 200% of the price, that is a lot worse. It is also not clear (here) how big a supply of such chips will be made available.
If even that slower chip is a huge improvement on current chips, sounds like chips are indeed serving as a limiting factor slowing things down.
None of this seems like ‘barely slowed’ or ‘minimal effects’?
From April 19 via MR: Digichina provides a variety of additional reactions to China’s new draft AI regulations. Everyone here says the regulatory draft is consistent with past Chinese policies, and expect something similar to it to take effect and be enforced, and for them to be substantial obstacles to development of AI products. There is some speculation that this could pressure Chinese companies to develop better data filtering and evaluation techniques that eventually prove valuable. That’s not impossible, it also very much is not the way to bet in such spots.
What Even is a Superforecaster
Last week, I covered a claim that superforecasters were not so concerned about AI.
I then quoted a bunch of people noticing this result didn’t make any sense, and various speculations about what might have gone wrong.
It is always important to remember that ‘superforecaster’ is a term someone made up.
That someone is Phillip Tetlock. He identified a real phenomenon, that some people are very good at providing forecasts and probability estimates across domains when they put their minds to it, and you can identify those people. That does not mean that anything Tetlock labels as ‘superforecasters say’ involves those people, or that the work was put in.
We now have two inside reports.
Here’s Peter McCluskey.
If this was the core reasoning being used, then I feel very comfortable dismissing the forecasts in question. If you are thinking that AI will probably not be transformational within the century under a baseline scenario, what is that? At best, it is a failure to pay attention. Mostly it feels like denial, or wishful thinking.
At minimum, this tells us little about how much we should worry about developing transformational AI.
Here’s magic9mushroom:
We must await the final report before we can say anything more. For now, these reports update me towards not taking the forecast in question all that seriously.
Sam Altman Interview with the Free Press
The problem with having generalists do such interviews is that they end up covering a lot of already well-covered ground. There are usually still things one can learn, if only because one hadn’t noticed them earlier.
For example, I had forgotten or missed Altman’s trick of substituting the word software for the word AI. This does seem useful, while also serving to disguise many of the ways in which one might want to be concerned.
It’s always good to see how the answer about potential dangers is evolving.
The thing that worries me most here is the word misuse. Misuse is a good thing to worry about, yet misses the bulk of the important dangers.
The same goes for his answer on safety protocols, and on how to handle concerns going forward.
Sam Altman thinks, or at least is saying, you should trust the government more than him. Perhaps we should believe him. Or perhaps that’s a good reason to trust Sam Altman more than the government, if those are your only choices.
The idea to start with requiring government audits of large training runs seems eminently reasonable. If nothing else, in order to have the government audit a training run, we would need to be able to audit a training run.
I always wonder when I see people point to the fact that airlines are safe. Are they suggesting we should treat AI systems with similar safety protocols and standards to those we apply to air travel?
Potential Future Scenario Naming
Scott Aaronson and Boaz Barak propose five possible futures.
My first instinct was I’d simply call AI-Fizzle ‘normality,’ partly to remind us that most scenarios are very much not what we think of as normality. Even normality won’t be that normal, we already know (some of) what GPT-4-level-models plus tinkering could and will do.
Alternatively, we could call the fizzle scenario Futurama. As anyone who watched Futurama can attest, the whole premise of the show is that the world of Futurama in the year 3000 is exactly the same as the world Fry left behind in 2000. Yes, you have robots and aliens and space travel, and it’s New New York and Madison Cube Garden and so on. So what? On a fundamental level, you call it ‘the future’ but nothing has changed. The world of Futurama in its first season is arguably less alien to the year 1999 than the world of 2023 is now, in terms of the adjustments required to live in it, and the lived experiences of its humans.
How does Aaronson describe his Futurama?
That explains why he called it Futurama. If AI fizzles from here, in the sense that core models don’t get much better than GPT-4, and all we can do is iterate and build constructions and wrappers and bespoke detailed systems, then my guess on impact is exactly this, on the level of the other three revolutions.
There is some room in between, where we can keep AI going a bit and still things can look kind of normal while things worthy of names like GPT-6 exist. I’d still count those as fizzles – if civilization ‘recognizably continues’ the way Scott is imagining it in his post for an indefinite period, that implies AI fizzled.
So this is the same scenario, except with the ‘and things go well’ condition, as opposed to his AI-Dystopia where he adds ‘and that’s terrible.’
Arthur Brietman agrees.
Thus, I’d say call the good version Futurama, call the bad world Dystopia.2
Which one would we get? That would be up to people. For any such set of tech details, there are multiple equilibria and multiple paths. As Scott notices, opinions on the (D&D style) alignment of our own world differ, they likely would continue to differ in such futures. My baseline is that things get objectively much better, and people adjust their baselines and expectations such that the argument continues.
That baseline world misses out on the promise and most of the value offered by Scott’s Singularia, but I would expect its value or utility to be high relative to historical levels. The other worry would be non-AI existential risks, which would threaten to add up over time even thought the per-year risk is likely very low.
I found Scott’s description of Singularia interesting. It’s always tough to figure out ‘what to do if you win,’ and that’s an underappreciated section of the problems ahead.
The correct summary of this is ‘Ian Banks’s The Culture.’3
The Culture is by far the most popular reference point when I’ve asked how people envision a future where AI doesn’t fizzle and then everything works out.
I have decidedly mixed feelings about this class of outcome.
I worry that such universes are very much out of human control, and that humans are now leading lives without purpose, with no stakes and nothing to strive for, and I am not sure that letting humans enter virtual world simulations, even with Matrix-style amnesia, makes me feel all that much better about this. There’s something decidedly empty and despairing about the Culture universe from my own particular human perspective. What are the stakes? It’s cool that you can switch genders and see the stars and all that, still doesn’t seem like the exercise of vital powers.
The character in that universe I identify with the most is, of course, the protagonist of The Player of Games, which felt like it was written about a younger version of me, except of course for that one scene that sets the plot in motion where he does something I’d never do in that situation. After that? Yep. And that feels real, with stakes and lots of value, from his perspective, except that the way we got there was going somewhere outside of the culture. At some point maybe I should say more, although it’s been a long time since I read it.
Don’t get me wrong. I’d take the outcome there for sure. The Minds seem like they plausibly are something I would value a lot, and the humans get a bunch of value, and it’s not clear at all how one does substantially better than that. If I end up valuing the Minds a lot, which seems at least possible, then this is plausibly a great scenario.
An important thing to notice, that I was happy to see Aaronson incorporate in his model: Aaronson doesn’t see a possible path where humans retain control of a future that contains AGIs that don’t fizzle out in capabilities.
This is not so uncommon a perspective. It’s very difficult to see how humans would retain control for long in such scenarios.
What’s frustrating is that often people will envision the future as ‘AI keeps advancing, humans stay in control’ without any good reason to think such an outcome is in the possibility space let alone probable or a default.
Without that, the world we get is the world the AIs decide we get. They configure the matter. That vast majority of the configurations of matter don’t involve us, or are incompatible with our survival, whether or not ‘property rights’ get respected along the way.
Thus there is a very Boolean nature to the outcomes on the righthand side of the graph. There is no listed ‘things turn out vaguely okay’ scenario, whereas in the normal-looking worlds such outcomes are not only possible but common or even expected. In theory, one can imagine worlds in-between, where humans are allowed some fixed pool of matter and resources, under survival-compatible conditions. The ‘kept as pets,’ the ‘planetary museum’ or what not, I don’t think one should put much probability space, or substantial value or hope, in such places, they’re almost entirely born of narrative causality and the need to find a way things might kind of be fine.
The attempts to ‘thread the needle’ are things like Tyler Cowen’s insistence that we must ‘have a model’ of AIs and humans and how we will handle principle-agent problems. If I interpret this as AI fizzling slightly later than a full fizzle, where the AIs max out at some sort of sci-fi-style variation on humans, like in the show Futurama, you can sort of squint and see that world, other than the world in question not actually making sense? If it’s not a fizzle, then you get to enjoy that world for weeks, months or maybe even years before you transition to the right-hand side of the graph.
I do still object strongly to the name Paperclipia, on two highly related counts, in addition to objecting to the description along similar lines.
Here’s Eliezer’s comment to this effect.
One could, perhaps, divide Scott’s Paperclipia into two classes of outcome.
What he is imagining, in its broader form, we might call The Void. A universe (technically, a light cone, or the part that aliens haven’t touched) devoid of all that we might plausibly value. Whatever the surviving AIs configure the available matter into might as well be paperclips for all we care, nothing complex and rich and interesting that we assign value to is going on. Or a world in which neither humans nor AIs survive.
As Eliezer points out, we can get to The Void even if the utility function determining the future is itself complex, so long as it is best satisfied by an uninteresting process or steady state. Almost all possible utility functions have this property, or are this plus ensuring that state is not in the future disturbed by aliens one has not yet encountered. Most configurations of matter, I do not care about. Most goals, I do not care about. One does not by luck get to the types of worlds Eliezer is aiming for in the above quote, or the types I would aim for, when highly powerful optimization is done.
(You do get at least one such world, at least for now, from whatever led to us being here now, although it seems like it took really a lot of matter and energy to do that.)
The other possibility is Codeville. That the AI wipes us out (whether or not this is sudden or violent or involves current people living out their natural lives matters little to me), while doing something we might plausibly value. Perhaps Minds come to exist, and are doing lots of super-complex things we’d never understand, except they don’t mysteriously have a soft spot for humans. Maybe they compete with each other, the same way we do now, maybe they don’t. Perhaps simulations are run that contain value. Perhaps real beings are created, or allowed to evolve, that would have value to us, for various reasons. Who knows.
A key source of important disagreement is how to view possible Codeville.
The Robin Hanson view, as I understand it, is that Codevilles are the good futures we can actually choose, the relevant alternative is The Void. Yes, Futurama might be something we can choose, but he sees little difference between The Void and Futurama, as we’d be stuck in static mode on one planet forever even in the best case.
All the value, he says, lies in Codevilles, why do you care that it’s AI and not human?
I reject this. I do not think that it would ‘make it okay’ to replace humans with whatever computer programs are the most efficient at using compute to get more compute, even if current humans had their property rights respected and lived to old age. I think that some changes are good and some changes are bad, and that we have the right and duty to reject overall world changes that are bad, and fight against them.
Are there some possible Codevilles I would decide have sufficient value that I would be fine with such results? Perhaps. I do not expect the default such worlds to count.
Think Outside of the Box
Yes, yes, we now know we were all being stupid to think there would be precautions people would take that an AI would have to work around.
Joshua gets ten out of ten for the central point, then (as I score it) gets either minus a million for asking the wrong questions.
This is how deep the problems go, even if there exist solutions. Joshua correctly headlines that we are going to hand over power to the AIs in order to get efficiency gains. That if your plan is ‘don’t give the AI power’ then your plan is dead.
Then his response is to go domain by domain in Hayekian fashion, ‘at the object level,’ and build context-specific tools to guard against bad outcomes in individual domains? That seems even more dead on arrival than the original plan. How could this possibly work out? Hand over power to increasingly intelligent and capable models given power and told to seek maximalist goals with minimal supervision, and solve that with better data sets, process models and ‘safety constraints?’
Either you can find some way to reliably align the AIs in question, as a general solution, or this path quite obviously ends in doom. I am confused how anyone can think otherwise.
If you don’t think we can align the AIs in question, ‘build them and then don’t hook them up to the power’ is no longer an option. The only other option, then, is don’t build the AIs in question.
They Took Our Jobs
Timothy Lee explains he is not worried about mass unemployment due to AI, because software didn’t eat the world and AI won’t either, there will be plenty of jobs where we prefer humans doing them. I broadly agree, as I’ve noted before. We have a huge ‘job overhang’ and there will be plenty we want humans to do once they are free to do other things and we are wealthy enough to hire them for currently undone tasks.
Such dynamics won’t last forever if we keep pushing on them. None of this tackles the existential longer term questions if AI capabilities keep going. My core takeaway on the whole They Took Our Jobs issue is, essentially, that if we have a real problem here, we have so much bigger problems elsewhere, if we don’t solve those bigger problems we won’t miss the jobs, and if we solve those bigger problems then we won’t too much miss the jobs.
The fantastic Robert King (creator of The Good Wife, Evil and more, I’m quite enjoying Evil) links to an extensive Twitter thread dismissing ChatGPT via Team Stochastic Parrot, and also notes another issue.
There have indeed been some rather wicked lawsuits out there for creative works that are similar to previous creator works. Sometimes it is obvious the new work is indeed ripping off the old one, sometimes it’s ‘who are you kidding with that lawsuit’ and sometimes it’s ambiguous. Other times, the old work is very obviously ripped off and everyone agrees it’s fine except there’s an argument about royalties.
Ironically, this is a very good use case, I would expect, for ChatGPT, where you can do a two-step.
Then, if the answer to #2 comes back yes, consider changing elements until it’s a no.
Certainly if you are releasing commercial music without running an AI check, no matter what process created your music, you be asking for it.
As for the thread’s parrot claims, things like ‘no insight, no themes, just legos?’ ChatGPT as BuffyBot? All it does is string together things it has seen? That AI doesn’t know the word ‘hello’ can do five different things at once?
Aside from the obvious ‘well then it will fit right in as a Hollywood writer these days?’
I understand that some people want it to be one way. It’s (largely) the other way.
Even more than that, people like Nash, the author of the thread in question, want it to be the one way by law of physics, that ‘wiring a bunch of video cards together and feeding it math’ cannot possibly result in anything new. I have bad news, sir.
A testable hypothesis. Let’s ask GPT-4.
It is not entirely the other way. At least not yet. The idea that ChatGPT could take the place of Robert King any time soon is ludicrous. We can use ChatGPT as part of the creative process, nothing more. If companies like OpenAI keep using RLHF to stifle creativity and rule out various types of thoughts like they’ve been doing? It’s going to be quite a long while before AI can substantially replace writers and creators. No one is (yet, I hope) ‘betting Hollywood’ on AI in any meaningful way, if anything they’re doing their old accounting tricks. If they are actually crazy enough to bet it all on AI, oh no.
My understanding is that one issue in the writers’ strike is that the writers want to ban AI from getting writing credits, and to prevent studio executives from doing things like creating god-awful AI-written material to claim initial authorship and then paying ‘rework’ rates to writers to turn it into something human and tolerable. The writers’ position is that AI can be used as a tool by writers, but not take a job on its own.
This seems like a very reasonable position to me. The AI absolutely cannot replace the writers any time soon, and what they are trying to prevent here is contract arbitrage. They should use their leverage to prevent the studios from gaming the current payment system to screw the writers by having the AI technically do the parts that get the most compensation. That’s what unions are for, after all.
I want to be clear: I don’t mean any of this as a knock on the writers. It is the studios and audiences that have made it clear that what they want are remixes and sequels and retreads and reboots, over and over, over-optimized schlock all around. That’s not the fault of the writers, if no one wants to make anything good, the fact that you can make something good and the AI can’t? That won’t save you.
Mostly, though, my understanding is that the writers’ strike is about compensation for streaming, and it’s a good old fight over how much they will get paid. As someone who wants higher quality writing and who writes a lot, I hope the writers win.
Quiet Speculations
Punch cards were not great. Talking in natural language has many advantages. Also disadvantages. In terms of being able to get started, it is simple. In terms of getting the most out of the interface? It is not at all ‘simple.’ Prompt engineering exists because getting LLMs to live up to their potential is insanely complex. Have you tried to debug the English language? In the context of all word associations and LLMs? Yikes.
An easy prediction is that the cycle will continue. Did you see the sample interfaces for Microsoft Copilot? Notice how they were full of menus and buttons. Yes, I want to use natural language, I also want tons of bespoke prompting and scaffolding and framing to take place under the hood. Most of the time, I want to be coding in English even less than I want to be coding in Python. Oh to have something type safe.
Jeffery Ladish is worried about the Agency Overhang problem, where we have LLMs that are superhuman in some ways while missing other capabilities, most importantly agency. What happens if you managed to give them the kind of agency humans have?
The problem is that the plan of ‘carefully experiment with agents because they are risky’ is in all its forms very clearly out the window. We do not have that option. Jeffery says don’t rush to build agentic systems.
The problem is still that if such agentic systems are inevitable to the extent that there is a way to do them, and that is going to happen relatively soon, then we might as well fix the agency overhang now. If fixing it kills us, we were already dead. If it does bad things sort of killing us, that can alert us to the dangers and help us avoid them. If it does neither, we can get mundane utility from it and study it to improve.
Derek Thompson in The Atlantic says AI Is a (productive) waste of time.
I don’t think Tyler Cowen specifically would change his opinion, based on my model of his model. I do think many others would change their minds. To convince Tyler Cowen, you would need to convince him we will be able to prosper and innovate without AI.
I would generalize life extension to generally giving people hope and belief in the future generally. There is certainly some ‘build AI so I won’t die.’ There is far more ‘build AI because I can’t imagine the future going well otherwise’ or ‘build AI because the risks of not doing so are even bigger.’ Want to save the world? Repeat the Jones Act.
Kelsey Piper points out that the benefits of smarter-than-human AI would be tremendous if everything worked out – this isn’t people risking the planet ‘over trivial toys’ as she puts it. It’s people risking the planet over quite valuable things. True that.
AI will bring the 24th century crashing down on the 21st century. The problem is that, in general, when things hundreds of years beyond you crash down upon you, it does not go well. Even if the 24th century is relatively well-intentioned.
Ruby at LW speculates on what things might look like in 2025. GPT-6 in everyone’s ear and unemployment is at 10%-20% and everyone has bigger things to do than worry about it. I would eagerly bet on under 10%.
Miles Brundage pushes against the term ‘God-like AI’ and Eliezer Yudkowsky agrees.
One could argue that future people will find AI mundane because once they wouldn’t there aren’t likely to be many future people left to find it anything. Also, one can argue that the ancients found the Gods mundane in this way. You go to the Temple of Apollo, you offer a sacrifice, and perhaps you’re kind of in awe but mostly you are trying to profit maximize and it is Tuesday.
I do agree that invoking Gods has downsides, yet I do not think it is trivializing religion, and the name serves the important need of clearly and concisely indicating a very large gap in capabilities that means You Lose. AGI has gotten highly ambiguous, and ASI has its own problems while definitely not reliably indicating a large capabilities gap. If anything, I agree with the Eliezer quote about CEV, that the issue with saying ‘God-like AI’ is that if this AI meets God in its journey, God will be cut.
People Are Worried About AI Before It Kills Everyone
UK’s outgoing chief scientist Patrick Vallance says AI could be as transformative as the Industrial Revolution, is worried about impact on jobs.
It does not seem that he has thought through what would happen if AIs ‘start to do things that you really didn’t expect.’ He does have some wise suggestions:
This statement was super weird:
An initial decrease in economic output from the Industrial Revolution? Really?
Dan Schwartz paints a refreshingly concrete picture of a possible future he worries about.
I certainly have questions.
The world being described seems to have lost all ability to verify the origins of messages or phone calls, such that people ‘won’t respond to unverified texts’ at all. Not only ‘won’t give you the bank information’ paranoia, pure ‘I won’t talk to such a thing’ paranoia.
The verification systems seem insane, even in a relatively paranoid world. You don’t give out your Amazon password without the code word, or realistically even with the code word, sure. You also don’t need to verify identify to tell someone about a giant news story where cars are crashing all over the place.
How did things get that bad? What are people worried about happening? What is actually happening here? Computers and phones are being hacked that often? Spoofing technology has won out over identification? It’s not clear to me how we get there from here or why we should expect this to happen.
I’d also note that if the world did get to be as described here, the logical response would be that one would want to physically co-locate with people you could trust. Anyone capable of the operations described above would not, unless their poverty was highly binding, be outside physical travel range for all of their known associates.
Similarly, why are there no authoritative news cites one can trust on the level of ‘do you need to worry about crashing objects today,’ in this world? It seems like Wired should be able to preserve reasonable journalistic basics here, or if not them then someone else – presumably every news source in the world is focusing on these satellites and car crashes. Why does this man go to Hacker News to get info on such a mainstream, basic story?
He says he doesn’t trust the Wired story because it was obviously GPT-assisted. Well, that sounds like a strong incentive to keep one’s writing assistance highly non-obvious. Society adjusts to such things.
The trading stuff seems very confused. If NASDAQ trading was taking the form of a sine curve, yeah sure something is wrong with some AI system somewhere, and some people’s AI trading systems are in the process of losing a lot of money, but that’s no reason to halt trading for weeks. Let Jane Street Capital and friends buy low and sell high until whoever did this goes bust. Problem solved. Similarly, if the NYSE is down 10% and that’s a typical day, either the world is actually much weirder and accelerating much faster than described above, or the stock market has suddenly become highly inefficient, in a way that wouldn’t last long before the people who keep their heads get big enough to fix things.
I certainly refuse to believe this is a random Tuesday, where you have a bunch of different industrial-strength AI programs that have access to all the major news sites of 2026 and anything else they want, and they are split as to whether it is physically safe to go outdoors or be on the road, and the NYSE is down 10%. If so, again, some much stranger things are happening that aren’t being described.
I am a short-term mundane utility optimist. I think scenarios like the one above are highly unlikely, and that the activity above of giving concrete detail makes this increasingly clear.
Another person very worried about the short term and also the long term is Michael Cuenco, who in Compact invokes Dune and calls outright for a Butlerian ‘Jihad against AI.’ It is a confused piece, in the sense that Michael notices the existential threat posed by AI, and also most of his focus is on economic disruption and job loss. Where he claims the duty of Democracy to rise up and sabotage productivity and wealth in order to ensure people get to work for a living.
Until we get to the parenthesis and things get more complicated, one can reasonably say, so? Compared to taking farmers from a third of people to 2%, that’s nothing, and there will be plenty of resources available for redistribution. Stopping AIs from doing the work above seems a lot like banning modern agriculture so that a quarter of us could plow the fields and thus earn a living.
To those laughing at the white-collar targets? He warns, oh, you’re likely next.
So the theory is that it would be a dire threat to the economy if suddenly we saw massive increases in productivity, and the United States contained vastly more desired goods and services.
He does notice the ‘China problem,’ focusing on defense rather than economic concerns, and suggests China would have the same control and regulation concerns we would so it would be down for mutual limits, especially in the areas of defense.
I always have highly mixed feelings when seeing such calls. I share the existential worries, yet the reader is intended to mostly respond to the threat of They Took Our Jobs – the idea that AI might be too valuable and useful, And That’s Terrible. Could there be a worse reason to ban something? If that was the actual reason, what hope would we have for our world afterwards? Once again, I find myself highly sympathetic to the accelerationists who say ‘you took away all our other hopes and options, if we also ban this you’d ban anything else we came up with, so where’s the hope in that?’
Yet I can’t help but notice we are all on track to die, or at least lose control over the future in ways that will likely destroy (or at least severely limit) all the Earth-originating value in the universe according to my own values. That does seem worse.
Glenn Harlan Reynolds worries about increasingly sexy sex-bots, that humans will inevitably fall for them because every year the sexbots get sexier and the humans stay the same, and chatbots are already seen as more empathetic than doctors, and if humans are choosing this over other humans than we won’t reproduce.
Certainly this will by default be another hit to fertility, if that was the only effect, but this would also come with other advantages. In such worlds, the economics and logistics of raising children should radically improve, as should alternative fertility methods. If we are all far richer, such problems are easy to solve. The question of the future, in the futures where questions like fertility rates remain relevant, seems more likely to be akin to ‘do people want to have and raise children, or do other things?’ and ‘will we provide the economic resources people need?’ rather than questions of household formation and sex.
I also challenge that humans are staying the same every year. I expect a steady rise in human sexiness. New medical treatments and drugs and surgeries will help, both ones that already exist and new ones we don’t yet know about. Our understanding of diet and exercise and such will improve. We will get better at solving the sorting and matching problems involved, including through use of AI on several levels. We can have AI in our ear to help us navigate situations, including sexual ones, and to help us practice, train and improve. If we get good sexbots, it likely means we are also wealthier, and have more free time to devote to such things, which also helps.
Max Roser is worried that people will use AI for destructive purposes, because we explicitly instruct it to pursue destructive purposes, the way we created nuclear weapons. He misses the instrumental convergence angle involved, which reminds us that if one is in competition with other intelligent beings, then the power of destruction is valuable, so those in such competitions will develop and sometimes use such capabilities, unless you create conditions that prevent this – whether that destruction is violent and physical, or otherwise.
What I am confused by is the threat model here where if there is one AI then whoever has control over it might use it for bad purposes – the wrong monkey might get the poisoned banana. Whereas if lots of different monkeys had AIs, then the concern is only if we can’t solve the alignment problem, which he understands as the risk that no one could control a powerful AI system.
The thing is, if there are many different powerful AIs, and I am the bad monkey, I not only can still use my powerful AI for destructive purposes, now I have a reason to do so. If I have the only one, there’s little need for destructiveness. No rival, no war. If everyone else has one, now I am in a struggle.
Worry that if there are tons of people looking to fund AI alignment work, and people with projects keep applying, that eventually net negative projects will get funded, so enabling lots of cheap applications could be negative, since it is easy for something people tell themselves is AI alignment work to be net negative. Presumably the solution to this is to differentiate between ‘I don’t think this would do enough so I don’t want to fund it’ versus ‘I think this is negative, please don’t do this.’
Right now, my understanding is that it is highly unusual to even tell the applicant that you think their work is net negative, let alone warn others. At minimum, I hope that if someone applies for funding and you think their project will do harm, you should at least tell them that privately and explain why. Ideally, if we are going to do a ‘common application’ of sorts, there would be a way to pass this info along to other funders, although there are obvious issues there, so perhaps the right way is ‘you tell the funders, and the funders get asked if anyone told them it was negative.’
Dan Hendrycks tears into the accelerationist arguments of ‘BasedBeffJezos,’ after Jezos backed out of a planned debate. He does a fine job of pointing out the absurdities of the ‘let the AIs replace us without a fight while they fight each other over resources so they get to use the cosmic endowment’ position. As Connor Leahy says, I can’t relate to actually wanting to give up the future to arbitrary computer programs.
Another problem we have to solve is that the general principle ‘give everyone a vote’ requires everyone getting a vote, and have you seen the views and preferences of everyone? AI Alignment is hard.
Our Words Are Backed By Nuclear Weapons
The worst possible thing you can do is accelerate towards AGI before we have solved AI alignment, in a way that encourages various labs to race against each other.
The worst possible thing you can do with an artificial intelligence is, of course, to hook it up to nuclear weapons4. Despite this, there will be clear competitive pressures pushing us to hook AIs up to the nuclear weapons.
Even accelerationists and Hollywood script writers know this. See of course The Terminator and Wargames, and notice even if the AI is not looking to kill us that we celebrate Petrov Day for when he plausibly saved us by refusing to follow procedures.
As Eliezer Yudkowsky often points out, the nuclear weapons are not actually where the most important dangers lie. I’d also add that a sufficiently dangerous AI is not going to be stopped by this particular bill. Still, the thing the bill would prevent is deeply stupid. You have to start somewhere.
I demand a vote on this. If we can get it together and all agree on at least this one thing, then perhaps we can build on that. If we can’t get it together, perhaps we can at least find out who thinks hooking AIs up to nuclear weapons is an intriguing idea, and respond appropriately.
Geoff Hinton is Worried About AI Killing Everyone
Geoff Hinton, who is quite a big deal (the NYT headline calls him ‘Godfather of AI’) quits Google in order to be able to talk freely about the dangers of AI.
Here’s Hinton on CNN, explaining the danger in 40 seconds – there are not many cases where the less intelligent thing stays in control of the more intelligent thing. It’s a difficult problem, he has no solution, as far as he can tell no one has a solution.
Noteworthy is that he says ‘we are all in the same boat, so we should be able to get agreement on this with China.’
Hinton says he partly regrets his life’s work.
Pro tip: If you see yourself saying this in the future, consider that you might want to stop whatever it is you are doing.
He warns the danger is near.
The article does not tell us as much as we’d like about what Hinton’s threat model looks like in detail. Hopefully we will know more about that soon.
Hinton also clarifies on Twitter that, yet again (yes, I remember), Cade Metz in the NY Times is implying things that are not true in order to make targets of their narrative look bad.
Indeed, Roman makes an excellent point. I do understand Hinton’s decision that he needed to avoid the conflict of interest, yet I still want the calls coming from inside the house, and those inside the house to be willing to make the calls.
This is a big change. For contrast here he was in 2015:
I do appreciate the honesty here. Even more than that, I appreciate Hinton’s willingness to come out and say: I was wrong about this very important thing. I now realize that what I was doing was harmful, and a mistake.
Hinton also gave an interview to MIT Technology Review.
Nando de Freitas, research director at DeepMind, shares Hinton’s concerns.
How meaningful is this development?
Meaningful enough for one person to buy out of a doom bet at a 10% penalty after only four days. Congrats, Robin Hanson.
I am definitely more hopeful due to Hinton standing up, and I suppose also for Metz joining Klein at NYT on the warning front. Still nothing like the update size that must have happened to resolve this bet so quickly.
Stuart Ritchie points out that much (although far from all) news coverage of Hinton’s resignation focused on misinformation and job loss, and failed to mention existential risk. It is easy not to see what one does not wish to see.
Other People Are Also Worried About AI Killing Everyone
Jess Riedel provides a reference list of arguments for such worries. Good list.
If you are worried enough to try and work on alignment, Rohin Shah of DeepMind offers an FAQ on getting started, and here’s Richard Ngo’s advice.
Paul Christiano wrote a post explaining his probabilities of various future outcomes.
Here he is saying AI is ‘the most likely reason he dies.’
Here is a visual representation of the main scenarios, from Michael Trazzi.
I consider such exercises highly virtuous. It is great to write down actual branching paths and probabilities. At the same time, one must not treat the numbers as having lots of precision. As Paul says, treat this as having about 0.5 significant figures.
A common pattern is that someone will push back against a specific threat model, either Yudkowsky-style foom or something else, except they have a different threat model that also has similar probability of doom. Sometimes they imply similar interventions, sometimes different ones.
Whereas it’s much harder to get people who think hard about these problems to find scenarios that go well that they can agree upon and then assign very high probability. Alas, the response is that if one isn’t convinced by a specific scenario, that lets one set all doom scenarios to 5%, or to 0.1%, even though no one in the discussions you are citing thinks that?
Think about the following group of options. One might (symmetrically) say:
The first thing to notice is that #3 and #7 are both pretty silly positions. There are plenty of impossible futures out there that people are talking about as realistic, running the spectrum from utopia to annihilation or worse. While #2 and #8 are perfectly logical positions to have, there are far more people claiming someone else holds such positions, than there are people who actually do believe them.
I essentially have position #6b here. I don’t see any one scenario of doom being overwhelmingly likely, it is more that I don’t see good paths to victory and I see a lot of plausible paths to defeat. I see Paul as holding #5 and Eliezer as #9. Some of our modeled (some would say imagined) important dangers and hopes overlap a lot, others do not.
Pia Malaney in Institute For Economic Thinking gives a window into how an outsider trying to make sense of recent history might view the developments of AI until now, including what was effectively (although unintentionally) a central accelerationist role for Eliezer Yudkowsky, leading to the development of intelligent systems. The fear in the future here is ‘Market AI.’ The idea is that given fiduciary obligations to shareholders, once we plug AIs into our economic infrastructure, it will not be legal to take back control even if things are clearly going off the rails. So our ‘point of no return’ where we lose control happens well before humans could, in physical terms, shut it all down. Combine that with profit maximizing instructions and instrumental convergence (identified here by name, great to see that getting through), and whoops. That might be how it ends.
There is a common misunderstanding of the actual legal nature of the modern corporation, that this post seems to share. People have this idea that firms act like Milton Freeman would want them to, and have a legal (and he would have said ethical) obligation to, within legal constraints, maximize profits.
That simply is not true in practice. Corporations are run largely for the benefit of the managers, in part for other stakeholders including employees, often honor their commitments even when those commitments are not legally binding or commercially profitable, take into consideration social costs and benefits and have broad flexibility to do whatever they like, subject to the occasional shareholder lawsuit or attempt to replace the board if things get too out of hand or an activist shareholder shows up. Matt Levine often talks about the question of who actually gets to control a company, if there is a conflict, and the answer is not usually either the shareholders or some invisible hand of profit maximization.
There are limits. You can’t stray too far from profit maximization, or there will be increasing risk of increasing trouble. There is still a lot of flexibility.
That does not invalidate the doom scenario described here. A sufficiently capable AI, told to maximize profits, will indeed do the instrumentally convergent things, seek power, seek to self-improve and so on.
The frame in the post ties the AI to the concept of legal obligations. If the AIs are necessary to meet legal obligations to maximize, you have to put the AI in charge.
I find the flip side more interesting. A human can and often does ignore their legal obligations, whether or not they can ‘get away with it.’ Humans are not really expected to keep up with the actual text of our laws, it’s not something we could even do if we wanted. Law is too complex, and often prescribes impossible or unrealistic action incompatible with life, or incompatible with good outcomes or being moral. Most people understand this.
When it comes to an AI, however, suddenly we are, as the post notes, taking laws that are created assuming humans will not fully follow them and not be fully held accountable when they don’t, and shifting that into a frame where the laws are fully enforced – the AI actually does have to obey all the laws, including the ones no one follows. It will be responsible for every slip up, legal or otherwise, in a way humans wouldn’t.
The resulting actions will often not lead to good outcomes. Our systems aren’t designed for such conditions. If we don’t adjust, and write new laws that are designed to be enforced as written, and develop new norms along similar lines, the whiplash is going to be brutal. You’ll give the AI the same instructions you would have given a human, and the AI will do something very different – because it faces different conditions.
And when those AIs are put in charge of increasingly powerful and important things, suddenly you have this other alignment problem: Our explicitly and outwardly expressed preferences, our professed norms and our formal laws on the one side, is not aligned very well with what we would actually want to happen in the world or what we would want or expect humans to do when making decisions and doing things. Which one are we going to try and align the AI to?
A different line of concern: One way we might die is if the AI causes changes to the planet, which have the side effect of rendering the climate uninhabitable by humans.
The most widely feared doom scenario in the world right now has nothing to do with AI. Instead, it has to do with humans foolishly doing this to ourselves, also known as climate change. We are putting a lot of carbon in the atmosphere to generate energy, enough to noticeably warm the planet. If we were to do too much of this, and warm the planet too much, various feedback cycles might be triggered, making the situation much worse. Many people actively expect doom from this.
Thus, it should seem highly plausible that an AGI, or a collection of multiple AGIs, especially multiple AGIs engaged in competition with each other, might end up doing something similar. There will be great demand for energy. The most likely abundant energy source, in an AGI future, is fusion power. Fusion power has the side effect of generating heat (example: the sun). If you were to generate too much fusion power at once, you would raise Earth’s surface temperature. Do that too much, and we die. There need be no intent to harm us, and we’ve seen exactly this dynamic among humans.
I haven’t looked into the technical details. I see this more as an example of the type of thing that might happen, rather than a particular unique threat to worry about.
Eliezer Yudkowsky suggested that one good safety feature would be to check for sudden drops in the loss function. A bunch of people jumped on him for this, saying it showed he had no idea what he was talking about. Most such objections I saw were simply wrong, in the sense that (1) we have indeed seen such losses in many cases and (2) that doesn’t have to, sometimes this is a bug or something else, but often does represent ‘groking’ or otherwise a large increase in capability of the model. Sarah Constantin explains at the link, in response to Jeremy Howard saying the suggestion is not even wrong, because what’s the danger with training a model? It doesn’t impact the world, so no need to worry about it.
That points to the actual important disagreement.
In Eliezer’s model, yes, it absolutely is dangerous to be training a sufficiently dangerous model, even if you don’t deploy it, on several levels. Including:
If you train something that would if deployed be on the edge of dangerous, then training it seems unlikely to be dangerous, and all of this sounds like crazy sci-fi scenarios. However, if you actually do worry about large capability gains happening quickly, then no, you cannot, in the Eliezer model, presume that training is safe simply because you do not intend for it to impact the outside world.
At minimum, a big theme is that past a certain cognitive point, you should assume that any human exposed to sufficient amounts of text from a sufficiently advanced and capable intelligence, can be convinced to do what the sufficiently capable intelligence wants. You absolutely cannot give a sufficiently advanced technology access to a human-read chat window, or even have humans reading its outputs and consider this ‘safe.’
Which, once again, only matters when it counts, when such rapid capability gains are actually happening or might be happening. The argument ‘this is not how any of this works’ is valid right now, for current levels of tech. That does not mean it need stay that way.
Connor Leahy goes on CNN (3 min video) to talk to Christiane Amanpour and explain that he does not expect any of this to end well. She then talked to Marietje Schaake (2:24 video), who blamed the USA for a lack of regulation, said congressional members of both parties are concerned, and said:
FT article mentions AI alongside climate change offhand, as things that the British government is distracted by Brexit from dealing with, as opposed to if they’d stayed in the EU, when they’d instead be prevented from dealing with them.
People Would Like To Explain Why They Are Worried About Killing Everyone
Richard Ngo reports that his ICML (International Conference on Machine Learning) submission was just rejected, despite all-accept reviews, for lacking ‘objectively established technical work.’
In context, the rejection is essentially saying: “You claim that there are behaviors that future more capable systems will exhibit, that current systems do not exhibit. Please provide proof of these behaviors by pointing to documented examples of this happening in existing systems.”
And, well, uh, um… yeah.
Sometimes there is indeed a proto-example of the behavior one could point to or measure in existing systems, either happening naturally or induced via construction. Other times, this is more refusal to imagine that things that haven’t already gone wrong might go wrong in the future, or tackle with arguments for such events.
The paper is, as far as I can tell, a good faith attempt to explain basic concepts in academic speak so that people can say that it has been written up in proper academic speak in the proper academic font and therefore proper people can agree that the concepts exist and perhaps even consider the arguments and claims involved.
It ends with an alignment research overview, showing that there is not much research yet for us to overview, and progress is slim.
The Quest for Sane Regulation Continues
Earlier in the week, Tom Friedman advises Biden to make Kamala our AI czar. My note on that was: Do you still think we’re not all going to die?
Last in the week, Nandita Bose reports:
On the one hand, this is great news, bringing together the key players and high ranking government officials to discuss the issues.
On the other hand, the high ranking government official is Kamala Harris. When I picture myself putting her in charge of AI policy in a fictional scenario, I see people quite reasonably going ‘oh come on, you’re such a doomer.’
Thing is, if not her, then who? At least she is ‘only’ 58 years old, consider the alternatives.
Another issue is that Demis Hassabis is not going to be at the meeting. A bad sign for his influence over how things play out, and therefore a bad sign for alignment and humanity, given the alternatives. Or could be a logistic issue, but for this, you make the trip.
Aadam Thierer calls this stage ‘regulation by intimidation,’ a ‘nice large language model you have there’ kind of approach.
NY Times has another generic call for AI regulation, this time from Lina Khan. As you would expect, focuses on the harms we could address once they arise and don’t pose an existential threat. As opposed to the other ones.
The Many-AGI Scenario
Max Roser asks what are the reasons for optimism? Which to me is a right question, one needs a reason things might go right.
Anders Sandberg attempts an answer, Eliezer and Robin respond.
I do think there is a good chance Eliezer is right that LDT (logical decision theory) will allow AGIs to collectively expropriate (and effectively kill) the humans. It is an important point that entities that can prove things about each other and share source code, and employ good decision theories, can coordinate with each other much better than they could hope to coordinate with us even if they wanted to do so, even while competing against each other.
I also don’t think this is necessary, on multiple levels.
Humans do this kind of thing all the time, the coordination is natural and is not hard, it is to be expected. Thing is, even the coordination also isn’t necessary. In a world of countless AGIs, humans will continuously lose control of their resources because they are consuming more than they produce and import more than they export (humans might buy human-created goods and services from each other, but AGIs almost entirely won’t, whereas humans will buy things from AGIs), because they are vulnerable to attack and manipulation (so at a minimum, they will need to ‘hire protection’ in some form and it will come with principal-agent problems), because humans will hire AGIs to help with struggles between humans, because of general decay of wealth over time and so on.
So essentially:
Also: AGIs would not need to target us on purpose or break typical institution structure, merely follow existing conventions and decision methods among themselves. The kind of robust property rights and rule of law Robin is envisioning, including immunity from damaging policy changes and confiscatory taxation, does not currently exist for most of humanity, now. Our rule of law and understanding of rights and property is very social, very contingent, always evolving and not something that would actually shield us sufficiently in such scenarios even to get us to the third and fourth steps here.
Robin’s counterargument, as I understand it, is that returns to capital will be super high, so humans will earn more capital faster than they can spend it, and it need not bother us if humans control only 0.001% of the wealth and resources so long as the absolute quantity is increasing.
My response to that would be that this fortunate state of affairs, even if we got it, would not long last, as Robin himself predicts in Age of Em, because physical limits would prevent returns on capital, especially returns on unsupervised capital, from remaining high for all that long.
At minimum, it seems obvious that humans would even under strong rule of law and strong property rights rapidly control decreasingly many atoms and effectively control less and less of the future and influence on policy decisions. Even if tech allows us to not depend on the Earth’s environment (which would presumably change radically, far beyond any typical climate change fears) we would not have much of a future.
From what I can tell, Robin is essentially fine with humans fading away in this fashion, so long as current humans are allowed to live out a pleasant retirement.
I still count such scenarios as doomed.
Other suggested theories of hope include these first five other responses:
I notice there is a lot of poorly justified hope running around these days.
No GP4U
What if it was a big deal that compute isn’t that hard to monitor?
Here is the abstract to that paper:
This seems right to me. The tech needs to be set up in advance, so we need to start on this soon even without any agreements on future controls. That gives us the option to agree on controls later.
That does not make this an easy problem. It is one thing to be able to detect use of compute, another to get agreement on a restrictive regime, and to enforce the restrictions if someone is determined to break them. That’s hard. It is very helpful to notice that it has one less importantly hard step than many might think.
Other People Are Not Worried About AI Killing Everyone
One of those people is Matt Parlmer, who has dismissed such fears as preposterous and treating those who call for doing something about it as… well, nothing good.
And I couldn’t resist pulling this Tweet of his.
Yes. Very much so. I wonder what his intended context was.
Michael Shermer has words for Tristan Harris, and does not get it.
Freddie DeBoer responds to the Sam Altman interview and general AI talk of important things being about to happen with the standard anthropic argument to make the case that nothing could possibly be so good or so bad, stop thinking the future will see big changes. Except he thinks the prior should be over all the years of human history equally, so it’s a far worse version of the argument than usual. Sigh.
Ben Goertzel graduates to a shorter, more public facing Fox News article where he makes his argument that LLMs are not dangerous and yet they are impractical to pause, also the singularity is coming by 2029 and that is good actually. None of which is surprising for someone who named their thing SingularityNET.
The ‘this is inevitable if we allow progress at all and it is all good actually’ arguments, I predict, reliably backfire on most people. Most people don’t want to die, they don’t want most humans to die, and they don’t want humanity as a whole to die or hand over the world to computer programs. Telling them that they are wrong about this does not seem likely to be convincing.
Words of Wisdom
The Lighter Side
Rebirth of an old joke, the wisdom as true as ever.
Relevant to at least two of my interests, more profound than you think.
The times they are a-changing.
Good advice from Chelsea Voss, I suppose, whether or not you agree with the premise?
Who makes sure fetch doesn’t happen? We do!
How was reaction to Robin Hanson’s podcast appearance on Bankless?
Top comments:
I had the same reaction. I came away from the podcast thinking ‘Robin Hanson thinks we are all definitely going to die, except he is more worried that we won’t?’
1
For those who didn’t get this: I’m saying that they never learn, and keep doing things that potentially increase capabilities without any corresponding gains.
2
Disenchanted?!? The future is hell.
3
GPT-4 nailed it, of course, with its other top 5 choices being The Matrix, Asimov’s Foundation novels, The Singularity Series by Hertling (which I haven’t read, should I?) and more oddly Down and Out in the Magic Kingdom by Cory Doctorow. Strangely, it cited The Mule for Foundation – when challenged that this was obviously wrong and there was a far better alternative, it folded on the spot (and if you’re not spoiled and decide to read the books, which I like a lot, please do read the books in the order written, not in timeline order.)
4
Arguably this isn’t true, and the worst possible thing is to give it a goal and structure such that it engages in recursive self-improvement. Or one could say nukes would leave survivors, the wrong bioweapon is worse, or something. Still, come on, nukes.