You have as broad categories:
- Fizzlers saying capable AGI is far or won’t happen
- How-Skeptics saying AGI won’t be able to effectively take over or kill us.
- Why-Skeptics saying AGI won’t want to.
- Solvabilists saying we can and definitely will solve alignment in time.
- Anthropociders who say ‘but that’s good, actually.’
Pithy response: You don't need to not believe in global warming to think that 'use executive order to retroactively revoke permits for a pipeline that has already begun construction' is poor climate policy!
Detailed response: One that I think is notably missing here is 'thinking that AI is a risk but that currently proposed methods for mitigating AI risk do large amounts of damage for no or minimal reduction in that risk'.
What exactly is the mechanism by which "mandatory labeling of AI content" leads to a reduction in existential risk? The mechanism by which "protect actors' employment from AI by banning contracts that license studios to use AI-generated footage" reduces the risk of world destruction? The mechanism by which "ban AI images from various places" makes human flourishing more likely?
The only mechanism I have ever seen proposed is "tying developers' hands with all this rubbish will slow down AI development, and slowing it down is good because it will give us more time to...uh...to do...something?" Hopefully slowing down AI development research doesn't also slow down to an equal-or-greater extent whatever areas of research are needed for safety! Hopefully the large corporations that can best navigate the regulatory mazes we are building are also the most responsible and safety-conscious developers who devote the largest portion of their research to safety rather than recklessly pursuing profit!
I guess if your P(doom) is sufficiently high, you could think that moving T(doom) back from 2040 to 2050 is the best you can do?
I guess if your P(doom) is sufficiently high, you could think that moving T(doom) back from 2040 to 2050 is the best you can do?
Of course the costs have to be balanced, but well, I wouldn't mind living ten more years. I think that is a perfectly valid thing to want for any non-negligible P(doom).
Weird confluence here? I don't know what the categories listed have to do with the question of whether a particular intervention makes sense. And we agree of course that any given intervention might not be a good intervention.
For this particular intervention, in addition to slowing development, it allows us to potentially avoid AI being relied upon or trusted in places it shouldn't be, to allow people to push back and protect themselves. And it helps build a foundation of getting such things done to build upon.
Also I would say it is good in a mundane utility sense.
Agreed that it is no defense against an actual ASI, and not a permanent solution. But no one (and I do mean no one, that I can recall anyway) is presenting this as a full or permanent solution, only as an incremental thing one can do.
(Author of the taxonomy here.)
So, in an earlier draft I actually had a broader "Doom is likely, but we shouldn't fight it because..." as category 5, with subcategories including the "Doom would be good" (the current category 5), "Other priorities are more important anyway; costs of intervention outweigh benefits", and "We have no workable plan. Trying to stop it would either be completely futile, or would make it even more likely" (overhang, alignment, attention, etc), but I removed it because the whole thing was getting very unfocused. The questions of "Do we need to do something about this?" and "Which things would actually help?" are distinguishable questions, and both important.
My own opinion on the proposals mentioned: Fooling people into thinking they're talking to a human when they're actually talking to an AI should be banned for its own sake, independent of X-risk concerns. The other proposals would still have small (but not negligible) impact on profits and therefore progress, and providing a little bit more time isn't nothing. However, it cannot a replacement for a real intervention like a treaty globally enforcing compute caps on large training runs (and maybe somehow slowing hardware progress).
Fooling people into thinking they're talking to a human when they're actually talking to an AI should be banned for its own sake, independent of X-risk concerns.
I feel like your argument here is a little bit disingenuous about what is actually being proposed.
Consider the differences between the following positions:
1A: If you advertise food as GMO-free, it must contain no GMOs.
1B: If your food contains GMOs, you must actively mark it as 'Contains GMOs'.
2A: If you advertise your product as being 'Made in America', it must be made in America.
2B: If your product is made in China, you must actively mark it as 'Made in China'.
3A: If you advertise your art as AI-free, it must not be AI art.
3B: If you have AI art, you must actively mark it as 'AI Art'.
(Coincidentally, I support 1A, 2A and 3A, but oppose 1B, 2B and 3B).
For example, if an RPG rulebook contains AI art, should the writer/publisher have to actively disclose it? Does 'putting AI-generated art in the rulebook' count as 'fooling people into thinking this was drawn by a human'? Or is this only a problem if the publisher has advertised a policy of not using AI art, which they are now breaking?
It sounds to me like what's actually being proposed in the OP is 3B. The post says:
As in, if an AI wrote these words, you need it to be clear to a human reading the words that an AI wrote those words, or created that image.
Your phrasing makes it sound to me very much like you are trying to defend the position 3A.
If you support 3A and not 3B, I agree with you entirely, but think that it sounds like we both disagree with Zvi on this.
If you support 3B as well as 3A, I think phrasing the disagreement as being about 'fooling people into thinking they're talking to a human' is somewhat misleading.
I think the synthesis here is that most people don't know that much about AI capabilities, and so if you are interacting with an AI in a situation that might lead you to reasonably believe you were interacting with a human, then that counts. For example, many live chat functions on company websites open with "You are now connected with Alice" or some such. On phone calls, hearing a voice that doesn't clearly sounds like a bot voice also counts. It wouldn't have to be elaborate - they could just change "You are now connected with Alice" to "You are now connected with HelpBot."
It's a closer question if they just take away "you are now connected with Alice", but there exist at least some situations where the overall experience would lead a reasonable consumer to assume they were interacting with a human.
And moving doom back by a few years is entitely valid as a strategy, I think it should be realized, and is even pivitol. If someone is trying to punch you and you can delay it by a few seconds, that can determine the winner of the fight.
In this case, we also have other technologies which are concurrently advancing such as genetic therapy or brain computer interfaces.
Having them advance ahead of AI may very well change the trajectory of human survival.
Adam Jermyn says Anthropic’s RSP includes fine-tuning-included evals every three months or 4x compute increase, including during training.
You don't need to take anyone's word for this when checking the primary source is so easy: the RSP is public, and the relevant protocol is on page 12:
In more detail, our evaluation protocol is as follows: ... Timing: During model training and fine-tuning, Anthropic will conduct an evaluation of its models for next-ASL capabilities both (1) after every 4x jump in effective compute, including if this occurs mid-training, and (2) every 3 months to monitor fine-tuning/tooling/etc improvements.
In this case yes, I should have checked the primary source directly, it was worth the effort - I've learned to triage such checks but got this one wrong given that I already had the primary source handy.
New York City Mayor Eric Adams has been using ElevenLabs AI to create recordings of him in languages he does not speak and using them for robocalls. This seems pretty not great.
Can you say more about why you think this is problematic? Recording his own voice for a robocall is totally fine, so the claim here is that AI involvement makes it bad?
Yes he should disclose somewhere that he's doing this, but deepfakes with the happy participation of the person whose voice is being faked seems like the best possible scenario.
Yes he should disclose somewhere that he's doing this, but deepfakes with the happy participation of the person whose voice is being faked seems like the best possible scenario.
Yes and no. The main mode of harm we generally imagine is to the person deepfaked. However, nothing prevents the main harm in a particular incident of harmful deepfaking from being to the people who see the deep fake and believe the person depicted actually said and did the things depicted.
That appears to be the implicit allegation here - that recipients might be deceived into thinking Adams actually speaks their language (at least well enough to record a robocall). Or at least, if that's not it, then I don't get it either.
A year from now these drones will be running open source multimodal models that can navigate complex environments, recognize faces, and think step-by-step about how to cause the most damage.
I am skeptical. How small can language models be and still see large benefits from chain of thought? Anyone have a good sense of this? I should know this...
Anyhow if the answer is "they gotta be, like, 70B at least" then I doubt that'll be running at speed on a drone for several more years. Though I haven't done any actual calculations with actual numbers so I'd love to be corrected.
My guess is that a model with 1-10B params could benefit from CoT if trained using these techniques (https://arxiv.org/abs/2306.11644, https://arxiv.org/abs/2306.02707). Then there's reduced precision and other tricks to further shrink the model.
That said, I think there's a mismatch between state-of-the-art multi-modal models (huge MoE doing lots of inference time compute using scaffolding/CoT) that make sense for many applications and the constraints of a drone if it needs to run locally and produce fast outputs.
If we don’t want China to have access to cutting edge chips, why are we allowing TSMC and Samsung to set up chip manufacturing in China?
Because "we" that don't want Chine to have these and "we" that actually have a say in what TSMC and Samsung is doing are two different "we"s.
The GPT-4V dog that did not bark remains adversarial attacks. At some point, one would think, we would get word of either:
- Successful adversarial image-based attacks on GPT-4V.
- Unsuccessful attempts at adversarial image-based attacks on GPT-4V.
As of late August, Jan Leike suggested that OpenAI doesn't have any good defenses:
https://twitter.com/janleike/status/1695153523527459168
Jailbreaking LLMs through input images might end up being a nasty problem. It's likely much harder to defend against than text jailbreaks because it's a continuous space. Despite a decade of research we don't know how to make vision models adversarially robust.
What played essentially no role in any of it, as far as I can tell, was AI.
One way I would expect it to play a role at this stage, would be armies of bot commenters on X and elsewhere, giving monkey brain the impression of broad support for a position. (Basically what Russia has been doing for ages, but now AI enabled.)
I haven't been able to tell whether or not that happened. Have you?
It did not get the bulk of the attention, but the actual biggest story this week was that America tightened the rules on its chip exports, closing the loophole Nvidia was using to create the A800 and H800. Perhaps the new restrictions will actually have teeth.
Also new capabilities continue to come in based on the recent GPT upgrades, along with the first signs of adversarial attacks.
Also a lot of rhetoric, including, yes, that manifesto. Yes, I do cover it.
Table of Contents
Language Models Offer Mundane Utility
Present a description made of gears.
Transform equations into python functions.
Empower armed autonomous drones to seek out ways to inflict maximum damage. I will not pretend that No One Will Be So Stupid As To.
Use an AI to help you with Internal Family Systems self-therapy. Reports are it avoids the uncanny vally.
Dalle-3 System Complete Prompt
Here you go, or so it is claimed. It is consistent with partial versions previously seen.
Ate-a-Pi breaks it down here.
The first note is that capital letters works for emphasis in prompts. I am definitely going to tell GPT-4 to THINK STEP BY STEP now.
A lot of the rest points out that the prompt sets out orders,, and then backpedals to clean up unfortunate splash damage and side effects.
GPT-4 Real This Time
The GPT-4V dog that did not bark remains adversarial attacks. At some point, one would think, we would get word of either:
I would expect the first, but I definitely at least expected the second one.
And yet, no. We get neither. Everyone keeps whistling in the dark. Cool toy bro, let’s keep playing with it, no need to hack the system. This gap included the system card.
We have at least a little bit of an attempt, now? It straightforwardly works?
Whoops. That is, presumably, not the behavior we want the system to exhibit. Instructions in the picture should not overrule the system, or cause the system to lie to the user. That would be quite bad, especially if you could embed human-invisible instructions.
Which, oh look, yes we can.
It is not this easy. If you do not know exactly how the LLM will be used, or there are humans in the loop, trying this will get you caught. There are still a lot of more subtle things one can do. Also a lot of more damaging ones.
So what have we learned?
OpenAI is willing to ship a product that, as far as we can tell, is completely open to adversarial prompt injections. We also learned that we are all cool with it, in practice? That nothing went wrong? Well, nothing went wrong yet.
I certainly do not see both shoes resting upon the floor.
Fun with Image Generation
The voyage of the Mona Lisa.
MidJourney introduces 2x and 4x upscaling, reports are it looks sweet.
Deepfaketown and Botpocalypse Soon
Felix Simon makes the case that worries about misinformation are overblown. He echoes previous arguments that costs of misinformation production were already so low as not to be binding, that misinformation producers already are passing up many opportunities to increase output quality because the market for misinformation does not much care about quality, and that personalization will not have much impact given the ways information spreads. As he says, ‘there it no original ‘age of truth’ and there never was.
He explicitly does not address the fourth issue, that LLMs might spontaneously generate plausible but wrong information, rather than people spreading it on purpose. I would add to that the worry that they will regurgitate existing misinformation from its training data.
New York City Mayor Eric Adams has been using ElevenLabs AI to create recordings of him in languages he does not speak and using them for robocalls. This seems pretty not great.
In response, Craig McCarthy of The New York Post paid $1 to get the same tool to use Eric’s voice to say he really liked Craig McCarthy and The New York Post was his favorite publication.
I would also note a dog that so far has not barked.
This past week, there has been quite a lot of misinformation and enflamed rhetoric, with much confusion about what did and did not happen and who is or is not to blame. Those with different agendas pushed different narratives, while others sought to figure out the truth. We hopefully paid attention to how all our information and media sources reacted to that test, including prediction markets and their participants, and hopefully will update accordingly.
What played essentially no role in any of it, as far as I can tell, was AI. Good old fashioned lying and misinformation are still state of the art when the stakes are high. Will that change? I am sure that eventually it will. For now, the song remains the same.
People Genuinely Against Genuine People Personalities
Profoundlyyyy goes straight for Think of the Children.
Unlike existential risks, this type of AI risk, that childhood development might be impacted, is exactly like previous risks. Things change, in some ways for the worse. When we learn the full effects, often we have to adapt, to mitigate, to muddle through. We figure it out. If necessary, we can ban or restrict the harmful offerings.
It would surprise me, but not shock me, if character AIs interfered with children forming friendships. It would be if anything less surprising to me if this one went the other way. The default to me is approximately the null hypothesis.
Yanco and Roko on the other hand emphasize takeover risk, making the maximalist case for where things might go. I do not share this concern to anything like this extent, but I do see where it is coming from.
Is this possible? Definitely. It is indeed remarkably easy to imagine takeover scenarios driven by character AIs, even character AIs that are below human capabilities and intelligence levels. The intelligence in that case could be centered in the humans, as it did in Suarez’s novel Daemon. It would not be the first time that people used some automated process or other simple mechanism to give their lives meaning, became attached to it, and it gained remarkable power in the world (see among other things: all the religions and political movements and nations you do not believe in.)
At least for now, it still seems like the kind of thing we are used to dealing with, that we can adapt to and mitigate as it happens. As Roko notes, it would not happen overnight. It does not seem that different in kind from previous tech changes, not at anything like current capabilities levels.
That changes when the AIs involved are as smart or smarter than we are, or otherwise at or above human levels at chatting and convincing. But at that point, we are in most of the same trouble without the genuine people personalities, and the AIs will start learning to fake them anyway if that is not true.
Would I support a ban on AI mimicking humans in various ways? If I had to choose purely between this restriction and none at all, I would do it in order to slow things down and raise the thresholds where various risks emerged, and yeah I can see the purely social impacts being reasonably bad and we have not thought that through. I would still note that this feels like an unprincipled place to draw the line, and that it is not likely to be an especially helpful one, and that this type of precaution is all too pervasive and does not serve us well. I would happily trade these ‘GPPs’ to get freedom to do other things like build houses and ship goods between ports. But if we are determined not to build houses or ship goods either way? Hmm.
They Took Our Jobs
Did they? Stack Overflow lays off 28% of its workforce.
This graph was circulating online, which Stack Overflow claims is inaccurate:
A 50% decline seems like a lot given how slowly people adapt new technology, and how often LLMs fail to be a good substitute. And the timing of the first decline does not match. But 5% seems suspiciously low. If I were to be trying to program, my use of stack overflow would be down quite a lot. And they’re laying off 28% of the workforce for some reason. How to reconcile all this? Presumably Stack Overflow is doing its best to sell a bad situation. Or perhaps there are a lot of AIs pinging their website.
Jim Fan’s agent-within-Minecraft Voyager gets featured in The New York Times, with warnings that ‘A.I. Agents’ could one day be coming for our jobs.
At first, yes. Over time, I would not be so sure.
Wall Street Journal’s Deepa Seetharaman reports tech leaders including Sam Altman predicting ‘seismic changes to the workforce, eliminating many professions and requiring a societal rethink of how people spend their time. So, They Will Take Our Jobs, then, you say?
Robin Hanson not only believes this won’t happen, he describes this as ‘they keep saying this & keep being wrong.’ That depends on the value of they. If they means people worried about technology reducing employment across history, then yes, they keep saying this and they keep being wrong. If they means those building AI and striving to build AGI, we have not yet much tested the theory. We can rule out the most gung-ho predictions of massive change happening on the spot. We can also rule out the ‘this will not make any difference’ predictions. It is still very early days.
Get Involved
Open Philanthropy hiring people for more things in general. They have a General Application, so you can fill that out and outsource to OP the question of what your job should be, and they only contact you if they have a potential fit.
I love this idea in general. I suggest it as ideally a feature of LinkedIn, although it could also be its own tool. Rather than apply individually for jobs, you can select companies that you want to work for, leave a ‘common job application’ resume and include your requirements like geographic location, hours and salary. Then the companies you selected can see the list of interested people, and if interested back they can contact you. Companies that want good interest can then develop policies of keeping their interest list confidential. And Facebook-style, other companies rather than cold emailing you can make ‘interest requests’ in case you want to know who is looking.
This lets you stay exposed to finding your dream jobs without having automatically alerting your current employer or having to interact with a human. No one wants to have to interact with a human in a situation like this.
Survival and Flourishing is looking for white-hat hackers and security professionals to be “on call” once or twice a year for a week or so, at $100-$200 an hour, to bolster security around public AI safety announcements.
Not AI, but Sarah Constantin has a line on a new biotech company looking for angel investors, aiming at autoimmune diseases short term and aging long term.
Introducing
Here is the resulting model on HuggingFace, Mistral 7B V0.1.
Starlight Labs has a new (voice-enabled using Eleven Labs) storytelling engine that incorporates images.
In Other AI News
New DeepMind paper discusses evaluating social and ethical risks from generative AI. This came out right at my deadline so I’m pushing full coverage to future weeks. Seems potentially important?
If we don’t want China to have access to cutting edge chips, why are we allowing TSMC and Samsung to set up chip manufacturing in China?
NIMBY continues its world tour, TSMC drops plan for next-generation chip site after local protest… in Taoyuan, Taiwan. Governments keep trying to get themselves next-gen chip plants, various local or concentrated interests keep trying to stop them.
Ethan Mollick offers an AI FAQ, especially good in its first section on detecting AI, in the sense that AI detectors do not work for text. I do expect them to continue to work for images for a while.
From the survey of AI engineers referenced in the section on who is worried about AI killing everyone (as in, it’s a majority of AI engineers), some other interesting facts.
Interesting that Google and Cohere do so well here. Also a lot of open source action for those looking to commercialize, which makes sense.
I find this chart interesting in large part for its colorizations and sorting rules.
Prompting needs to be heavily customized. It makes sense that internal tools would be popular here, although external tools sometimes work. Spreadsheets not bad either.
You cannot rely on benchmarks, metrics or recursive AI evaluations if you want to know if the AI is doing what you need. Yet many still rely upon it, and less than half of those surveyed are relying on human review or data collection from users. I predict the other half that went the other way will not do as well.
Air Street Capital releases their State of AI Report for 2023.
Here they review their predictions for the past year.
Note that for each of the five correct predictions, the threshold was exceeded by an order of magnitude, arguably for the ambiguous prediction as well. The DeepMind prediction was likely only a few months too early. Then the two failed predictions were expecting a particular safety response – I am confident this would have traded very low in all prediction markets throughout – and a call for commercial failure that did not pan out. It has been quite a year.
They attribute GPT-4’s superiority over open source alternatives to OpenAI’s use of RLHF. I do not think this is centrally right.
Slide #109 shows that while Generative AI investment in startups is way up, general AI investment actually is not up despite this, as VCs cut overall investment in all compnies by 50%.
Here’s #122, a chart of who is regulating how. They see UK and China as leading the pack on AI-specific legislation, whereas they do not expect the USA to pass any AI-related laws any time soon.
There’s lots of very good detail in the slides, I’d encourage browsing them. They are a reminder of how much has happened in the past year.
Here is how they overview catastrophic AI risk, via Dan Hendrycks.
Even though Dan Hendrycks is the author of the most prominent paper warning about evolution favoring AIs, even his graphics exclude many of the scenarios I am most worried about in ways I worry will make people actually disregard such dangers.
The report then discuss the mainstreaming of debate around such issues, in a NYT-style neutral-both-sides approach that seems fine as far as it goes.
They end with predictions:
The odd prediction out here is #6, which I do not expect. The rest seem more likely than not to varying degrees. I created Manifold markets for the first five. If I have time and there is interest I will also create the other five.
OpenAI changes its ‘core values’ statement from a bunch of generic drek to emphasizing its focus on building AGI. While I am not a fan of driving to build AGI, the new statement has content and is likely an honest reflection of OpenAI’s goals and intentions, so I applaud it.
Old statement:
New statement:
Quiet Speculations
SEC chair Gary Gensler, famous hater of new technology and herald of doom, claims it is ‘nearly unavoidable’ that AI will cause a financial crash within a decade. I put up a Manifold market here, simplifying to a 20% decline in the S&P within a month. His causal mechanism is that traders will rely on models that share a common source, and hilarity will ensue. He bemoans that our usual approach won’t save us here, because regulations are typically about individual market actors.
Tyler Cowen fires back that not only is this inevitable, AI likely lowers the chances of a stock market crash. He is not even referring to AI’s role in driving future economic growth, which is also a big game. Fundamentals matter too. As Tyler points out, a trading firm actively wants to avoid using the same model as everyone else, although I would note you very much want a good prediction of what everyone else’s models will say. But trading with the herd is not how you make money.
I give this one decisively to Cowen. As usual, Gensler fails to understand the nature of new technology, looking only for ways to attack and blame it.
Man With a Plan
evhub (Anthropic) defends RSPs (responsible scaling policies) as ‘pauses done right.’ The argument is that RSPs are easy to get agreement on now, while the resulting pauses would be far away, and are realistic because they contain an explicit resumption condition even though we can’t actually define how we would satisfy the condition (evhub agrees in bold that we don’t know this). In this thinking, ‘indefinite pause without a clear exit condition’ is a no-go, but ‘pause until we pass these alignment requirements that we don’t know how to do or to even evaluate’ might work. Maybe? Is that how people work? Seems weird to me.
Then once most people have committed, with everyone loving a winner, getting government to codify the whole thing becomes far easier.
As Jaan points out, this plan at minimum requires certain assumptions.
I would add that it also assumes we can do the capability evaluations properly, which evhub asserts in bold, with the requirement of fine-tuning and a bunch of careful engineering work. Right now evals do not meet this bar, and meeting this bar at least imposes real and expensive delays. I am skeptical that we can have this level of confidence in our future evaluation process, and its ability to stand up to new techniques discovered later that might increase capabilities of a given model.
And it also presumes that if the RSPs were triggered, that the alignment check wouldn’t be handwaved away or routed around or botched, that we can trust each lab on this, and that using this approach would not bake in such problems. Ut oh.
Evhub’s response is to accept short-term further scaling as an acceptable risk, and that yes figuring out the right capabilities bar here is tricky (and that it has not yet been agreed upon). His hope is that you evaluate capabilities continuously during training, and that capabilities advances are at least marginally continuous, so you spot the problem in time. I responded by asking whether this is a realistic evaluation standard to expect labs to follow, given no one has yet done anything like that and it seems pretty expensive and potentially slow, Adam Jermyn says Anthropic’s RSP includes fine-tuning-included evals every three months or 4x compute increase, including during training. That’s at least something.
Joe Coleman says he is still skeptical of RSPs, noting the chasm I discussed above between RSPs in theory as described by Evhub, and RSPs in practice as announced by Anthropic and ARC. If the RSPs we were seeing involved the kinds of details evhub is discussing in the comments, I would feel much better about them as a solution.
Akash emphasizes this point even more. A good RSP would have explicit and well-specified thresholds, triggers and responses, and ideally a plan for race dynamics beyond a (not entirely unfair, but also not much of a plan) de facto ‘if pushed too hard we’ll race anyway, we’d have no choice.’ Instead, existing plans are vague throughout.
That is still better than no action. The issue is the communication, which Akash (I think largely correctly) likens to a motte-and-bailey situation. Bold in original.
This goes hand-in-hand with last week’s note about prominent organizations declining to help further push the Overton Window, instead advising us to aim in ‘realistic’ fashion.
We can do both. We can implement Responsible Scaling Policies that are far too vague and weak but far better than nothing, at an individual level, and try to get others and then government to follow. While we also are clear that such policies are not strong enough, or even complete or well-specified, in their current forms.
China
China released a draft standard on how to comply with their AI regulations (HT: Tyler Cowen via Matt Sheehan). Since it is a MR link presumably that means he thinks this is real enough to take seriously.
My presumption is that this will be a de facto requirement, necessary but not sufficient for compliance. It seems unlikely it will serve as a safe harbor, but it will give you some amount of benefit of the doubt, perhaps? The worry, which remains, is that you can get entirely roasted for even a single mistake. I would not want to be the first Chinese executive to find out if this is true.
The system outlined here is highly vulnerable to being gamed. You get to design your own test set, who is to say why you chose those particular questions based on the particular quirks (or tested inputs or contaminated data set) of your model. Even if you are not cheating, refusing >95% of requests you should refuse without >5% false refusals is not so high a bar if the test sets are non-adversarial. Which, given the company gets to make the data sets, they won’t be.
Contrast this with an ARC-style evaluation, where you do not know what they will throw at you. The regulations here have no teeth except for fear of what regulators would do if they found out you played it too loose.
Which in turn I am guessing is actually bad for Chinese AI companies. When dealing with a regime like China’s, you want safe harbor. Ensure you’ve done X, Y and Z, and you are in the clear. Instead, this is suggesting you do X, Y and Z, but leaving you so much room to fudge them that if you piss off an official they can point to all your fudging, and that of everyone else. But if you don’t do the test, then that’s worse. So the test becomes necessary without being sufficient.
The Quest for Sane Regulations
The Samotsvety Forecasting report includes an entire proposed international treaty on AI. This is excellent, even if you think this particular treaty is terrible. We gain a lot when people make the effort to write down a concrete proposal we can work from.
Note the implied odds of action here, which they lay out explicitly later.
So how do we do that?
This means the first limit is fine for everyone else, and the second limit is for work in a collaborative AI safety lab. I continue to think that 10^23 is too low in practice, an attempt to find a completely safe level in a place where we need to accept we can’t be completely safe. The next 1-2 OOMs introduce some risk, but also greatly increase the chances of pulling this off and lessen the practical costs. This is especially true given that the treaty gives their new organization, JAISL, the power to lower the limit to account for algorithmic improvements.
These proposals seem good in principle, but don’t have the same level of concrete detail. So how are we doing that?
The central strategy is to create and use The Joint AI Safety Laboratory (JAISL), which would have a higher compute limit to work with than everyone else. As part of their work they would create more capable models, and responsible actors who got with the program would be given API access.
The official overview is here, the exact text here.
The treaty seems like a fine baseline from which to have discussion. It does not address many key issues, such as:
It is good to discuss how to get the foundation right, even if we don’t have a solution to the hardest questions. One thing at a time, or else ‘you can’t even do X so how are you going to do Y’ plus ‘this only does X without Y so it won’t work’ combine to block you from making any progress.
As Simeon puts it, yes it is this easy to write the first proposed text and help us move out of learned helplessness. More people should write more concrete proposals.
They also offer a timeline for potential catastrophe:
The Chips Are Down
America has for a while been imposing export controls to stop China from getting advanced chips and competing in the AI race. This is one of the few places where there is easy American political consensus on this issue. Whatever your concern, including existential risk prevention, everyone agrees China should not get the chips.
The problem has been that, as companies faced with regulations often do, Nvidia and others looked at the chip regulations, noticed a loophole, and drove a truck through it.
The problem was that to be restricted, a chip needed both fast computational performance and fast interconnect speed. So Nvidia (with shades of old Intel in the 486 era) produced chips with intentionally crippled interconnect speeds, the H800 and A800, so they would not count. They aren’t as good as H100s and A100s, but they were not that much worse either.
At a conference on reducing AI existential risk, we asked the question of to what degree the restrictions would ultimately matter if the flaw was not fixed, and the consensus was not all that much in the grand scheme. We wondered whether this could be fixed.
The answer is yes. We have fixed it.
The graph on the left and the one on the bottom are the ones people drew at the conference. The one on the right is the new rule.
New rules also include look-through enforcement to parent companies, and musing about potential enforcement at the cloud level, which seems necessary for the whole thing to work.
Is the lack of comments a huge error and EMH violation? Or do comments not matter? It has to be one or the other, there are billions of dollars and major international competitions at stake.
The Week in Audio
Justin Wolfers on homework in the age of GPT. Comes recommended (and self-recommended) to me, but haven’t had time to watch yet.
I went on the new podcast The World of Yesterday. Audience is still very small but it is a promising new podcast, I was impressed by the uniqueness of the questions asked.
Could be an older clip, but ICYMI: Illya Sutskever, co-founder of OpenAI currently helping lead the Superalignment Taskforce, comes out strongly in favor of next-word prediction causing LLMs to learn world models.
Yes, We Will Speak Directly Into This Microphone
Finally some actual rhetorical research. Nirit Weiss-Blatt, brave fighter against those who would fight against doom but who seems remarkably committed to technical accuracy, has the story. She seems sincere, and to think she will win because she is guided by the beauty of her weapons and all she has to do is accurately describe these awful people and what they are doing and tell everyone what the most effective messaging is that we have found, why can’t it always be like this, I love it so much.
She is here to deliver the shocking message that people trying to persuade others and convince politicians did the things you do when persuading others and convincing politicians, great, finally, we did it everyone.
Give Republicans some credit here. ‘Oppressive AI’ plays on their biases, but the other three distinct messages are about conveying the actual problem using more words, versus vibing the problem with less words.
Yes, the big shock is that we are using the possibility of extinction, simply because it is possible that we all go extinct from this.
Then in part 2, it is revealed that these dastardly people are seeking donations, mean to run adverting, and are suggesting the passing of restrictive legislation to stop development of AGI. Yes, indeed, I do believe that is exactly what we are doing.
Did we speak sufficiently directly into your microphone? Do you have any follow-up questions?
Rhetorical Innovation
Andrew Critch points out at length that yes, obviously sufficiently capable AI poses an existential risk, and ordinary people should trust their common sense on this. That those who say there is no risk are flat out not being honest with you. I would add, or are not being honest with themselves.
Yes, obviously. I don’t get how anyone thinks this as a Can’t Happen. I really don’t.
And yes, the danger of pointing out AI might kill us is that some people treat this as hype, or as a sign that they should go out and build the thing first, either to ‘build it safely before someone else builds it unsafely’ or purely because think of the potential. And we collectively very much did not appreciate this risk before it was too late. But at this point, it is too late to worry about that, the damage has already been done.
Andrew Critch also offers his thoughts on the need for (lack of) speed.
As with many such arguments, I wonder if that helps convince anyone? The speed advantage makes disaster and existential risk more likely, but is not necessary for those scenarios. Nor is it sufficient on its own. I hope it causes some people to wake up to the issues, makes the situation feel real in a way it wouldn’t feel otherwise. But it is very hard to tell.
One way people try to not notice this problem is to say ‘well what matters is the physical world, where the limit is the speed of physical action.’
I find this type of argument convincing, yes obviously if you are thousands of times faster you can run virtual circles around humans and today’s world makes that a clear victory condition if you don’t have other comparable handicaps. But I did not need to be convinced.
One of the sanest regulations would be mandatory labeling of AI outputs. As in, if an AI wrote these words, you need it to be clear to a human reading the words that an AI wrote those words, or created that image. Note that yes, we have moved past the Turing Test of trying to tell the difference, to noticing that in practice humans often can’t.
I doubt this is close to complete but here is a noble attempt at a taxonomy of AI-risk counterarguments. You have as broad categories:
Each is then broken up into subcategories.
Is that complete? If AGI is soon, has sufficient affordances to kill us or end up effectively in control of the future, would use those affordances, couldn’t be prevented from doing so, and that’s bad actually, is there another way out?
The names other than Fizzlers could use improvement.
For the fifth one I tend to use Omnicidal Maniacs, which I admit is not a neutral term, but they actively want me, my children and everyone else dead so I’m okay with that.
The other three are trickier to get right.
My response to the five objections is something like:
What else is missing from the list? Comment on the post, let us know.
Akash Wasil asks us to raise our standards a bit.
This reminds us of The Tale of Alice Almost. One who believes in AI existential risk would ideally both reward and reinforce taking AI risk seriously, and also apply pressure to then advocate for reasonable policies (and when appropriate to stop personally doing accelerationist things). Many a movement has the same dilemma.
Which effect dominates depends on circumstances and details.
What you do not want to do is to cast out those who ever do any bad thing at all, or failing to differentiate ‘this action is bad’ from ‘you are bad,’ or make people fear that you will do this, especially after they stop.
But you also can’t be giving indefinite free passes to abject cowards. So it is hard.
Somewhere in the middle, one would hope, the truth lies. Making mistakes or not living up to the ideal standard does not make you a bad person. Thinking (or willfully not realizing) that what you are working on is likely to end the world, and continuing to work on it and increasing the chances of that happening because the job pays well or the problems are too delicious, without any attempt to mitigate the risk? That pretty much does? If this is you, you are bad and you should feel bad, until such time as you Stop It.
One natural reaction to this is to decide not to realize that what you are doing is risky, which is even worse because it increases the risks and poisons your mind and the epistemic commons. You don’t get to do that. Whereas if you sincerely on reflection think such work does not pose these risks or is worth the risks, then I believe you are a wrong person, but not a bad one. The line between these can of course be thin.
If you tell a story where your work at the lab is instead advancing safety, and are working towards that end, then that is different, but you should beware that it is very easy to fool oneself into thinking that what you are doing is helping and ending up merely fueling the system instead. Feynman reminds you that you are the easiest person to fool.
Open Source AI is Unsafe and Nothing Can Fix This
What is obvious to people who know is not obvious to others, or to lawmakers, or to those who are determined not to notice or admit it. Often it is highly useful to prove the obvious, such as how easy it is to strip all the safety precautions out of Llama-2. Note that the cost quoted to Congress to strip all protections from Llama-2 was $800, so this is a capabilities advance, we now know a guy who can do it for $200.
No One Would Be So Stupid As To
Believe this? Say it? Making a Yann LeCun exception for the purity.
Aligning a Smarter Than Human Intelligence is Difficult
Anthropic collaborates with Polis to use democratic feedback in determining the rules of its Constitutional AI. It is clear that the ‘seed statements’ and framing had a big impact on ultimate outcomes. Also that people will absolutely pile on lots of absolutist statements that sound good and are hard to disagree with, whether or not they apply in a given context and regardless of how much they make it impossible to ever get a straight answer out of the damn thing.
While I have chosen for safety reasons not to publish my long critique of Anthropic’s implementation of constitutional AI, I will note that when you pile on these kinds of conflicting maximalist principles on the basis of how they socially sound, the result is at best going to be insufferable, and if you turned up the capabilities you get far worse.
The exercise also illustrated a lot of directly opposed perspectives between different groups, as one would expect.
People Are Worried About AI Killing Everyone
Those people include a majority of AI engineers, according to a recent survey of 841 of them. Maybe they should change what they are doing?
First, some data on who these people are.
These are mostly not people working on frontier models. They are mostly working on SaaS.
So what does the future look like?
Just for fun, huh?
If we take the P(doom) answers here seriously, that is a lot of doom. 88% of respondents have it >1%, and two thirds are >25%. The median and mean look like they’re something like 35%-40%, the range where this is a huge deal and our decisions matter quite a lot. Note that includes those who think AI is overhyped, so a lot of that non-doom is coming from not expecting sufficient capabilities.
There are some reasons to worry that we should not take the answer so seriously. Here Robert Wilbin goes through the realizations of the issues involved:
We will know more in a few weeks when they publish further. I’d like to see this re-run at a conference, without online access of any kind, and with this question not put without explanation into the ‘rapid fire’ section. We certainly should not rely on this answer to be accurate, given it both has these methodological issues and is also an outlier. It still makes it very difficult for the real answer to be in the vicinity o ‘oh right, then if you believe that carry on, then.’
The open source answer is strange, especially in that there is so little support for ‘both’ despite that being the equilibrium for existing software, and the current state of AI as well. Perhaps they think that open source is the future unless banned, so you cannot have it both ways? I’d love to see the cross-tabs between open source predictions and doom predictions, and also everything else.
Perhaps the boldest prediction yet, in the context that Roon (1) expects us to build AGI and (2) expects us to survive it, although he recognizes this is far from a given, and what we do determines our fate.
If you told me there were no massive scale social chaos effects after we built AGI, I would assume the reason for this was that we all died or lost control too quickly for there to be social chaos.
Given his expectations here are very different from mine and he does not expect that, that seems like a full on ‘really?’ situation. I admit we might find a way to get through this, I sure hope that we do, but… no massive social chaos? Things just keep going, all normal like?
New Bengio Interview
New good interview with Yoshua Bengio. He explains that he had the existential risk arguments intellectually for a while, but they felt far away and did not connect emotionally until last winter. He sees things as moving much faster than he expected.
I would note this exchange:
That matches my understanding. The scientific venues were dismissive and did not want to hear it, demanding ‘concrete evidence’ in ways that did not in context make sense, and which formed self-reinforcing barriers because scientific credibility and standards of evidence are recursive and self-recommending, for both good and bad reasons. Faced with this, those trying to sound the alarm ‘turned their backs’ in the field in the sense of giving up on the channels that were refusing to engage. Standard you-can-call-it-both-sides situation.
Things are improving on both fronts now. The gatekeepers are less automatically dismissive, and there is enough ‘concrete evidence’ available to satisfy at least some demands for it and start the bootstrapping, although that requirement remains massively warping at best. And with that plus the higher stakes and resourcing, existential risk advocates are making more of an effort.
Also this:
Yes. If you seek to understand, there is no substitute for explaining to others.
As always, there is the clash of priorities. Notice the standard asymmetries.
So how bad are things?
We can be rather good at convincing people to behave when we are willing to apply various forms of pressure, or if necessary force, but we have to be willing to do that. We are willing to do that continuously, every day, on a wide range of ordinary things. I am not so despairing that we could do it once again, even if the international aspect increases the difficulty level, but Bengio nails the problem that we need the motivation to do it, and that this might not happen until catastrophe strikes. At which point, it could already be too late.
His conclusion:
In one form or another, this seems right.
Marc Andreessen’s Techno-Optimist Manifesto
All right, fine. Marc Andreessen presents The Techno-Optimist Manifesto, which got enough coverage that I need to make an exception and cover it.
Big ‘in this house we believe’ energy. Very much The Dial of Progress, except with much heavier anvils and all subtext made text.
Directionally, in most places, it is right, and it makes many important points, citing the usual suspects starting with Smith and Ricardo. Many overstatements. It’s a manifesto, comes with the territory. What did you except, truth seeking to ever get chosen over anticipated memetic fitness? This. Is. Manifesto.
Alas, while I mostly agree with the non-AI portions, I was not inspired by them, because the damn thing is too long and rambling, and it is not precise while doing so, it does not seem to be attempting to convince anyone, and yeah yeah what else is new.
The exception to that is the Technological Values section, much of which is excellent.
Then there’s the parts on AI, which are quite bad. There’s the ‘intelligence’ section.
That’s right. Not developing maximum AI is a form of murder.
I feel oddly singled out, here? Why isn’t holding back everything else or any non-optimal decision also a form of murder? What about Marc’s failure to donate more money for malaria nets?
Existential risk? Never heard of her, except in the ‘enemies’ list. Very constructive way to think we have here:
Does Marc have no idea that the first of these things – and I do not think it is a coincidence it is first – is not like the others? That one of these things does not belong?
Or does he know, and is deliberately trying to put it there anyway? Perhaps as the entire central point of the entire damn manifesto?
Or is he so far gone that the concept of the map matching the territory, that words could have meaning and causes might have a variety of effects, completely lost to him?
Either way, none of this is an argument on anything but vibes.
Which is a shame, because I otherwise very much want to help with the whole techno-optimism thing, and the list of virtues (called ‘technological values’) starts off pretty sweet and also is pretty sweet later on, this part is a list I can get behind (modulo ‘what is revenge doing there, that’s kind of a weird choice…’):
Except I don’t want us all to die. What did I skip over?
What does it mean to ‘believe in evolution,’ ‘embrace variance’ and ‘believe in risk, in leaps into the unknown’ in the context of intentionally creating maximally intelligent and capable new things as quickly as possible? It means losing control over the future, it means having no preferences other than fitness. It means death. Which could be with or without the ‘and that’s good, actually’ line at the end that is logically implied.
And later, in the next section, we have this:
Like the rest of the manifesto, this has historically been very true, is currently very true on most (but notice, not all) margins, and quite obviously we should not expect that relationship to hold if we build machines that think better than we do.
What to make of all this? One certainly must take in the irony.
Again, on his list of enemies, one of these things is not like the others. Whereas when one hears the responses from those who affiliate with the rest of his list, it is hard not to sympathize with Marc’s need to go off on an extended rant here.
Gary Marcus offers his response here, in case you hadn’t finished filling out your bingo card. Thalidomide!
Vice wins the hot take contest with “Major Tech Investor Calls Architect of Fascism a ‘Saint’ in Unhinged Manifesto.”
Fact check technically accurate, I think?
There is also this gem of willful misunderstanding:
There is no mystery here. The official ideology of Silicon Valley is to build cool stuff and make money. That is true whether or not they live up to their ideals.
Also, sure, there are a bunch of them who really don’t want us all to die and have noticed that one might be up in the air, another highly overlapping bunch of them think maybe doing good things for people is good, and on the flip side others think that building cool stuff is The Way even if it would look to a normal person like the particular cool stuff might actually go badly up to and including getting everyone killed. They are people, and contain multitudes.
What frustrates me most is that Marc Andreessen keeps talking about general techno-optimism, I agree with him on every margin except frontier or open source AI models, and yet he seems profoundly uninterested in all the other issues, where I mostly think he is right. Many others are in the same boat, for example David Deutsch agrees with most of the manifesto. Rob Bensinger actively likes not only its substance but its style, if you added a caveat about smarter-than-human AI. It’s time to build. How about we work together on our common ground?
Other People Are Not As Worried About AI Killing Everyone
Do they actually believe this?
I suppose there are four possibilities here.
What else? That’s all I got.
The rest of the thread, by others, only gets worse. We say ‘don’t build an AGI before we know how to have it not kill everyone’ and repeatedly say ‘we do not want anyone to have AGI [at this time]’ they hear both ‘managed decline of humanity’ and ‘hand the lightcone over to Sam Altman.’
Except, I’m used to it at this point, you know? I except nothing less, and nothing more. Nor do I believe there is some way to say ‘I can’t help but notice that building an AGI right now would probably kill everyone, maybe we should therefore not do that’ without getting these kinds of reactions.
John Carmack chooses open source software as his One True Cause, a right in the absolutist ‘Congress shall make no law’ sense alongside free speech. Many in the comments affirming this position. Better dead than closed source, I suppose.
Nassim Taleb continues to frustrate, because he is so close to getting it, and totally should be getting it.
This is exactly (part of) the correct way to think about AI risk. The risks are not Gaussian. The whole point of the game is to prevent ruin, to keep playing, and this is a whole new level of ruin and humanity not getting to keep playing. If things go wrong, the loss is infinite, and you can’t draw conclusions from it not having happened yet. Nor will it be, when and if it does happen, a black swan or unlikely event. There has to be an armor-piercing question that would get him thinking about this. What is it?
Other People Wonder Whether It Would Be Moral To Not Die
Jessica Taylor says many AI discussions come from a place of philosophical confusion, which I agree with, and then questions whether a deontologist can consider it moral to align an AI or worry about AI existential risk, since the AI capable of causing extinction would be more moral than we are? That definitely is strong evidence of profound philosophical confusion.
My general stance on such matters is that Wrong Conclusions are Wrong, if your deontology cannot figure out that the extinction of humanity would be worse than aligning an AI, then what needs to be extinguished or aligned is your version of deontology. Morality is to serve us, not the other way around.
My solution to this particular dilemma, aside from not centrally being a Kantian deontologist, is that (up to a point, but quite sufficiently for this case) a universal rule of sticking up for one’s own values and interests and everyone having a perspective is a much better universal system than everyone trying to pretend that they don’t have such a perspective and that they their preferences should be ignored.
The Lighter Side
At least by this metric.
Names are important.
Illustrations as well.
OK, who didn’t do the copyright filtering? Dalle-3 from Davidad:
America, f*** yeah.
Packy McCormick: DALL•E3 is America-pilled “Please make me an image of the best possible future for humanity”
What to do in the case of a Dangerous Capability Alarm.