You stochastic parrot!
This, but unironically. What's the magic thing which keeps a human from qualifying as a stochastic parrot? And won't that magic thing be resolved when all the LLMs deploy with vector databases to resolve that pesky emphasis on symbols?
Because they have never shown an example in any of their takedowns of GTP4 that I have not also heard a human say.
I am trying to come up with a reason this isn’t 99%? Why this is the hard step at all?
tl;dr (pls correct me if this summary is wrong) the argument is that the more likely hypothesis is that the model will be simply myopically obsessed with getting reward. (Though I think Joe might still think that the model also might be actually alignd/honest/etc.? Not sure why, would need to reread.)
Another busy week. GPT-5 starts, Biden and Xi meet and make somewhat of a deal, GPTs get explored, the EU AI Act on the verge of collapse by those trying to kill the part that might protect us, multiple very good podcasts. A highly interesting paper on potential deceptive alignment.
Despite things quieting down the last few days, it is still a lot. Hopefully things can remain quiet for a bit, perhaps I can even get in more work on that Jones Act post.
Table of Contents
Language Models Offer Mundane Utility
GPTs mostly do not unlock strictly new capabilities, but they do let people store and share structured prompts.
Beware trivial inconveniences. Even a slight push towards making the UI easier and faster to use could plausibly be a huge game.
Alas, my first attempt to create a useful and unique GPT failed, but I was not trying all that hard and when I have time I will try again – it is likely I needed to write a document as guide for my concept to work, perhaps script some commands, and I instead tried to type my requests into a box.
But let’s not get carried away.
Toby Ord tests GPT-4 on its understanding of ethics. Without a custom prompt, it knows many facts but does poorly, making lots of mistakes that show lack of central understanding, in a way that would be worrisome in an undergraduate. With a custom prompt for generic academic responses, Toby reports errors declined by about two thirds and helped quite a bit, but the result was still ‘undergraduate who read some stuff and is preparing a quick summary.’ I continue to think that ‘make it ethical’ is going to be a lot harder to pull off with sufficient rigor than people realize, and a good way to get yourself killed.
Utility everywhere.
Use AI to parse zoning regulations, figure out what they are, and connect them to outcomes. Will cover the results in the next housing roundup.
Who contextualizes it better?
I have not had the opportunity to run experiments. This indicates that GPT-4’s effective context window is now optimized at 16k and is superior up to 27k, but that if you are using most of a 100k window you still want to use Claude-2.
Language Models Don’t Offer Mundane Utility
You can play 20 questions, but it is incapable of having a fixed answer in mind, finds Nature paper. Roon points out this illustrates how Nature papers are typically six months behind Twitter and LessWrong posts. Note that this is a form of deception, the model will lie to you claiming it has a fixed answer.
Which models hallucinate more? We now have a supposed leaderboard for that. I was skeptical that Claude’s accuracy is so low here, or that Llama 2’s is that high. There is a lot of subjectivity in the assessment.
Then Jim Fan offered a helpful breakdown, which he updated after seeing the blog post explaining the test. I am convinced this was not a robust test. That does not mean it is useless, but use caution.
Make AI characters for a new MMO, also metaverse and also blockchain blockchain blockchain?
Sounds about right. If you try to do everything at once, you end up doing nothing well.
I can think of at least four ways the first great (clean) AI-NPC game might work.
GPT-4 Real This Time
OpenAI pausing ChatGPT+ sign-ups until it can fully meet current demand.
ChatGPT+ accounts going on Ebay for a modest premium. Our price cheap. Market expects this not to last very long.
GPTs can and will share with the user any information they are given, straight up providing links to download the knowledge files (RAG).
I wondered right away how much harder you can make this via putting instructions to avoid this into the GPT in question, my guess is not enough to stop anyone who cares. The answer from Borriss is that they are on version 5.0 of instructions meant to defend you from attacks, because the first four versions already got cracked.
Version 5.0 is a trip into what maximum emphasis looks like to GPT-4.
Note that this means users cannot upload files. For some uses, that is going to be a key loss of functionality.
Rowan Cheung offers advice on building a good GPT. Protect the instructions as above, use actions, use data, and of course promote the hell out of it.
Fun with Image Generation
Dalle-3 in the new integrated GPT-4 reported to be refusing many clearly-fine image requests. Tetramorph speculates that Dall-3 generalized that it gets rewarded for refusals, then the refusal gets rationalized. Seems like a case of ‘no way, that is too stupid, they’d never let that happen’ but as the years go by I gain experience.
So it does things like refuse to depict two Germans in a bar near MIT discussing AI.
The good news is, if it won’t give you what you need, such as an image it can obviously do, provide GPT-4 with some helpful hype. You can do it, GPT-4. We believe in you. Warning: May be training intelligent agents to deceive.
Reminder that AI went from sub-human to superhuman very quickly at art styles, even if some art aspects are not yet superhuman, and it is a task with a soft upper bound. Point is that other skills likely to witness similar patterns.
Deepfaketown and Botpocalypse Soon
Terrorists reported by Wired to be using (they say ‘exploiting’) generative AI tools to manipulate images at scale and bypass hash-sharing. So once again, generative AI is not about satisfying demand for high quality fakes. Here the demand is to evade automated detectors, as part of the eternal filtering/censorship/spam arms races. Both sides can play at that game. If you were counting on your old static strategies, that is not going to work. I do expect new detection strategies.
Ed Newton-Rex resigns from leading the audio team at Stability AI because he believes training generative AI on copyrighted material is not fair use. Whereas Stability AI strongly disagrees.
A Bad Guy With an AI
Which knowhow is dangerous? Verge warns ‘AI suggested 40,000 new possible weapons in just six hours.’
Whenever you hear someone talking about how long something took an AI, chances are the result is not so dangerous.
Indeed, in this case Vijay Pande and Davidad both seem clearly correct. Getting locally deadly substances is already easy, it is scaling and distribution that is hard. Chemical weapons are a human problem, not an AI problem.
Indeed, the surprising fact about the world is that there are bad actors, with access to locally deadly means, who consistently fail to cause much death or even damage.
Davidad nails what makes an AI-enabled threat scary, which is if it scales.
Chemical weapons do not scale. Biological weapons scale.
If you create a pandemic virus, and you infect people at a few airports, the virus does the rest. If you have a sufficiently effective cyber attack, it is trivial to replicate and distribute it. If an AI becomes capable of recursive self-improvement, or of acquiring more resources over time that are then used to make copies of itself and further acquire resources, same issue.
A demonstration on chemical weapons, if used to scare people about what a bad actor could do with deadly compounds, as Verge presents it here, is indeed (mostly) doomer BS. I say mostly because a sufficiently large leap in some combination of toxicity, ease of manufacturing and transport and difficulty of detection could present a step change. Technologies often start out as curiosities, then when made ‘ten times better’ are no longer curiosities. Reaching adaptation thresholds can then cause users to learn how to further improve use, which in this case could go quite badly. It is not at all an existential threat, but it certainly does not sound awesome.
As a demonstration of what a similar system could do with biological weapons or other reversals of benign intent where the threats could scale, it is of concern.
Also of concern, periodic reminder for whose who need to hear it: You get to choose one (1) of the following for your AI model: Read untrusted input sources including the internet OR be able to take direct actions that can impact things you care about.
My assumption is the opposite. It cannot be fixed. Or at least, no one knows how to fix it. OpenAI knows this. The choice is allow users to take this risk or cripple the system.
I think it is mostly fine to make such systems available despite their vulnerabilities. People can make choices to manage their risk. They do need to be made aware of those risks.
They Took Our Jobs
South Park goes Into the Panderverse. If that sounds like fun given it’s listed here, watch the episode before I spoil it.
What is worth noting is that the underlying economics make some important mistakes. I imagine Alex Tabborak screaming in frustration.
One of the two plots is that AI is taking everyone’s job unless it requires the use of arms, and no one knows how to do stuff anymore. So the roles flip, those with college degrees are useless hanging out outside Home Depot looking for work that never comes, while handymen get their services bid up higher and higher until they’re competing to go into space.
The joke of course is that everyone could do the things themselves, the instructions are right there. This is even applied to Randy Marsh, who somehow is a weed farmer who does physical work all the time and operates a successful business, yet cannot with simple instructions attach an oven door. It highlights two importance concepts.
One is Baumol’s cost disease and general labor supply. If AI takes over many jobs without creating any new ones, then those who remain employed would see wages decline, not increase, because there would now be a surplus of labor. If no one has to work at the office, everyone starts a string quartet.
The other is occupational licensing and barriers to entry. It is silently a righteous explanation of the dangers of what happens when we bar people from doing things or holding jobs and creating an artificial cartel monopoly.
Also, when the episode ends and the supply of physical labor expands, everyone else is still out of work.
What about automating the CEO? Well, partially.
The job of CEO will belong to humans until things are rather much farther along. But yes, there are lots of things a CEO does all day that could be vastly streamlined. The workload can be made much more efficient, and I do expect to see that.
A close reading of the AI provisions of the Actors deal.
Presumably actors are allowed to negotiate individual deals that require more approvals. And presumably, if studios or directors do this in ways that piss off actors, they will get a reputation, and actors will demand compensation or rights.
That seems like a potentially large oversight if true and not fixed. There would be great temptation to make such sequels, and we definitely do not need more sequels. My guess is this is unintentional, and the rule was intended to stay within the production, since Schedule F is all about a fixed fee for a given small production.
That seems like a logical way to handle the situation. There is an obvious danger, which is that there are those would happily let their likeness be used for zero or even negative cost, and if it is AI then over time the fact that most of them cannot act might be an acceptable price.
This does seem like a rather large loophole. If the first amendment prevents closing the loophole generally, watch out. I do not think we should allow this. Using someone’s likeness to portray them doing things they didn’t do without their consent seems very bad, even without considering some of the things one might portray.
Skipping ahead a bit.
It’s not like you can retroactively give them the experience of a set that does not exist. What else could we do here?
If I was SAG I would have placed a very high priority on getting studios not to do this, but how is it different from renting the likeness of someone willing to sell out for almost nothing? The only solution would have been to use their leverage, while they had it, to ensure minimum payments for use of any human likeness. And then, perhaps, if they used an AI Object instead, they would have to pay into a generic fund.
In the long run, if AI actors can provide the product as well as human ones, the humans of SAG have very little leverage. SAG’s hope, and I think it is a strong one, is that humans will actively greatly prefer to see human actors over AI actors, even if the AI actors are objectively as good. We also likely get to see a lot more live performance and theater, the same way music has shifted that way in an age of better tech. If AI gets to this level, anyone in the world will be able to create movies, and hamstringing the studios won’t work.
Freelancers perhaps starting to feel the pinch.
Often economists will blow small changes out of proportion, but the earnings decline of almost 10% seems like a big deal, coupled with an almost 3% decline in freelance jobs. This won’t generalize everywhere but it is a concrete sign.
Once again: Michael Strain explains the standard economic argument for why AI will not cause mass unemployment. He says we have nothing to worry about for decades, same as every other technological improvement. He says this because he believes that AI will be unable to replace all human workers for decades. In which case, if we expand from all to merely most (or some critical threshold), since AI would then also take most of the new jobs that would replace the old jobs, then sure. That is a disagreement about timelines and capabilities. One in which I believe Strain is highly overconfident. He also does not notice that AI may not remain merely another tool, and that in a world in which AI can do all human jobs, the economic profits and control might not remain with the humans.
Get Involved
Palisade is hiring researchers. Goal would be to find and warn of potential misuse.
Anthropic is hiring an experienced Corporate Communications Lead and a Head of Product Communications. Ensuring Anthropic’s communications lead communicates the right things, especially regarding existential risks and the need for the right actions to mitigate them, could be a pretty big game. Even if you think Anthropic is net negative, there could be a lot of room for improvement on the margin, if you are prepared to have a spine. Product communications is presumably less exciting, more standard corporate.
As always with Anthropic, sign of impact is non-obvious, and it is vital to use the process to gather information, and to make up your own mind about whether what you are considering doing would make things better. And, if you decide that it wouldn’t, or these are not people one can ethically work with, then you shouldn’t take the job. Same goes for any other position in AI, of course, although most are more clearly one way (helping core capabilities, do not want) or the other (at least not helping core capabilities).
Nat Friedman is looking to fund early stage startups building evals for AI capabilities. If this is or could be you, get in touch.
Topos Institute is taking applications for its 2024 Summer Research Associate program. Apply by February 1.
Google DeepMind expanding Experience AI, an introductory course on Rasberry Pi for 11-14 year olds to learn about foundational AI.
Introducing
DeepMind GraphCast for weather forecasting. Claims unprecedented 10-day accuracy in under a minute.
DeepMind releases Lyria, their most advanced music system to date, including tools they say will help artists in their creative process, and some experiment called Dream Track that lets select artists generate content in the styles of select consenting artists. SynthID will be used to watermark everything. I don’t see enough information to know if any of that is useful yet.
Gloves that translate sign language into sound. Now we need to do hard mode: Gloves that move your hands to translate your voice into sign language.
Claim that this is Whisper except 6x faster, 49% smaller and still 99% accurate (paper).
In Other AI News
OpenAI working on GPT-5. ‘Working’ not training, no timeline.
OpenAI says its six member board will ‘determine when it has reached AGI’ which it defines as ‘a highly autonomous system that outperforms humans at most economically valuable work.’ Any such system will be excluded from any IP deals and other commercial terms with Microsoft.
The discussion is framed as about money, would would reap the returns from AGI.
I say, if you have claim you have AGI, and you cannot stop Microsoft from getting control over it, then you are mistaken about having AGI.
Microsoft briefly banned ChatGPT access to employees over security concerns. It is not clear if this was related to the temporary surge in demand around feature upgrades, it was an error that got corrected, or if it was something else.
Nvidia continues to do its best to circumvent the China chip export restrictions. Nvidia seems to think this is a game. We set technical rules. They find ways around them. They are misaligned. More blunt measures seem necessary.
Nvidia is responding to incentives. It has made clear they think maximally capable chip distribution is good for its business. If we fail to provide sufficient incentives to get Nvidia on board with ‘not in China,’ that is America’s failure.
How big a deal is this? Opinions differ.
OpenAI attempting to poach top AI talent from Google with packages worth $10 million a year. This is actually cheap. If they wanted, those same people could announce a startup, say the words ‘foundation model’ and raise at least eight and more likely nine figures. Also, they are Worth It. Your move, Google. If you are smart, you will make it $20 million for the ones worth keeping.
I would also note that if they were to offer me $10 million a year plus options to work on Superalignment, I predict I would probably take it, because when you pay someone that much you actually invest in them and also listen to what they have to say.
Barack Obama pushes AI.gov.
fly51fly reports on a new paper claiming to conceptually unify existing algorithms for learning from user feedback, and providing improvement. I asked on Twitter and was told it is likely a real thing but not a big deal.
An attempted literary history of Science Fiction to help explain the origins of the libertarian streaks you see a lot in the AI game. I never truly buy this class of explanations or ways of structuring artistic history, but what do I know. Literature and in particular science fiction was definitely a heavy influence on me getting to a similar place. But then I noticed that the way almost all science fiction handles AI does not actually make logical sense, and the ‘hard SF’ genre suddenly is almost never living up to its claims, and this has unfortunate implications.
AI boom means startup boom.
I notice there are two different mechanisms here. Both seem right. I totally buy that AI is causing massively more entry into the tech startup space, which means that YC gets its pick of superior talent. I also buy that it means more real opportunity and thus higher chance of success. I would warn there of overshoot danger, as in Paul Graham’s statement that none of the AI startups in the last round were bogus. Is that even good? Is the correct rate of bogosity zero?
I also note that Paul Graham continues to talk about likelihood of success rather than expected value of success – he’s said several times he would rather be taking bigger swings that have lower probability, but he finds himself unable to do so.
Periodic reminder:
Quiet Speculations
Sam Altman says GPT-5’s abilities are unpredictable.
Greg Brockman gave a different answer to French president Emmanuel Macron.
Which is it? I presume this is spot on:
I would say that Altman’s statements here are helpful and clarifying, while Brockman’s are perhaps technically correct but unhelpful and misleading.
A unified statement is available as well: ‘We have metrics like log-loss in which so far model performance has been highly predictable, but we do not know how that will translate into practical capabilities.’
Anti Anti Trust
If those who do not want to regulate AI are sincere in their (in most other contexts highly reasonable and most correct) belief that regulations almost always make things worse, then they should be helping us push hard for the one place where what is needed badly in AI is actually deregulation. That is Anti-trust.
This good overview of AI and anti-trust mostly discusses the usual concerns about collusion over pricing power, or AI algorithms being used to effectively collude, or to make collusion easier to show and thus more blameworthy. As usual, the fear or threat is that anti trust would also be used to force AI labs to race against each other to make AGI as fast as possible, and punish labs that coordinated on a pause or on safety precautions. We need to create a very explicit and clear exemption from anti-trust laws for such actions.
The Quest for Sane Regulations
First, a point where those who want regulation, and those opposed to regulation, need to understand how the world works.
You do your best to fight for the best version you can, at first and over time. Or, if you are a corporation, for the version good for your business. If that isn’t better than nothing, and nothing is an option, you should prefer nothing. In this case, I do think it is better than nothing, and nothing is not an option.
However, there is the EU AI Act, where even more than usual key players are saying ‘challenge accepted.’ It’s on.
Even by EU standards, things went off the rails once again with the AI Act. Gary Marcus calls this a crisis point. The intended approach to foundation models was a tiered system putting the harshest obligations on the most capable models. That is obviously what one should do, even if one thinks (not so unreasonably) the response to GPT-4’s tier should be to do little or nothing.
Instead, it seems that local AI racers Mistral in France and Aleph Alpha in Germany have decided to lobby heavily against any restrictions on foundation models whatsoever.
I looked at the link detailing Mistral’s objections, looking for a concrete ‘this policy would be a problem because of requirement X.’ I could not find anything. They ask for the ‘same freedom as Americans’ because they are trying to ‘catch up to Americans.’ So they object to a law designed explicitly to target American firms more?
This echoes a bunch of talk in America. Those who insist no one can ever regulate AI ever cry that reporting requirements that apply only to Big Tech would permanently enshrine Big Tech’s dominance, so we have to not regulate AI at all. Or, in this case, not regulate the actually potentially dangerous models at all. Instead regulate everything else.
The ‘risk-based approach’ would be endangered? Wow, not even hiding it. It would jeopardize innovation to treat larger models, or more capable models, differently from other models? I mean, if you want to ‘innovate’ in the sense of ‘train larger models with zero safety precautions and while telling no one anything’ then, yeah, I guess?
So a company, Mistral, with literally 20 employees and a highly mediocre tiny open source model, looking to blitz-scale in hopes of ‘catching up,’ is going to be allowed to sink the entire EU AI Act, then? A company that lacks the capability to comply with even intentionally lightweight oversight should have its ability to train frontier models anyway prioritized over any controls at all?
The good news is this is the EU. They have so many other regulatory and legal barriers to actually creating anything competitive that I have little fear of their would-be ‘national champions’ exerting meaningful pressure on the leading labs under normal circumstances. And if Mistral is so ill-equipped that it cannot meet even an intentionally vastly lighter burden, then how was it going to train any models worth worrying about?
But over time, if a pause becomes necessary, over time, this could be an issue.
Or, even worse, perhaps the EU AI Act might drop only this part, and thus not regulate the one actually dangerous thing at all, in the explicit hopes of creating a more multi-polar race situation, while also plausibly knee-capping the rest of AI in the EU to ensure no one enjoys the benefits?
Indeed there have long been forces in the EU explicitly pushing for the opposite of sensible policy, something completely bonkers insane – to outright exempt the actually dangerous models from regulation. All the harm, none of the benefits!
A trip through memory lane, all of this has happened before:
This is where ‘regulate applications not models’ gets you, where the things that are dangerous are unregulated and the thing that might kill everyone is completely unregulated. One could say this was far less insane back in 2022, and a little bit sure you could have a different threat model somehow, but actually no, it was still rather completely insane. Luckily they reversed course, but we should worry about them going this way again.
Yann LeCun, of course, frames this as open source AI versus proprietary AI, because you say (well, anything at all) and he says open source. Except this is exactly the regulatory move OpenAI, Microsoft and Google actively lobbied against last time. The source here claims Big Tech is once again largely on the side of not regulating foundation models.
Because of course they are. Big Tech are the ones training the foundation models.
It is a great magician’s trick. There is a regulation that will hit Big Tech. Big Tech lobbies against it behind the scenes, and also allows its allies to claim they are using it to fight Big Tech. Including Meta and IBM, who are pretending not to be Big Tech themselves, because they claim that what matters is open source.
But once again, none of this mentions the words ‘open source.’ Those who want open source understand that open source means zero safety precautions since any such precautions can be quickly stripped away, and no restrictions on applications or who has access. Because, again, that is what open source means. That is the whole point. That no one can tell you what to do with it.
So they treat any attempt to impose safety precautions, to have rules against anything at all or requiring the passing of any tests or the impositions of any limits on use or distribution, as an attack on open source.
Because open source AI can never satisfy any such request.
So it is a Baptists and bootleggers in reverse. The ‘true believers’ in open source who think it is more important than any risks AI might pose are the anti-Baptists, while the Big Tech anti-bootleggers go wild.
The later parts of the Euractiv post say that if this isn’t handled quickly, the whole AI Act falls apart because no one would have the incentive to do anything until mid-2024’s elections. So it does sound like the alternative is simply no AI Act at all. Luca Bertuzzi confirms this, if they can’t iron this out the whole act likely dies.
The whole ‘do not regulate at the model level’ idea, often phrased as ‘regulate applications,’ is madness, especially if open source models are permitted. The model is and implies the application. Even for closed source we have no idea how to defend against adversarial attacks even on current systems, let alone solve the alignment problem where it counts. And when it matters most, if we mess up, what we intended as ‘application’ may not much matter.
If the rules allow anyone who wants to, to train any model they want, and AI abilities do not plateau? Whether or not you pass some regulation saying not to use that model for certain purposes?
The world ends.
EU, you had one job. This was your moment. Instead, this is where you draw the line on ‘overregulation’ and encouraging a (Mistral’s term) ‘risk-based approach’?
Not regulating foundation models, while regulating other parts of AI, would be the worst possible outcome. The EU would sacrifice mundane utility, and in exchange get less than no safety, as AI focused on exactly the most dangerous thing and method in order to escape regulatory burdens. If that it the only option, I would not give in to it, and instead accept that the AI Act is for now dead.
Alas, almost all talk is about which corporations are behind which lobbying efforts or would profit from which rules, instead of asking what would be good for humans.
Bostrom Goes Unheard
Nick Bostrom goes on the podcast Unheard, thoughtful and nuanced throughout, disregard the title. First 80% is Bostrom explaining AI situation and warning of existential risk. The last 20% includes Bostrom noting that there is small risk we might overshoot and never build AI, which would be tragic. So accelerationists responded how you would expect. I wrote a LessWrong-only post breaking it all down in the hopes of working towards more nuance, to contrast discussion styles in the hopes of encouraging better ones, and as a reference point to link to in further discussions.
The Week in Audio
Barack Obama on Decoder, partly about AI, partly about constitutional law. I love Obama’s precision.
Also his grounding and reasonableness. As he notes, every pioneer of new tech in history has warned that any restrictions of any kind would kill their industry, yet here many of them are anyway. Yet in all this reasonableness, Obama repeats the failure of hiss presidency, missing the important stakes, the same way he calls himself a free speech absolutist yet failed as President to stand for civil liberties.
Here he continues to not notice the existential risk, or even the potential for AIs much stronger than current ones, or ask what the future might actually look like as the world transforms. Instead he looks at very real smaller changes and their proximate mundane harms from, essentially, diffusion of current abilities.
Obama is effectively a skeptic on AI capabilities, yet even as a skeptic notices the importance of the issue and (some of) how fast the world will be changing. Even thinking too small, such as pondering AI taking over even a modest portion of jobs, rightfully has his attention.
Someone Picked Up the Phone
China and United States to make a clear win-win deal.
Is this the central threat model? Will it be sufficient? Absolutely not.
Is it good to first agree on the most obvious things, that hopefully even the most accelerationist among us can affirm? Yes.
I hope we can all agree that ‘human in the loop’ on all nuclear commands and autonomous weaponry would be an excellent idea.
Jeffrey Lewis expresses skepticism that this will have any teeth. Even an aspirational statement of concern beats nothing at all, because it lays groundwork. We will take what we can get.
Also note that yes, Biden is taking AI seriously, this is no one-off deal, and yes we are now exploring talking to the Chinese:
Mission Impossible
The most positive perspective on the movie yet?
This seems like a solid take, especially if one of the galaxy brain interpretations of The Entity proves true in Part 2.
There’s a lot of nonsense, but what matters is that the movie shows the heroes and key others understanding the stakes and risks that matter and responding reasonably, and it also shows those with power acting completely irresponsibly and unreasonably and getting punished for it super hard in exactly the way they deserve.
Rhetorical Innovation
Pause AI once again calls upon everyone to say what they actually believe, and stop with the self-censorship in order to appear reasonable and set achievable goals. What use are reasonable-sounding achievable goals if they don’t keep you alive? Are you sure you are not making the problem worse rather than easier? At a minimum, I strongly oppose any throwing of shade on those who do advocate for stronger measures.
Old comment perhaps worth revisiting: Havequick points out that if people knew what was going on with AI development, it would likely cause a backlash.
Andrew Critch responds to the Executive Order with another attempt to point out that the only alternative to an authoritarian lockdown on AI in the future is to find a less heavy handed approach that addresses potential extinction risks and other catastrophic threats. If the technology gets to the point where the only way to control it is an authoritarian lockdown, either we will get an authoritarian lockdown, or the technology goes fully uncontrolled and at minimum any control over the future is lost. Most likely we would all die. Trying to pretend there is no risk in the room, and that governments will have the option not to respond, is at this point pure denialism.
If AI and ML continue to advance, there will either be compute governance, or there will be no governance of any kind. I realize there are those who would pick box B.
Washington Post covers the anti-any-regulation-of-any-kind-ever rhetoric of the accelerationist crowd, frames those voices as all of Silicon Valley outside of Big Tech.
Robert Wiblin proposes a 2×2:
Daniel Eth fires shots, counter proposes:
Does this cut reality at its joints? It is a reasonable attempt. If AI is merely a new consumer product, there are still harms to worry about and zero restrictions is not going to fly, but seems right to push ahead quickly. However, if AI will change the nature of life, the universe and everything, then we need to be careful.
If you want to open source everything, this is indeed the argument you need to make: That AI is and will remain merely another important new class of consumer product. That its potential is limited and will hit a wall before AGI and certainly before ASI.
I believe that is probably wrong. But if you made a convincing case for it, I would then think the open source position was reasonable, and would then come down to practical questions about particular misuse threat models.
Open Source AI is Insafe and Nothing Can Fix This
Civitai allows bounties to encourage creation of AI image models for particular purposes. Many are for individual people, mostly celebrities, mostly female, as one would expect. An influencer is quoted being terrified that a bounty was posted on her. 404 Media identified one bounty on an ordinary woman, which most users realized was super shady and passed on, but one did claim the bounty. None of this should come as a surprise.
You can get a given website to not do this. Indeed Civitai at least does ban explicit images that mimic a real individual, other rules could be added. One could make people do a bit of work to get what they want. What you cannot do is shut down anything enabled by open sourced AI models, in this case Stable Diffusion. Anyone can train a LoRa from 20 pictures. Combining that with a porn-enabled checkpoint means rule 34 is in play. There will be porn of it. That process will only get easier over time.
So of course a16z is investing. Very on brand. Seems like a good investment.
Open source means anyone can do whatever they want, and no one can stop them. Most of the time, with most things and for most purposes, that is great. But of course there are obvious exceptions. In the case of picture generation, yes, ‘non-consensual AI porn’ is going to be a lot of what people want, CivitAI does not have any good options here.
I also do not know of anyone providing a viable middle ground. Where is the (closed source) image model where I can easily ask for an AI picture of a celebrity, I can also easily ask for an AI picture of a naked person, but it won’t give me a picture of a naked celebrity?
It is up to us to decide when and whether the price of the exceptions gets too high.
Remember Galactica? Meta put out a model trained on scientific literature, everyone savaged it as wildly irresponsible, it was taken down after three days.
There was some talk about its history, after certain people referred to it as being ‘murdered.’
VentureBeat offers a retrospective.
Meta is still both saying their models are fully open source, and saying they are not ‘making it available to everyone.’ This is a joke. It took less than a day for Llama 1 to be available on torrent to anyone who wanted it. If Meta did not know that was going to happen, that’s even worse.
Ross Tyler, the first author on the Galactica paper, gives the TLDR from his perspective.
No safety precautions. Got it. Not that this was a crazy thing to do in context.
A model was made available with one intended use case. People completely ignored this, and used it as an all-purpose released product, including outside of its domain.
So was it taken down? It was no longer available through the official demo, but it has been available as an open source model for a year on HuggingFace.
Yeah, the demo decision was dumb, the whole thread is making that clear. Should Ryan be proud of what he created? Sounds like yes. But here once again is this idea that openness, which I agree is generally very good, can only ever be good.
Open source is forever. If you put it out there, you cannot withdraw it or take it back. You can only patch it if the user consents to it being patched. Your safety protocols will be removed.
So no, in that context, if things pose potential dangers, you do not get to say it is better to do something and regret, than never do it at all.
Aligning a Smarter Than Human Intelligence is Difficult
ARIA’s Suraj Bramhavar shares their first programme thesis, where they attempt to ‘unlock AI compute hardware at 1/1000th the cost.’ The hope is that this will be incompatible with transformers, differentially accelerating energy-based models, which have nicer properties. Accelerationists should note that this kind of massively accelerationist project tends to mostly be supported and initiated by the worried. The true accelerationist would be far more supportive. Why focus all your accelerating on transformers?
I missed this before, from October 5, paper: Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! Costs are trivial, and not only for Llama-2 and open source. If allowed to use fine tuning, you can jailbreak GPT-3.5 with 10 examples at cost of less than $0.20. Even well-intentioned fine tuning can break safety incidentally. This suggests that any fine-tuning ability, even closed source, should by default be considered a release of the fully-unlocked, happy-to-cause-harm version of your model.
Will AIs fake alignment during training in order to get power, asks paper of that title by Joe Carlsmith of Open Philanthropy. Here is the abstract.
The paper is 127 pages, even for me that’s usually ‘I aint reading that but I am happy for you or sorry that happened’ territory. But we can at least look at the summary.
He classifies four types of deceptive AI:
The question feels rather begged here, or perhaps the wrong question is being asked?
The better question is, I would think, will the AI act during training as if it has goals that are not effectively some combination of ‘maximize the chance of positive feedback’ or ‘predict the next token’?
(For sticklers: Note the ‘as if.’ I am attempting to do some combination of simplifying and abstracting, rather than utterly failing to understand how any of this gradient descent or RLHF business works, I know models do not ‘maximize reward’ etc etc.)
If the answer is yes, that the AI is doing something functionally similar to picking up goals, then your goose is highly cooked. Of course you should then assume deceptive alignment within training, because that is how an AI would achieve almost any goal.
I do not need convincing that, given the creation of a goal-directed, situationally-aware model, it would begin scheming. Yes. Of course.
Joe on the other hand says he puts the probability of scheming here at 25%:
I am trying to come up with a reason this isn’t 99%? Why this is the hard step at all?
You say ‘scheming.’ I say ‘charting path through causal space to achieve goal.’
I also say ‘what almost every student does throughout their education’? They continuously figure out the teacher’s password and give it back to them?
The ‘aligned’ students are the ones who both guess the teacher’s password, and also do their best to learn the material, and also learn to embody ‘aligned’ values. Do they still ‘scheme’ as described here? Oh, they scheme. Indeed, when one of them refuses to scheme, it is seen as a bug, and we stamp it right out.
If a model does not ‘scheme’ in this spot, and you did not take any explicit precautions to not do so, I question your claim that it is above human level in intelligence. I mean, come on. What we are talking about here does not even rightfully rise to the level of scheming.
Most people I surveyed do not agree with my position, and 38% think the chance is under 10%, although 31% said it was more likely than not.
I think my main disagreement with Joe is that Joe says he is uncertain if this definition of ‘scheming’ is a convergently good strategy. Whereas I think of course it is. With others, it is harder to tell.
Or perhaps the part where it is done specifically ‘to seek power’ is tripping people up. My default is that the ‘scheming’ will take place by default, and that this will help seek power, but that this won’t obviously be the proximate cause of the scheming, there are other forces also pushing in the same direction.
Much of the reasoning that follows thus seems unnecessary, and I haven’t dug into it.
So what makes the model likely to de facto have goals in this context? That seems to have been covered in another paper of his previously. If I had more time I would dig in deeper.
Thoughts of others on the paper:
Davidad says yes, but that more efficient architectures offer less risk of this.
My response would be that for efficiency to stop this, you would need things to be tight enough that it wasn’t worthwhile to fulfill the preconditions? If it is worth being situationally aware and having goals, then I don’t see much further cost to be paid. If you can squeeze hard enough perhaps you can prevent awareness and goals.
Quintin Pope digs directly into later technical aspects I didn’t look into. This quote is highly abridged.
Again I see the main scheming cost contained within the premise requirements. What is so galaxy-brained about ‘when you are being tested, you ace the test, when you are not you act differently’? I admit I am not investing enough to follow Quintin’s technical claims here.
This feels like the right skepticism to me, in this particular setting? As I see it Nora is largely questioning the premise. Current techniques, she is saying (I think?) won’t result in sustained situational awareness or goal retention, even if you had them they would not ‘pay enough rent’ to stick around.
Those ways for us to not get into trouble here seem far more plausible to me. I do strongly disagree with a lot of what she says in the quoted thread, in particular:
I think (1) we already see massive naked-eye-obvious Goodhart problems with existing systems, although they are almost entirely annoying rather than dangerous right now (2) Goodhart will rapidly get much worse as capabilities advance, (3) early stopping stops being so early as you move up but also (4) this idea that Goodhart is a bug or mistake that happens e.g. when you screw up and overtrain, rather than what you asked the system to do, is a key confusion (or crux).
I would even say proto-deceptive alignment currently kind of already exists, if you understand what you are looking at.
Will an AI attempt to jailbreak another AI if it knows how and is sufficiently contextually aware, without having to be told to do so?
Yes, obviously. It is asked to maximize the score, so it maximizes the score.
Once again, there is no natural division ‘exploit’ versus ‘not exploit’ there is only what the LLM’s data set indicates to it is likely to work. Nothing ‘went wrong.’
Responsible Innovation Labs issues voluntary Response AI guidelines, signed by 35+VC firms representing hundreds of billions of dollars and 15+ companies, including Infection AI, SoftBank and Bain Capital.
Great idea. But is it real? Are the commitments meaningful?
If a company refuses the commitments, you can still invest in them. All you are promising is to make reasonable efforts, and consider this during diligence. In practice, this does not commit you to much of anything. For an LP there is another layer, you are encouraging your firm to take reasonable efforts to encourage.
OK, what is the ultimate goal here?
That is a mission statement.
All right. This at least is something. They are promising to disclose their safety evaluations (and therefore, implicitly, any lack thereof) and results of adversarial testing (ditto).
Commitment to do risk assessments, periodic auditing, adversarial testing and red teaming.
Most of this is corporate-responsibility-speak. As with all such talk, it sounds like what it is, told by a drone, full of sound and a specific type of fury, signifying nothing.
However there are indeed a few clauses here that commit one to something. In particular, they are promising to do risk assessments, audits, adversarial testing and red teaming, and to release the results of any safety evaluations and adversarial testing that they do.
Is that sufficient? For a company like OpenAI or Google, or for Inflection AI, hell no.
For an ordinary company that is not creating frontier models, and is rightfully concerned with mundane harms, it is not clear what the statement is meant to accomplish beyond good publicity. A company should already not want to go around doing a bunch of harm, that tends to be bad for business. It is still somewhat more real compared to many similar commitment statements, so go for it, I say, until something better is available.
One possibility for a company that does not want to do its own homework, is to commit yourself to following the guidelines in Anthropic’s current and future RSPs and other similar policies (I’ve pushed the complete RSP post out a bit, but it is coming soon). That then works to align incentives.
People Are Worried About AI Killing Everyone
Lina Khan, chair of the FTC, says on Hard Fork she puts her p(doom) at 15% (podcast).
I think p(doom) of 15% is indeed a reasonable optimistic position to have. But it is important to note that it is indeed optimistic, and that it still would mean we are mostly about to play Russian Roulette with humanity (1/6 or 16.7% chance).
In practice, I do not believe the number of loaded chambers should much change the decisions we make here, so long as it is not zero or six. One is sufficient and highly similar to five.
A trade offer has arrived – we agree to act how a sane civilization would act if p(doom) was 15%, and we also agree that we are all being optimistic.
I also agree with Holly here about the definition of ‘optimist.’
Yuval Noah Harari, author of Sapiens, makes case in The Guardian for worrying about existential risk from AI. I don’t think this landed.
Other People Are Not As Worried About AI Killing Everyone
Bindu Reddy goes full stocastic parrot. Never go full stocastic parrot.
She cites various failures of GPT-4.
Her opening and conclusion:
From this and other things she says, I don’t think she would support regulating AI under almost any circumstances. Gary Marcus, quoting her, would under almost all of them.
Two sides of the coin. I say that being AGI and greater capabilities would indeed make regulations more necessary. And indeed, the whole point is to regulate the development of future more capable models.
There are some who claim GPT-4 is an AGI, but not many, and I think such claims are clearly wrong. At the same time, claims like Bindu’s sell current models short, and I strongly believe they imply high overconfidence about future abilities.
Transition to the next section:
Alex Kantrowitz, paywalled, says ‘They might’ve arrived at their AI extinction risk conclusions in good faith, but AI Doomers are being exploited by others with different intentions.’ Seems to be another case of ‘some call for X for reasons Y, but there are those who might economically benefit from X, therefore ~X.’
Timothy Lee is not worried. He thinks AI capabilities will hit a wall as there is only so much meaningfully distinct data out there, and that even if AI becomes more intelligent than humans that it will be a modest difference and intelligence is not so valuable, everything is a copy of something and so on. I was going to say this was the whole Hayek thing again, saying the local man will always know better than your, like, system, man, but then I realized he confirms this explicitly and by name. ‘Having the right knowledge matters.’ Sigh.
The Lighter Side
The assistants saga continues, what a twist.
Don’t become an ex-parrot.