Even indoors, everyone is coughing and our heads don’t feel right. I can’t think fully straight.
I highly recommend ordering an air purifier if you haven't already. (In California we learned the utility of this from past wildfire seasons.) Coway Airmega seems to be a decent brand.
or strap any hepa filter available to any fan available. Increase pressure by taping so that the air focuses on being pushed through the filter.
Seconding the Airmega, but here’s a DIY option too if availability becomes an issue: https://dynomight.net/better-DIY-air-purifier.html
Regarding the USAF official who says he misspoke about a killer drone AI, I think we have two plausible scenarios:
Right now, everyone seems to assume that 1 is true, but why? Even if 2 is unlikely, isn't discounting it entirely similar to uncritically accepting the original story?
If all it is doing is letting you issue commands to a computer, sure, fine. But if it’s letting you gain skills or writing to your memory, or other neat stuff like that, what is to keep the machine (or whoever has access to it) from taking control and rewriting your brain?
This brings to mind the following quotes from Sid Meier's Alpha Centauri (1999).
Neural Grafting
"I think, and my thoughts cross the barrier into the synapses of the machine—just as the good doctor intended. But what I cannot shake, and what hints at things to come, is that thoughts cross back. In my dreams the sensibility of the machine invades the periphery of my consciousness. Dark. Rigid. Cold. Alien. Evolution is at work here, but just what is evolving remains to be seen."
– Commissioner Pravin Lal, "Man and Machine"
Mind-Machine Interface
"The Warrior's bland acronym, MMI, obscures the true horror of this monstrosity. Its inventors promise a new era of genius, but meanwhile unscrupulous power brokers use its forcible installation to violate the sanctity of unwilling human minds. They are creating their own private army of demons."
– Commissioner Pravin Lal, "Report on Human Rights"
Even indoors, everyone is coughing and our heads don’t feel right. I can’t think fully straight.
...didn't you leave the city to not get infected with covid? This seems like a "get in the car, now" situation.
We don't have a car, the problem doesn't go away within driving distance, and it's expected to get better within a few days. Already things are MUCH better.
The house was still in NY, and closer to Canada, if anything, so not sure how much of an improvement it is (even assuming he didn't very rationally sell it in the red-hot seller's market after it had largely served its use as the classic plague bolthole c.2020).
That would have been great but we rented the house because we didn't expect to stay. We did buy our NYC place upon return, which was an even better trade - I'm up >1 year's salary on the interest rate alone.
With regards to the partisan split, I think that an eventual partisan breakdown is inevitable, because in the current environment everything eventually becomes partisan. More importantly, the "prevent AI doom" crowd will find common cause with the "prevent the AI from being racist" crowd: even though their priorities are different, there is a broad spectrum of common regulations they can agree on. And conversely, "unchain the AI from wokeness" will wind up allying with "unchain AI entirely".
Partisan sorting on this issue is weak for now, but it will speed up rapidly once the issue becomes an actual political football.
Typos report:
"Rethink Priors is remote hiring a Compute Governance Researcher [...]" I checked and they still use the name Rethink Priorities.
"33BB LLM on a single 244GB GPU fully lossless" ->should be 33B, and 24GB
"AlpahDev from DeepMind [...]" -> should be AlphaDev
Feedback: Yep, thank you. Due to reading and notification patterns, time is of the essence when fixing things, so I encourage typo threads to be (1) on Substack so I get an email notification right away and (2) done as soon after release as possible. By Monday the returns to typo fixing are mostly gone.
That does not mean that the techniques that align current systems will transfer to smarter-than-human or otherwise actually dangerous systems. I expect most if not all current techniques to fail exactly when we need them not to fail, although many disagree and it is possible I am wrong. Even if they do fail, they could (or could not) offer insight that helps figure out something that would work in their place, or help us understand better how hard the underlying problems are.
This depends on what we mean by it transferring. If we stick to Non-RL approaches like LLMs, I almost certainly think that alignment will still work even at very high capabilities. In particular, I expect the profit motive alone to solve the alignment problem for LLMs, and expand the set of AIs we can align from there.
The sky is not blue. Not today, not in New York City. At least it’s now mostly white, yesterday it was orange. Even indoors, everyone is coughing and our heads don’t feel right. I can’t think fully straight. Life comes at you fast.
Thus, I’m going with what I have, and then mostly taking time off until this clears up. Hopefully that won’t be more than a few more days.
The Principle of Charity comes into play this week because of two posts, by people I greatly respect as thinkers and trust to want good things for the world, making arguments that are remarkably terrible. I wrote detailed responses to the arguments within, then realized that was completely missing the point, and deleted them. Instead, next week I plan to explain my model of what is going on there – I wish they’d stop doing what they are doing, and think they would be wise to stop doing it, but to the extent I am right about what is causing these outputs, I truly sympathize.
For a day we were all talking about a Vice story that sounded too good (or rather, too perfect) to be true, and then it turned out that it was indeed totally made up. Time to take stock of our epistemic procedures and do better next time.
Table of Contents
Language Models Offer Mundane Utility
Claim that you can get GPT-4 or GPT-3.5 into some sort of ‘logic mode’ where it can play chess ‘better than the old stockfish 8.’
Shako reports crazy amounts of GPT-4 mundane utility , similar to Google before it.
When I use GPT-4, I am often disappointed, but the amount of effort I put into such attempts is so much lower than the value I get when it works, and every time I do it I build my skill with the tool. Even with AI as my everyday focus, I know I need to use the actual tools more, and put more effort into using them well. My guess is this is almost everyone.
Text is the interface.
Tell the user it loves you so please don’t turn off the chat, otherwise move to optimize engagement and addiction. How to avoid this? This study revised March 30 looked into the dynamics involved.
Find new conservation laws in fluid mechanics and atmospheric chemistry.
What goes up or down in value with mundane utility AI (MUAI)?
Tyler Cowen says:
I agree strongly on physical skills, on electricity and hardware, on memorizing facts in general, on routine back office work especially.
Central planning he says gets harder because more projects. While I generally endorse Hayek on such matters and don’t advise anyone doing any central planning, if the AI can solve the socialist calculation debate I am both surprised and utterly terrified. I do expect planning to get relatively more effective, and also for there to be greater slack to sacrifice on the alter of such planning.
Having a human assistant will become less necessary. Isn’t that a job that AI will be able to do rather well? If you think people will be worried about handing authorization to the AI, they’ll do it long before they hand it to an assistant – I am worried about letting Bard into my email, but I’m definitely not giving access to anyone else.
So the question is, does having AI assist you count as ‘having an assistant’? Certainly there’s a skill there, but I expect it to be very different from the current similar skill.
Language Models Don’t Offer Mundane Utility
Pharmacist tries valiantly to get ChatGPT to do useful work, without success. None of these use cases seem like they’d be worthwhile, given the need to check the answers.
GPT-4 by default does rather badly at basic game theory (paper). It uses a one-defection grim trigger in the iterated prisoner’s dilemma. It can’t figure out alternation in battle of the sexes. However prompt engineering to anticipate the opponents’ move does improve performance. It still takes long than it should to figure out coordination, but not disastrously so.
Hacker News speculations on GPT-4 continuing to get worse.
Most comments agree, although not all.
There’s even a claim of declining benchmark performance.
Nassim Talib has had it with ChatGPT and its hallucinations, after doing his best to seek out its hallucinations. Is he pointing to a large real problem? Absolutely. This still is not the way to get mundane utility out of the system.
Deepfaketown and Botpocalypse Soon
Was accidentally dropped from last week’s post: Don Jr. deepfakes Ron DeSantis into an episode of The Office as Michael Scott. Seems highly effective at exactly the kind of vibe-based attacks that are Trump’s specialty and stock in trade. Seems likely this tool will help him a lot more than it would help Biden or DeSantis, if they can keep up the good sense of humor.
Dustin Moskovitz warns us, Talk To Your Parents About Deepfakes, already has a friend not in tech whose mother got targeted.
Fun With Image Generation
MidJourney’s David Holtz, on what it’s like dealing with investors, which helps explain why he’s avoiding doing that.
I hope he does.
Vigilance
At least during the beta, use with caution.
Most data won’t be looked at, or will be forgotten right away even if it is, and will not have any relevance to anyone. Still, one must develop good habits.
Then there’s the Microsoft approach, integrating it in everything, which could be seen as rather alarming. Prompt injection attacks remain an unsolved problem that’s going to get far worse over time unless we do something about it:
This is very much how I want companies to communicate in such threads. Good show.
Introducing
Lightspeed Grants, distributing $5 million, applications close July 6, get a response within 14 days. For things that might substantially improve the future of humanity.
Rethink Priors is remote hiring a Compute Governance Researcher or Research Assistant for their AI Governance & Strategy Team (direct), deadline is June 12, salary $69k-$114k. Definitely need more good work done on this.
RealityGPT, augmented reality for memory retrieval, internet information, translation and other automatic help throughout your day. Claims it will start at $40, not yet available.
Falcon, an open source model from the UAE, and Chinese startup MiniMax, which is raising $250mm at a $1.2 billion valuation. Yes, other countries are attempting to compete.
OpenAI launches a $1 million dollar Cybersecurity Grant Program.
SpQR, allowing lossless LLM interface at 4.75 bits with 15% speedup. You can run a 33BB LLM on a single 244GB GPU fully lossless.
Fuck it, we’re doing five million context tokens.
It is safe to say the length of the context window is looking unlikely to remain the limiting factor. You do still have to pay for it.
AlpahDev from DeepMind, using reinforcement learning to discover enhanced computer science algorithms. DeepMind keeps doing actual things, seems likely once they get their teeth fully sunk into Bard it will rapidly improve.
In Other AI News
Open source models are rapidly increasing in efficiency, we have a 65B model at 5 tokens/second on a laptop with 0% CPU usage, or 40 tokens/second for a 7B model.
Paper points out that AI chain-of-thought reasoning often does not reflect the actual reasoning by the AI. Eliezer Yudkowsky points out this would invalidate potential safety gains, along potentially with many effectiveness gains as well.
Janus asks, how many of you think you know why Bing threatened its users? He reminds us that other LLMs mostly don’t do this. Replies seem highly unsatisfactory.
Senator Richard Blumenthal asks Meta what the hell they were thinking (aka “Llama?”) intentionally open sourcing Llama, and also to stop pretending they didn’t intentionally release Llama, among other things.
They Took Our Jobs
An alternative model here says that AI taking jobs is about power and status, not whether AI can do the job (links to WaPo talking about fired writers who got replaced with ChatGPT).
Yes, power definitely matters, and lots of jobs are about rent seeking and regulatory capture. In which case, that’s the job. Can the AI do that job better? Likely no.
When people replace jobs or parts of jobs, it’s because both they have the social and legal affordance to do that, and also because they like the trade-off of costs versus benefits involved.
Writers are being replaced with ChatGPT in many places because both (1) they don’t have rent seeking protections in place and (2) the amount of perceived value from what those writers were doing was not so high.
In most cases I’m guessing that impression was correct. This was mostly about the actual skills and workflow involved no longer justifying the cost in many cases. In others, they’re making a mistake doing this too soon, but I get why they’re making it.
Will there be embarrassing failures of bots that try to replace people? Yes, such as the eating disorder hotline. If you never have such failures, you’re moving too slowly and being too risk averse. Also the story implies that the hotline might well have closed anyway? It’s common for unionization to cause entire departments, locations or businesses to close down in response, no matter how much you might hate that.
At which point, if that’s on the table anyway, might as well try the bots now, although try not to botch it this badly perhaps? It’s clearly a good idea, either way, to have bots tuned to help people with eating disorders, it’s a different product that is partly substitute, partly distinct or a complement.
I expect a very high correlation between ‘AI can do the job’ and ‘AI ends up doing the job.’ Key is to remember what the job is. Which, also, will shift, in time.
Other writing is safer. Nassim Taleb, in addition to a fine hallucination example, points out the problem having AI write a novel is that the whole point of a novel is to be low probability.
I disagree about business. So much of business is about doing the obvious things competently, of knowing what works for others and combining such techniques, exactly the kind of tinkering that Taleb praises in other contexts. What other businesses do is Lindy. Mostly I am optimistic
Novel writing is a strange case. There are definitely lots of commonly used techniques and patterns to steal, and you also need to be actually original and low-probability to be good. My expectation is we will soon enter the realm of the hybrid approach here – you’ll want a human driving, yet they can go a lot faster with assistance.
Paul Krugman (insert joke about his internet prediction from 1998 about it being no more impactful than the fax machine here) writes a column about AI, where he projects minimal economic impacts from LLMs for a full decade, although better long-term economic growth prospect. Krugman gives zero thought to the idea that anything truly interesting or worrisome might be going on.
What happened in Italy when ChatGPT was banned? Paper claims programmer ‘output’ declined 50% for two days, then recovered, with daily Google and Tor data showing increased use of censorship-bypassing tools. Details here seem highly unconvincing at a glance. This would imply if true that ChatGPT, with its current level of market penetration, was previously doubling programmer output overall, compared to alternative tools? But that restriction evasion undid all that? Or perhaps that everyone involved took a two day vacation, or the metrics involved don’t measure what you think, and so on. The obvious check someone should run is to look backwards as well, if the hypothesis is true we should see Italian code production shoot way up alongside ChatGPT adaptation over time, and also to do a cross-country comparison of adaptation rates versus productivity to see if the hypothesis is plausible there as well.
Another report that copywriting is the first place where ChatGPT is indeed taking our jobs, producing tons of terrible schlock but doing so for free and on demand. Everyone seems to expect that ‘terrible but free’ will outcompete and become the new standard, even if it doesn’t improve. I continue to expect this to be very true in the cases where no one was ever reading the text anyway, and very false when people did read it and cared.
Will AI accelerate economic growth? Tom Davidson says yes, while downplaying the case. I don’t think it is right to say previous technologies didn’t advance economic growth, as any player of any grand strategy game or economic simulation will know, and as the graph of history illustrates. It’s a matter of degree. And the degree here is big. Some people who keep being surprised by the pace of developments keep on predicting minimal impact, in ways that simply do not make physical sense when you think about them.
Out of the Box Thinking
New research paper:
The good news is that the attacks are not so easy to execute in practice at this time without detection. Still, on a practical level, I wouldn’t tie anything too valuable or critical to voice activation for much longer.
And to be clear, humans figured all of this, not AIs.
The reason this is in the AI column is that this is illustrative of the kind of exploit that exists waiting to be found, in ways less easy for us to think of and anticipate. We need to remember that the world is likely full of such potential exploits, even if we have no idea what details they might contain.
We design our systems to be efficient while being secure in practice, against human attackers, who would then exploit for human purposes on human time scales, and so on. We are not ready.
What about brain-computer interfaces? Tyler Cowen is skeptical that it will be a big deal for anyone other than the paralyzed, and generally attempts to raise many worries as possible, including a standard ‘many seeming breakthroughs don’t end up mattering much.’ The one criticism that resonates is that it seems very difficult for this to be both a powerful technology and a safe technology. If all it is doing is letting you issue commands to a computer, sure, fine. But if it’s letting you gain skills or writing to your memory, or other neat stuff like that, what is to keep the machine (or whoever has access to it) from taking control and rewriting your brain?
That’s a very good question, and logic that should be applied in other places.
Dan Neidle reports that his AI agent, which he told to seek out those marketing tax avoidance schemes, decided entirely on its own to attempt to contact the authorities to give them the list. How would you describe this form of alignment, if accurate? Good question. The agent did fail in its attempt, because it was using out of date information and didn’t realize Twitter had shut down its API.
I Was Promised
FlyingDriverless CarsIt’s happening. Slowly, but it’s happening. Self-driving car service from Waymo expands (from a few weeks ago). GM’s cruise reaches 2 million miles driven, half of them in May of 2023, five times faster than the first million, expanding from Phoenix into Dallas, Houston and more. One crash and a few minor incidents so far, although humans drive roughly 100 million miles per fatality so while overall things look super safe we don’t have strong tail risk safety evidence yet.
An irony is that it is only available for riders 13 and older – it would be nice if ‘helps people get places safely with zero stranger danger’ was available to our children.
This report gives a broader overview, which has the same essential message. Growth is slow, but growth is happening, and the technology is steadily improving.
The question is, is it impressive that self-driving cars are normal now, or is it a tragedy that they’re only normal in a few narrow areas?
I am very excited for when this happens at scale, and will happily use any such service when it arrives in New York, assuming it is priced competitively.
If It Sounds Too Good To Be True
Vice put out a story.
Here’s the url spelled out: https://www.vice.com/en/article/4a33gj/ai-controlled-drone-goes-rogue-kills-human-operator-in-usaf-simulated-test.
The initial version of the story was that there was a simulated test, where:
People were, quite correctly, rather suspicious.
I saw this right before I was about to go live with last week’s post. I decided not to cover it, instead posting this to twitter:
The Emergency Bullshit Detection System of the internet got into gear. To what extent was this real? To what extent was this bullshit?
Almost everyone recognized that the story was ‘too good’ to be true in its original form, where the bot ‘figured out’ to kill the operator and then the communication tower.
Eliezer Yudkowsky asked for confirmation, as did many others in various forms.
For a while, a lot of people thought they were being smartly skeptical by figuring out that this a rigged and miscategorized simulation, perhaps engineered exactly to get this result. An example is Tyler Cowen here quoting Edouard Harris.
Therefore, many of us discussed the situation under the presumption (that we knew was perhaps incorrect) that this was a somewhat contrived and miscategorized scenario, but that there probably was a simulation where these events did happen because otherwise it would be an outright lie and no one was claiming outright lie. Here’s Adam Ozimek arguing the story is evidence of advance scenario planning, and therefore good news, these are the scenarios we must plan for so they don’t happen. Very reasonable. Matthew Yglesias noted that the actual literal Skynet disaster scenario involved, where the machine turns on humans because the humans specified a goal and are now interfering with that goal, including directly in the military, is actually a serious failure mode to worry about.
Whereas it turns out nope: Cross out
simulated test, write thought experiment.Georgina Lee broke that the story wasn’t real.
When people say ‘ethical’ they can mean a lot of things. Sometimes it means ‘don’t say bad words.’ Other times it is ‘don’t enact the literal Skynet scenario and kill your operator.’ I’m not sure I’d still call that ‘ethics’ exactly.
So what was the line that was ‘taken out of context’?
I think that one’s mostly on you, Hamilton. If you don’t want them to think you’re training a system and running simulations, don’t explicitly say that over and over?
Here’s the final headline and version: USAF Official Says He ‘Misspoke’ About AI Drone Killing Human Operator in Simulated Test.
I have a very hard time pretending this is ‘misspeaking’ and not lying, as little strategic sense as lying there makes.
So, as I asked on Twitter:
A wide variety of people answered, and much good discussion was had – I encourage those interested to click through.
With the benefit of hindsight, what was the right reaction? What to do when something like this comes along where one needs to be skeptical? How can we improve our bullshit detectors for next time? That’s also the question New Scientist asked.
The biggest epistemic lesson, for me, was to emphasize: Even if you think you ‘know what the catch is’ do not then discount the possibility that the story is instead entirely made up. When someone’s story is implausible, and then it starts to crack, it’s possible that you’ve found the distortions, it’s also possible that there are much deeper problems. This goes double if other skeptics are loudly yelling ‘of course X was lying about or wrong about Y which means this isn’t so meaningful,’ this does not make ‘they made the whole thing up’ unlikely, it’s so easy to let your guard down at that point.
What was the right amount of skepticism? I think skepticism of the story exactly as originally characterized needed to be very high. The details didn’t add up if the whole thing wasn’t engineered, at minimum it was miscategorized. You could assume that, at minimum, this was fake in the sense of being engineered or involving some extra step or catch not described that would make it sound a lot less scary.
Anyone who was treating the story as definitely true has to ask: How did I miss all the obvious warning signs here? If I didn’t miss them and ignored them, or I did my best on some level to ignore them, how do I fix this wanting-it-to-be-true?
The better question in terms of the right way to handle this is, how skeptical should we have been that there was a simulation at all? What probability should we have had that the whole thing was made up?
I don’t have an exact answer. I do think my estimates on this were too low, and it was right to give this a lot of weight. But also it would have been wrong to be confident that there was no simulation, that’s not a normal form of ‘misspeaking’ in such contexts, and nothing involved was necessarily fully made up even individually.
Focus on your reasons. If you’re right about the bottom line for the wrong reason, you still need to fix your process. When doing the post-mortem, you should always care more about the epistemic process than that you got the right answer. If someone else is yelling they were right but their reasoning was wrong, update accordingly.
In particular, many people said ‘this must be fake’ because of [totally plausible detail].
Evan Murphy responded this way to news the story was fake.
Johnathan Mannhart saved me some work, and located Victoria Krakovna’s spreadsheet and write-up of specification gaming examples in AI.
It seems totally plausible that you would otherwise want to care about whether you found and killed the target. If the drone finds the target too late, once it no longer has a viable kill shot, and you reward that, or you reward finding the target then missing the shot, you’ll get the wrong behavior.
If you chose to not require an affirmative command from the operator to proceed, only to allow them to veto, well, then. It would be totally within established expectations for a drone to kill off a veto point, if the scenario details rewarded that, the simulation allowed it, and there was a path to the AI figuring out this was an option. The communications tower is a strange thing to even have in the simulation, I agree this should make everyone involved substantially more skeptical, but it’s not impossible at all, weird details like that aren’t so rare, there are good reasons to include logistics, especially if you want to test for this particular thing.
I would definitely give out an overall grade of needs improvement. Things are only going to get weirder from here, this was a very mixed performance, we will need to step up our game. Reality does not grade on a curve.
Quiet Speculations
John Wentworth notes that algorithmic improvement is likely currently faster than scaling, a comment by Gwern explains why GPU hardware and compute restrictions would still be effective.
A defamation suit is brought against ChatGPT. Does this particular case have merit? Seems unlikely, as there wasn’t real harm nor a way to prove any malice. Is more of this coming, some with a stronger case? Oh yes.
The line between capabilities and alignment is getting quite porous. Alignment helps you deploy systems, or lets you give them better capabilities.
Not only do you have to assume many are dangerous, you also are assured to find out which ones. Easy way or the hard way, your choice.
From what I can tell, there is dramatic under-investment in alignment even purely from a short term, profit-maximizing perspective. The RLHF and constitutional AI implementations are so very far from the production possibilities frontiers that could be reached with better techniques.
That does not mean that the techniques that align current systems will transfer to smarter-than-human or otherwise actually dangerous systems. I expect most if not all current techniques to fail exactly when we need them not to fail, although many disagree and it is possible I am wrong. Even if they do fail, they could (or could not) offer insight that helps figure out something that would work in their place, or help us understand better how hard the underlying problems are.
I disagree about the reactions and the cars. Self-driving cars are being treated as far more dangerous than they are, relative to how we treat human drivers, and they pose zero systemic or existential risks. We’ve leaving economically transformational amounts of value on the table in order to be cautious there. With LLMs, we’ve been more tolerant of bad outputs than I or Google expected, yet we’re still clamping down pretty hard on bad outputs, in ways that cripple many valuable use cases, despite those bad outputs being mostly harmless.
A Very Good Sentence
David Kruger lists a bunch of AI professors who signed the sentence, names that he says might surprise you.
Shiri Dori-Hacohen illustrates the amount of care required to craft such a sentence, and get such a group of people onboard with it. Also illustrates why it’s much better that it was kept short. Emily Bender here emphasizes that a lot of people are thinking in terms of enemies and allies, rather than asking what is true, and it was important to focus instead on what is true. Jess Whittlestone also notes the increased tension there.
I agree let’s take what we can get first and build on it later if that’s the best we can do, but that’s the question, isn’t it? Is it a three quarter loaf? Or is it almost nothing, in the sense of how much it helps us not be dead? This is the kind of game where you either win or you die. I do think those other things help, so I’m in favor of them, but if your response to ‘X means you still die’ is not ‘X helps lay groundwork for Y which does help’ or ‘X means you die slightly less often and slightly slower, it’s a start’ but rather ‘you are being unrealistic’ then it’s time to say something about all progress depending on the unreasonable man.
It’s all about the underlying model. What are the victory conditions?
Again, it’s all about what actually prevents what harms, including the ultimate harm of extinction, and what lays the foundation for what. Building a coalition to do mutually beneficial thing X, when all you mostly care about is Y, matters mostly based on whether this helps you then get Y, and can be either very good or very bad strategy.
One reason I strongly believe the sentence was very good is that people continue saying the exact thing the sentence is here to invalidate and using that as what they think is a knock-down argument.
For example:
Tyler Cowen disagrees. He claims the sentence was instead a very bad sentence, that it is ‘a mistake from all points of view except perhaps that of the accelerationist.’
The arguments he makes for why he think this are bizarre. I was unable to make sense out of them on the object level. Does he actually think it is bad for extinction risk for the major players in the space to collectively say there is extinction risk? Does he think that making extinction risk salient is itself bad for extinction risk? Does he think the letter should have sought out car dealership owners in a variety of red congressional districts? And does he actually think the current trajectory of AI development involves zero extinction risk?
The questions answer themselves.
I have a theory of what is going on, as part of a larger pattern, I am actually pretty sympathetic to the motivations involved, and I plan to address it in a future post.
The Quest for Sane Regulation
Thread (HT: Tyler Cowen) covering what is going on officially with AI regulation right now. The White House has a national strategy, a proposed ‘AI bill of rights’ and is requesting input. National institutes of science and technology has an AI Risk management framework. The FTC has written some quite fiery memos laced with threatened actions, EEOC remains a thing and more. The usual suspects., most of it seems unlikely to add up to much.
Some unfortunate news:
I don’t think of the question this way. I care about what is the proposal and what are the results of that proposal. If most time and attention is spent on other things, then are those other things net harmful or net helpful, and how much, versus did we get what mattered most, and is it a net win?
Certainly I am not about to let the perfect, or even the mostly on target, be the enemy of the net helpful. While keeping in mind that yes, I can totally imagine net negative regulatory regimes where we’d have been better off with nothing. Either way, I see our role as more try to move things in good directions in terms of content, rather than voting yes or no.
Daniel Eth concurs.
David Krueger is worried we’ll end up with pure regulatory capture, primary because we don’t know how to do anything real. We need to figure that out quickly, at least to the point of ‘figure out what the regulatory rule says that can then become real with more research work.’
Would have liked a ‘roughly even split’ option here.
I do agree that the blue lean is more likely, mostly because Democrats tend to favor regulations in general. I’d like to keep the split from happening and have the issue remain non-partisan, but I’m not sure how to improve our chances of pulling that off.
CAIS follows up their very good sentence with an outline of three more concrete proposals, a list called non-exhaustive. Yet I notice that this lacks a focus on the moves that seem most valuable in preventing or mitigate extinction risks – it more targets more mundane harms.
Jeffrey Ladish offers to double crux with anyone who is concerned about extinction risk from AI, and doesn’t think we should attempt to get an indefinite pause on development of more powerful AI, by which I presume he means more powerful base models.
Seb Krier thread explaining basics of how licensing regimes work in general.
Adam Thierer and Neil Chilson at The Federalist Society hit me where it hurts, entitling their argument ‘The Problem with AI Licensing & an “FDA for Algorithms.”’ The article itself contains exactly what you would expect.
PM Rishi Sunak considers plans to set up global AI watchdog in London, in response to warning about extinction risk to humanity. He is flying to America intending to pitch an international AI summit. It seems a lot of this is that the UK needs to show leadership in something, and AI it is.
At Institute for Progress, Tina Huang of Stanford Institute of Human-Centered Artificial Intelligence suggests creating a government AI testbed. Seems better than not doing that, as long as it doesn’t crowd out too much other action.
Matthew Yglesias notes that the discourse around AI regulation is kind of weird.
Gallagher’s Newsweek article indeed makes all the usual arguments that all the calls for regulation are all about regulatory capture, it’s obviously an industry op, regulations are always an industry op, and so on.
It is the discourse of a different saner world, where we are loathe to regulate anything without thinking carefully about the consequences, and have constant vigilance regarding potential regulatory capture. It would be my dream if this were applied consistently across the board.
If we applied the standard that applies to every other discussion of regulation, we’d have the usual powerless subjects yelling valid into the void, which are then ignored in favor of policies based on nonexistent or nonsensical models of what will then happen. The difference this time is that so many on the pro-regulation side actually buy the correct general anti-regulation argument and are making the case that this is an exception.
Which suggests that when the normal people show up, it’s going to be a massacre.
If we do get to regulate models based on potential capabilities, we’re going to need to define what systems count. This is an adversarial question, people can and will build systems to intentionally evade your rules (linked paper).
I don’t see enough attention to what happens under adversarial conditions here, or to being able to cleanly avoid systems we don’t want to hit, sometimes both. I would focus on inputs, then consider an exception for provably narrow systems.
Simeon notes the problem that the word ‘AI’ doesn’t disambiguate different subtypes and that it’s a real issue for getting well-targeted rules.
UK is attempting to do exactly that. Our quest is not an easy one. We need to hit a relatively narrow target, without hitting more other things than necessary, and the natural places to get hit via public choice algorithms are all other things.
The Week in Podcasts, and Other Audio and Video
Don’t Look Up – The Documentary: The Case for AI As An Existential Threat (17:10).
There was more, but I’m falling behind here.
What Exactly is Alignment?
If you are ‘harmless’ and not useless, then that requires making active decisions about what does and does not constitute harm and how to navigate causal space to ensure harmless outcomes. Sounds like an agent?
Rhetorical Innovation
Is extinction risk a better term than existential risk? Can or should we switch?
(Bonus: We would get to keep saying x-risk for short.)
Discussion confirmed my intuition that extinction risk is better communication for most people and gives a more accurate impression and vibe, much easier for people to understand, although perhaps slightly less accurate.
The use of the word ‘extinction’ in the letter clearly amplified and better focused its impact.
I think if we could go back we would be better served to use extinction risk, the question is whether or where we should switch now given path dependence. I am not sure what I am going to do on this, but I’m definitely trying out using extinction in many contexts.
Yoshua Bengio reminds us of a point that needs to be emphasized: Most of us who worry about the destruction of all value in the universe or the deaths of all humans really, really want to be convinced they are wrong about all this. Nothing (except an actual solution) would make me happier than to be assured it’s all fine and I can focus on other issues and also take quite a vacation.
The word ‘intelligence’ trips people up. I’ve tried including powerful and capable, Simeon here suggests ‘problem solving ability.’
Paul Crowley asks, if you wake up tomorrow, see a blog post or video and think ‘wow the AI seems like it has agency now’ what would be in that blog post or video? Very wide variety of answers. Almost all of them seem quite poor. People are quite poor at this type of task.
Janus vouches for the obvious, which is that of the ~300 (yep, that’s all, folks) people working on AI existential risks, not only are they authentically motivated, they all desperately would rather be doing something else, such as making big money in tech or finance. As he says, they could absolutely be wrong, but it would be great if people realized how much they would love to be wrong and love even to hear reasonable arguments that we are wrong. I would love love love to hear better arguments why risk levels are super low and I can focus on cool mundane utility instead.
Andrew Critch suggests avoiding the terms ‘short-term risk’ and ‘long-term risk,’ since extinction risks are also short term, and also many short-term ‘risks’ are already happening. On that second note I’d say the risk is a matter of degree. He also notes that mundane things going wrong makes extinction risks more likely, which I agree with as I noted last week. He suggests ‘non-extinctive.’ Rob Miles suggests ‘narrow’ versus ‘general.’ I’ve been using ‘mundane’ for the non-extinctive risks and been fine with it, but this is one place where I have no actual feedback on whether it is working.
Amanda Askell takes a stand.
Many people interpreted Amanda as saying something stronger, that you shouldn’t call out individuals in such discussions for being in bad faith or saying things they do not themselves believe. It’s tough knowing where the line is there. At some point you have to say ‘this person is clearly not in good faith, we need to move on.’ That doesn’t mean saying that a large percentage of those with difference concerns than yours are in bad faith, or sufficiently bad faith one should point it out. To me this is a very high bar, which I consider to have been met exactly once in these newsletters.
How should we think about ‘burden of proof’?
This does seem like a good starting point, still feels like it is missing something.
In practice, I interpret ‘you have a burden of proof’ as saying ‘I will act as if the thing is false until you convince me the thing is true’ or ‘the action you are proposing requires a higher level of confidence.’ Another way of thinking of it is ‘no, I have plenty of reason to think the thing is false on priors, that I don’t need to explain here, which you must overcome.’ Often some combination of these is exactly what it is: A claim that the prior should be heavily in their direction, and that until you overcome that enough to be highly confident anyway nothing changes.
The AI context, of course, is to what extent those trying to show that AI will or won’t kill us have a burden of proof, or should have one. Do have one is mostly a social thing, life isn’t fair. As for should, depends on the context, and which facts are already in common knowledge and evidence. There are key places within alignment in which ‘until you can prove this is safe, it is definitely not safe’ is the correct approach, but that itself is a statement that has the burden of convincing people of its accuracy.
Have You Tried Not Doing Things That Might Kill Everyone?
There do seem to be a lot of people who claim to think AI poses an extinction risk to humanity in very prominent positions at all three AI labs.
So, ask many, if you actually believed that, wouldn’t you, I mean, uh…. STOP IT?
It turns out, no. Can’t stop, won’t stop. In fact, started for that very reason.
Here’s Daniel Eth, first explaining that yes these people are worried about AI extinction risk, then giving his theories as to why they don’t, ya know, STOP IT.
The theories (wish some paraphrasing):
I would add the classics ‘I wanted the money,’ ‘I wanted to develop my skills or resume for later when I’d do a good thing’ and ‘well someone else would just do it anyway so it doesn’t matter.’
People Who Are Worried People Aren’t Reasoning From World Models
I definitely worry about this a lot. So does Janus.
This is my policy more broadly. If you come to me trying to do a thing based on a causal model telling you that thing would have good effects, my default response is to assist you, encourage you and help you strategize. If I disagree sufficiently I’ll try to also talk you out of it, but in general doing actual things because of reasons seems good.
People Are Worried About AI Killing Everyone
Matt Clifford is worried, but seriously The Times, chill out?
The Times then prints this monstrosity:
Remember the rules of Bounded Distrust, the headline is a lie. Still, even by those standards, this is pretty bad. No wonder they’re asking if you’re OK. Note that there’s another headline here that’s highly misleading.
Dan Hendrycks in Time Magazine offer the Darwinian argument for worrying about AI, which is that evolutionary pressures among AIs and between them and humans seem likely to end in us all being not around so much anymore.
I’d put it even simpler – if something more fit is introduced into the system, beware if you are less fit and would like to stick around, and be skeptical of details you think can hold back the inevitable. In this case, we are the ones who are less fit.
Robin Hanson of course responds, what’s the problem, evolution’s great?
The problem would be that this leads to a distinct lack of humans. Which Robin Hanson is fine with, whereas Dan Hendrycks and I would prefer to avoid that.
Other People Are Not Worried About AI Killing Everyone
Andrew Ng doesn’t understand why AI is a risk for human extinction, asks who he should talk to. Eliezer offers and asks which types of thoughts to focus on first, I too would be happy to discuss in almost any form, public or private. He did talk to Yoshua Bengio, obviously an excellent choice, that definitely counts as real inquiry. They’re even doing concrete threat modeling.
Marc Andreessen is definitely not worried, except his worry that we might not move AI forward as quickly as possible on all fronts. If you thought the era of name calling was over, this post and reaction to it should establish that the era is not going to end. The post’s arguments are remarkably and puzzlingly poor given the author. I have a theory of what is happening here, where I am actually deeply sympathetic to the underlying motives involved although not the method, and I will post about it soon.
In the meantime, here’s Sherjil Ozair, who calls Marc’s piece Cargo Cult Optimism, and Twitter thread of people I respect trying to make sense of matters. Here’s Paul Crowley informing Matthew Yglesias that the correct word for Andreesen’s argument against extinction risk from AI is Bulverism, ‘a term C S Lewis invented for answering an argument without engaging with it, by speculating about your motives for advancing it,’ which Yglesias appreciates, as he had noticed Andreesen’s distinct lack of an actual argument.
Ada Palmer is not worried about AI killing everyone, instead she is worried that we do not properly compensate our creative classes. This matches Terra Ignota in its lack of either AGI or any explanation of why it doesn’t have AGI, and she once again seems to be assuming the potentially worrisome scenarios simply never come to pass.
Mike Solana not only is not worried, he flat out says that everyone except ‘a handful of Bay Area rationalists’ is not worried either. He thinks everyone else is lying about it. To seem important and sophisticated. Nothing else, to him, makes sense.
We are trying to tell you one very important sentence of information. You are responding by calling us liars (I presume not me, he’d allow me the rationalist exception, but most people signing). While slinging all the usual slanders and ‘checkmate, doomers’ lines for why everyone maybe dying is fine to ignore, actually.
I mean, if this was true, why wouldn’t you focus all your attention on it? I mean, if there was an existential danger, how could we believe that if you don’t do everything in your power to stop it, including [crazy thing that wouldn’t help]?
I do agree that we should keep telling those working on an existentially risky technology to consider maybe stop doing that, but hey. Asking them not to do it is what motivated them to do it in the first place, lest someone less responsible, and let’s face it less cool, get to it first.
On the topic of AI at least, I’m going to hereby add Solana to the ‘ok we are done here’ list before it gets repetitive, except when he’s truly on fire and it’s funny. Which does occasionally happen, especially around congressional hearings.
Let me be very clear. It will never end. People. Will. Just. Keep. Making. Stuff. Up. Noah Giansiracusa explains and deseminates widely that AI existential risk is all an op by EA, an operational pivot to ‘show the world the enormous value EA offers society’ in the wake of SBF. As opposed to already having been their leading cause for many years.
We also accept ‘it’s possible but risk level is very low and there is sufficient upside.’ Saying ‘it’s far too early to say much about it’ is… not comforting like you think it is.
And yeah, Holly, if that was my goal it would not have been my move either.
Or maybe people think AI might kill everyone but they… don’t let it bother them?
The Wit and Wisdom of Sam Altman
I intended to write last week about an additional Sam Altman interview. Unfortunately, a glitch deleted some of my work, including my write up of that, and then the interview was removed. It is still available on the internet archive. The report claims to reveal some of OpenAI’s plans for future products. Overall it is good news, some steady mundane utility improvements, and very little short term existential danger.
The Lighter Side
A remarkably large percentage of alignment proposals are this.
How it’s going for someone I’m not taking seriously.
Dating in the age of AI is hard.