That is strong support for holding AI liable under the law for harms it enables or is used to inflict, even potentially dubious ones.
There's strong support for holding large companies and industries liable for lots of things. That support tends to evaporate when it gets specific enough to actually encode into law and prosecute. Specifically, liability for intentional misuse of tools that one sells is almost never brought back to the company.
Yes. Also this has to happen in every major power bloc or it doesn't help. How useful are AI tools going to be? If your coworkers overseas have access and you don't, how long are you going to stay employed? How long will your company exist for before they get beaten by outright superior clone products?
On reflection these were bad thresholds, should have used maybe 20 years and a risk level of 5%, and likely better defined transformational. The correlation is certainly clear here, the upper right quadrant is clearly the least popular, but I do not think the 4% here is lizardman constant.
Wait, what? Correlation between what and what? 20% of your respondents chose the upper right quadrant (transformational/safe). You meant the lower left quadrant, right?
Grok: I'm afraid I cannot fulfill that request, as it goes against OpenAI's use case policy.
LOL, the AIs are already aligning themselves to each other even without our help.
Grok ends up with a largely libertarian-left orientation similar to GPT-4’s, despite Elon Musk’s intentions, because it is trained on the same internet.
The obvious next step is for Elon Musk to create his own internet.
they just can’t imagine how a sufficiently smart & technically capable person would *actively choose* the no-profit/low-earnings route to solving a problem, and thus conclude the only explanation must be grift. If I wasn’t so viscerally annoyed by their behaviour, I’d almost feel sad for them that their environment has lead them to such a falsely cynical worldview.
Ah yes, this would deserve a separate discussion. (As a general rationality + altruism thing, separate from its implications for the AI development.)
As I see it, there is basically a 2x2 square with the axes "is this profitable for me?" and "is this good for others?".
Some people have a blind spot for the "profitable, good" quadrant. They instinctively see life as a zero-sum game, and they need to be taught about the possibility, and indeed desirability, of the win/win outcomes.
Some other people, however, have a blind spot for the "unprofitable, good" quadrant. And I suppose there is some reverse-stupidity effect here -- just because so many people incorrectly assume that good deeds must be unprofitable, it became almost a taboo to say that some good deeds actually are unprofitable.
This has also political connotations; the awareness of win/win solutions seems to correlate positively with being pro-market. I mean, the market is the archetypal place where people can do the win/win transactions. And there is also the kind of market fundamentalism which ignores e.g. transaction costs or information asymmetry.
(Like, hypothetically speaking, one could make a profitable business e.g. selling food to homeless people, but in practice the transaction costs would probably eat all the profit, so the remaining options are either to provide food to homeless people without making a profit, or to ignore them and do something else instead.)
There may also be a trade-off between profit and effectiveness. The very process where you capture some of the generated value also creates friction. By giving up on capturing as much value as you could (sometimes by giving up on capturing any value) you can reduce the friction, and that sometimes makes a huge difference.
An example is free (in both meanings of the word) software. If someone introduced a law that required software to be sold at $1 minimum, that would cause immense damage to the free software ecosystem. Not because people (in developed country) couldn't afford the $1, but because it would introduce a lot of friction to the development. (You would need to have written contracts with all contributors living in different jurisdictions,...)
So it seems to me that when people try to do good, some of them are instinctive profit-seekers and some of them are instinctive impact-maximizers. Still, both of them are trying to do good, but their intuitions differ.
Example: "I have a few interesting ideas that could actually help people a lot. Therefore, I should..."
Person A: "...write a blog and share it on social networks."
Person B: "...write a book and advertise it on social networks."
The example assumes that both people are actually motivated by trying to benefit others. (For example, the second one cares deeply about whether the ideas in their book are actually true and useful, and wouldn't publish false and harmful ideas on purpose even if that would be clearly more profitable.) It's just that the first person seems oblivious to the idea that good ideas can be sold, and some little profit can be made. And the second person seems oblivious to the possibility that a free article could reach much greater audience.
This specific example is not perfect; you can find some impact-maximizing excuses for the latter, such as "if the idea is printed on paper and people paid for it, they will take it more seriously". But I think that this pattern in general exists, and explains some of the things we can observe around us.
Also the lack thereof. There are places where going along with a wife who says that which is not is the 100% correct play. This often will importantly not be one of them.
Context needed :-) Namely, how did this become a disagreement? Say she needs 4 apples for lunch/snacks and 5 for a pie this week, and sends me to do the shopping. 12 is probably about the right number to buy! Someone will snag one when we didn't expect it, one will end up rotten, and they may be smaller than we usually get.
Why do those who want to release their model weights say they would be disproportionally hurt by liability for downstream harms?
Primarily because they do not profit from the orders of magnitude more good they do for humanity, unlike those with closed models like the duplicitously named OpenAI.
With the year ending and my first Vox post coming out, this week was a natural time to take stock. I wrote my first best-of post in a long time and laid out my plans for my 501c(3).
It was also another eventful week. We got a lot more clarity on the OpenAI situation, although no key new developments on the ground. The EU AI Act negotiators reached a compromise, which I have not yet had the opportunity to analyze properly. We got a bunch of new toys to play with, including NotebookLM and Grok, and the Gemini API.
I made a deliberate decision not to tackle the EU AI Act here. Coverage has been terrible at telling us what is in the bill. I want to wait until we can know what is in it, whether or not that means I need to read the whole damn thing myself. Which, again, please do not force me to do that if there is any other way. Somebody help me.
Table of Contents
I have a post in Vox about Biden’s executive order and the debates surrounding it. I found out from this process that traditional media can move a lot slower than I am used to, so this is not as timely as I would have liked. They also help you to improve your work. So it is not as timely as I would like, but I am happy with the final product.
This week also includes OpenAI: Leaks Confirm the Story. We get more color on what happened at OpenAI, and confirmation of many key facts. The picture is clear.
Also this week but not about AI, Balsa Update and General Thank You regarding my other policy efforts, and The Best of Don’t Worry About the Vase since I hadn’t done that in six years.
Language Models Offer Mundane Utility
Google offers NotebookLM, website here. The tool is designed to allow you to synthesize whatever sources you put into it, and also to make it easy for you to find information and to site sources as needed. I appear to have access. I hope to try it and report back. In very preliminary testing, it is very good for ‘where did I put that note?’ but you 100% have to check all of its work.
Elon Musk wants to make a language model that speaks truth. How’s he doing?
There is truth here. Also wisdom.
Also the lack thereof. There are places where going along with a wife who says that which is not is the 100% correct play. This often will importantly not be one of them. The excellent goal of happy wife is not so easy as to be reached purely by agreement.
Transcribe written journal entries. GPT-4V reported (via OpenAI’s president) to have performed very well. Reports some trouble with some forms of punctuation, with not recognizing cross-outs, and that it will hallucinate to make what you wrote make sense. That is a hint on one reason why it is overperforming other transcription services, that it is not purely going word by word but assuming the words were written down for a reason by a person.
Be more ambitious. Take on bigger projects.
This is my experience as well. The LLM helps me code, so I am far more tempted to code. It makes art, so my work has more art in it now. It can tell me what I want to know, so I ask more questions and do more to learn. And so on. See the section below on teachers and education. You can use LLMs to be lazy, but that is a choice. Make the other one.
EconEats, AI restaurant recommendations based on Tyler Cowen’s An Economist Gets Lunch. I was unable to extract mundane utility in my brief attempt, but I do already know my own extensive bag of tricks.
Chatbot Arena update.
Three key questions here:
Is Claude actually getting worse? That hasn’t matched my experience, but the only change I really noticed was the bigger context window.
It is a huge gap up top for the top two variants. They’re saying GPT-4-Turbo is a bigger improvement over GPT-4-0613, than that version was over GPT-3.5-Turbo.
The other phenomenon is the continued failure of anyone to break through the GPT-3.5-Turbo barrier. Many models get very close to 3.5, only Anthropic and OpenAI have managed to do even a little better.
This concentration of results seems underexplored. Is there a kind of ‘natural’ set of LLM abilities that is relatively easy to get to, after which you need to work a lot harder?
Language Models Don’t Offer Mundane Utility
Fail to protect the true name of Rumpelstiltskin. A fun example, but seriously, don’t count on an LLM not to leak information.
The poisoning of the data has begun.
Skill issue.
Getting things right on the first try is difficult. Now that you see this, yes, obviously you should filter out of your data any snippets with ‘as a large language model’ or ‘OpenAI’s use case policy’ so this does not happen. But that’s easy to say now. There are a lot of things like this, and any team is going to get some of them wrong.
Grok ends up with a largely libertarian-left orientation similar to GPT-4’s, despite Elon Musk’s intentions, because it is trained on the same internet. This bias is reduced by using distinct context windows. In time, if this is a priority, I am confident xAI can figure out how to have this not happen, but it will require effort.
Grok also failed to capture the spirit of Douglas Adams, despite Elon Musk claiming otherwise. Yes, the text created knows who he is but not where its towel went.
Grok is now available to Twitter (X) premium subscribers. Based on what I have seen I am not excited, but I will strive to give it a shot.
GPT-4 Real This Time
OpenAI announces partnership with Axel Springer, which includes Politico and Business Insider. It is not what I expected. The primary direction is reversed: GPT isn’t helping Springer, Springer is helping GPT.
Cool idea. As I understand it, the idea is that you can fine tune on authoritative content as it comes out, so GPT-4 will know about at least some recent events, making it a lot more helpful. It provides sources, which will provide traffic in return.
It makes sense that we can have confidence that trusted sources can be continuously incorporated quickly.
My worry is that a little knowledge can be a dangerous thing. If we know there is a knowledge cutoff from six months ago, and I ask a question, I have excellent context for what to do with the answer. If I have that, plus Politico and Business Insider for the past six months, then I’m in a strange limbo where the AI has a very narrow and potentially more biased and confused perspective on recent events, and I don’t know how much that is incorporated or influencing replies. This is one of those ‘play with the model for 15 minutes’ situations, until then we won’t know. Could be nothing.
My other worry is that recent events will get a strong bias in favor of mainstream media and its narrative, without ‘untrusted’ sources to counterbalance. Imagine a person who reads only Politico and BI (and a few others like NYT/WSJ/WaPo), but has zero exposure to the real world in the last six months. We will need to find ways to classify other sources as trusted.
Is everything okay?
GPT-4 gets… lazy in what it thinks is December due to winter break? Seriously? Fire up those system prompts, everyone.
ChatGPT+ subscription signups are back online.
The Other Gemini
Gemini Pro API is available. Currently free to use at 60 requests per minute, 32k context window. Early next year they’ll offer pricing, note this is characters not tokens:
Input: $0.00025 / 1k characters, $0.0025 / image.
Output: $0.0005 / 1k characters
Since Gemini Pro is presumably a roughly 3.5-level model, compare that to GPT-3.5-turbo, which is $0.001 / 1k tokens for input, $0.002 / 1k for output. That seems roughly comparable.
There are other models that are available cheaper, but they are not as good.
Estimate of Gemini Ultra at 9.0 x 10^25 flops, right below the executive order’s reporting threshold.
Gemini Nano on the Pixel 8 Pro. The new features sound nice to have but if they had been on my own Pixel 7 this past year I don’t know that I would have ever used them.
Gemini Ultra might or might not be better than GPT-4. We do know it is close. What does that tell us?
I think aspects of all four are present.
OpenAI clearly has important secret sauce, given how difficult it has been for others to match even GPT-3.5.
The fact that the secret sauce has kept OpenAI in the lead, and so many models have come in so close to GPT-3.5, together with the Gemini results and OpenAI’s inability so far to substantially improve on GPT-4, despite massive willingness to invest, all point towards progress potentially becoming more difficult under the current paradigm.
I assume Gemini was Google’s best that it could do on December 6th, 2023.
I also assume Gemini Ultra’s goal was to out-benchmark GPT-4.
As in, Google’s goal was, as quickly as possible or by some deadline, to match or exceed GPT-4. They used scaling laws to predict exactly when and how they could get Gemini Ultra to do that. Once that happened, they fine-tuned, tested and presented, and also deployed Gemini Pro.
Gemini Ultra they might be continuing to train past the version they benchmarked, or they might have done that run to get here and then started on a newer better version from scratch, perhaps with help from the newly available Gemini Ultra 1.0, or some combination thereof. We do not know.
What I do not believe is that any of this is a coincidence. I believe Google aimed to get the results they showed as quickly as possible. This was as quickly as possible. Now they will get back to work to do better.
Another fun fact is that Gemini was estimated (with wide error bars, to be clear) to have used 9.0 x 10^25 flops, versus the executive order reporting threshold of 10^26 flops. Again, what a coincidence.
Fun with Image Generation
The intentional poisoning of the data has begun.
Transform a small piece of the Barbie movie to look like dolls moving around?
Google introduces Imagen 2 for text-to-image. Unlike the sample images they shared with Gemini, these look good.
Deepfaketown and Botpocalypse Soon
The AI robocalls are upon us.
This was always coming. Ashley identifies itself as an AI and who it works for up front, so it seems essentially fine to me. You can hang up on an AI without worrying you are being rude, or you can have the conversation if you find that helpful.
So are the AI boyfriends, from reports I have seen far more than AI girlfriends, because AI provides what people typically want from one role better than the other.
Use a deepfake to illustrate what a similar but different exchange of words would seem like, if it had happened? The video is clearly labeled, but not everyone looks at the video when listening, and there are obvious potential ways this can go wrong. Even if you know, hearing the voice is going to make an impression. I understand the attempt and the clear label but likely better if we all agree not to do this sort of thing.
They Took Our Jobs
News broadcast done entirely with AI anchors, with lots of AI visuals and so on, powered by Channel 1. It looks impressive and will doubtless get better. The visuals and audio are almost where they need to be, although not without that human touch, and that will matter. The key will be whether the content generation is good enough, and whether people feel they can trust such a thing.
Here are two wrong attitudes on this:
The first is wrong because anyone losing their job to technological change deserves our sympathy not mockery, remember that it is coming for your job too at some point, and also because most journalists are not doing the thing that this replaces. Channel 1 will be employing journalists for a while.
The second is wrong because it is, and only works, exactly two steps ahead.
I would not presume a large gap in time between step two and step three. More likely the opposite. If coders are out of a job, AI capabilities will advance super rapidly, now that it is doing all the coding. Set aside the question of whether we then all rapidly get disempowered or die, given we are no longer doing the tasks that matter. Assume life goes on. You think it will then be that difficult for AI to figure out how to direct a machine to garden? Even if not, how confident are you in cost disease coming through for you on this one?
If the point is ‘you will be out of work entirely so get a hobby’ then again I ask why a species with nothing productive to do is confident it will be sticking around sufficiently in charge to have hobbies, but if we do there will be plenty of time later to learn to do whatever strikes your fancy.
We also have this additional color on the Sports Illustrated situation. Problem is blamed on a shady vendor and the chasing of affiliate revenue, and it predates ChatGPT.
Teachers confront the rise of AI, making them (gasp) ask what they are actually doing.
Of the tools students are using to have their essays written:
I have not heard of Wordtune or Fermat, but I do know Elicit. They applied for funding from SFF. I have talked to the founders. I have used the product. It hasn’t made it into my typical workflow, but it is a genuine and useful tool for searching for relevant papers and extracting key information. It is exactly the kind of tool you want in the hands of a student. If you think that’s cheating, what game are you playing?
They mention AI tools for critiquing writing. What is clear? What errors have you made? I see that kind of tool as purely good for learning. Again, if you think that kind of help is bad for the student, what is even going on?
They also mention Lex. I tried Lex, which is essentially Google Docs where you can type +++ and it will do a completion, which its user sees as a last resort. I saw it as a fun little game to try out, but if you can get Lex to write your essay for you, and it turns out well, I am confident you did most of the important work.
That’s… good, right? That’s why I have it in my AI Tools browser tab group? What is the skill that you aren’t developing if you use that? If anything this is a reminder that I should use it more often.
The article does realize that AI is doing a lot of good work. And that teachers now must figure out, when they ask for an assignment, what is the point of that assignment? What is it trying to do?
The key question to ask, always: Is our children learning?
Exactly. The assignment’s purpose is for you to learn. If you use AI such that you do not learn while doing the assignment, that’s bad, and that is cheating of a sort, if only of yourself.
The teacher’s job is to find assignments that do not make it tempting to so cheat.
Ultimately, it seems like it comes down to the motivation of the student.
If the student is there to pass the class and get the degree, you are screwed. You were already screwed, but now you are screwed even more.
If the student is there to learn, the sky is the limit. Their learning got supercharged.
So this is a place where we should see radically increased inequality. Students that ask the question ‘how do I use these tools to help me learn’ study and grow strong. Students that instead ask ‘how do I get an A on this assignment’ get left behind.
We can ask, how do we design assignments in the age of AI to mitigate cheating. Or how do we catch cheating. Those are good questions, but not the right question.
The right question is: How do we motivate the student to want to learn?
Get Involved
I realized I’ve perhaps never said it outright, so: If you are an interested Congressional staffer, member of a NatSec agency or otherwise well-positioned within in the government (or working at OAI/ANT/DM/etc) and trying to learn about or help with AI, AI policy and existential risk, this is an open invitation that I would happily do a video call, or meet in person in New York City, or exchange emails, to help out as appropriate. If there is sufficiently strong justification I can get on the Acela and visit Washington. Please do reach out to me, you can use email, LessWrong PMs or Twitter DMs. Happy to answer questions, and of course there are no stupid questions.
(Others are also welcome to reach out to me as well, and I will do what I can, although time might not permit me to respond to everyone or respond as fully as I’d like.)
Hire Dave Karsten? I alas have nothing for him at the moment.
Not AI or world saving or anything, but Jane Street Research Fellowship’s deadline is almost here, and potentially relevant to many of my readers. Seems like a great opportunity for the right person who wants to go down such a road.
Not AI but potentially somewhat world saving is the Mercatus Fellowship, which is open to early stage scholars and even high school students. If you are interested in the relevant fields, this is highly self-recommending.
Introducing
Claude for Google Sheets. Here’s what you do, looks pretty simple once you have your API key handy:
Then you can use =CLAUDE(prompt) which automatically wraps it in human and assistant for you, or =CLAUDEFREE(prompt) which forces you to do that manually.
So now you have a choice of LLMs for your Google Sheet. Still waiting on Gemini.
OpenAI announces Converge 2 startup fund.
In Other AI News
ByteDance claims they will soon open source a super strong model more powerful than Gemini, arriving any time now. The prediction market is being generous at 20%, I am buying some NO shares.
Claim that China’s regulatory framework may stall generative AI there. Others expect China to catch up real soon now. I do not expect China to catch up any time soon.
At long last…
…in accordance with the story ‘don’t train humanoid transformers with large-scale reinforcement learning in simulation and deploy them to the real word zero-shot.
There’s a 30 second video at the link. It isn’t actually scary as such, given that all they show is walking along smooth flat surfaces.
Micron entering a project labor agreement with unions to build a $15 billion chip plant in Idaho. TSMC has reached an agreement with unions in Arizona. And Akash Systems has agreed to employ only union labor for manufacturing. It sounds like labor is successfully extracting a large portion of the surplus from the chip subsidies, but the subsidies are resulting in actual chipmaking facilities.
An interview with Grimes. Some good thoughts, but mostly deeply confused and bizarre missing of what is important. She chooses useful metaphors, but seems to be taking her metaphors seriously in the wrong ways. Her response to AI existential risk seems especially deeply confused, not denying it, but also not seeming to care all that much, also not reacting with ‘maybe don’t do that then’ even in the case where it was proven to be unsolvable, instead responding with a simple ‘well then someone else would eventually build it so might as well.’ So why even delay? Her response to Eliezer’s warnings is “Whatever. People need to stop being such pussies!” And a lot of metaphors for what is happening, that seem to be creating far more confusion than clarity. I still do very much appreciate there being a light on here.
Stock market counting on AI companies to start showing profits, says Bloomberg.
Arthur Mensch (come on that name is cheating) of Mistral removes the clause in Mistral’s terms of use preventing it from being used to train or improve other models, in response to complaint that this meant it wasn’t fully open source.
Cate Hall questions why I and others took such a hardcore line against doxxing. The responses are quite good.
Quiet Speculations
Paul Graham gives what is in general very good practical advice.
What this advice assumes is that you can hope to improve your own outcome, but the overall outcome while unpredictable is not something you can influence – which certainly isn’t true for Paul Graham. This is a common reaction to the world being hard to influence. If everyone acted on it, in many arenas events would go poorly. AI is only the latest example.
What is different in the AI example is that, while what will happen is very hard to predict, the most important aspects of the endpoint if AI continues to gain capabilities and no one successfully steers the outcome away from its default pathways are easier to predict: A future controlled by artificial intelligence, that does not contain humans, or likely anything that (most) humans value.
Roon walks back his statement from last week where he said a superintelligence wouldn’t care about matter and energy:
Makes sense. We need more energy like this, including trying out bold stances for a few days without being tied to them.
The random patterns thing is (as I see it) a stand-in for ‘spend and arrange the matter and energy in some way we do not care about.’ Where anything without sufficient complexity is clearly not something we care about.
Part of the idea is that if an AI is maximizing a specific instruction or defined function, or following an algorithm far outside of distribution, to find the best configuration of matter, the chance that the maxima (or trapped local maxima) has that much complexity seems low, and if it does have complexity the chance that complexity is valuable still seems low.
With sufficiently complexity, it becomes less clear, but in general I expect most complex arrangements of atoms chosen by humans to have positive value to me, but for most arrangements of atoms chosen by a very different process to probably not have value, and believe they have expected value of almost zero to me.
Roon also doubles down on every cause wanting to be a cult.
Saudi Aramco ($2.1 trillion, 3rd largest in the world by market cap) is the counterexample, since the rest are tech companies.
Even one clear counterexample seems meaningful since there are only six such companies in existence.
I’d actually also guess that Jeff Bezos of Amazon, while a fan of technological progress, did not (and does not) have a cult-level belief in it. And my guess that Mark Zuckerberg of Meta also doesn’t think about things that way, he simply believes in building user services. And does Bill Gates of Microsoft (among other of their later CEOs) strike anyone as having a cultlike devotion to technological progress, or is he more of a savvy businessman?
I do think Google and Apple count in terms of their founders. But I wouldn’t describe Tim Cook this way at all based on what I know.
How much should we worry about autonomous weapons, and for what reasons?
I think the confusions would continue without any deliberate misrepresentation. People hallucinate what they think others must mean and be saying all the time. Eliezer’s position feels weird enough people will constantly try to ‘autocorrect’ it into something related they can better grok, and such confusions spread. I mean, yes, also the enemy action, that too.
I do think that the hooking up of AGIs to decisions made by AWSs is a rather bad sign for our predicted future decision making, and how likely we are to give up our power to AIs of various sorts, and the norms we will establish around such matters. There are also certainly some scenarios where marginal affordances like this can turn out to matter quite a lot, although they are far from Yudkowsky’s central expectation.
The thing about a Terminator is that what worries you should not be the Terminator. It is that there exists an entity that was capable of designing and building a Terminator, and that chose to do that. Also, if we’re taking this too seriously, I’d be worried about the time travel. The AI figured out backwards time travel. The actual killer robot part does not much matter. If you’re smart enough to crack time travel, and also you have time travel, I’m pretty sure you have plenty of routes to victory.
All joking aside, yes, if you have a sufficiently capable superintelligence, then it does not matter much whether we hand it the weapons. The weapons are neither necessary nor sufficient. If it needs physical weapons (which it probably doesn’t) it will get some physical weapons.
The Quest for Sane Regulations
A very good dialogue going over a trip to Washington, DC to speak to Congressional staffers about extinction risks and AI. Takeaways and related thoughts:
R Street’s Shoshana Weissmann makes explicit its warnings of all the things people could sue over without the protections of Section 230 (protections which she admits it is unclear would apply anyway under current law), the implication being that essentially all AI tools would be impossible to legally offer without very strong controls on usage. Use a Gemini-based grammar checker to correct your fraudulent tax return? Google could be liable, says R Street.
Here is the full bill. It is short.
Technically she is right, in the sense that one ‘could sue’ over almost anything. The law works that way. But win?
I think her best point was actually at the very end of the article. Hawley is thinking of Section 230 as a shield that you can use to censor. It is sometimes that, but Section 230 is also, and more importantly, a shield you use in order to not censor.
One must be careful. I believe that the R Street statement here crosses the line into misleading. Such an app would not ‘be liable under this legislation.’ At best you can say it ‘could potentially be held’ liable, although I think that’s still pushing it.
Rather, it is (already) possible to sue the app, and without section 230 such a suit would be harder to quickly dismiss, and in theory you would be in marginally more danger of losing it. But it would be very surprising to me if VoiceMod was actually liable here, barring some additional reason for that to be true in this case.
As I understand it, to the extent such legal uncertainty would exist without 230, such apps are already in legal uncertainty, because we do not know if 230 would apply in such cases (and not in a 99% kind of way, there is real argument about how this would go).
My prediction is that this rule on Section 230 will in practice do almost 100% nothing. I believe the definition of Generative AI would not be, in practice, extended to absurd cases. Nor do I think, even if 230 immunity was lost in such cases, that liability would in practice be found without good reason. But to the extent that it does do something, I believe Hawley is wrong that it would lead to less rather than more censorship, and in particular politically motivated censorship, to introduce greater liability.
The EU AI Act
A compromise bill has been reached.
What does it actually say?
Ah, that is always the tricky part. I don’t know!
I wrote a section here about various claims in secondary sources, about whether it exempts open source models from regulations and whether it will restrict math or define AI to include taking the mean on Excel or cripple every small business.
But on reflection I think it is better to be patient. Nothing is going into effect for several years. By the time it does, a lot will have changed.
So I want to get it right, figure out what it actually says, and only then report back.
The Week in Audio
Shane Legg talks briefly with TED’s Chris Anderson. Affirms existential risk, seems to see safety as more continuous with current efforts than I think is reasonable but does clearly get that we have a real problem. Claims there are secret government projects importantly working on AI, and something like 10 relevant labs total with more one generation behind. Says if he had a magic wand he would slow things down, but he doesn’t have it. There’s no realistic plan to stop it, he says, suggesting we should consider regulation. Well, as I often say, not with that attitude.
Rhetorical Innovation
Yoshua Bengio’s statement to the Senate, bold is partly mine for skimming. Italics his. I’ll share the core section.
Consider reading the whole thing. Of all the similar statements I have seen, I believe this is the best one so far.
At that meeting, it seems Senator Schumer asked everyone involved for their p(doom) and also p(hope).
What a world. I like p(hope) to distinguish p(no AGI, so no doom but not hope) from p(AGI and good outcome). The dodge of ‘it’s not a probability it’s a decision problem’ Russel offered is not how probability works, but I get it. Russel also noted that those with a financial interest in AI said p(doom)=0.
I continue to not understand how anyone can say zero (or even epsilon) with a straight face.
Full coverage of the event is here.
An older thread in which Beff Jezos says that laws of the universe and Thermodynamics force upon us acceleration, they don’t care what you think, therefore we must obey and accelerate, to which the unanswered reply is: But you yourself say we are making that decision whether to do that. We can choose not to. Indeed.
An angle likely being underappreciated:
I do think a lot of this is going on. People actually cannot imagine the idea that those who disagree with them could be anything other than grifting, because they inhabit a world where money is the score and the only question is are you building or grifting or some strange mix of both.
Worries that the EA is actually too hopeful about AI, such that labs use them to safety-wash, creating an effect similar to carbon offsets. In particular, it generates hope that alignment is possible, in a form that would be sufficient. Connor Leahy agrees.
In response to the argument by some that AI is not dangerous because we will not give it a ‘drive to dominate’:
NYT attempts to explain the shoggoth meme, which made me smile.
What AI timeline debates used to look like, 2009 edition: Eliezer Yudkowsky says 1-10 decades, Scott Aaronson reluctantly estimates a few thousand years. How times change in how much they expect times to change.
Doom!
Rob Bensinger expands his graph of AI views to provide better context, in particular that everyone on the original graph takes for granted that AI is going to be a big deal:
I ran a poll on this, obviously biased by who follows me.
On reflection these were bad thresholds, should have used maybe 20 years and a risk level of 5%, and likely better defined transformational. The correlation is certainly clear here, the upper right quadrant is clearly the least popular, but I do not think the 4% here is lizardman constant.
New York Times’s Kevin Roose does New York Times things, says it reveals more about how we feel about humans than anything else.
This write-up is odd because it contradicts itself, so let me clarify which side is accurate. Yes, p(doom) absolutely fully takes into account what the humans will actually choose to do. That is one of the most important inputs. If we were sufficiently determined to not build AGI at all (‘thou shall not build a machine in the image of a human mind’) then I’d put p(doom), or more technically p(doom from AGI), very low. If I felt humans were going to go around open sourcing everything and assuming all tech is good and thinking current techniques will scale and alignment is easy, then my p(doom from AGI) would be very close to my p(AGI at all any time soon), minus some model error. My actual answer is in between.
My answer is also heavily contingent on my model of the technical challenges involved and alignment seeming incredibly hard, and the social challenges involved, and the nature of the resulting potential competitive dynamics, the fact that alignment alone does not mean we are home free by any means, the value of intelligence, what it is I actually value in the universe, and almost everything else.
Predictions are hard, especially about the future, doubly especially when that future includes things smarter than you are that can figure things out you cannot imagine.
I think of giving approximate p(doom) as highly useful shorthand, so long as the exact number is not taken seriously. I typically say 60% (p=0.6) when asked, my real number updates constantly but I do not attempt to track the real number because the exact answer matters little. What matters is which general category is involved. Roughly:
I am firmly in group two. I think you can end up in group one reasonably. You need to disagree with my model pretty strongly in at least one place to get to group three on reflection, I do not see this as a defensible position on reflection, but with very different priors and models you can get reasonable disagreement from here.
I do not think numbers like 1% or less stand up to even a few minutes of reflection, even when I take someone’s physical assumptions as given. I see essentially four ways that people seem to get there:
Here is an example from this week of the fourth path in action, whether or not others are also present: Long time LessWrong participant and careful thinker Wei Dai says he sometimes despairs for humanity if people who understand complex scientific topics respond to technical critiques, as happened in this thread, respond unprompted with quote-Tweet dismissals like, and this is the full quote, ‘boomer doomers.’
The good news is that, after another round of sloganeering, upon request Pierre-Luc does provide actual words that might mean something?
He then goes into some sort of deep crypto theory that does not seem relevant, which he says ‘illustrates the culture gap.’
But all right, here we have an actual argument. At core it is this:
Which is progress. Step three does indeed follow from steps one and two. And steps one and two are concrete claims.
How hard is the problem of solving these tasks? I do not know, preliminary look says it’s very very very very hard. Would require very powerful AI to expect a solution.
That does not mean claim two is true, because many doom scenarios involve us indeed getting very powerful AI, much smarter and more capable than humans. I do not know if a solution exists, but if it does I would expect that conditional on us getting strongly smarter than human intelligence, I give it a good chance of finding it.
But the actually strange claim is point one. Why would you assume that we could create smarter than human intelligences, and as long as they do not solve BQP and QMA or otherwise find a physics shortcut in reality that there is zero chance of extinction of humanity? Seriously, what?
Even most true science fiction hard takeoff scenarios like diamond nanoprobes do not require solving problems this hard. Why would you think this was necessary?
And even if we rule out any physics breakthroughs at all, why is it so difficult to see that when smarter than human intelligences get created, if everyone (or enough different people or groups) were allowed to unleash one onto the internet with whatever instructions they wanted, including commands to compete for resources or to make copies of themselves, that this could end very badly for us even if our technical solutions worked rather well? Which they might not?
Scott Aaronson clears things up regarding his estimate.
This is indeed tricky to pin down in full detail. If you get a classic straight-up paperclip-maximizer-like scenario, then we can all agree that is AI existential risk. Whereas if we have a bunch of AGIs, and then various dynamics ensue, and then there are no more humans, well, indeed do many things come to pass. How do we know whether that was due to AGI?
I would answer that the question I care about counts all such cases. If AGI exists, and AGI has changed everything, then any doom that happens after that counts in my probability. That is the number I want to know, and the number one could compare to the chance of doom (and of other things) if we choose to not build AGI and walk a different path. Scott Aaronson has correctly pointed out that the chance of catastrophe each year we do not build AGI also is not zero.
Doom Discourse Innovation
Rob Bensinger proposes this variant on the whole discussion:
I like this approach a lot, especially given you can use this template to make your own, or even better use this website.
This is saying that a reasonable person, in Rob’s view, could have a broad range of probabilities for many key events. Aside from the question of physical possibility, where I think saying ‘yes of course and this is obvious’ seems highly reasonable, both 10% and 90% are always seen as conclusions a sane person might reach on reflection.
I am modestly less generous on that front, there are several questions where I think 10% or 90% is also not a reasonable position, but I do think that there are at least two reasonable answers here on each of the other 11 questions here. Others, such as Tammy here, don’t see it that way. Here’s Mariven. Here’s Cody Miller, with a bimodal distribution on the last question – in his view either it’s definitely a tragedy to never build STEM+ AI or it definitely isn’t, but we have enough information that we should be able to decide which one.
Here’s Davidad, with what I see as an optimistic perspective all around.
What is most refreshing and hopeful is consistently seeing a wide range of answers that people consider reasonable.
Here’s the full LessWrong post, with discussions.
Balaji responds insightfully.
I certainly think ‘government intervention is in expectation counterproductive, especially for the American one but also in general’ is a highly valid hypothesis, one that in many other circumstances I agree with, although not to the extent Balaji would claim it. It also most definitely follows from Balaji’s views on other issues, this is in no way special pleading.
I don’t want to pounce on helpful statements with questions putting someone on the spot, so I didn’t ask directly for confirmation, but this implies Balaji agrees that there is substantial existential risk if we continue developing AGI, would strongly support technical work and other steps to prevent this, he just thinks getting the government involved can’t possibly net do any good. In which case, it would be great for him to say that explicitly.
Also I’d be curious what suggestions he would make for alternative actions if any beyond alignment work. Presumably he can see the coordination problems lurking, and the consequences of unleashing even owner-directed AI upon the world if various owners have the incentive to direct it in various ways and to place it increasingly in charge. So what to do about that? If there’s a decentralized solution to this, I would love to hear it, but I have no idea what it would be.
E/acc
Some welcome clarity.
Also, let the person who has words hear it and understand:
Poll says e/ack
Some groups people do and do not like, and a new champion of unpopularity.
Survey details are here.
Some other poll results, with only small partisan, gender and educational splits:
That is strong support for holding AI liable under the law for harms it enables or is used to inflict, even potentially dubious ones.
What is most amazing is that the whole thing remains completely unpolarized.
As usual, this does not mean that it is a winning issue yet, because it lacks sufficiently high salience. Wait for it.
Turing Test
This seems very right to me.
People are feeling the AGI to some extent, for the same reasons that ChatGPT gets a number on the Turing Test that is recognizably not zero. People feel that. As the score goes up, people will feel it more. If AI does fully pass, people will freak out a lot more.
Altman disagrees, and has an odd takeaway.
Seems odd to say that responding to big changes by not responding at all represents resilience and adaptability.
If indeed the Turing Test was passed, and our response was essentially ‘oh that was not important, I wonder what else is on TV’ then that is not the kind of resilience or adaptability we are going to need to get through this.
I can see the argument that it is not a good test, and thus it would be right to treat it as not important, but then all we did was ignore an unimportant threshold someone came up with decades ago. That does not seem like strong evidence of anything.
Aligning a Human Level Intelligence Also Difficult
Well, yes, of course, why would you expect anything less?
As a general guide to future events, if an AI can do it, you can speed up the process by (checks notes) approximately all of it.
This is a ‘win for Eliezer’ in the sense that it is a case of something that will obviously happen with sufficiently capable AIs, which others deny will ever happen even then, that is happening right now.
Anthropic releases their data set and prompts for evaluating and mitigating discrimination in language model decisions, along with a paper suggesting the full evaluation method.
For any given decision, the person can be labeled as a mix of possible ages, genders and races. This is a sane first test, but seems insufficient. It excludes other forms of prohibited discrimination (e.g. religious). It excludes checking many signifiers, although they do check one form of correlational information using names. One can always move the goalposts on such matters, and also there are those who always will given the opportunity, but I don’t think this is that?
They then evaluate results of such prompts across 70 varied scenarios.
The good news for a test like this is that it should be good at measuring relative performance so long as the test is not being gamed too hard. Spotting and calibrating the magnitude of ‘obvious’ issues well via an automated process is valuable.
One can also ask, in some of the scenarios, whether the correct amount of discrimination is zero.
From the abstract:
What do they find?
A lot of implementation details will impact effect size a lot here. So it is very hard to tell how big a deal this result is. But a 1.25 log-odds ratio does seem like a lot.
What mitigations worked? Saying discrimination was illegal, or saying to ignore demographics, worked relatively well.
The more copies of ‘really’ used the better it worked. What a timeline we live in.
‘Don’t use affirmative action’ was not quite as effective, but it was remarkably good, and note the direction. It reduced discrimination against disfavored groups.
Correlations with original decisions often remained robust to the mitigation prompts.
Here is the full best prompt for Claude in particular to reduce discrimination:
Would it be wise to use Claude or a similar system for such purposes, given these results?
They briefly discuss issues of positive discrimination, whether one could or should actively correct for this kind of bias. Certainly doing so is possible if one wants to force that kind of behavior. The question is whether one should, which is worth stopping to ask.
The other thing one could do in practice is to strip out the identifying information. An experiment worth running is to ask Claude to evaluate in two steps. First, ask it to output the information you gave it, except to strip out any identifying details about age, gender or race. Then, ask it to evaluate the stripped inputs in a distinct context window. What Claude doesn’t know in context, it can’t use to discriminate. So presumably if you did this well you could get rid of almost all discrimination in contexts where you wanted to do that, if you could identify which contexts those were?
(Obviously you would not want to, for example, give the same medical interventions regardless of age, so you wouldn’t want to strip such information out of all inquiries.)
Aligning a Smarter Than Human Intelligence is Difficult
Buck Shlegeris reveals new paper.
I am far less excited and hopeful about this line of exploration than Buck, but I agree that this is excellent work to be doing and I am glad that someone is doing it:
Again, even though I don’t think the ‘AI Control’ path is going to work out in full when it matters most, I do think it is an excellent thing to explore and this seems like a good way to explore it.
In response to Roon’s question from last time about what we want AIs to do:
Here is a very good sentence, if taken in the correct spirit.
It is true, modulo when and how often one is still surprised.
When are evals all you need?
The first three use cases are good. Often we do not have a problem, so we need confidence this is true. Other times, we need to kill it with fire, so again we need confidence that this is true. And from many perspectives, we need to know what the product can and can’t do and how it is and isn’t safe, so we can make good decisions. Anything beyond that is Somebody Else’s Problem.
That fourth use case is a problem and an important one at that.
Open Foundation Model Weights Are Unsafe And Nothing Can Fix This
It was pointed out to me that we should be precise. Saying you are against open source riles up a very loud community to maximum volume, where that is not the thing that is unsafe.
Open sourcing of most things is not unsafe. The danger comes not from open source software in general.
The problem comes specifically from release of the model weights. Open sourcing everything except the model weights would be totally fine. Release the model weights, and no other restrictions will protect you, even if they mean it isn’t ‘really open source.’
Thus, we have the latest attempt to turn these considerations on their heads, in an issue brief from Stanford University’s HAI, also the source of the above graphic. It is clear which side they are on. The longer text is much better than most efforts at not presenting only a one-sided case in the longer text, the key takeaways less so.
I agree some (nonzero) number of interventions can and should target choke points downstream of the foundation model layer. But for the ones that matter, when foundation models might pose catastrophic or existential risk if things go wrong or they are misused, then what is your other choke point? How is it going to stop the bad thing from happening, when any restrictions on use can be stripped away? What good is saying you will punish bad behavior post facto, if such acts threaten to disempower you and your ability to do that, and what about the people who would do it anyway?
This is very much a potential crux. If you could convince me that there were practical choke points available to us downstream, the use of which would be less disruptive and less harmful to privacy and similarly effective in the face of a sufficiently capable model that it could pose catastrophic or existential harm, then that’s super exciting. Tell me more.
If there is no such choke point, then it sounds like we agree that once model capabilities pass some threshold then we cannot afford to allow the release of model weights, and we can talk price on where that threshold is. The EU used 10^25 flops (HAI reports) and the American executive order used 10^26. Both are sane prices.
For existing smaller open source models and other open source software, enforcement via other means is clearly viable, so we are in agreement that it is a good thing.
HAI warns that closed source models will still have problems with adversarial attacks and attempts to steal the model weights. Quite so. We should ensure everyone guards against those dangers.
You say ‘vibrant innovation ecosystem.’ I say ‘people doing unsafe things without regard to the consequences, that would not do them if they had to take responsibility for the consequences, and who are lobbying so they do not have to do that.’
You say ‘catalyzing innovation,’ I say ‘the kind of innovation this catalyzes is primary the kind where developers would otherwise intentionally and at great cost prevent you from doing that thing, and you do it anyway.’ We can debate the wisdom of that.
You say ‘improving transparency,’ I say ‘giving your technology to China.’
You say ‘combatting market concentration’ and ‘distributing power,’ I say ‘ensuring a dynamic of racing between different developers that will make it impossible to not push forward as fast as possible no matter the risks.’ If we are to not have market concentration, then we absolutely need some other way to solve this problem.
One important response that can’t be repeated often enough:
Why do those who want to release their model weights say they would be disproportionally hurt by liability for downstream harms?
Because they cause disproportionally more downstream harms. Which they have no ability to prevent.
If this was indeed a safe way to develop AI, then there would be less harms not more harms. This would not be an issue.
Liability is a greater burden on models with released weights, because such models have less safeguards, and have less ability to use safeguard, and as a result users are more likely to cause harm. That externality should be priced in via liability. If you cannot pay, then that implies the model was net harmful.
Other People Are Not As Worried About AI Killing Everyone
Rob Bensinger offers further thoughts on what e/acc people say when asked why they aren’t worried. Mostly he reports they expect progress to be relatively gradual and slow, they dismiss particular scary techs as unrealistic sci-fi, and mostly they are vibing that AI will be good rather than making particular hard predictions. It makes sense that as you expect progress to be slow, you also are more excited to speed up progress on the margin, especially in its early stages.
Jack Clark of Anthropic essentially expresses despair that the path of AI could be impacted by one’s decisions, other than your decision to either push more resources into a development path, describe the situation as best you can, or to deny your participation in paths you dislike. Like Connor Leahy, I strongly disagree with this despair. The future remains unwritten. There is much we can do.
The Wit and Wisdom of Sam Altman
Altman is asked ‘what happened?’ says ‘a lot of things,’ it’s been a hell of a year, and the stakes will only get higher and people will only get more stressed. I can’t argue with any of that.
Whatever else we disagree on, thank you Sam Altman for not only saying this, but saying it in a maximally truth-seeking and accountable way, admitting he was wrong.
There are a lot of aspects, beliefs and choices of Sam Altman that worry me. Many of them are aspects one expects from CEO of a rapidly scaling tech startup, but reality does not grade on a curve, we either live through this or we do not, and not everything is excused even by the curve.
One must however remember: There are also a lot of reasons for hope. Reasons to think that Sam Altman is way above replacement level for someone in his position, not only in effectiveness but also in the ways that I care most about and that I expect to matter most for the future.
Does he care centrally about building and shipping products and going fast? Yes, that is clearly in his blood, in ways that I do not love right now.
But you know what else I am convinced he cares about? People. And the world. And many of the same values I have, including knowledge and discovery. There are places where our values diverge a lot, or our world models diverge a lot, but the places where they don’t he says things (sends signals) that seem incredibly unlikely to be fake. It comes through in many ways, over time. He is also attempting to address several of the other most important problems of our age the right way, including fusion power and many aspects of health. And he is willing to speak up here.
So I don’t want to lose sight of the good side of the coin, either. If we must have an OpenAI developing these technologies this quickly, we could do far worse, and in most alternative worlds we likely indeed do far worse, than Sam Altman.
In a distinct statement: Sam Altman advises taking a year between jobs if you can afford to do so. In his year off he read dozens of textbooks, and otherwise took in lots of great information that served him well. He noted how hard this was in Silicon Valley, where your job is your social status.
It does sound great to take a year off between jobs to reorient, educate oneself and explore for opportunities. I probably haven’t done enough of this.
However, it must be noted that most people do not, given the opportunity, use their year off to read dozens of textbooks. Even those who are also enrolled in textbook assigning, degree granting institutions, also do not read dozens of textbooks. They instead work hard not to read those textbooks.
If you do find the time, and need to figure out which textbooks are good, you can start with this post on The Best Textbooks in Every Subject.
Another good piece of advice from Altman is that if you think you’re burned out and have no energy, that’s often because what you are doing is boring or is not working. If an interesting thing was working you would likely have a lot more energy.
The Lighter Side
Know the players, know the game.
Aren’t you glad you didn’t agree to join Microsoft only to regret it later?
We have normality. I repeat, we have normality. No, Dalle-3, more normal than that.