In previous studies, we consistently find an equalizing effect from use of LLMs. High performers improve, but low performers improve a lot more.
I was just talking to a math teacher last night about something similar. He was talking about how COVID really hurt math learning and the lowest performers aren't recovering from it (doing even worse than pre-COVID low-performers). I had been talking to him about how I use ChatGPT to learn things (and find it particularly helpful for math), so I asked if he thought this kind of thing might help narrow the gap again.
His answer is that he thinks it will increase the gap, since the kids who would actually sit down and ask an AI questions and drill down into anything they're confused about are already doing fine (better than pre-COVID since they don't have to wait for the slow kids as much), and the kids who are having trouble wouldn't use it. Also, the benefit is that AI can immediately answer questions about the part that you, personally, are confused about or interested in, so neither of us thing it would be that helpful to try to force kids to use it if they don't want to.
(Schools also have annoying but reasonable concerns that if you tell kids to use the magic machine that can either help them learn faster or just do their homework for them, many of the kids will not use the machine in the way you're hoping for)
The book Crystal Society is now available in audiobook form, with AI voices playing various parts, here on YouTube and here on Spotify.
I found the AI voices to be really surprisingly good. I love the book, and highly recommend the series. Even if you've read it before though, I think it's worth at least listening to a few minutes just to see if you are also impressed by the voice acting.
in 2024 we will presumably see 4-level AI out of the bottle.
Relevant market: GPT4 or better model available for download by EOY 2024? | Manifold
Help you write an award-winning science fiction novel? Journalism professor Shen Yang says yes. I do not know details, but my guess that Yang was central to the actual book and the AI should not get so much credit.
It is 6000 Chinese characters long, cut down from a "draft of 43,000 characters generated in just three hours with 66 prompts."
It looks like it one a "level-two" prize for the "Youth Science Education and Science Fiction Competition". IDK if that means the competition was for Chinese YA sci-fi, but it kind of feels like it. 3/6 judges approved the work. One judge recognized it was written by AI and didn't vote for it. Another judge was informed according to the article, but not the Wikipedia page. The organizer said it wasn't bad but didn't develop well and wouldn't have met the standards for publication. He plans to allow AI-generated content in 2024.
From Wikipedia:
The novel was among 18 submissions that won the level-two prize at the Fifth Jiangsu Youth Science Education and Science Fiction Competition (第五届江苏省青年科普科幻作品大赛). The contest was restricted to participants between the age of 14 and 45 but did not forbid entries generated by AI. One of its organizers reached out to Shen after finding out that the professor had been experimenting with writing science fiction using AI. The judges were not told about the novel's origin at the time. Three of them, out of the six, approved the work. One judge, who had worked with AI models before, recognized that the novel was written by AI and criticized the work for lacking emotional appeal. The organizer who had contacted Shen said the novel's introduction was not bad but the story did not develop well. It would not meet standards for publication. However, he still plans to allow AI-generated submissions in 2024.[3][1]
From the article:
Among the judges, only one was notified that Shen had used AI in his work, according to the report. But another judge, who had been exploring AI content creation, recognised that Shen’s work was AI-generated. The judge said he did not vote for the submission because it was not up to standard and “lacked emotion”.
The best argument against scaling working, from what I have seen, is the data bottleneck
A $10B-$100B training run could maybe employ about 1e28 FLOPs with about 1M GPUs, it's not feasible to get much more on short notice. With training efficiency improvements, this might translate into 1e29 FLOPs of effective compute.
The scaling laws for use of repeated data
estimate that repeating data 16 times is still useful. Chinchilla scaling laws estimate that a dense transformer with X parameters should use 20X tokens to make the best use of the requisite 6*X*20X FLOPs of compute. Notice that X is squared in the FLOPs estimate, so repeating data 16 times means ability to make use of 256 times more FLOPs. Crunching the numbers, I get 2e29 FLOPs of compute for 50T tokens of training data (with even more effective compute). There's a filtered and deduplicated CommonCrawl dataset RedPajama-Data-v2 with 30T tokens.
So we are good on data for the next few years. It's not high quality data, but it's currently unknown if it will nonetheless suffice. GPT-4 doesn't look like it can be scaffolded into something competent enough to be transformative. But going through another 3-4 OOMs of compute after GPT-4 is a new experiment that can reasonably be expected to yield either result.
Gallabytes: ... This is a pretty big factor in why I expect some kind of diffusion to eventually overtake AR on language modeling too. We don’t actually care about the exact words anywhere near as much as we care about the ideas they code for, and if we can work at that level diffusion will win.
It will probably keep being worse on perplexity while being noticeably smarter, less glitchy, and easier to control.
Sherjil Ozair: Counterpoint: one-word change can completely change the meaning of a sentence, unlike one pixel in an image.
I think this is an interesting point made by Gallabytes, and that Sherjil misses the heart of it here. Adding noise to a sentence must be done in the semantic-space, not the token-space. You shift the meaning and implications of the sentence subtly by switching words for synonyms or near synonyms, preserving most of the meaning of the sentence but changing it's 'flavor'. I expect this subtle noising of existing text data would greatly increase the value we are able to extract from existing datasets, by making meaning more salient to the algorithm than irrelevant statistical specifics of tokens. The hard part is ensuring that the word substitution in fact moves only a small distance in semantic-space.
yeah I basically think you need to construct the semantic space for this to work, and haven't seen much work on that front from language modeling researchers.
drives me kinda nuts because I don't think it would actually be that hard to do, and the benefits might be pretty substantial.
In practice, one can think of this as ChatGPT committing copyright infringement if and only if everyone else is committing copyright infringement on that exact same passage, making it so often duplicated that it learned this is something people reproduce.
Definitely. Currently, I am of the opinion that there's nothing LLMs do with their training data that is fundamentally much different than what we normally describe with the word "reading," when it happens in a human mind instead of an LLM. IDK if you could convince a court of that, but if you could it would seem to be a pretty strong defense against copyright claims.
If you were asked to write new copyright laws which apply only to AI, what laws would you write? Specifically, would you allow AI developers to freely train on copyrighted data, or would you give owners of copyrighted data the right to sell access to their data?
Here are two non-comprehensive arguments in favor of restricting training on copyrighted outputs. Briefly, this policy would (a) restrict the supply of training data and therefore lengthen AI timelines, and (b) redistribute some of the profits of AI automation to workers whose labor will be displaced by AI automation. I'd also suggest that the policy should be evaluated on its consequences, rather than its adherence to some essentialist notion of fair use, or whether this policy designed to apply to AIs would be a good policy if applied to humans.
(a) restrict the supply of training data and therefore lengthen AI timelines, and (b) redistribute some of the profits of AI automation to workers whose labor will be displaced by AI automation
It seems fine to create a law with goal (a) in mind, but then we shouldn't call it copyright law, since it is not designed to protect intellectual property. Maybe this is common practice and people write laws pretending to target one thing while actually targeting something else all the time, in which case I would be okay with it. Otherwise, doing so would be dishonest and cause our legal system to be less legible.
I think it’s pretty common and widely accepted that people support laws for their second-order, indirect consequences rather than their most obvious first-order consequences. Some examples:
These aren’t necessarily perfect analogies, but I think they suggest that there’s no general norm against supporting policies for their indirect consequences. Instead, it’s often healthy when people with different motivations come together and form a political coalition to support a shared policy goal.
I think these examples may not illustrate what you intend. They seem to me like examples of governments justifying policies based on second-order effects, while actually doing things for their first-order effects.
Taxing addictive substances like tobacco and alcohol makes sense from a government's perspective precisely because they have low elasticity of demand (ie, the taxes won't reduce consumption much). A special tax on something that people will readily stop consuming when the price rises won't raise much money. Also, taxing items with low elasticity of demand is more "economically efficient", in the technical sense that what is consumed doesn't change much, with the tax being close to a pure transfer of wealth. (See also gasoline taxes.)
Government spending is often corrupt, sometimes in the legal sense, and more often in the political sense of rewarding supporters for no good policy reason. This corruption is more easily justified when mumbo-jumbo economic beliefs say it's for the common good.
The first-order effect of mandatory education is that young people are confined to school buildings during the day, not that they learn anything inherently valuable. This seems like it's the primary intended effect. The idea that government schooling is better for economic growth than whatever non-mandatory activities kids/parents would otherwise choose seems dubious, though of course it's a good talking point when justifying the policy.
So I guess it depends on what you mean by "people support". These second-order justifications presumably appeal to some people, or they wouldn't be worthwhile propaganda. But I'm not convinced that they are the reasons more powerful people support these policies.
or only caring about one’s own opinion.
Or the people whose opinions you care about predictably agreeing whether a work is GOOD. I can see why someone might only care about the opinions of a small group who they know, respect and trust.
The New York Times has thrown down the gauntlet, suing OpenAI and Microsoft for copyright infringement. Others are complaining about recreated images in the otherwise deeply awesome MidJourney v6.0. As is usually the case, the critics misunderstand the technology involved, complain about infringements that inflict no substantial damages, engineer many of the complaints being made and make cringeworthy accusations.
That does not, however, mean that The New York Times case is baseless. There are still very real copyright issues at the heart of Generative AI. This suit is a serious effort by top lawyers. It has strong legal merit. They are likely to win if the case is not settled.
Table of Contents
Language Models Offer Mundane Utility
A game called Thus Spoke Zaranova where you have to pretend to be an AI, Tweet thread, design notes. Premise is of course rather silly, but is the game interesting or fun? I do not know.
In previous studies, we consistently find an equalizing effect from use of LLMs. High performers improve, but low performers improve a lot more.
Now we have a study that finds the opposite effect. Entrepreneurs in Kenya were given AI ‘mentor’ access via WhatsApp. High performers benefited, low performers were harmed.
The paper is sweet. It is alas short on concrete examples, so one cannot search for patterns and check on various hunches.
One hunch is that higher performing entrepreneurs know what the important questions and important details are, and also they face genuinely easier questions at the margin. Operating a business that is not going well is way harder than operating one that is working, the flip side being you could have massive low-hanging fruit. But trouble begets its own forms of trouble. And I suspect that most people in such situations do not turn in writing to others for help and get it, in ways that would make it into one’s training set. For them context becomes more important.
Also, there is a background level of skill required to understand what information is important to include, and to identify which parts of the LLM’s answer are likely to be accurate and useful to you. When the AI is giving you advice, you need to be able to tell when it is telling you to shoot yourself in the foot or focus on the wrong thing, or is flat out making things up.
So I suspect the task graph is not telling the central story. As they say, more research is needed.
Another clear contrast is that here the AI is being used as a mentor, to learn.
Whereas in other tasks, the AI is being used as more of an assistant and source of output.
An alternative source that helps do your work is an equalizing force. The consultants use GPT-4 to write a draft of the report. If you suck at writing drafts, that’s very helpful. If you are good at it, it is not as helpful.
A source of knowledge is different. Being able to be mentored, to learn, is a skill.
Help you write an award-winning science fiction novel? Journalism professor Shen Yang says yes. I do not know details, but my guess that Yang was central to the actual book and the AI should not get so much credit.
Draft a law?
Justin Amash has suggested a rule that before you pass a bill you have to read the bill out loud. That might help.
Mostly in such cases, the ‘real’ bill is one sentence or paragraph of intent. The rest is implementation and required technical language. In theory it should be fine to use a program to translate from one to the other, and also back again. But this is not something you want to sometimes get wrong. So, for now at least? Check your work.
Emmet Shear keeps looking.
Those were the two he found promising. Adam Smith suggests prompts here for proper formatting.
If I was going to do more than 15 minutes of rambling, I would want not only repetitions removed and proper punctuation, but actual sorting of everything and logical interpretation. Otherwise does this really work?
What are the actually good tools? This is one of many places where it seems like a good tool would be valuable, but it has to be very good to be worth anything at all.
GPT-4 Real This Time
What do people want from GPT in 2024? Altman asked and got six thousand answers, he listed the top responses, which I’ve formatted as a numbered list.
I am surprised ‘price cuts’ did not make the most wanted list.
Number ten here is interesting. Quite the sign of a Big Tech company in the making. If you sign in with OpenAI, what can then be integrated into the website? Can you give it permission to use your API key to enhance your experience? Can they mediate trusted data in both directions? Could get very interesting.
My number one mundane request isn’t on the list either. It is for better probabilities, estimation and guessing. GPT-4 is notoriously reluctant to engage in such activity right now even when explicitly asked to do so, which is super frustrating. Perhaps a GPT or a good prompt would do it, though.
I very much agree that the browsing could use an update. I would also like better GPTs, better reasoning (for mundane utility tasks, and up to a point) and especially control over degree of wokeness/behavior. Video personalization and voice mode aren’t my areas, but sure, why not, and maybe I’d use them if they improved. And of course everyone loves higher rate limits, although I’ve never actually hit the current one.
Then there are the other three requests.
You think you want AGI in 2024? Let me assure you. You do not want AGI in 2024.
Perhaps you will want AGI later, once we are ready. We are not ready.
Altman did not ask for patience on GPT-5. I expect to be fine with GPT-5 once it is properly tested, and I expect life to get better in many ways once that happens. But it definitely makes me nervous.
Then there’s Open Source. Of course Twitter is going to scream for it. They want OpenAI to give away its lead, believe open source does no wrong and don’t understand the dangers. Luckily I am very confident Altman knows better.
Fun with Image Generation
MidJourney 6.0 is clearly much better at following directions.
Eliezer Yudkowsky notes its progress at following specific prompts. He still plans to wait a few months for the next upgrade before really going to town, because it’s not quite where he’d like it to be, so even if it would work now, why rush it?
There are also lots of examples of very cool pictures with exquisitely rich detail. It’s pretty great. It is especially great at particular famous people.
It also rewards a different prompting style. Before you wanted a lot of keywords separated by commas. Now you want to use English.
The output still is not quite right on other details, but two out of four Is are perfect.
Some predictions worth noting.
Some complaints along similar lines are that MJ 6.0 is perhaps a little too good at following directions, recreating fictional worlds and replicating particular people and objects…
Note that these are not exact copies. They are very clear echoes, only small variants. They are not replicas. There are a number of examples, all of which are iconic. What is happening?
As discussed above, I believe this is not overfitting. It is fitting. It is a highly reasonable thing for a powerful model to do in response to exactly this request.
It is only overfitting if you see these particular things bleed into answers out of distribution, where no one asked for or wanted them. I have not seen any reports of ‘it won’t stop doing the iconic thing when I ask for something else.’
This is not something that v6.0 will do for every picture. It will only (as I understand it) do this for those that ended up copied over and over across the internet, such as movie promotional pictures or classic shots. The iconic. Then you have to intentionally ask for the thing, rather than for something new. The prompts are simple exactly because the images are so iconic.
In which case, yes, most cases like this that it is working with do look very similar to the original, so the results will look like the original too. It seems likely MidJourney will need to actively do something to intercept or alter such prompts.
If you ask for something else, you’ll get something else.
One must notice that this does not actually matter. Why would you use an image generator to generate a near copy of an image that you can easily copy off of the internet, a capability you will have in 100% of such cases? Why would it matter if you did? Don’t you have anything better to do?
That is not to dismiss or minimize the general issue of copyright infringement by image models. Under what conditions should an image model be allowed to train off of images you do not own? Who should have veto power? What if anything do you owe if you do that? How do we compensate artists? What restrictions should we place if any on creation or use of AI generations?
Those and others are questions first our courts, and ultimately our society, must answer. The ability to elicit near copies of iconic movie stills should not even make the issue list.
Copyright Confrontation
The New York Times, once again imitating Sarah Silverman, finally officially sues Microsoft and OpenAI for copyright infringement. Somehow they took their paywall down for this article. They are the first major company to do this. The Times expects the case to go to the Supreme Court.
That seems like the right procedure. The two Worthy Opponents can battle it out. Whatever you think should be the copyright law, we need to settle what the copyright law actually says, right now. Then we can decide whether to change it to something else.
So what is the NYT’s case?
The right plaintiff. The right argument. Much better to say ‘your outputs copy us’ than ‘your inputs come from us.’
How did the Times generate that output?
I quickly looked. The complaint does not say exactly how they did it, or how how cherry-picked this response was. In general, how do you get a verbatim (or very close) copy of a Times article? You explicitly ask for it.
If you can get normal NYT passages this closely copied without any reference to The New York Times, without any request to quote an article, then I would be pretty surprised.
In a handful of famous cases, there seems to be an exception. Exactly as in the MidJourney examples, why are we seeing NYT article text almost exactly (but not quite) copied anyway in some cases? Because it is iconic.
In practice, one can think of this as ChatGPT committing copyright infringement if and only if everyone else is committing copyright infringement on that exact same passage, making it so often duplicated that it learned this is something people reproduce.
This presumes that The New York Times is in a settling mood and will accept a reasonable price, in a way that sets a precedent that OpenAI can afford to pay out for everyone else. If that was true, then why were things allowed to get this far? So I presume that the two sides are pretty far apart. Or that NYT is after blood, not cash.
I think this is not all that spicy a hypothesis. Seems rather likely. I am sure, given they paid Politico, that OpenAI would settle with the New York Times if there was a reasonable offer on the table. Why take the firm risk? Why not be in business with the Times and set a standard? Because NYT is looking for the moon.
I would be careful if I was the Times. Their reputation and that of journalism and legacy media in general is not what it once was. ChatGPT provides a lot more value to more people than it is taking away from a newspaper. I am also amused by the Streisand Effect here, where Toner’s paper is now being quoted exactly because it was used as part of a boardroom fight.
What harm is being done to the New York Times? Yes, there were times when ChatGPT would pull entire NYT articles if you knew the secret code. But those codes become invalid if a lot of people use them, so the damage will always be limited.
The flip side is that the public is very anti-AI, and most people aren’t using ChatGPT.
Seems pretty flimsy to me. Yes, hallucinations happen, but that’s not copyright, and I find it hard to believe the NYT reputation is in any danger here. They are welcome to try I suppose but I would have left this out.
Especially because, well, here’s how you get Bing to say that…
In general, I find it unwise to combine good arguments with bad arguments.
As usual, open source people think they should not have to pay for things or obey the rules. They believe that they are special. That somehow the rules do not (or should not) apply to them. Obviously, they are mistaken.
I get the claim that it is not ‘stealing’ because it is not a rival good and the marginal cost is zero. In a better world, things like The New York Times would be freely available to all, and would be supported via some other mechanism, with copyright only existing to prevent other types of infringement. We do not live in that world.
What is ultimately the right policy? That depends on what you care about. I see good reasons to advocate for mandatory licensing at reasonable prices the way we do it in music. I see good reasons to say it is up to the copyright holder to name their price. I even see good reasons for setting the price to zero, although I think that is clearly untenable for our society if applied at scale. We need a way to support creators.
Some disgree. Other creators, the creator (of open source software) disregards.
I mean… yes?
Google said ‘I am going to take your intellectual property and give it away for free on the internet without your permission.’
That is… not okay? Quite obviously, completely, not okay?
No, it is not ‘free money’ if the price is universal free access to your product? That is a lot more like ‘we sell the rights to all our products, forever’?
The world would be a better place if Google were to pay the publishers and writers so much money that they were happy to make the deal, and then all the books were available online for free. That does not mean that Google’s offer was high enough.
You need a way to support creators. You need to respect property.
Ideally we find a way that works for all, both for books and for data.
Early action says if this does not settle then NYT will likely win. I think that’s the wrong question, though? What matters is the price.
Deepfaketown and Botpocalypse Soon
Defense, on the offensive?
An objection letter could scarcely be more self-damning. Why does it matter what search method identified the claimed plagiarism? Either the passage is the same, or it is not. If ChatGPT is hallucinating, check both sources and prove it, and that will be the end of that. If both check out, what are you complaining about? That you got caught?
That’s the thing. If you hallucinate 50% of the time, but you find the answer 50% of the time, and I can verify which is which and take the time to do that, then that is a highly useful search method.
The real existential risk from AI, power might decide, is that people might be able to discover all the crime power has been doing and all the lies it has been telling. In which case, well, better put a stop to that little operation.
Also, does this give anyone else an idea?
If, as many of her defenders claim, Gay’s offenses are a case of ‘everyone does it all the time’ then we have the technology to know, so let’s test that theory on various top scholars.
There are three possibilities.
If this is not actually something that ‘happens to the best of them’ then that should be conclusive evidence. Presumably it would be insane to then allow her to remain President of Harvard.
If this is actually something that ‘happens to the best of them,’ if indeed everyone is doing it, then one must ask, is this the result of inevitable coincidences and an isolated demand for crossing every T and dotting every I, or is it that much or most of academia is constantly committing plagiarism?
If we decide this is not the true plagiarism and is essentially fine, then we should update the rules to reflect this, including for students, make it very clear where the line is, and then decide how to deal with those who went over the new line.
If we decide that this indeed the true plagiarism, and it is not fine, then we will need some form of Truth and Reconciliation. Academia will require deep reform.
Whether or not we have the technology now, we will have it soon. The truth is out there, and the truth will be getting out.
More coverage of the fact that the President of Harvard seems to have done all the plagiarism, this fact is now known and yet she remains President of Harvard will likely be available in the next Childhood and Education roundup, likely along with a reprise of this section.
Going Nuclear
Last week I noted with some glee that Microsoft was using a custom LLM to help it write regulatory documents for its nuclear power plants. I thought it was great.
Andrew Critch and Connor Leahy do not share my perspective.
Yes. No. The process in question is bullshit paperwork.
My position remains closer to Alyssa’s here:
There is a big difference between ‘AI is used to run the nuclear power plant’ and ‘AI is used to file tens of thousands of pages of unnecessary and useless paperwork with the NRC.’ I believe this is the second one, not the first one.
If this indeed a huge disaster? Then that will be a valuable lesson, hopefully learned while we still have time to right the larger ship.
In Other AI News
Nancy Pelosi buys call options on $5 million of Nvidia, expiration 12/20/24, her largest purchase in years. You know what to do.
Apple plans to enter the AI game, and wants to run the AI directly on iPhones rather than on the cloud, offering a paper called ‘LLM in a flash: Efficient LLM Interface with Limited Memory.’
As usual, my first thought is ‘why the hell would you publish that and let Google and Microsoft also have it, rather than use it.’
Getting a reasonable small distilled model, that will do ‘good enough’ practical inference relative to competitors, seems relatively easy. The hard part is making it do things that customers most value. That is much more of Apple’s department, so they definitely have a shot. One handicap is that we can be confident Apple will absolutely, positively not be having any fun. They hate fun more than Stanford.
Scott Sumner analyzes geography of the distribution of AI talent. The talent tends to migrate towards America despite our best efforts. It is always odd to pick what ‘talent’ means in such contexts. What is the counterfactual that determines your talent level?
Quiet Speculations
Richard Ngo (OpenAI) points out that of course some AIs in the future will act as agents (like humans), some will act as tools (like software) and there will also be AI superorganisms, where many AIs run in parallel.
Richard Ngo spars with David Deutsch and others on Twitter over how to think about LLMs and their cognitive abilities.
This was another commentary on Ngo’s original statement:
Arnold Kling speculates on future mundane utility, finds robotics, mentors, animations. I think this is keeping things too grounded.
Dwarkesh Patel ponders how likely it is scaling LLMs will lead to transformational AI, framed as a debate. He puts it at 70% that we get AGI by 2040 via straightforward scaling plus various algorithmic and hardware advances, about 30% that this can’t get there, which leaves 0% for it being able to get there but not by 2040. He notes things he doesn’t know about would likely shorten his timelines, which implies his timelines should be a little shorter via conservation of expected evidence.
The best argument for scaling working is that scaling so far (at least until, perhaps, very recently) has matched predictions of the ‘scaling will work’ hypothesis scarily well, whereas the skeptics mostly did not expect GPT-4.
The best argument against scaling working, from what I have seen, is the data bottleneck, both in terms of ‘you will run out of data and synthetic data might not work’ and ‘you will run out of data because your data is increasingly duplicative, and your synthetic data will be as well.’ Or perhaps it’s the ‘something genuinely new is difficult’ and yes Dwarkesh notes the FunSearch mathematical discovery thing from last week but I am not convinced it counts here.
He notes the question of Gemini coming in only at GPT-4 level. I also think it’s worth noting the host of GPT-3.5 level models stalling out there. And no, basically no one predicted it in advance, but there is indeed a certain kind of logic to what GPT-4-level models can and cannot do.
I think Dwarkesh’s 70% probability estimate is highly reasonable, if we count various forms of scaffolding and algorithmic innovations. It is in the range where I do not have the urge to bet on either side. Note that even if the 30% comes in, that does not mean that we can’t build AGI another way.
It is a crux for some people’s timelines. Not everyone’s, though.
I would bet against that 2-6 year timeline heavily if we knew scaling was not available.
I would consider betting against it without that assurance, but that would depend on the odds.
Yes, new knowledge is created by inference from observations. That does not mean that the ‘create new knowledge’ complaint is not pointing at a real thing.
The UN Reports
NOTE: Most of you should skip or at most skim this section.
That comes at point #70 in their full report.
Alas, no, this does not reflect them actually understanding existential risk at all.
Mostly they are instead the UN doing and pushing UN things:
That looks like seven steps to still not doing anything with teeth. Classic UN. They think that if they say what they would like the norms to be, people would follow them. And that policy needs to be ‘anchored in the UN charter, International Human Rights Law, and the Sustainable Development Goals.’
They emphasize that we should prioritize ‘universal buy-in.’ There is one and only one way your AI policy gets universal buy-in, and I do not think the UN would like it.
The UN’s entire existence (and all of human history) falsifies these and their other hypotheses.
That does not mean they cannot wishcast for worthwhile or harmful things, and maybe that would matter on the margin, so I checked out their full report.
It is what you would expect. Right off the bat they are clearly more worried about distributional effects than effects. Page three jumps to ‘the critical intersection of climate change and AI opportunity,’ which of course ignores AI’s important potential future impacts in both directions on the problem of climate change.
They are ahead of many, but clearly do not know what is about to hit them:
Constantly people talk about how it is a difficult time. Yet would not any time earlier than now have been clearly worse, and has this not been true for every year since 1945? And of course, if AI does arrive for real in a positive way (or a negative way, I suppose, for different reasons), the Global South will rapidly have far fewer worries about electricity or broadband access. Already they have less such concerns every year. Even if AI does not arrive for real, mobile phones work everywhere, and I continue to expect AI to reduce effective consumption inequality, in addition to that inequality having been rapidly falling already for a long time.
Here is how they view the risks, to be fair this is risks ‘today’ rather than future risks, but still it is a reminder of how the UN thinks and what it believes is important.
The entire section makes it continuously clear they do not get it, that they see AI as a tool or technology like any other and are rolling out the same stuff as always.
So close. And yet, wow, so far. Zero mention of existential risks in the risks section.
In other so close and yet so far statement news:
Even in their wheelhouse of dreams, they fall short. Cannot rule out, really?
Tragedy as comedy. If only the risks were indeed clear to them, alas:
The Week in Audio
Liron Shapira on Theo Jaffee.
Nathan Lebenz on 80k hours. This was recorded a few weeks prior and discusses the OpenAI situation a lot, so a lot of it already looks a little dated.
Paul Bloom on EconTalk ostensibly asks ‘Can AI be Moral?’ and they end up spending most of their time instead asking whether and how humans can be moral. They do not discuss how one would make an AI moral, or whether we have the ability to do that, instead asking: If you did have the ability to get an AGI to adapt whatever morality you wanted, what would you want to do?
The book Crystal Society is now available in audiobook form, with AI voices playing various parts, here on YouTube and here on Spotify.
Rhetorical Innovation
I believe that the people who say ‘this is our religion’ are doing a religion or cult, and the ones that don’t say that probably aren’t?
Your periodic reminder and attempted explanation that the Eliezer Yudkowsky position is not that he or any of his allies need to be in charge, but rather that it needs to be one or more humans in charge rather than an AI being in charge. He believes, and I agree with him, that many humans including the majority of those we argue against all the time have preferences such that they would give us a universe with value and that provides existing humans with value.
There are definitely humans who, if entrusted with such power, would get us all killed or otherwise create a universe that I thought lacked value. Andrew Critch has estimated that 10% of AI researchers actively support AI replacing us. Some advocate for it on the internet. Others actively want to wipe out humanity for various reasons, or have various other alien and crazy views. Keep such people away from the levers of power.
But I believe that most people would, either collectively or individually, if given the power, choose – not only for themselves but for humanity as a whole – good things over bad things, existence over non-existence, humanity over AI, life over death, freedom over slavery, happiness over suffering.
We would likely disagree a lot on what is the good, but for their view of the good to still be good. There are impossibly difficult problems to navigate here. From our current situation, what matters most is ensuring that it is people who get to make those choices, rather than it being left to AI or various runaway dynamics that our out of our control. I am highly flexible on exactly which humans are choosing.
An attempted Eliezer Yudkowsky metaphor that didn’t really work due to its details.
What is the true division?
The AI situation is different in the sense that most of the ‘everyone else’ has not yet paid any attention and don’t understand the problem, whereas many of those usually on the technology side have indeed noticed that this situation is different.
AI With Open Model Weights Is Unsafe and Nothing Can Fix This
A central problem with discussions that involve the words ‘open source’ is that advocates of open source are usually completely unwilling to engage with the concept that their approach could ever have a downside.
Alas, it does have downsides, such as the inability to ever build meaningful safety into anything with open model weights.
The good news is that open source work continues to be very much behind the major labs, does not seem to be innovating beyond interpretive compute efficiency, and useful AGI development looks to require massive amounts of compute in ways that make it possible (at least in theory) to do it safely.
[many examples of such copium throughout the threads.]
Roon emphasizes, I believe correctly, that open source efforts should be concentrating on the tasks where they have comparative advantage, which is post-training innovations. All this focus on training slightly more capable base models is mostly wasted.
Instead of trying to force new open models to happen, such builders should be figuring out how to extract mundane utility from what does exist, then apply their techniques to new models as they emerge over time. The ability to give the user that which the big labs don’t want to allow, or to give the user something specialized that the big labs do not have incentive to provide, is the big opportunity.
Where are all the highly specialized LLMs? Where are the improved fine-tuning techniques that let us create one for ourselves in quirky fashion? Where are the game and VR experiences that don’t suck? Build something unique that people want to use, that meets what customers need. You know this in other contexts. It is The Way.
Ethan Mollick (from earlier this month) talked about the widespread availability of 3.5-level models as ‘the AI genie is out of the bottle.’
I would say that 3.5-level AI is out of the bottle, and that in 2024 we will presumably see 4-level AI out of the bottle. The important genies will still remain unreleased, and the usual of superior proprietary models should make us worry a lot less.
I do my best not to quote Yann LeCun, but do note he was interviewed in Wired saying much that is not. He also seems to have painted himself into a corner:
There are a few positions one can take. Here is a potential taxonomy. What did I miss?
It sounds like LeCun is trying to live in 1a(ii)? The terrorists with open model weights are not an issue because they need access to a lot of GPUs that no one can detect.
But that means far more monitoring of compute and GPUs and pretty much everything than exists today, in a far more stringent way than trying to prevent model training, with the threshold shrinking over time (why do you need 2k GPUs anyway even now?). What is the plan, sir?
Daniel’s point about ‘Democracy’ is also well put. Either you are willing to put the power of powerful AI into the hands of whoever wants it without controlling what they do with it, allowing them to remove all controls, or you are not. I too thought the whole point of open source and open model weights was that indeed that anyone could do it? That you needed a lot less resources and talent to do things?
The atom blaster points both ways. You either let people have one, or you don’t.
Aligning a Human Level Intelligence is Still Difficult
Have you tried telling AI companies ‘or else?’
I haven’t exactly noticed a torrent of utility coming out of China. He does link us to this paper.
Or, how about we actually say what is wrong with your answer, might be useful?
That is obviously a good idea. It has much higher bandwidth, and avoids a bunch of problems with binary comparisons. The question is how to do it. As usual with Chinese AI, I am skeptical that their benchmark results represent anything.
The implementation to try now seems obvious, if my understanding of the available training affordances is correct, and it does not seem to be what the paper does. As usual, this is a case of ‘I have no idea to what extent this has ever been tried or whether it would work, but it seems wise not to specify regardless.’
Please Speak Directly Into the Microphone
Daniel Faggella speaks directly into the microphone, advocates destruction of all value in the universe as measured by what I (and hopefully you) value.
The Wit and Wisdom of Sam Altman
He writes What I Wish Someone Had Told Me. It is short and good, so here it is in full.
Such statements are Easy Mode talking about Hard Mode, but this is still an impressively good list. Very well executed, and it hits the sweet spot of speaking to his unique experiences versus what generalizes.
I could of course write a long post digging into the details and quibbles. One could likely do one for most of the individual points.
The biggest danger I would say comes from #8. There is a true and important version that often applies in business, where people often follow the procedure rather than doing what would work, and treat this as an excuse for failure. It is hugely important to avoid that. The danger is that the opposite mistake is also easy to make, where you become results oriented. Instead, you need the middle path, where you ask whether you played it right and made good decisions, what changes and improvements to that need to be made, and so on. It is tricky.
Also #12 is true as written but also often used as rhetoric by power and hostile forces, including folks like Altman, to get their way in a situation, so beware.
I’d also note #3 is, even more than the others, shall we say, not reliably true. The important thing is that it can be true, and it is far more true than most people think.
He notes:
This has not been my experience as a writer. I have suspected I have something good. I have known I am going to have something I think is good. That does not predict what others will think, or how well the piece will do. I think that this represents either overconfidence, or only caring about one’s own opinion. But all writers are different.
He offers his look back at 2023. He is master of the understatement.
There was a bunch of talk about the fact that Sam Altman is rich, and that as a rich person he buys things that cost of a lot of money.
These are hints that suggest Sam Altman is not only rich, but that he may have what those in the gambling business call TMM, or Too Much Money.
The real estate seems excessive, but there is high value in having the right venues available when and where you want it. It lets you engineer events, meetings, communities. It gives you convenience and optionality. It matters. If I had billions, I can see spending a few percent on real estate.
The other stuff is more alien to me. I would be plowing that extra money into my fusion companies instead of buying 15 million dollar (admittedly cool) cars and 480 thousand dollar watches, but then again my taste tops out at much cheaper prices. Also I don’t drive and I don’t wear a watch. I do think the car represents good taste and buying an actual great experience. The watch I cannot judge but I find it hard to imagine what makes it substantially better than a $50k watch other than showing people how much you spent.
And yes. It is his money. He gets to spend it on what he thinks are Nice Things. Notice he also does some very effective large investments that I would take over almost all altruism, like his investments in fusion power and medical innovations.
You can of course criticize him for potentially getting everyone killed, but that is a different issue.
Does Altman enjoying nice things actively help a little with AI safety?
My Twitter followers say no.
I on the other hand say yes.
It is important for each of us to find value in the world. To have Something to Protect, and to have hope in the future. To care deeply about preserving what is great. To not feel the need to gamble it all on a game of pitch and toss, when you cannot then start over from the beginning, because no one will be around to not be told about your loss.
Of course, a fast car and an expensive watch are not first best choices for this role. Much better are people that you love. Coworkers and partners and friends help, there were an awful lot of heart emojis (who knows how sincere in various directions), and a spouse he loves is better still. It can be a complement, you want people you care about and for those people to have a future.
Ideal, of course, would be children.
There are some counterarguments around risk preferences or potential value misalignment.
The most common argument for negative impact, however, was essentially ‘this is a person who likes money and spends it, so they must want more money, which is bad’ and that this must be his motive at OpenAI. I think that is wrong, this does not provide substantial evidence that Altman needs more money. He can already afford such things without trouble. There are personal expenses that would strain him, I suppose, but he is showing he has better taste than that.
Finally, he asks about interest rates.
As Altman would no doubt affirm, ideas are cheap. What matters is implementation. Will we be allowed to implement those ideas in ways that deploy or at least generate a lot of capital? Or will AI enable this to happen?
If so, real interest rates should rise, although of course note Cowen’s Third Law, that all propositions about real interest rates are wrong.
In (economic) theory, we can say that expecting transformational AI, or transformational anything, should raise interest rates due to consumption smoothing, and also rise them because most such scenarios increase returns on investment.
I do not however think it is this simple. There are scenarios were capital becomes extremely valuable or necessary as our ability to profit from labor declines and opportunities open up, or people fear (or hope) that it will be so. The new ideas could require remarkably little total capital to implement, or the total amount of deployment available for capital could be small, or small relative to profits generated. Or, of course, things could change dramatically in ways that render these questions invalid before anyone knows what is happening.
I also expect most people to instead execute their existing habits and patterns with little adjustment until things are on top of them. Remember Covid, and people’s inability to adjust prices based on an incoming exponential, or the lack of price adjustments during the Cuban Missile Crisis.
What should you pay in interest to build a data center? Obviously we don’t have enough information. The answer depends on many things, including your confidence in the scenario. Most behaviors are very constrained by the need (or at least strong preference) of many to not fall over dead if one’s assumptions prove wrong.
One cost of borrowing or taking financial risk will always be opportunity cost. If I borrow or gamble now, I cannot borrow or gamble with those resources later while the transaction is outstanding. Always be a good trader, and remember the value of optionality.
A classic question that applies here is, do you expect to have to pay the money back, or care about that given the new situation? What is ‘powerful’ AI? Will data centers even be that valuable, or will the AI come up with superior methods and designs and render them irrelevant? And so on.
We got the Wall Street Journal writing about Sam Altman’s ‘knack for dodging bullets with a little help from bigshot friends.’ Was mostly old info, no important updates.
The Lighter Side
A warning in song.