Is this a setback for ‘AI safety’? In some sense I guess so. In others, not at all. Would we be better off if we could see and modify the code?
I can see the code generated by Code Interpreter. Here is one of my chats about a math problem (impressive performance by GPT-4 with Code Interpreter, giving it "A-" for minor reasoning defects, the code generation and the obtained answers are quite correct):
https://chat.openai.com/share/d1c4c289-9907-46a6-8ac1-f6baa34f1b12
One clicks on "Show work" to see the code, and then one can click "Copy code" (as usual) to take the code elsewhere...
And, yes, I can edit the resulting code in an outside editor and ask GPT-4 with Code Interpreter to execute it.
And it executes it just fine while fixing my bug (missing import math
):
https://chat.openai.com/share/0433de45-779a-41c4-8e0d-ca4cfd38c92e
I tried code interpreter on some of the D&D.Sci challenges here. As expected, it failed miserably at generating any useful insights. It also had some egregious logic errors. I didn't, but should have, expected this.
For example on https://www.lesswrong.com/posts/2uNeYiXMs4aQ2hfx9/d-and-d-sci-5e-return-of-the-league-of-defenders the dataset is three columns of green team comp, three of blue team comp, and a win/loss result. To get an idea of which picks win against the known opponent team, it grabbed all games with that team participating, found the games where the other team won, and did some stats on the other team's comp. Except no, instead, it forgot that it had grabbed games where green was that comp and where blue was that comp, so actually it checked for when blue won and did stats on all of those, aka half the "winning opponent teams" were just the original comp. Its analysis included "maybe just mirror them, seems to work quite well".
Oh you missed our daily chat, do that again and we’re done
That has such a strong Duolingo vibe. Except you will never dare to miss a daily chat with that green owl.
I know, it’s so crazy that we don’t have this, or the ability to translate podcasts into alternative voices, or any neat stuff like that.
I can tell why we don't. IP laws are lagging eons behind the technology. And famous voices usually belong to people with good lawyers who also aren't kin to selling their voices even when asked for permission.
It's not that difficult nowadays technically. But noone wants to be sued to bankruptcy.
Also the reason why non-commercial experiments for this are easy to be found - they are somewhat protected by Fair Use, and that protection goes out of the window the minute it transfers from "curious experiment" to "feature of my app" or the moment you start making money on it, whichever comes first.
Or maybe the right UX for experts is to have an LLM plugin inside Jupyter notebook rather than a notebook inside a chatbot.
Literally one of the products we've just released!
Code Interpreter
Nitpick: some links in this section are messed up (e.g. text for link 1 points to the url 2, link for text 2 points to the url 1)
There is no shortage of people willing to talk about every week as a huge week in AI with tons of amazing new releases and announcements. At first this was usually right, then it mostly stopped being right. This week we had Code Interpreter and Claude 2.0 and x.AI (whatever that actually is) and a bunch of other stuff, while I was still processing OpenAI’s huge announcement of their Superalignment Taskforce. As usual, when there’s lots of great stuff that makes it that much harder to actually find time to play around with it. Things are good and also can be overwhelming.
I will be in Seattle this weekend, arriving Sunday and leaving Tuesday. If you’d like to say hello while I am there, let me know. I know better than to request things get quiet.
Also on a non-AI note, congratulations to Nate Silver on a great run at the World Series of Poker main event, whole tournament has been a blast to watch especially when Rigby is at the table, he’s crazy good.
And on another non-AI note, I’d like to highlight the Roots of Progress Blog-Building Intensive. We need more voices for progress generally, including to ensure AI goes well. Deadline is August 11. I am a big believer in blogging in particular and writing in general, as a way to think and understand the world, and also to share your knowledge with and connect with others, and increase your serendipity factor greatly. You need to be a good philosophical fit, but if you are, this is a great opportunity. The talent you’ll be working with will be top notch.
Table of Contents
Code Interpreter
Code Interpreter now available to all GPT-4 users. Many reports are that it is scary good.
This seems like a dangerous failure of imagination. Giving everyone a script kiddie is a big deal. It seems easy to imagine bigger deals. A lot of other professions are a lot larger as a share of GDP.
Still, yes, impressive.
Jordan Schneider used it to analyze his blog, and was very impressed.
On the other hand:
So in other words, giving most people access to data analysis is bad, most of what people do all day with data is generate misinformation, and also any ‘competent’ data scientists ‘don’t need GPT.’ Seems like a full ‘intelligence and information are bad, actually’ take.
That’s not to say there won’t be something wrong on the internet. Are people who are given easier access to statistical tools suddenly going to say and spread a whole bunch of dumb stuff? I mean, yes, of course, absolutely, why are you even asking that.
Are there going to be a lot more of things like this for a while than we are used to? Again, yes, that seems quite likely. That is the only way everyone involved can hope to learn how to usefully do statistics, and differentiate good from bad. GPT-4 is, as you would expect, very good at explaining if asked that this type of correlation does not imply causation.
Also, if you look at the later two posts in the thread, it’s clear Josh Wolfe knows, as he says “It’s sheer spurious correlation. The union decline is reversing from record low and record number of Americans now support unionizing. Happens to coincidentally coincide with end of generational decline in rates.” The main error here was not statistical at all and has nothing to do with Code Interpreter, it was saving the clarification for a later Tweet in the thread, which is proven to never ever work.
There will also be more subtle errors, as Russ Poldrack observed.
Have you met Microsoft Excel? I kid except that I don’t. People keep saying ‘a human has to verify that the LLM’s output is correct, sometimes it makes mistakes’ as if this is some fatal flaw or big news. We know this, also it is true for other outputs too. Also this:
Arvind has another, better objection as well.
I definitely predict that for most purposes there will be a UX wrapper that works better than the current interface, if you give people time to develop one.
Not letting you edit the code or otherwise customize limits usefulness. It means if you want to go deeper, you often have to start over, and the power user will lose out. That’s a problem if the important users are the power users who would otherwise do tons of work to get things exactly right. Is that the base use case?
Right now, the real problem with most people’s data science is neither lack of bespoke customization or misinformation. The problem is that they aren’t doing data science, or they are doing precious little of it, because it’s annoying to do. A cheap ‘good enough’ option here seems wonderful, even for people like me who can do ‘real’ data science if needed. How often will I want to do that? Not never, but, well, hardly ever.
He then links to Teresa Kubacka’s thread below.
Ethan Mollick notes code interpreter can go through a spreadsheet and perform sentiment analysis on each line. So can an extension that gives you a GPT() function to use in Google Sheets, which is not obviously harder to get working right.
Code interpreter works by assuming your data will look the way it expects. It can recover you automatically from errors that match the errors in the training data. If you intentionally introduce an ‘unnatural’ error that it has no reason to expect, it will likely fail, even if it is human-eye obvious what you did.
Does that make it worse than useless, even for those who do not have much data analysis ability? I strongly say no. This is the place where professionals, who highly value robustness and customizability and not making mistakes, jump to such conclusions too quickly. This is similar to when you cannot get the expert to give you any estimate whatsoever, because they don’t know if it’s 30% or 50%, or a week versus a month whereas you want to know if it’s 1%, 50% or 99%, or a day versus a decade. Many people are in the complete dark. The key is to use such answers knowing the error bars involved.
Is this a setback for ‘AI safety’? In some sense I guess so. In others, not at all. Would we be better off if we could see and modify the code? That certainly has a lot of benefits that seem to exceed the risks and costs. I still understand why the decision got made.
Language Models Offer Mundane Utility
Boost gender diversity in hiring for male-dominated sectors?
Interesting to see the problem solved be called out as the perception of bias more than any actual bias (note that both the perception and reality can be any magnitude in any direction), although there is also the claim that the human evaluation process is unfair to female applicants. An AI application (or for now an AI) is a tool, it will act the way it is programmed to act. There are any number of reasons that might favor women. We have zero way of knowing which set of evaluations is how biased, because we have no objective measure of quality to compare to, either the male or female applications could on average be better than the other.
What we do know is that when you use an AI, you often have to force the results to obey rules, or adhere to statistical distributions, that you would otherwise not have to obey or adhere to. What would otherwise be the sum of human decisions that incorporate plenty of things they aren’t supposed to becomes instead a blameworthy outcome of an algorithm. The AI is held to a different standard.
Grant Slatton checks in on a neuroscience researcher, who notes that ChatGPT and associated tools have been huge productivity boosts to their coding and data analysis already. As I have noted, the worse a coder you are, the bigger your productivity gains from ChatGPT. In my case, things like syntax and locating libraries and knowing API details are huge time sinks I am not good at navigating, and are places where if the AI messes up, you’ll know.
AI translates 5000 year-old cuneiform tablets.
Language Models Don’t Offer Mundane Utility
Caution is certainly appropriate when substituting a computer’s voice for your own. What this highlights is the distinction between those scenarios where the actual details of the content are important and must be genuine, versus those where what matters is that the content exists at all and properly pattern matches, perhaps while containing some actual information. Or where the proper performance is genuine human, versus where it is corporate speak, versus where it is performing class or being superficially polite or formal, and so on.
AI can help you code and write, but there are limits, as this anonymous report claims his small no-name company has ‘completely lost its mind’ via its CEO expecting ChatGPT to be a magic box that can do anything. No doubt there will be many such cases in the coming years, and capitalism will sort the winners from the losers. For now this is worse for the company than ignoring AI entirely. A few years from now, perhaps not.
AI-generated post headlined “A Chronological List of Star Wars Movies & TV Shows” features highly incorrect and incomplete chronology, as well as generally sucking. Needless to say, comments were closed off. Editor of Gizmodo, where it appeared, got 10 minutes warning this was going online. Experiment is going great.
Math skills need work if you’re not using plug-ins, although they are improving.
Via Robin Hanson, a gated article claiming that there will be little mundane utility on offer let alone something more.
Ah yes, it pattern matches to the dot com bubble, that proved that the internet won’t be a major game changer on par with the internet.
A mystery.
I know, it’s so crazy that we don’t have this, or the ability to translate podcasts into alternative voices, or any neat stuff like that. I do not know how much I would pay for this. I do know it definitely is not zero. It is less if the person has to opt in so you only have a fixed list, but there are so many choices I would pay for on their own. Starting, of course, with Morgan Freeman.
Reports of Pi having a normal one, asking follow-ups then not doing the job. What is Pi supposed to be good for?
Deepfaketown and Botpocalypse Soon
Patrick McKenzie reports not only giving but also getting the ‘do not trust that the voice is me on the phone and do not send money because of it’ talk that everyone now needs to have with their parents (and yes this includes you). When in doubt, let the police handle it, doing nothing is usually the best move anyway.
This is also the best move from a society-wide perspective. The way kidnappers and other extortionists and fraudsters typically work is that they tell you ‘no cops’ and to otherwise keep everyone else in the dark. If we use the possibility of deepfakes as justification to always involve the authorities in such situations, then even real kidnappings and traditional extortions and frauds get a lot less attractive.
The Art of the Super Prompt
Katie Herzog asks for some race-based joke headlines about opposition to affirmative action, and GPT-4 stands with Harvard in saying it’s not racist if you target Asians. Latinos also get joke headlines when she insists, white and black get a no. I’m not sure exactly what the right term is to describe the output, but ‘pure stereotyping’ and ‘in quite poor taste’ and ‘oh my should they be happy it didn’t produce the black one’ are certainly appropriate.
New jailbreak for code interpreter, yo.
They Took Our Jobs
Journalist explains that the job of a journalist reporting on AI is to ‘hold the companies developing AI accountable’ and avoid falling for their hyperbole. Then explains that the people you should trust are existing domain experts. They can be trusted. They will tell you that what the company claims to do is impossible or too difficult. That AIs can never succeed outside of their training distributions.
It is also always a good time to mention that the people labeling that training data were not American and might not have gotten the prevailing American wage, which means they were exploited, or that the company might not have gotten consent to use the data. That’s the ‘documented downside.’ So is looking for biases like this:
Ask how this can be both true and the strongest claim that could be made here.
Always focus on the present, never think about what AI might do in the future:
This is what journalism increasingly means with respect to tech. Your job is either to do a fluff piece or a hit piece. Proper Journalism means hit piece, where you point out the standard reasons why tech is bad. Or you can buy the claims of the company and report them uncritically, if you’re some sort of hack. Then they wonder what is going so wrong in their profession and where all their jobs went.
I saw this clip and appreciate the music and visuals but don’t understand why anything involved is AI based? Lasers are cool, automated weapon systems are scary, this looks more like firing lots of lasers all around on a simple algorithm?
It is to Hollywood’s credit that they are not pretending that the AI future isn’t coming.
As Scott Lincicome informs us, “Chipotle tests robot to prepare avocados for guacamole.” They named it the… Autocado. I am with Scott in hoping for its success for that reason alone. Also these are very much the jobs we want to be automating.
The claims here are pretty weird taken together (draft of paper).
This reasonably posits that AI can be a substitute for general knowledge within teams, which increases available team slots and ability to muddle through in areas you don’t know well, and thus allows for greater specialization. That makes sense. If you have a ‘good enough’ generalist available at all times, then you can get higher returns with more specialization.
The model then claims to find higher returns to fully general specialization. The argument is (in English) that as each generalist can coordinate more different people and thus specializations at once, due to shifting their low-level work onto AIs, they will want to shift their knowledge to be more fully general.
Thus, you get a bifurcation. Either you want to go super deep into a narrow area, where you can still be superior to the AI. Or you want to be fully general, to coordinate the specialists, or (unstated) so you can synthesize the AI capabilities from various fields yourself. The number of generalists (and supervisors) declines, and the ones that remain are higher skilled, so the competition there would be intense.
Not present is the question of whether people will respond by using smaller teams, and whether those smaller teams will gain a lot in productivity by virtue of being smaller. I suspect this is the real story here. If you can have a five person team instead of an eight person team, or two instead of four, you could use that room to add more people, or you could instead move faster and cheaper and with lower coordination costs. The moves from 3→2 and even more so 2→1, and going from one supervisor to not needing one, in particular are super valuable.
Note the common economist limiting presumptions about AI are also present, as illustrated by here saying ‘not ever’ rather than ‘would happen comparatively later’:
I am imploring economists generally to actually think about what future more capable AIs will be able to do, rather than assuming capabilities will cap out. Also to stop making the presumption that AI will be incapable of or inferior in such ‘soft’ skills. Remember that when GPT-4 was tried out as a doctor its highest marks were for bedside manner.
Claude 2.0
New model, who dis?
From the model card:
Data cutoff looks like early 2023. You can’t browse the web, but that’s a big step up from 2021 in terms of practical use.
ARC evaluations were done and did not detect replication capabilities. Focus of the various alignment tests seemed to be on mundane harms, with the ARC replication test being the exception. This is still useful more generally, for example it meant checking on jailbreak techniques, but we will need to improve here.
They used constitutional AI as a core part of their alignment strategy here. I expect that the details of their implementation of that are a lot of why Claude exhibits the harmlessness-related overboard behaviors we see, but that’s a much bigger topic.
The model was rated as 46 Elo points higher on helpfulness than Claude 1.3, and 35 Elo higher on honesty, with only -4 points lost in harmlessness. I worry about overreliance on Elo but at this high level it should be a good enough scale.
Bias scores (bias as in stereotypes rather than logical fallacies, low is less bias, which is good) are modestly improved.
Disambiguation scores in these contexts are lower, they speculate due to refusal to answer questions. It would be interesting to make this measurement while (somehow) forcing the model to answer such fact questions, to disambiguate refusal to answer (which is still, in context, a bug) with potentially decreased accuracy.
How did the model do on TruthfulQA?
Did we do non-automated evaluation of the other 324 prompts? If there were three false positives, can we assume zero false negatives?
I am not a fan of automated evaluations of harmfulness, or of anything else, unless it is at least subjected to systematic spot checks, and also evidence is provided for why we can rely on such evaluations. That is exactly the kind of hole that gets you killed.
Translation quality seems unchanged from v1.3.
Context window is launching at 100k tokens, the model was trained to go to 200k. Performance over longer contexts was as expected by power laws.
GRE results: 95th percentile verbal, 42nd percentile quantitative reasoning, 91st percentile analytical writing.
Bar results: 76.5%, a passing grade.
US Medical Licensing Examination (USMLE): 68.9% step 1, 63.3% step 2, 67.2% step 3, with tables transcribed for it and images deleted. Passing is roughly 60%.
I have not yet had time to try Claude 2.0. From what I see here, it is a modest improvement over Claude 1.3, with substantial improvements in coding ability that leave it still behind GPT-4 there. If I had to spitball, this seems like somewhere between 0.1 and 0.2 GPTs of improvement over Claude 1.3.
How easy is it to jailbreak Claude? Michael Trazzi asked after he failed. Peli GGrietzer and Coskaiy respond reporting success. This thread from AI Panic has the details.
[thread has a bunch of examples of Claude being highly jailbroken.]
This is in no way a surprise. As the system card notes, no LLM yet has been immune to jailbreak, and one was already known to sometimes work.
Yes, the jailbreak that ended up being found first was rather simple, and thus stupid looking one to have missed. But of course that was always the most likely outcome. The simple things get checked first and we see the one that worked. The mistake that is found will usually make you look stupid, no matter how many other non-mistakes would have made you look smart.
Given how LLMs work, it is not reasonable to hold Anthropic to this impossible standard. The ARC investigation passed because they concluded that the inevitable jailbreak would be insufficiently dangerous this time, not because there was any hope of avoiding one. Except that, in order to make an actually dangerous future system safe, it would need to match exactly that impossible standard.
That does not mean that preventing more marginal jailbreaks is not worth doing. It means that we don’t have a path to a solution by doing that. If your plan is to continuously hammer down the next jailbreak that comes to your attention, there will always be a next jailbreak waiting to come to your attention.
Introducing
x.AI, from Elon Musk and advised promisingly by Dan Hendrycks, that says its goal is ‘to understand the nature of the universe.’ Team member Greg Yang says it is ‘Math for AI and AI for math.’ Beyond that, what is it? No one knows.
Hopefully we will know more soon and I will cover x.AI for real next week.
Threads, of course, is the new Twitter alternative from Meta, which uses your Instagram handle, and will use all data across their various platforms to sell you things. It currently has only an algorithmic feed heavy with brands and celebrities, is heavy on censorship, is planning to be unfriendly to journalists.
As you can imagine, I think this is a horrible, no good, very bad product.
I am also asking you, even if you disagree, please do not use Threads. Threads is owned by Meta. When you use Threads, you provide Meta with proprietary data to feed to its LLMs, which it will then proceed to open source without having done almost any alignment work or safety checks at all. Facebook AI is a menace that could kill us all.
Don’t let them. The least we can each do is not feed that beast.
MineTester, a fully open source stack of tools to do ML within Minecraft, ostensibly for purposes of alignment and interpretability. The line between capabilities and alignment here seems if anything even thinner than usual.
Reka raises $58 million for generative AI models ‘for the benefit of humanity, organizations and enterprises’ including ‘self-improving AI.’ Great.
Mistral raises $113 million for generative AI models, their differentiator will be doing the worst possible thing by releasing some of them open source.
It definitely seems that if you say the words ‘generative AI models’ and have a credible team investors will throw money at you. I presume that mostly this will not go great for those investors, as such companies will remain a step behind. Which I very much hope they do remain.
In Other AI News
Boston Globe profile of Dan Hendrycks, whose organization CAIS was behind the one-sentence open letter warning of extinction risk from AI, and his efforts to mitigate that risk. Unusually good overall, although without much insight for my readers. There is some unfortunate framing, but that’s journalism. He’s now advising whatever x.AI is.
Insufficiently worried the training run will kill you? We can help, here is Reinforcement Learning from Unit Test Feedback. I mean, yes, obviously, if you want to teach the AI to code you want to reward working code, ensuring the building of code designed to pass exactly the kinds of unit tests you use in training, which hopefully can be made close enough to what you want.
Partly gates claims about GPT-4 architecture via MR. Seems highly overconfident across the board, including about the past not only the future. Likely is hiding the real value behind the gate.
‘First classified Senate briefing on artificial intelligence will take place in a sensitive compartmented information facility (SCIF).’
From DeepMind a few weeks ago, I think I missed it: GKD: Generalized Knowledge Distillation for Auto-regressive Sequence Models (arXiv). I have no idea why DeepMind is continuing to publish such results. There’s the speeding up capabilities concern, more than that there is competitive advantage. If you can do better distillation, why would you give that away? How is the answer still ‘to publish’?
Quiet Speculations
Bill Gates writes ‘The Risks of AI Are Real But Manageable.’ This is about the mundane but still transformative risks of sub-human AI, as he says explicitly:
I share his attitude towards such mundane risks, that they are real but we will be able to deal with them as they come. He goes through the usual suspects: Deepfakes, election interference, they took our jobs, hallucinations, biases and the risk that students may not learn how to write. In each case, he gives a well-considered and detailed case for optimism that I endorse.
What is important to note is that Gates, who signed the letter warning of extinction risks, is talking only about AIs below a critical threshold of capability.
How will AI affect real interest rates, asks Tyler Cowen. He expresses skepticism of the story that a productivity boost will increase returns to capital, asking if Gutenberg was a billionaire, does think relative prices will have higher variance and this will make managing the economy more difficult for the Fed.
I explored a version of this claim back in January in On AI and Interest Rates. Tyler was highlighting a claim that AI would raise interest rates, as a challenge to the claim that AI would even do anything economically important, and as supposed proof that no one really believes in extinction risk since their portfolios don’t reflect it. Whereas those of us who predict impact with or without doom are doing very well with our tech stocks, thank you.
I know Tyler believes that AI will be transformative in many ways and is super excited by it, even if he declines to notice many bigger implications down the line (yes potential doom, also many others.) Yet suddenly he is coming out against exactly the thesis that this will raise interest rates, or by implication that those developing AI will make money. Curious.
My prediction is this comes down to how much capital can be deployed. AI spending is not going to be a big enough share of overall spending to transform interest rates before it transforms everything else first. The question is whether AI will enable large scale deployment of capital to invest in things that are not directly AI, such that it truly impacts the demand for money and thus real returns. That then comes down to whether or not we will allow it.
Even more skepticism as Goldman Sachs predicts AI adaption will add 0.5% per year to S&P 500 growth over the next 20 years. Nothing to see here, move along people.
New paper on regulating transformative technologies via Tyler Cowen, here is the abstract.
That abstract is indeed very clear. I do not need to read the paper to agree that if you accept its assumptions, its conclusion seems very right. The new technology has large negative externalities in the form of disaster risk, so you need to tax that risk in some appropriate fashion, which given optimal tax sizing is a de facto ban if an action poses a substantial existential risk. I would also consider the positive externalities that are possible, when considering adaptations that do not carry existential risk, the assumption that any externality effects must be negative is common and absurd across such calculation in many areas.
The problem is look at the assumptions. Potential social damages are not going to be proportional to productivity gains. More importantly, ‘disaster’ here is of a very particular sort.
This takes the disaster as an unknown but determined variable, rather than a probability that depends on decisions made, which is explicit here:
I do not see why one should model disaster as having a distribution over time T, rather than having that be a function at least of use of the technology. In the case of existential risk from AI, a better model is that the level of foundational technological development determines disaster risk. Deploying only to some sectors gets you maximum risk for less benefit. Also slowing development decreases risk at a given tech level or at a given time T.
Whereas here you see their model is something very different:
This is an extremely poor model of the existential risks involved, which if realized will not be limited by sector or much care about which sectors have adapted the new technology, except perhaps to see if a threshold has been crossed.
That makes sense if disaster worlds punish early adaptors, or do damage in proportion to them, as in the sectors-cannot-switch-back model. It does not make sense as a model of existential risks from AI. I do see how ‘hard to switch back’ could easily be a real (mundane) problem for a sector, but I would be skeptical that this alone would do much to justify slowing down deployment. In most cases where it could not be undone, the only way out would be through. Also I would expect such mundane disasters to be proportional and localized.
It is still good to see a paper like this, because this is the internet. The best way to get a right answer is to post a wrong one. Well done. Already you can see them tricking me into laying out my alternative modeling assumptions.
Tyler then suggests reasons that could strengthen the case for accelerationism in such a model, such as national rivalry, adaptation facilitating learning (why would one think it would primarily do this rather than increase risk? Important question) and chance of regulation taking a form that backfires. I did appreciate the note that open source is an especially dangerous way to do AI, and we should be wary of it, on that we strongly agree. And of course we agree that reducing disaster risk is the ultimate goal.
AIs making their own bitcoin payments is already here, in case you were worried you might not lose your Bitcoins you can now delegate that task. Could AI use of crypto cause Bitcoin $750k? I mean, sure, why not, any number is possible, there is only supply and demand and you can do whatever exponential math you like, you have to admit that people taking things like the graphic below seriously is pretty funny:
Do those numbers have anything to do with anything? No, of course they do not, since when did that stop anyone discussing crypto? What I find most interesting is the common assumption that the crypto in question looks to be Bitcoin. If things need to ‘happen at the speed of AI’ then isn’t that a rather lousy choice of cryptocurrency? There are a lot of other protocols that cost less compute and are faster and also don’t have such fundamental bottlenecks. The advantage of Bitcoin is that it is the original crypto, why should this be important here? Not saying it’s not, more noticing my deep uncertainty.
Patrick Staples has a good question about the paper introducing the One Billion Token window via dilated attention. Why publish the paper? Even if you are not worried about accelerating AI capabilities, your discovery is commercially valuable. If you give it away, that’s gone. Daniel Eth responds that the race dynamics are often about short-term social status – or, I would note, even simply ‘I need to publish’ – rather than always being about money. Can we please buy such people off somehow, anyway? I get how it is hard to stop a race that is the inevitable result of capitalistic incentives. If it’s for social status or to publish papers, that’s just embarrassing.
The Quest for Sane Regulations
DeepMind paper announced here looks into International Institutions for Advanced AI, both to help underserved communities and to mitigate risks.
My overall take is that the paper is well-reasoned and good as far as it goes, its weakness being that it sidesteps the hardest questions rather than confronting them.
From their announcement, a more readable version of the abstract:
I read that as four distinct tasks: Gather and disperse information to orient and form consensus, mitigate harms and risks (both mundane and existential), promote mundane utility, and solve alignment.
It is easy to support commissions that gather information and build consensus, the classic way to do something without, well, doing anything. More efforts to study the problems seriously seems like a slam dunk. The objections would be that this would inevitably lead the way to a regulatory response, which can reasonably be thought of as defaulting to disaster in most times and places, and that drawing such attention to the problem might be accelerationist as idiot disaster monkeys at state departments and militaries and even economists reach for the poisoned banana.
It is even easier in principle to get behind a collaborative effort to distribute mundane utility widely around the world. It is important not to be the type of person who would say ‘that is a distraction’ as a reason to oppose it. The concern is simply whether governments can help here or if the market plus open source resources will do the job on their own. My guess is the latter. If you want to help the less fortunate get access to AI, that seems mostly like a question of helping them with phones, computers and internet access. Helping with such infrastructure seems good. Or better yet in most cases, of course, give those people money, and let them decide what they would benefit from most.
I see the emphasis on this in the paper as a framing device to gather more support, and am fine with it as such. They also discuss the potential need to develop new frontier models in order to better serve the needs of such people. I don’t see why this is necessary or a good idea, or why other intervention is needed to get it. I also worry about an international organization saying that AI systems currently have ‘the wrong values.’ The good news is that such efforts need not drive capabilities in any dangerous way. So long as there remain private offerings from the market, it should be mostly harmless. They also mention supporting the development of a local commercial ecosystem to ensure benefits get widely enjoyed. I am all for local commercial ecosystems but that seems well beyond scope, and also a place where the proper government action is usually ‘get out of the way.’
It is easier still to get behind a joint AI Safety Project to give researchers the resources they need. Implementation still matters, as government rules can easily drown everyone involved in red tape and steer them towards non-useful work, or even worse towards what becomes capabilities work. It is not so easy to get net positive AI Safety by spending money. I remain quite eager to try, while noting that I care far more about implementation details and putting things in the right hands with the right procedures and objectives, and far less about getting a large budget. They suggest modeling this after other large-scale collaborations like ITER and CERN.
The AI Governance Organization is the big question. It has the highest stakes, the most necessary role to play, the most difficulties and the most downside costs and risks.
A table from the paper, not feeling great about seeing ‘develop frontier AI’ here as this should not be necessary to ensure the benefits of AI are widely shared:
This clarifies where their heads might be at, they see hope in controlling inputs to AIs:
Controlling access to what systems we do have is presumably part of any plan to limit such access. We would also need to limit who gets to build one, and as they note ensure anything built goes through proper checks.
What I did not see explicitly discussed is the possibility that there might be limits beyond which it is not safe to allow anyone to train a new system, no matter what technical checks you might do along the way, but wee do have this very good passage:
And of course:
Helping facilitate this style of research is one area where I think government performs relatively well.
An underappreciated point in the paper is that harmonized international standards are very good for those developing new technologies. A shared regulatory framework is good for mundane utility and perhaps for capabilities development as well by nature of being consistent and known, while also providing the ability to ensure safety or other goals if chosen wisely, and monitoring compliance.
Bayesian Investor Blog updates favorably on requiring Foom Liability for AGI. Notes correctly that full such liability is a de facto total ban, which I would call ‘policy working as designed, if you can’t profitably insure it then it wasn’t actually profitable.’ Instead he suggests more limited liability, on order of $10 billion, which does not seem like it properly corrects the incentives, but is far better than nothing.
Also, UK has an AI minister. He has a Politico profile. He is focused on monitoring and international cooperation, including with China.
The Week in Audio
Eliezer Yudkowsky’s 6-minute TED Talk. Excellent distillation and introduction. Nothing new here for veterans.
I went on Cognitive Revolution to talk to Nathan Lebenz. Was a fun conversation, with standard caveat that a lot of it is stuff I’ve said here already.
80,000 Hours has Markus Anderljung on how to regulate cutting edge AI models. I haven’t had opportunity to listen yet.
A remarkably good clip from Yann LeCun in his debate – current AI systems (LLMs) are inherently unsafe, if they were scaled to human level they would be unsafe. He is not worried because he thinks LLMs will not scale and we will abandon them. I see most of that as reasonable, except that he loses me when he thinks we will soon abandon them – although if they don’t scale, not abandoning them won’t be too bad.
He then says he has a potential solution for safe AI (that it sounds like is for a different type of AI system?) but he doesn’t know it will work because he hasn’t tested it. That’s great, both that he has the idea and he knows he doesn’t know if it will work yet. What is it? Why would we keep that a secret, while open sourcing Llama?
Verge interviews Demis Hassabis, CEO of DeepMind. Interesting throughout, mostly in terms of a synthesis of flow that is hard to quote from. The media training shows, for better and for worse. Demis also appeared on The Ezra Klein Show, which I have not yet listened to but is top of my queue.
The most quotable section of the Verge interview is his note about singing the letter warning about extinction risk.
This is a good start, although not fully the answer one would hope for here in terms of optionality, essentially dismissing that there are so many useful things that can be done any time soon. Evaluations and benchmarks are good but seem far from sufficient on their own. It is good that Demis thinks AGI is not so close, although he also speculates it could easily come within 10 years, which seems to contradict his statement here.
Also noteworthy is that the question about regulatory frameworks had China as part of the cooperative group rather than the enemy, and Demis did not bat an eye.
RFK Jr. goes on Lex Fridman, makes clear he believes AI is going to kill us and says stopping this via international coordination is more important than our conflicts with countries like Iran, China and Russia. Then folds in bioweapons, says only crazy people continue research with existential risk, and if we don’t end it we are on the road to perdition. Brings it up with no prompting, if he’s faking his emotion here he’s impressively good.
I don’t believe he mentioned this on Joe Rogan, where he was instead focused on, shall we say, other concerns and arguments, in rather bad faith, where he is at best highly wrong, before the much bigger audience, in a way that did real harm and which would taint anything else he endorses. Plus he was a key figure in shutting down New York’s Indian Point nuclear plant. All in all a real shame.
And yet, maybe one could say none of that matters?
This might be the true revealed preference. Is this issue so important that it matters more than the very long list of other things that have gone horribly wrong here? What would one be willing to sacrifice on this alter?
I notice my answer is firmly no. He got into this mess largely because his epistemics are deeply in error (to say it charitably) across the board and he does not care to fix this. It is great that his process led him to this particular correct answer, but to navigate this mess, and many other messes too, we need good epistemics that will adapt as the world transforms. He is not that. He also seems like a ticket to getting such concerns grouped in with a bunch of other concerns that are correctly dismissed and that are often argued for in bad faith or via poor methods.
The whole thing scores points for the Tyler Cowen position that we need to seek scientific legitimacy, no matter how unwelcoming or painfully slow or unfair or simply inappropriate to the nature of the issue the playing fields might be, to avoid ending up grouped with other such causes.
I wrote most of a piece about RFK Jr and the debate over when one should debate and otherwise seek truth, especially on issues with an ‘established right answer’ of sorts. I had decided to shelve it as too much ‘something is wrong on the internet,’ especially since this man is very much never getting anywhere near the nomination of the Democratic party or the presidency. I am curious if people think I should finish and post that?
Loading…
Jaan Tallinn on Logan Bartlett show. It seems worth listening to at least one extended Jaan Tallinn podcast at some point.
Marc Andreessen went on EconTalk, more of the same from Marc, I did not listen.
What Would Make You Update on the Major Labs?
Last week Julian Hazell asked: What would be something OAI could feasibly do that would be a positive update for you? Something with moderate-to-significant magnitude
Connor Leahy took a few days to think about it. Here is his reply, with eight overlapping things that OpenAI or another lab could do to set an overall agenda.
I would summarize this as:
Even a subset of these would certainly be a sufficiently costly and well-considered signal that I would be highly convinced of their intent. I continue to lean towards it being unwise to try and limit the use of autonomous agents on systems at GPT-4 level, for reasons I explained in On AutoGPT, that doing so only extends an ‘agent overhang’ that is impossible to stop from being closed later. I believe this is because Connor and Conjecture overall view such systems as already dangerous, in ways that I and most others do not.
I would be willing to update highly positively for considerably less radical changes than Connor’s list.
Here are some lesser asks:
I did indeed update on OpenAI based on Sam’s related statements. I do think there is a big difference between Sam Altman’s measured statements and what Eliezer Yudkowsky would like to see. Logically, Sam’s statements suggest the same conclusions. In practice, they do not lead to the same updates by others.
For Anthropic in particular I would need Dario Amodei to get more vocal and explicit about extinction risks in public, and to articulate a call for serious restrictions on frontier models or other measures that would have hopes of working, rather than focusing on being quiet and incremental in a way that is impossible to observe.
For OpenAI in particular I would love to see a pivot on alignment strategy or more good engagement on related questions, and more detailed and quantifiable statements on what exactly would get them to stop pushing capabilities in the future. Perhaps even mor than that, I would like to see a loud commitment to building a culture of safety and ideally also an endorsement of the need for security mindset. It is difficult to change corporate culture once this type of mistake is made, but it must be done and the first step is admitting you have a problem.
For DeepMind in particular I would certainly like to see them stop publishing capabilities work, and see them stop pulling their punches in their statements and position papers. I’d also like to be reassured that Demis Hassabis is meaningfully in charge in case hard choices become necessary. And I’d love to be more convincing to head of alignment Rohin Shah, who is up for a remarkably high degree of engagement, I owe him a response but getting it right is super hard (as you would expect) and keep getting distracted rather than finishing it – I’d love to be confident they had a good understanding of the problem space.
Rhetorical Innovation
Guardian commissions article ‘five ways AI might destroy the world.’ I have accepted that people need the idea that the AI will be capable of this hammered into them repeatedly, it still seems so frustrating that this is somehow necessary. Between three and four respondents – Tegmark, Yudkowsky, Corta and to partly Bengio – understood the assignment and make reasonable cases. I see merit in all of them, and also in showing the many different ways one might approach the question. If I have to pick one answer, I find Tegmark’s the best rhetorical strategy here.
Then there is Brittany Smith, who either does not understand what ‘kill everyone’ or ‘existential risk’ means, or is simply choosing to ignore our reality and concerns and substitute her own, which is to focus on such dangers as housing discrimination and biased detection of welfare fraud. Given the title, one wonders what exactly the editors of this piece were doing when they read her response and thought ‘yep, that seems like a way AI might destroy the world.’
Ajeya Cotra thread gives four reasons LLM progress may pose extreme risks soon: Progress is unpredictable, progress is fast and could become faster, LLMs are black boxes we don’t understand and there is no clear power or impact ceiling. Jeremy Howard responds that all of these premises appear to have been true of machine learning since the 1960s, so what has changed? Ajeya says historically there were large capability gaps to point to that could reassure us, Jeremy says that hasn’t changed either, there were always claims it was coming Real Soon Now and also there are still barriers. While I think Ajeya’s conclusion is correct I don’t think that is a satisfying answer here.
The Free Press reprints Andreessen’s essay, then somehow chooses an anti-AI response from Paul Kingsnorth that might be worse. I list it as an example of how not to talk about AI.
I do not understand why Garrett Jones continues to think the fate of the Global South is a reason to be optimistic about AI alignment outcomes.
Yes. If you think like an economist here, something I am usually inclined to do, the question is what are the requirements for the dynamics evoked by Garrett Jones to hold in the AGI case. This requires some combination of ‘exogenous niceness’ towards humans from AGIs, which, I mean, come on, no way, you’re kidding, also that would additionally require social enforcement mechanisms, or alternatively that physical resistance by humans could make extinction more expensive than it is worth, in which case I don’t know where to start (as in, there are almost too many conceptual errors involved in getting there to count and I don’t know which one is useful to now point out).
I also continue to not much care whether the extinction is ‘everyone is actively killed’ versus ‘everyone slowly starves’ versus ‘we get to live but have no grandchildren.’ My discount rate for value in the universe is not that high, what I care about is the future.
Call and response is a fun Twitter format, but does it actually help convince?
The ‘intelligence is not a real thing’ (or not valuable, or not important) argument for not worrying about intelligence continues to make zero sense to me, especially when they also think capabilities and optimization power are also not things. It is sufficiently puzzling that I cannot identify the actual disagreement in a way that would allow me to address it.
Not about AI, yet also about AI:
When people imagine the AI as their opponent or simply as an agent out in the world doing whatever, they often do not even ascribe it ordinary charismatic leader levels of charisma, because in their minds that would mean it was the hero, or they think that it is a machine therefore it is obviously not charismatic (or persuasive), or they’re thinking about all the AIs in various media. No. An AGI on anything like the current paradigm would be highly charismatic, if that seemed like a thing it would be rewarded for being.
Is it good to point out examples of the extremely poor level of discourse among the unworried, such as this example labeling opponents as ‘Promotors of AI Panic’ as business or marketing, including Google and Microsoft, with no justifications given of any kind?
I don’t know. Certainly there are plenty of bad discourse moves to go around. There also is quite the imbalance.
Loading…
No One Would Be So Stupid As To
I want to be clear that I’m not saying Kevin Fischer is doing anything dangerous or bad or stupid by continuing to highlight his work here, merely that he is inadvertently warning us about the future.
Seen recently:
Yes, we should ensure entry takes longer, and that if you get hacked or lose your phone you also cannot enter your house. Don’t worry, the AI is only on the internet.
Aligning a Smarter Than Human Intelligence is Difficult
Eric Drexler looks back after four years on his ‘Reframing Superintelligence.’ He says that the old post is a good fit for today’s LLM-based technologies. I came away unconvinced. In particular I believe the updates regarding agency here are not sufficient. Most of the other disagreements present reflect the same disagreements from four years ago remaining unresolved.
Are you trying to do things safely? Or are you trying to prove you spent a lot of time and resources on things you labeled safety procedures?
Alignment taxes are a cost. You want to minimize the tax while getting the safety, not maximize the taxes paid in the hopes that safety magically follows.
People Are Worried About AI Killing Everyone
Anthropic employees are worried, reports New York Times.
Which section does this one even belong under?
I wish I could say most other suggestions were not this nonsensical. Except this isn’t true. Even if you think you have found the exception, you likely agree in general. Notice then that people vary a lot in which exception they think they have found.
Richard Ngo clarifies he is mostly optimistic, while worrying about a handful of particular scenarios, in a way that makes it clear where we disagree.
I think that for mundane utility this is largely spot on. For extinction risks, and for the dangers of AGI, it isn’t, because the robustness breaks as do the mechanisms that allow people to pay for safety as they get richer. The robustness relies on us being the most powerful optimizers present, so if something goes wrong we can address it.
The safety efforts rely on people choosing to spend on safety, which for individual actors or even largely individual labs or nations likely does not work with respect to AGI because almost the entire risk is an externality rather than internalized mundane harms, and one that has to be realized before the damage is done. Someone needs to have the incentive to pay and also the awareness of the need to pay and also the ability to pay in a way that turns that money into actual safety. The awareness, coordination and execution required are possible, but I am not optimistic.
This is in sharp contrast to self-driving cars, and our other actions against mundane harms, that often like in the case of self-driving cars are largely imaginary. If only we could somehow flip how such things are viewed. Any ideas?
Other People Are Not As Worried About AI Killing Everyone
Since people will ask about it: Tetlock’s report on forecasting existential risk probabilities finally got released. I went into detail on this earlier. The reason the ‘superforecasters’ here (reports are that standards for this were not so high) failed to get reasonable answers is that they spent very little time thinking about existential risks, especially from AI, and also had little incentive, since you can never be punished for an incorrect prediction being too low.
Thinking ahead is optional, I suppose.
Fast.Ai has a frankly embarrassingly poor post entitled ‘AI Safety and the Age of Dislightenment’ that says that any regulations imposed on AI would backfire, unless we first build systems far smarter than us, spread them far and wide and see what happens. The post starts by silently ignoring the actual existential risks or other concerns involved with zero justifications, with all downsides of AI use restricted to ‘human misuse’ that can be solved with ‘defense’ via more AI. It treats enforcement as impossible yet the impact of restrictions as so centralizing of power that it threatens to destroy the enlightenment, and measures ‘harm’ as how not open is access to AI systems. To those whose mood-affiliated links compel me to look at such drek, you are killing me, I am begging you to do better.
Connor Leahy responded more positively, thanking Jeremy Howard for the engagement and focusing on the problem of offense-defense asymmetry. I do appreciate that a lot of real effort has been invested here. Still.
Washington Post’s Nitasha Tiku joins the list of journalists one should definitely not give the time of day if they ask, offers deeply disingenuous hit piece about the whole concept that someone might be concerned that AI would kill everyone. Frames the effort as fundamentally a war against other concerns, including other extinction risks and even the other EA causes, and as motivated by ‘science fiction.’ Which is then a movement where billionaires go around ‘recruiting’ or buying kids on college campuses. One nice touch is attacking EA’s reputation via attacking as bad EA’s attempts to avoid reputational risk, while also portraying non-AI-EA as both a weapon with which to perform a reputational attack by association, and also as a kind of victim of stolen attention from its more liberal-friendly causes.
Other touches were, shall we say, less nice, with one being a clear case of knowing exactly where the edge is of how misleading (and also irrelevant) you can be while performing a pure ad hominem without breaking the rules. In some places, one can almost admire the craftmanship on display.
The post does correctly at one point state that all three major labs were founded in order to mitigate AI extinction risk, but treats this (implicitly but very clearly) as obviously bad actually in both directions.
As always, the correct response is not to draw lots of attention to such work. It is instead to notice that this is the way of the profession, and remember that when you see similar writing about others, and also to notice who is passing such things along uncritically or praising them.
Jason Crawford thread version of his post, Will AI Inevitably Seek Power? He does a good job of explaining in layman’s terms why the AI will inevitably seek power, then claims that it won’t. His justification boils down to this requiring ‘extreme’ optimization, and that’s weird and not to be expected, or easily avoided. There is a sudden turn from explaining why to demands for direct evidence and declaration of the burden of proof. In the case of GPT-4, he rejects evidence of power seeking behavior because those behaviors failed. If that is the standard of evidence, then only a successfully powerful AI will do. Ut oh.
Another argument raised is that such systems will understand concepts like return on investment (ROI). I agree, which is why they will mostly only do power seeking when it has positive ROI, which requires sufficient affordances and so is mostly not yet true. Once such activities turn that corner, things change.
Overall, I’d say: As not only someone who has thought a lot about the problem but also a former professional game player often guilty of ‘extreme optimziations,’ I assure Crawford that he is mistaken. ‘Extreme’ optimization is the default if you reward and enable it, such that it does better than ‘chill’ optimization, and we already see this in humans. Delivering the coffee does not come with a security detail or world takeover because it would not currently maximize results to do so, not because it is too absurd and therefore couldn’t happen, if the cases where the coffee is sufficiently important we can compare predictions and you know what mine is.
Here was an interesting exchange after:
What Rob is saying is, all you are doing by maximizing ROI is changing the optimization target from Value(outcome) = P(button press) → something like Value(outcome) = P(button press) + F(remaining funds). Which, if you are trying to avoid extreme optimizations, is the opposite of helping.
I do think this makes the case that we have not made the strongest possible or most scientifically formal case yet for power seeking, and this is worth doing, as it can often be a sticking point.
Roon calls out people worried about GPT-5 based on previous in-hindsight-silly-looking worries about GPT-2 and GPT-3.
I believe this is a fundamental misunderstanding of what happened so far.
That is an odd framing that I would actually fully agree with. Any given attempt to create AGI probably fails to do so, and there is a good chance all of them fail for a relatively long time, perhaps forever. But the implication here is that the most likely outcome if you do succeed is doom. Perhaps because that’s a second distinct miracle, where you also probably fail. Which, again, I agree with.
Let’s say that Swift On Security is not worried. Even when we get what could plausibly have been good critics, oh man do we need better critics.
The Lighter Side
This is our AI Czar and also our vice president (41 second clip), here’s how she starts:
To be fair to Harris, the intended audience does limit her affordances here, and nothing she says is technically false. What would be the best thing to say here?