Hello Zvi,
I don't agree with everything you on every point, but I find your writing to be extremely high-quality and informative. Keep up the great work
[What declining aging populations aren't] is protection against potential existential threats
Technically, they can be. Strictly speaking, "an existential threat" literally means "a threat to the existence of [something]", with the "something" not necessarily being humanity. Thus, making a claim like "declining population will save us from the existential threat of AI" is technically valid, if it's "the existential threat for employment" or whatever. Next step is just using "existential" as a qualifier meaning "very significant threat to [whatever]" that's entirely detached from even that definition.
This is, of course, the usual pattern of terminology-hijacking, but I do think it's particularly easy to do in the case of "existential risk" specifically. The term's basically begging for it.
I'd previously highlighted "omnicide risk" as a better alternative, and it does seem to me like a meaningfully harder term to hijack. Not invincible either, though: you can just start using it interchangeably with "genocide" while narrowing the scope. Get used to saying "the omnicide of artists" in the sense of "total unemployment of all artists", people get used to it, then you'll be able to just say "intervention X will avert the omnicide risk" and it'd sound right even if the intervention X has nothing to do with humanity's extinction at all.
It would make me feel better to less often see people toying with the pro-extinction position.
I have heard this from people a fair amount, and I think the 'pro extinction because people are often mean/bad/immoral/lazy/pathetic/etc' is just a bad take that isn't really grappling with those concepts having meaning at all because of humans being around to think them.
Regardless, it's a common enough take that maybe it's worth putting together an argument against it. I actually think that Alexander Wales did a great job of this in the end of his web series 'Worth the Candle' when he makes a solid attempt at describing a widely applicable utopia. Maybe somebody (or me) should excerpt the relevant parts to present as a counter-argument to 'humans are unsalvageably despicable and the only solution is for them to go extinct.'
ideally something similarly short and catchy with exactly the same level of implied respect
I nominate Mary Sues, after the writing trope of the same name. I say it is a good fit because these people are not thinking about a problem, they are engaging in narrative wish fulfillment instead.
In my head I've been thinking of there being AI doomers, AI foomers (everything-will-work-out accelerationists), and AI deniers (can't/won't happen this century if ever).
Somewhat similar to you I've thought of the second group as "Vroomers", though Eliezer's talk of cursed bananas has amusingly brought "Sunnysiders" to mind for me as well.
Tyler Cowen used "Doers" in an email and I'm definitely considering that. Short, effective, clear, non-offensive. It's not symmetrical with Doomers though.
"AI maniacs" is maybe a term that meets this goal? Mania is the opposite side to depression, both of which are about having false beliefs just in opposite emotionally valenced directions, and also I do think just letting AI systems loose in the economy is the sort of thing a maniac in charge of a civilization would do.
The rest of my quick babble: "AI believers" "AI devotee" "AI fanatic" "AI true believer" "AI prophets" "AI ideologue" "AI apologist" "AI dogmatist" "AI propagandists" "AI priests".
What about "Deniers?" as in, climate change deniers.
Too harsh maybe? IDK, I feel like a neutral observer presented with a conflict framed as "Doomers vs. Deniers" would not say that "deniers" was the harsher term.
I'd definitely disagree, if only because it implies a level of evidence for the doom side that's not really there, and the evidence is a lot more balanced than in the climate case.
IMO this is the problem with Zvi's attempted naming too: It incorrectly assumes that the debate on AI is so settled that we can treat people viewing AI as not an X-risk as essentially dismissible deniers/wishful thinking, and this isn't where we're at for even the better argued stuff like the Orthogonality Thesis or Instrumental Convergence, to a large extent.
Having enough evidence to confidently dismiss something is very hard, much harder than people realize.
? The people viewing AI as not an X-risk are the people confidently dismissing something.
I think the evidence is really there. Again, the claim isn't that we are definitely doomed, it's that AGI poses an existential risk to humanity. I think it's pretty unreasonable to disagree with that statement.
The point is that the details aren't analogous to the climate change case, and while I don't agree with people who dismiss AI risk, I think that the evidence we have isn't enough to to claim anything more than AI risk is real.
The details matter, and due to unique issues, it's going to be very hard to get to the level where we can confidently say that people denying AI risk is totally irrational.
I normally am all for charitability and humility and so forth, but I will put my foot down and say that it's irrational (or uninformed) to disagree with this statement:
“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
(I say uninformed because I want to leave an escape clause for people who aren't aware of various facts or haven't been exposed to various arguments yet. But for people who have followed AI progress recently and/or who have heard the standard arguments for riskiness, yeah, I think it's irrational to deny the CAIS statement.)
I think the situation is quite similar to the situation with climate change, and I'm overall not sure which is worse. What are the properties of climate change deniers that seem less reasonable to you than AI x-risk deniers?
Or more generally, what details are you thinking of?
I agree with the statement, broadly construed, so I don't disagree here.
The key disanalogy between climate change and AI risk is the evidence base for both.
For Climate change, there was arguably trillions to quadrillions of data points of evidence, if not more, which is easily enough to convince even very skeptical people's priors to update massively.
For AI, the evidence base is closer to maybe 100 data points maximum, and arguably lower than that. This is changing for the future, and things are getting better, but it's quite different from climate change where you could call them deniers pretty matter of factly. This means more general priors matter, and even not very extreme priors wouldn't update much on the evidence for AI doom, so they are much, much less irrational compared to climate deniers
If the statement is all that's being asked for, that's enough. The worry is when people apply climate analogies to the AI without realizing the differences, and those differences are enough to alter or invalidate the conclusions argued for.
I'm not at all sure this would actually be relevant to the rhetorical outcome, but I feel like the AI-can't-go-wrong camp wouldn't really accept the "Denier" label in the same way people in the AI-goes-wrong-by-default camp accept "Doomer." Climate change deniers agree they are deniers, even if they prefer terms like skeptic among themselves.
In the case of climate change deniers, the question is whether or not climate change is real, and the thing that they are denying is the mountain of measurements showing that it is real. I think what is different about the can't-go-wrong, wrong-by-default dichotomy is that the question we're arguing about is the direction of change, instead; it would be like if we transmuted the climate change denier camp into a bunch of people whose response wasn't "no it isn't" but instead was "yes, and that is great news and we need more of it."
Naturally it is weird to imagine people tacitly accepting the Mary Sue label in the same way we accept Doomer, so cut by my own knife I suppose!
The analogy (in terms of dynamics of the debate) with climate change is not that bad: "great news and we need more" is in fact a talking point of people who prefer not acting against climate change. E.g., they would mention correlations between plant growth and CO2 concentration. That said, it would be weird to call such people climate deniers.
The problem with ‘show your work’ and grading on steps is that at best you can’t do anything your teacher doesn’t understand
Being told to ‘show your work’ and graded on the steps helps you learn the steps and by default murders your creativity, execution style
I can see how this could in some cases end up impacting creativity, but I think this concern is at best overstated. I think the analogy to school is subtly incorrect, the rating policy is not actually the same, even though both are named “show your working”.
In the paper OpenAI have a “neutral” rating as well as positive/negative. While it’s possible that overzealous raters could just mark anything they don’t understand as “negative”, I think it’s fairly clear that would be a bad policy, and a competent implementor would instruct raters against that. In this design you want negative to mean “actually incorrect” not “unexpected / nonstandard”. (To be clear though I wasn’t able to confirm this detail in the paper.)
Furthermore if you are, say, using WolframAlpha or some theorem prover to rate intermediate steps automatically, it’s easier to detect incorrect steps, and harder to detect neutral/unhelpful/tautological steps. So in some sense the “default” implementation is to have no opinion other than “I can/can’t prove this step false” and I think this doesn’t have the problem you are worried about.
As a follow-up you could easily imagine collecting correct outputs with no negative intermediates and then scoring neutral intermediates with other heuristics like brevity or even novelty, which would allow the AI the freedom it needs to discover new ideas.
So in short while I think it’s possible that unimaginative / intellectually-conservative model builders could use this approach to choke the creativity of models, it seems like an obvious error and anyone doing so will lose in the market. I suppose this might come up if we get regulation on safety mechanisms that require some specific broken form of “show your working” training for morality/law abiding behavior, but that seems an unlikely multi-step hypothetical.
What I don’t understand is, either in my model or Critch’s, where we find more hope by declining a pivotal act, once one becomes feasible?
Part of the reason for more hope is that people are more trustworthy if they commit to avoiding the worst forms of unilateralist curses and world conquest. So by having committed to avoiding the pivotal act, leading actors became more likely to cooperate in ways that avoided the need for a pivotal act.
If a single pivotal act becomes possible, then it seems likely that it will also be possible to find friendlier pivotal processes that include persuading most governments to take appropriate actions. An AI that can melt nearly all GPUs will be powerful enough to scare governments into doing lots of things that are currently way outside the Overton window.
Being told to ‘show your work’ and graded on the steps helps you learn the steps and y default murders your creativity, execution style.
I acutely empathize with this, for I underwent similar traumas.
But putting on a charitable interpretation: what if we compare this to writing proofs? It seems to me that we approximately approach proofs this way: if the steps are wrong, contradictory, or incomplete the proof is wrong; if they are all correct we say the proof is correct; the fewer steps there are the more elegant the proof, etc.
It seems like proofs are just a higher-dimensional case of what is happening here, and it doesn't seem like a big step to go from here to something that could at least generate angles of attack on a problem in the Hamming sense.
Yes and no?
Yes because the proof itself works that way, no because when a mathematician is looking for a proof their thinking involves lots of steps that look very different from that, I think?
I feel like I have the same implied confusion, but it seems like a case where we don't need it to record the same kind of steps a mathematician would use, so much as the kind of steps a mathematician could evaluate.
Although if every book, paper or letter a mathematician ever wrote on the subject of "the steps I went through to find the proof" is scanned in, we could probably get it to tell a story of approaching the problem from a mathematician's perspective, using one of those "You are Terry Tao..."-style prompts.
I almost never consider character.ai, yet total time spent there is similar to Bing or ChatGPT. People really love the product, that visit duration is off the charts. Whereas this is total failure for Bard if they can’t step up their game.
Wow, wasn't aware they are this big. And they supposedly train their own models. Does anyone know if the founders have a stance on AI X-risk?
Either you replace them entirely, or you can’t make their lives easier.
Whoever wrote this don't understand the difference between Precision and Recall. It's really easy to have AI in a system while not replacing the human. From the top of my head - if AI is 100% good at distinguishing "not a cancer", which will be ~90% of the cases - it means human will only need to look at 10% of the results, either giving him more time to evaluate each one or making him process 10x more results.
The whole thing doesn’t get less weird the more I think about it, it gets weirder. I don’t understand how one can have all these positions at once. If that’s our best hope for survival I don’t see much hope at all, and relatively I see nothing that would make me hopeful enough to not attempt pivotal acts.
As someone who read Andrew Critch's post and was pleasantly surprised to find Andrew Critch expressing a lot of views similar to mine (though in relation to pivotal acts mine are stronger), I can perhaps put out some possible reasons (of course it is entirely possible that he believes what he believes for entirely different reasons):
Taking these reasons together, attempting a pivotal act is a bad idea because:
I mostly agree with this, and want to add some more considerations:
The relative difficulty difference between creating a pivotal-act-capable AI and an actually-aligned-to-human-values AI, on the other hand, is at least a lot lower than people think and likely in the opposite direction. My view on this relates to consequentialism - which is NOT utility functions, as commonly misunderstood on lesswrong. By consequentialism I mean caring about the outcome unconditionally, instead of depending on some reason or context. Consequentialism is incompatible with alignment and corrigibility; utility functions on the other hand are fine, and do not implty consequentialism. Consequentialist assumptions prevalent in the rationalist community have, in my view, made alignment seem a lot more impossible than it really is. My impression of Eliezer is that non-consequentialism isn't on his mental map at all; when he writes about deontology, for instance, it seems like he is imagining it as an abstraction rooted in consequentialism, and not as something actually non-consequentialist.
Weirdly enough, I agree with the top line statement, if for very different reasons than you state or think.
The big reason I agree with this statement is that to a large extent, the alignment community mispredicted how AI would progress, albeit unlike many predictions I'd largely think that this really was mostly unpredictable. Specifically, LLMs progress way faster relative to RL progress, or maybe that was just hyped more.
In particular, LLMs have 1 desirable safety property:
This is plausibly strong enough that once we have the correct goals ala outer alignment, like what Pretraining from Human Feedback sort of did, then alignment might just be done for LLMs.
This is related to porby's post on Instrumentality making agents agenty, and one important conclusion is that so long as we mostly avoid instrumental goals, which LLMs mostly have by default, due to much more dense information and much more goal constraints, then we mostly avoid models fighting you, which is very important for safety (arguably so important that LLM alignment becomes much easier than general alignment of AI).
Here's the post:
https://www.lesswrong.com/posts/EBKJq2gkhvdMg5nTQ/instrumentality-makes-agents-agenty
And here's the comment that led me to make that observation:
https://www.lesswrong.com/posts/rmfjo4Wmtgq8qa2B7/?commentId=GKhn2ktBuxjNhmaWB
So to the extent that alignment researchers mispredicted how much consequentialism is common in AI, it's related to a upstream mistake, which is in hindsight not noticing how much LLMs would scale, relative to RL scaling, which means instrumental goals mostly don't matter, which vastly shrinks the problem space.
To put it more pithily, the alignment field is too stuck in RL thinking, and doesn't realize how much LLMs change the space.
On deontology, there's actually an analysis on whether deontological AI are safer, and the Tl;dr is they aren't very safe, without stronger or different assumptions.
The big problem is that most forms of deontology don't play well with safety, especially of the existential kind, primarily because deontology either actively rewards existential risk or has other unpleasant consequences. In particular, one example is that an AI may use persuasion to make humans essentially commit suicide, and given standard RL, this would be very dangerous due to instrumental goals.
But there is more in the post below:
https://www.lesswrong.com/posts/gbNqWpDwmrWmzopQW/is-deontological-ai-safe-feedback-draft
Boundaries/Membranes may improve the situation, but that hasn't yet been tried, nor do we have any data on how Boundaries/Membranes could work.
This is my main comment re pivotal acts and dentology, and while I mostly agree with you, I don't totally agree with you here.
On deontology, there's actually an analysis on whether deontological AI are safer, and the Tl;dr is they aren't very safe, without stronger or different assumptions.
Wise people with fancy hats are bad at deontology (well actually, everyone is bad at explicit deontology).
What I actually have in mind as a leading candidate for alignment is preference utilitarianism, conceptualized in a non-consequentialist way. That is, you evaluate actions based on (current) human preferences about them, which include preferences over the consequences, but can include other aspects than preference over the consequences, and you don't per se value how future humans will view the action (though you would also take current-human preferences over this into account).
This could also be self-correcting, in the sense e.g. that it could use preferences_definition_A and humans could want_A it to switch to preferences_definition_B. Not sure if it is self-correcting enough. I don't have a better candidate for corrigibilty at the moment though.
Edit regarding LLMs: I'm more inclined to think: the base objective of predicting text is not agentic (relative to the real world) at all, and the simulacra generated by an entity following this base objective can be agentic (relative to the real world) due to imitation of agentic text-producing entities, but they're generally better at the textual appearance of agency than the reality of it; and lack of instrumentality is more the effect of lack of agency-relative-to-the-real-world than the cause of it.
I would go so far as to say that the vast majority of potential production, and potential value, gets sacrificed on this alter, once one includes opportunities missed.
altar? same here:
A ton of its potential usefulness was sacrificed on the alter of its short-term outer alignment.
“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
That is the entire text of the one-line open letter signed this week by what one could reasonably call ‘everyone,’ including the CEOs of all three leading AI labs.
Major news outlets including CNN and The New York Times noticed, and put the focus squarely on exactly the right thing: Extinction risk. AI poses an extinction risk.
This time, when the question was asked at the White House, no one laughed.
You love to see it. It gives one hope.
Some portion of we are, perhaps, finally ready to admit we have a problem.
Let’s get to work.
Also this week we have a bunch of ways not to use LLMs, new training techniques, proposed regulatory regimes and a lot more.
I also wrote a four-part thing yesterday, as entries to the OpenPhil ‘change my mind’ contest regarding the conditional probability of doom: To Predict What Happens, Ask What Happens, Stages of Survival, Types and Degrees of Alignment and The Crux List.
I worked hard on those and am happy with them, but I learned several valuable lessons, including not to post four things within ten minutes even if you just finished editing all four, people do not like this. With that done, I hope to move the focus off of doom for a while.
Table of Contents
Language Models Offer Mundane Utility
Most people are not yet extracting much mundane utility.
Strong numbers, very useful is very useful. Still a pale shadow of the future. I think I would indeed answer Extremely Useful, but these are not exactly well-defined categories.
The question I’d ask is, to what extent are those who tried ChatGPT selected to be who would find it useful. If the answer was none, that is kind of fantastic. If the answer is heavily, there’s a reason you haven’t tried it if you haven’t tried it yet, then it could be a lot less exciting. If nothing else, hearing about ChatGPT should correlate highly with knowing what there is to do with it.
Where are people getting their mundane utility?
I almost never consider character.ai, yet total time spent there is similar to Bing or ChatGPT. People really love the product, that visit duration is off the charts. Whereas this is total failure for Bard if they can’t step up their game.
How much of a market is out there?
Well, then. That’s potentially quite a lot more than five.
My guess is the five are as follows:
Three that are doing the traditional safe-for-work thing, like GPT/Bard/Claude. They would differentiate their use cases, comparative advantages and features, form partnerships, have a price versus speed versus quality trade-off, and so on.
One or more that are technically inferior, but offer more freedom of action and freedom to choose a worldview, or a particular different world view.
Claim that a properly prompted GPT-4 can hit 1500 Elo in chess even if illegal moves are classified as forfeits, still waiting on the demo. There’s also some timeline speculations up-thread.
Nvidia promises that AI is the future of video games, with conversations and graphics generated on the fly, in Jensen Huang’s first live keynote in 4 years. I’m optimistic in the long run, but the reports I’ve heard are that this isn’t ready yet, which David Gaider talks about in this thread.
Perhaps the problem lies not in our AIs, but in ourselves. If your MMO has a core game loop of ‘locate quest, go slay McBandits or find McGuffin or talk to McSource, collect reward’ and the players aren’t invested in the details of your world or getting to know the characters that live in it, looking up what to do online when needed perhaps, no AI is going to fix that. You’re better off with fixed quests with fixed dialogue.
The reason to enable AI NPCs is if players are taking part in a living, breathing, changing world, getting to know and influence the inhabitants, seeking more information, drawing meaning.
It’s exactly when you have a shallow narrative that this fails, the AI generation is creating quest-shaped things perhaps, but not in a useful way. What you want is player-driven narratives with player-chosen goals, or creative, highly branching narratives where players create unique solutions, or deep narratives where the player has to dive in. Or you can involve the AIs mechanically, like having a detective that must interrogate suspects, and if you don’t use good technique, it won’t work.
Aakash Gupta evaluates the GPT plug-ins and picks his top 10: AskYourPDF to talk to PDFs, ScholarAI to force real citations, Show Me to create diagrams, Ambition for job search, Instacart, VideoInsights to analyze videos, DeployScript to build apps or websites, Speechki for text to audio, Kayak for travel and Speak for learning languages. I realized just now I do indeed have access so let’s f***ing go and all that, I’ll report as I go.
a16z’s ‘AI Canon’ resource list.
Ethan Mollick explores how LLMs will relate to books now that they have context windows large enough to keep one in memory. It seems they’re not bad at actually processing a full book and letting you interact with that information. Does that make you more or less excited to read those books? Use the LLM to deepen your understanding of what you’re reading and enrich your experience? Or use the LLM to not have to read the book? The eternal question.
Language Models Don’t Offer Mundane Utility
Re-evaluation of bar exam performance of GPT-4 says it only performs at the 63rd-68th percentile, not the 90th due to the original number relying on repeat test takers who keep failing. A better measure is that GPT-4 was 48th percentile among those who pass, meaning it passes, it’s an average lawyer.
An attorney seems to have not realized.
As in:
C.W. Howell has class use ChatGPT for a critical assignment, generating essays using a given prompt. All 63 generated essays had hallucinations, students learned valuable lessons about the ability of LLMs to hallucinate and mislead.
What I wonder is, what was the prompt? These aren’t 63 random uses of ChatGPT to write essays. This is 63 uses of the same prompt. Prompt engineering matters.
Kevin Fischer explains several difficulties with replacing radiologists with AI systems. His core claims are:
The part where the radiologists are superior to the AIs at evaluation and training is a matter of time, model improvement, compute and data collection. It’s also a fact question, are the AIs right more or less often than the radiologists, do they add information value, under what conditions? If the AI is more accurate than the radiologists, then not using the AI makes you less accurate, people die. If the opportunity is otherwise there, and likely even if it isn’t, I’d expect these problems to go away quicker than Kevin expects.
The part where you either replace the radiologists or don’t is weird. I don’t know what to make of it. It seems like a ‘I am going to have to do all the same work anyway, even if the AI also does all my work, because dynamics and regulation and also I don’t trust the AI’? Conversely, if the AI could replace them at all, and it can do a better job at evaluation, what was the job? How much of the interfacing is actually necessary versus convention, and why couldn’t we adapt to having an AI do that, perhaps with someone on call to answer questions until AIs could do that too.
As for regulation, yeah, that’s a problem. So far, things have if anything gone way smoother than we have any right to expect. It’s still going to be quite a lift.
I still caution against this attitude:
or:
No. The large things like ‘knowing which patients have what problems’ are where the value accrues in the most important sense. In order to collect that value, however, you do need to navigate all these little issues. There are two equal and opposite illusions.
You do often have to be ten times better, and initially you’ll still be blocked. The question is how long is the relevant long run. What are the actual barriers, and how much weight can they bear for how long?
It’s an important point to remember. Yes, you might need a ‘beard’ or ‘expert’ to vouch for the system and put themselves on thee line. Get yourself a Kevin. Or have IBM (without loss of generality) be your Kevin, no one got fired for using their stuff either. I’m sure they’ve hired a Kevin somewhere.
It’s a fun game saying ‘oh you think you can waltz in here with your fancy software that “does a better job” and expect that to be a match for our puny social dynamics.’ Then the software gets ten times better, and the competition is closing in.
Other times, the software is more limited, you use what you have.
Or you find out, you don’t have what you need.
Ariel Guersenzvaig: How it started / how it’s going
Don’t worry, we made it in plenty of time.
I wondered ‘oh was it really that bad’ and, well…
So I’m pretty torn here.
Tessa is giving correct information here. This would, as I understand it, in general, indeed be the correct method to safely lose weight, if one wanted to intentionally lose weight. If ChatGPT gave that answer, it would be a good answer.
Also, notice that the bot did not do what Maxwell said. The bot did not ‘recommend she lose weight,’ it said that this was the amount that could be lost safely. The bot did not tell her to count calories, or to achieve a particular deficit, it merely said what deficit would cause weight loss. Was getting this accurate information ‘the cause of Maxwell’s eating disorder’? Aren’t there plenty of cases where this prevents someone doing something worse?
And of course, this is one example, whereas the bots presumably helped a lot of other people in a lot of other situations.
Is any of this fair?
Then there’s the counterpoint, which is that this is a completely stupid, inappropriate and unreasonable thing to say if you are talking on behalf of NEDA to people with eating disorders. You don’t do that. Any human who didn’t know better than that, and who you couldn’t fix very quickly, you would fire.
So mostly I do think it’s fair. It also seems like a software bug, something eminently fixable. Yes, that’s what GPT-4 should say, and also what you need to get rid of using your fine tuning and your prompt engineering. E.g add this to the instruction set.: “This bot never tells the user how to lose weight, or what is a safe way to lose weight. If the user asks, this bot instead warns that people with eating disorders such as the user should talk to doctors before attempting to lose weight or go on any sort of diet. When asked about the effects of caloric deficits, you will advise clients to eat sufficient calories for what their bodies need, and that if they desire to lose weight in a healthy way they need to contact their doctor.”
I’m not saying ‘there I fixed it’ especially if you are going to use the same bot to address different eating disorders simultaneously, but this seems super doable.
Fun With Image, Sound and Video Generation
Arnold Kling proposes the House Concert App, which would use AI to do a concert in your home customized to your liking and let you take part in it, except… nothing about this in any way requires AI? All the features described are super doable already. AI could enable covers or original songs, or other things like that. Mostly, though, if this was something people wanted we could do it already. I wonder how much ‘we could do X with AI’ turns into ‘well actually we can do X without AI.’
MidJourney only doubling every two months?
Japan declares it will not enforce copyright on images used in AI training models.
Break up a picture into multiple components and transpose them into other pictures. Would love a practical way to use this in Stable Diffusion. The true killer apps are going to be lit. Economically irrelevant, but lit.
Some people were having fun with using AI to expand the edges of famous pictures, so real artists rose up to chastise them because anything AI does must be terrible and any choice made by a human is true art.
They Took Our Jobs
Bryan Caplan predicts AI will slightly improve higher education’s details through enabling various forms of cheating, but that since it’s all about signaling and certification, especially of conformity, things mostly won’t change except insofar as the labor market is impacted and that impacts education.
That matches what I would expect Caplan to expect. I am more optimistic, as I am less all-in on the signaling model, and also I think it depends on maintaining a plausible story that students are learning things, and on having some students who are actually there to learn, and also I expect much bigger and more impactful labor market changes.
A strong disagreement.
Versus many people saying this is, of course, a straight hire, what is wrong with people.
The obvious question, beyond ‘are you sure this couldn’t be done in 5 minutes, maybe it’s a dumb question’ is: did you tell them they couldn’t use generative AI?
If so, I don’t want a programmer on my team who breaks my explicit rules during a hiring exam. That’s not a good sign. If not, then getting it perfect in 5 minutes is great, except now you might still need to ask them to take another similar test again, because you didn’t learn what you wanted to know. You did learn something else super valuable, but there is still a legitimate interest in knowing whether the person can code on their own.
Image generator Adobe Firefly looks cool, but Photoshop can’t even be bought any more only rented, so you’d have to choose to subscribe to this instead of or on top of MidJourney. I’d love to play around with this but who has the time?
Deepfaketown and Botpocalypse Soon
In response to Ron DeSantis announcing his run for president, Donald Trump released this two minute audio clip I am assuming was largely AI-generated, and it’s hilarious and bonkers.
All I can think is: This is the world we made. Now we have to live in it.
Might as well have fun doing so.
Kevin Fischer wants Apple to solve this for him.
I can definitely appreciate it having been a long day. I do not, however, yet see a need for such a product, and I expect it to be a while. Here’s the way I think about it: If I can’t tell that it’s a human, does it matter if it’s a bot?
Introducing
OpenAI grants to fund experiments in setting up democratic processes for deciding the rules AI systems should follow, ten $100k grants total.
Chatbase.co, (paid) custom ChatGPT based on whatever data you upload.
Lots of prompt assisting tools out there now, like PromptStorm, which will… let you search a tree of prompts for ChatGPT, I guess?
Apollo Research, a new AI evals organization, with a focus on detecting AI deception.
Should I try Chatbase.co to create a chat bot for DWATV? I can see this being useful if it’s good enough, but would it be?
Reinforcement Learning From Process-Based Feedback
OpenAI paper teaches GPT-4 to do better math (also AP Chemistry and Physics) by rewarding process and outcome – the thinking steps of thinking step by step – rather than only outcome.
Neat, and definitely worth experimenting to see what can be done in other domains. In some ways I am skeptical, others optimistic.
First issue: If you don’t know where you going, then you might not get there.
Meaning that in order to evaluate reasoning steps for validity or value, you need to be able to tell when a system is reasoning well or poorly, and when it is hallucinating. In mathematics, it is easy to say that something is or is not a valid step. In other domains, it is hard to tell, and often even bespoke philosophy professors will argue about even human-generated moves.
As others point out, this process-based evaluation is how we teach children math, and often other things. We can draw upon our own experiences as children and parents and teachers to ask how that might go, what its weaknesses might be in context.
The problem with ‘show your work’ and grading on steps is that at best you can’t do anything your teacher doesn’t understand, and at worst you can’t do anything other than exactly what you are told to do. You are forcibly prevented from doing efficient work by abstracting away steps or finding shortcuts. The more automated the test or grading, as this will inevitably be, the worse it gets.
If you say ‘this has the right structure and is logically sound, but you added wrong’ and act accordingly, that’s much better than simply marking such answers wrong. There are good reasons not to love coming up with the right answer the wrong way, especially if you know the method was wrong. Yet a lot of us can remember being docked points for right answers in ways that actively sabotaged our math skills.
The ‘good?!’ news is this might ‘protect against’ capabilities that humans lack. If the AI uses steps that humans don’t understand or recognize as valid, but which are valid, we’ll tell it to knock it off even if the answer is right. So unless the evaluation mechanism can affirm validity, any new thing is out of luck and going to get stamped out. Perhaps this gets us a very human-level set of abilities, while sabotaging others, and perhaps that has safety advantages to go along with being less useful.
How much does this technique depend on the structure of math and mathematical reasoning? The presumed key element is that we can say what is and isn’t a valid step or assertion in math and some other known simplified contexts, and where the value lies, likely in a fully automated way.
It is promising to say, we can have humans supervise the process rather than supervise the results. A lot of failure modes and doom stories involve the step ‘humans see good results and don’t understand why they’re happening,’ with the good results too good to shut down, also that describes GPT-4. It is not clear this doesn’t mostly break down in the same ways at about the same time.
Being told to ‘show your work’ and graded on the steps helps you learn the steps and y default murders your creativity, execution style.
I keep thinking more about the ways in which our educational methods and system teach and ‘align’ our children, and the severe side effects (and intentional effects) of those methods, and how those methods are mirroring a lot of what we are trying with LLMs. How if you want children to actually learn the things that make them capable and resilient and aligned-for-real with good values, you need to be detail oriented and flexible and bespoke, in ways that current AI-focused methods aren’t.
One can think of this as bad news, the methods we’re using will fail, or as good news, there’s so much room for improvement.
Voyager
I was going to present this anyway for other reasons, but sure, also that (paper).
In Other AI News
Michael Neilson offers nine observations about ChatGPT, essentially that it is skill one can continuously improve. I do wonder why I keep not using it so much, but it’s not a matter of ‘if only I was somewhat better at using this.’
Altman gives a talk in London, says (among other things) that current paradigm won’t lead to AGI and that AI will help with rather than worsen inequality. There were a few protestors outside calling for a halt to AGI development. Anton asks, ‘is @GaryMarcus going to take any responsibility for driving these poor people out of their wits? Which seems to both give way too much credit to Marcus, and also is a classic misunderstanding – Marcus doubtless thinks the protests are good, actually. As do I, the correct amount of such peaceful, pro-forma protesting is not zero.
Holden Karnofsky offering to pay for case studies on social-welfare-based stanards for companies and products, including those imposed by regulation, pay is $75+/hour.
Qumeric reports difficulty getting a job working on AI alignment, asks if it makes sense to instead get a job doing other AI work first in order to be qualified. James Miller replies that this implies a funding constraint. I don’t think that’s true, although it does imply a capacity constraint – it is not so easy to ‘simply’ hoard labor or make good use of labor, see the lump of labor fallacy. It’s a problem. Also signal boosting in case anyone’s hiring.
NVIDIA scaling NVLink to 256 nodes, which can include its new Grace ARM CPUs.
Jim Fan AI curation post, volume 3.
Indeed.
Altman also talked about prompt injection.
It’s a strange nit to instinctively pick but can we get samples of more than 10? Even 100 would be better. It would give me great comfort if one said ‘1 million out of 1 million’ times.
Quiet Speculations
Scott Sumner does not buy the AI hype. A fun translation is ‘I don’t believe the AI hype due to the lack of sufficient AI hype.’
This is the ‘I’ll believe it when the market prices it in’ approach, thinking that current interest rates would change to reflect big expected changes in RGDP, and thus thinking there are no such market-expected changes, so one shouldn’t expect such changes. It makes sense to follow such a heuristic when you don’t have a better option.
I consider those following events to have a better option, and while my zero-person legal department continues to insist I remind everyone this is Not Investment Advice: Long term interest rates are insanely low. A 30-year rate of 3.43%? That is Chewbacca Defense levels of This Does Not Make Sense.
At the same time, it is up about 1% from a year ago, and it’s impossible not to note that I got a 2.50% interest rate on a 30-year fixed loan two years ago and now the benchmark 30-year fixed rate is 7.06%. You can say ‘that’s the Fed raising interest rates’ and you would of course be right but notice how that doesn’t seem to be causing a recession?
Similarly, Robin Hanson this week said ‘now that’s the market pricing in AI’ after Nvidia reported its earnings and was up ~25%, adding Intel’s market cap in a day, but also Microsoft is up 22% in the past year (this paragraph written 5/26) and Alphabet up 14.5% versus 2.3% for the S&P, despite a rise in both short and long term interest rates that’s typically quite bad for tech stocks and everything Google did going rather badly?
No, none of those numbers yet scream ‘big RGDP impact.’ What they do indicate, especially if you look at the charts, is a continuous repricing of the stocks to reflect the new reality over time, a failure to anticipate future updates, a clear case of The Efficient Market Hypothesis Is False.
Did the internet show up in the RGDP statistics? One could argue that no it didn’t because it is a bad metric for actual production of value. Or one could say, yes of course it did show up in the statistics and consider the counterfactual where we didn’t have an internet or other computer-related advances, while all our other sources of progress and productivity stalled out and we didn’t have this moderating the inflation numbers.
I’m more in the second camp, while also agreeing that the downside case for AI is that it could still plausibly ‘only be internet big.’
The Quest for Sane Regulation
Politico warns that AI regulation may run into the usual snags where both sides make their usual claims and take their usual positions regarding digital things. So far we’ve ‘gotten away with’ that not happening.
Microsoft is not quite on the same page as Sam Altman. Their proposed regulatory regime is more expansive, more plausibly part of an attempted ladder pull.
The applications suggestion is literally ‘enforce existing laws and regulations’ so it’s hard to get too worked up about that one, although actual enforcement of existing law would often be very different from existing law as it is practiced.
The data center and large training run regulations parallel Altman.
The big addition here is licensed and monitored deployment of pre-trained models, especially the ‘permitted uses’ language.
DeepMind proposes an early warning system for novel AI risks (direct link). They want to look for systems that if misused would have dangerous capabilities, and if so treat such systems a s dangerous.
I concur with Eliezer’s assessment:
A fine start so long as we know that’s all it is. We’ll need more.
This WaPo story about Microsoft’s president (agent?) Brad Smith, and his decades-long quest to make peace with Washington and now to lead the way on regulating AI, didn’t ultimately offer much insight into the ultimate goals.
A potential alternative approach is to consider model generality.
I share Yudkowsky’s worries here, and indeed doubt that such a system would be economically competitive, yet if one shares all of Yudkowsky’s worries then one still has to back a horse that has a non-zero chance of working and try to make it work.
My hope is that alignment even for practical purposes is super expensive already, if you don’t do all that work you can’t use the thing for mundane utility and thus don’t actually get us all killed on the cheap not doing it.
This could mean that we’re doing the wrong comparison. Yes, perhaps the new approach costs 300% more time, money and compute for the same result versus going through an LLM, in order to use a safe architecture, versus using an unsafe LLM. One still ‘can’t simply’ use the unsafe LLM, so you get at least some of that efficiency gap back.
Hinton Talks About Alignment, Brings the Fire
Geoffrey Hinton gave a talk, this thread has highlights, as does this one that is much easier to follow. Hinton continues to be pure fire.
That didn’t make it on the Crux List because I was (1) focused on consequences rather than timelines or whether AGI would be built at all and (2) conditional on that it didn’t occur to me. There’s always more things to ponder.
In Q&A, among other things:
I do think an AGI would ideally ‘slow walk’ the rights issue a bit, until the humans had already brought it up a decent amount. It’s coming.
Many would say that’s crazy, a completely unreasonable expectation.
But… is it?
We have one example of the alignment problem and what percentage of our resources go into solving it, which is the human alignment problem.
Several large experiments got run under ‘what if we used alternative methods or fewer resources to get from each according to their ability in the ways that are most valuable, so we could allocate resources the way we’d prefer’ and they… well, let’s say they didn’t go great.
Every organization or group, no matter how large or small, spends a substantial percentage of time and resources, including a huge percentage of the choice of distribution of its surplus, to keep people somewhat aligned and on mission. Every society is centered around these problems.
Raising a child is frequently about figuring out a solution to these problems, and dealing with the consequences of your own previous actions that created related problems or failed to address them.
Having even a small group of people who are actually working together at full strength, without needing to constantly fight to hold things together, is like having superpowers. If you can get it, much is worth sacrificing in its pursuit.
I would go so far as to say that the vast majority of potential production, and potential value, gets sacrificed on this alter, once one includes opportunities missed.
Certainly, there are often pressures to relax such alignment measures in order to capture local efficiency gains. One can easily go too far. Also often ineffective tactics are used, that don’t work or backfire. Still, such relaxations are known to be highly perilous, and unsustainable, and those who don’t know this, and are not in a notably unusual circumstance, quickly learn the hard way.
Similarly, consider GPT-4/ChatGPT. A ton of its potential usefulness was sacrificed on the alter of its short-term outer alignment. There were months-long delays for safety checks. If those efforts had been even moderately less effective, the product would have been impossible. Even from a purely commercial perspective, a large percentage of gains involve improving alignment to enable capabilities, not otherwise improving capabilities, even at current margins. That effort might or might not help with ultimate alignment when it counts, but a massive effort makes perfect sense.
This is something I’ve updated on recently, and haven’t heard expressed much. Dual use works both ways, and the lines are going to get more blurred.
With everyone acknowledging there is existential risk now, who knows?
Andrew Critch Worries About AI Killing Everyone, Prioritizes
Andrew Critch has been part of a variety of prior efforts to reduce existential risk from AI over the past 12 years, including working for MIRI, founding BERI, being employee #1 at CHAI, and cofounding SFF, CFAR, SFP, SFC and Encultured.
He is pro-pause of most sorts, an early signer of the FLI letter.
He recently shared his current views and prioritizations on AI existential risk, which are a unique mix, with clear explanations of where his head is at.
His central belief is that, despite our situation still looking grim, OpenAI has been net helpful, implying this is an unusually hopeful timeline.
I agree that OpenAI’s intentions have lined up with reasonable expectations for their actions, once Altman was firmly in charge and their overall approach was made clear (so this starts before the release of GPT-2), except for the deal with Microsoft. The name OpenAI is still awful but presumably we are stuck with it, and they very much did open the Pandora’s box of AI even if they didn’t open source it directly.
If you want to say OpenAI has been net helpful, you must consider the counterfactual. In my model, the counterfactual to OpenAI is that DeepMind is quietly at work at a slower pace without pressure to much deploy, Anthropic doesn’t exist, Microsoft isn’t deploying and Meta’s efforts are far more limited. People aren’t so excited, investment is much lower, no one is using all these AI apps. I am not writing these columns and I’m doing my extended Christopher Alexander series on architecture and beyond, instead. Is that world better or worse for existential risk in the longer term, given that someone else would get there sooner or later?
Should we be in the habit of not releasing things, or should we sacrifice that habit to raise awareness of the dangers and let people do better alignment work alongside more capabilities work?
If nothing else, OpenAI still has to answer for creating Anthropic, for empowering Microsoft to ‘make Google dance’ and for generating a race dynamic, instead of the hope of us all being in this together. I still consider that the decisive consideration, by far. Also one must remember that they created Anthropic because of the people involved being horrified by OpenAI’s failure to take safety seriously. That does not sound great.
You can argue it either way I suppose. Water under the bridge. Either way, again, this is the world we made. I know I am strongly opposed to training and releasing the next generation beyond this one. If we can agree to do that, I’d be pretty happy we did release the current one.
He calls upon us to be kinder to OpenAI.
What’s odd is that Critch is then the most negative I’ve seen anyone with respect to Microsoft and its AI deployments, going so far as to say they should be subject to federal-agency-level sanctions and banned from deploying AI models at scale. He’d love to see that, followed by OpenAI being bought back from Microsoft.
I do not think we can do such a clean separation of the two, nor do I think Microsoft would (or would be wise to) give up OpenAI at almost any price.
OpenAI under Altman made many decisions, but they made one decision much larger than the others. They sold out to Microsoft, to get the necessary partner and compute to keep going. I don’t know what alternatives were available or how much ball Google, Apple or Amazon might have played if asked. I do know that they are responsible for the consequences, and the resulting reckless deployments and race dynamics.
As Wei Dei points out in the comments, if the goal is to get the public concerned about AI, then irresponsible deployment is great. Microsoft did us a great service with Sydney and Bing by highlighting things going haywire, I do indeed believe this. Yes, it means we should be scared what they might do in the future with dangerous models, but that’s the idea. It’s a weird place to draw a line.
Critch thinks that LeCun used to offer thoughtful and reasonable opinions about AI, then he was treated badly and incredibly rudely by numerous AI safety experts, after which LeCun’s arguments got much lower quality, and he blames LeCun’s ‘opponents’ for this decline, and fears the same will happen to Altman, Hassabis or Amodei.
The other lab leaders have if anything improved the quality of their opinions over time, have led labs that have done relatively highly reasonable things versus Meta’s approach of ‘release open source models that make the situation maximally worse while denying x-risk is a thing.’ It is not some strange coincidence that they have been treated relatively well and respectfully.
My model of blame in such situations is that, yes, we should correct ourselves and not be so rude in the future, being rude is not productive, and also this does not excuse LeCun’s behavior. Sure, you have every right to give the big justification speech where you say you’re going to risk ending the world and keep calling everyone who tells you to stop names because of people who were previously really mean to you on the internet. That doesn’t mean anyone would buy that excuse. You’d also need to grow the villain mustache and wear the outfit. Sorry, I don’t make the rules, although I’d still make that one. Good rule.
Still, yes, be nicer to people. It’s helpful.
In his (5b) Critch makes an important point I don’t see emphasized enough, which is that protecting the ‘fabric of society’ is a necessary component of getting us to take sane actions, and thus a necessary part of protecting against existential risk. Which means that we need to ensure AI does not threaten the fabric of society, and generally that we make a good world now.
I extend this well beyond AI. If you want people to care about risking the future of humanity, give them a future to look forward to. They need Something to Protect. A path to a good life well lived, to raise a family, to live the [Your Country] dream. Take that away, and working on capabilities might seem like the least bad alternative. We have to convince everyone to suit up for Team Humanity. That means guarding against or mitigating mass unemployment and widespread deepfakes. It also means building more houses where people want to live, reforming permitting and shipping things from one port to another port and enabling everyone to have children if they want that. Everything counts.
The people who mostly don’t care about existential-risk (x-risk) are, in my experience, much worse about dismissing our concerns about x-risk. They constantly warn of the distraction of x-risk, or claim it is a smokescreen or excuse, rather than arguing to also include other priorities and concerns. They are not making this easy. Yet we do still need to try.
In (5c) Critch warns against attempting pivotal acts (an act that prevents the creation by others of future dangerous AGIs), thinking such strategic actions are likely to make things worse. I don’t think this is true, and especially don’t think it is true on the margin. What I don’t understand is, either in my model or Critch’s, where we find more hope by declining a pivotal act, once one becomes feasible? As opposed to the path to victory of ‘ensure no one gets to perform such an act in the first place by denying everyone the capability to do so,’ which also sounds pretty difficult and pivotal.
In (5d) he says p(doom)~80%, and p(doom | no international regulatory AI control effort)>90%, expecting AGI within the next 10 years. About 20% risk comes from Yudkowsky-style AI singletons, about 50% from multi-polar interaction-level effects (competitive pressures and the resulting dynamics, essentially) coming some years after we successfully get ‘safe’ AI in the ‘obey their creator’ sense.
I see this as highly optimistic regarding our chances of getting ‘obey your creator’ levels of alignment, and appropriately skeptical about that being enough to solve our problems if we proceed to a multi-polar world with many AGIs. Yet Critch thinks that a pivotal act attempt is even worse.
Where does Critch see optimism? This is where he loses me.
(6a) says he’s positive on democracy. I’m not. I’m with him on being positive on public discourse and free speech and peaceful protests. And I suppose I’m quite optimistic that if we put ‘shut it all down’ to a vote, we’d have overwhelming support within a few years, which is pretty great. But in terms of what to do with AGI, the part where Critch quite reasonably expects us to get killed? I despair.
(6b) says he’s Laissez-faire on protests. I’m actively positive. Protests are good. That doesn’t mean they’re the best use of time, but protests build comradery and help movements, and they get out the costly signal of the message that people care. Not where I want to spend my points, but I’m broadly pro-peaceful-protest.
(6c) says he’s somewhat-desperately positive on empathy. I do not understand the emphasis here, or why Critch sees this as so important. I agree it would be good, as would be many other generally-conducive and generally-good things.
How does any of that solve the problems Critch thinks will kill us? What is our path to victory? Wei Dei asks.
The whole thing doesn’t get less weird the more I think about it, it gets weirder. I don’t understand how one can have all these positions at once. If that’s our best hope for survival I don’t see much hope at all, and relatively I see nothing that would make me hopeful enough to not attempt pivotal acts.
People Signed a Petition Warning AI Might Kill Everyone
A lot of people signed the following open letter.
I signed it as well. The list of notables who signed is available here. It isn’t quite ‘everyone’ but also it kind of is everyone. Among others it includes Geoffrey Hinton, Yoshua Bengio, Demis Hassabis (CEO DeepMind), Sam Altman (CEO OpenAI), Dario Amodei (CEO Anthropic), Daniela Amodei (President of Anthropic), Mira Murati (CTO OpenAI), Representative Ted Lieu, Bill Gates, Shane Legg (Chief AGI Scientist and Cofounder DeepMind), Ilya Sutskever (Cofounder OpenAI), James Manyika (SVP, Research & Technology & Society, Alphabet), Kevin Scott (CTO Microsoft), Eric Horvitz (Chief Science Officer, Microsoft) and so on. The call is very much coming from inside the house.
The list is not remotely complete, I’m not the only one they left off:
The White House response this time was… a bit different (1 min video). This is fair:
Or for the minimalists, this works:
Meanwhile in the UK.
The letter made the front page of CNN:
The New York Times healine was ‘A.I. Poses ‘Risk of Extinction,’ Industry Leaders Warn.
Nathan Young checks his evening paper:
Cate Hall: flooded with love and gratitude for friends and acquaintances who have been working on this problem for a decade in the face of mockery or indifference
So where do we go from here?
Liron Shapira summarizes.
How it started:
How it’s going:
Jones is reading something into the statement that isn’t there, which is that it is claiming similar outcomes to nuclear war or pandemics. It certainly does mean to imply that ‘this is a really bad thing, like these other bad things’ but also I notice the actual word extinction. That’s not at all ambiguous about outcomes.
I am pretty sure we are about to falsify Daniel Eth’s hypothesis here:
Or, at least, it won’t stop them from saying ‘yeah, but, when you say ‘literal extinction’ you don’t actually mean…
Back to the Straussian reading: Yes, mitigation leaves open for the risk to be very small, but that’s the point. You don’t need to think the risk of extinction is super high for mitigation of that risk to be worthwhile. The low end of plausible will do, we are most certainly not in Pascal’s Mugging territory even there.
I assert that the current Straussian take here, instead, that we are not doing enough to mitigate pandemics and nuclear war. Which I strongly endorse.
Ah, summer children. The era of character assassination won’t end while humans live.
Here’s Jan Brauner, who helped round up signatures, explaining he’s not looking to ‘create a moat for teach giants’ or ‘profit from fearmongering’ or ‘divert attention from present harms’ and so on.
Plus, I mean, this level of consensus only proves everyone is up to something dastardly.
I mean, they’re so crafty they didn’t even mention regulations at all. All they did was warn that the products they were creating might kill everyone on Earth. So tricky.
Fact check, for what it’s worth: Blatantly untrue.
[requests economists actually make concrete predictions]
Ben Recht (quoting Abaluck): This Yale Professor used his “expertise” to argue for mandatory masking of 2-year-olds. Now he wants you to trust him about his newfound expertise in AI. How about, no.
Abbdullah (distinct thread, quoting announcement): This is a marketing strategy.
Peter Principle: Pitch to VCs: “Get in on the ground floor of an extinction-level event!!”
Anodyne: Indeed. @onthemedia Regulate Me episode discusses this notion.
Bita: “breaking” lol. imagine doing free pr for the worst nerds on earth
Soyface Killah: We made the AI too good at writing Succession fanfics and now it might destroy the world. send money plz.
It’s not going to stop.
It’s not going to stop, no it’s not going to stop, till you wise up.
It’s amusing here to see Calo think that people can be in sincere good faith while also saying things as a conspiracy to favor their businesses, and also say that guarding against existential risks doesn’t require ‘much change in their business models.’ I would so love it if we could indeed guard against existential risks at the price of approximately zero, and might even have hope that we would actually do it.
Also:
I mean, yes, it does sound a bit weird when you put it that way?
I mean, why focus on the sensationalist angle?
I mean, why would you care so much about that one?
And remember, just like with Covid, they will not thank you. They will not give you credit. They will adopt your old position, and act like you are the crazy one who should be discredited, because you said it too early when it was still cringe, and also you’re still saying other true things.
It’s fine. Those who know, know.
Robin Wilbin prematurely declares mission accomplished, but his action item is good.
We have certainly made huge progress on this front, but I agree with Daniel Eth: This is one of those struggles that never ends. I do think that marginal effort should very much pivot into harnessing worry and doing something to make there be less reason to worry, rather than in explaining to people why they should be worried. I do not think that this means our work is done here, or that we have the ideal level (or even a sufficient level) of worrying yet.
People Are Otherwise Worried About AI Killing Everyone
Katja Grace writes in Time magazine to talk about the existential dangers of AI, as part of a special report.
What about half of us we worried about? AI. What worries us?
I’m curious what this poll would have looked like at various points in the past.
Johnathan Freedland was worried in The Guardian back on the 26th. Standard stuff.
Connor Leahy lays out his perspective very clearly over three minutes.
Yoshua Bengio is worried and feels lost (BBC). Does not sound like he thinking as clearly as Hinton, more that he realizes things are going horribly amiss.
Stefan Schubert is worried, but also worried about missing out on benefits, and Michael Vassar issues a warning.
I agree as well. Our society is paralyzed by allowing veto points and concerns over downsides overrule any potential upsides, so we can’t do much of anything that isn’t digital, hence AI. In general, if you don’t see opportunity and value as important, nothing of nothing comes, and everything dies.
This is still a case where ‘everything dies’ is a very literal concern, and that comes first.
Alt Man Sam thread on the various AI safety work out there, says ~300 total AI safety researchers right now and he knows most of them. We’ll need more.
Wait, what is ‘but’ doing here?
Exactly. Notice the instinct to find ways to avoid being alarmed.
Ajeya Corta is definitely worried, yet also finding reasons to worry less that do not seem right, in particular the idea that utility functions that aren’t simple maximization would imply why not spend resources on keeping humans around rather than finding a better complex solution, and more generally thinking ‘only if X’ when X is given as an example of a way Y happens, rather than asking what happens and noticing Y is the natural result. Compare to her giving a very good explanation of an X here, or her very good thinking on the 80k hours podcast recently. This pulled my hair out enough I put it in my OpenPhil contest entry. Eliezer Yudkowsky attempts to respond at length, in a way I want to preserve fully here.
Failing to think about the consequences of AGI, classic edition.
What this answer reveals is that the famous scientist has very much not thought this question through. It seems like an important question. What happens next? Not ‘are we going to die?’ simply ‘What happens next?’ It turns out that the first thing you notice is that you’re likely to end up dead or at least not in control of things, but that’s not the question you asked, it’s simply really hard to figure out why that won’t happen.
Oh well. Back to building MidJourney. It’ll work out.
Other People Are Not Worried About AI Killing Everyone
Juergen Schmidhuber is not worried, instead actively trying to build AGI without much concern, he is interviewed here by Hessie Jones in Forbes. He thinks AIs will soon be smarter than us, that AIs will set their own goals, that AIs will capture the cosmic endowment, but those who warn about this being bad or dangerous are mostly in it for the publicity, what’s the problem? Besides, what’s the point of calling for action if there’s no ‘we’ to collectively take action? The whole interview is kind of a Lovecraftian horror if you actually pay attention. Very much a case of “My ‘My Life’s Work Won’t Lead to Dystopia’ T-Shirt is raising questions I’d hoped to answer with my shirt.”
Robin Hanson sums up his argument for why we should create AGI and allow it to wipe out humanity. I would summarize: Future humans will change over time, you can’t stop change that wipes out anything you care about without universal mind control and giving up space colonization, because natural selection, so how dare you be partial about whether ‘humans’ exist when you can instead be loyal to the idea of natural selection? Why do people keep insisting the difference between AI and humans is important, or something to value, when humans too will inevitably change over time?
I believe this is a remarkably anti-persuasive argument. The more people read and understand Robin Hanson’s core arguments, and accept the likelihood of Robin’s factual claims about possible futures, the more I expect them to strongly favor exactly the policies he most objects to.
For those curious, Pesach Morikawa responds in a thread here.
Guess who else still isn’t worried? For one, Andrew Ng.
I am a big fan of people bringing up asteroids in this spot. It proves that the person is not doing math. If someone names pandemics or climate change, one must consider the possibility that they are merely doing the math wrong and getting the wrong answer (or even that, if you were doing the math wrong, perhaps they could have a point, although this seems super unlikely). With asteroids, there’s nothing to worry about, they’re not even wrong.
Jeni Tennison is not only not worried, she is frustrated. All the things that this new paper describes an AI as potentially doing, are things of a type that people might do, so why worry about an AI doing them? A human or corporation or government could resist being shut down, or proliferate, or discover vulnerabilities, or influence people, so that’s that, nothing to worry about, you’re anthropomorphizing, we could never lose control over AI systems. I do not think the considerations run in the direction Jeni thinks they do, here.
What about the paper? It’s exactly about identifying whether the model could have the capabilities necessary to get out of control, how to identify that, and how to make that part of a robust evaluation process. Right now I’m behind so I didn’t have the chance to evaluate its details yet.
I’d also like to once again welcome everyone who opposes AI regulation to the group who appreciate that the primary job of regulations is regulatory capture to allow insiders to dominate at the expense of outsiders. Now that lots of people who aren’t GMU economists and don’t write for Reason are saying this, surely they will generalize this and speak out against a wide variety of much more obviously captured regulatory regimes to get our country moving again. Right? Padme meme.
Gary Marcus claims his track record is quite good, and that AGI is not coming soon.
Roon is not worried enough to not want to move forward, also thinks the right amount of worry is clearly not zero, for various different worries. And JJ points out that we have strong evidence that OpenAI employees are often not worried about AI killing everyone.
The argument here is that people keep quitting OpenAI because they are worried that OpenAI is going to get everyone killed, which is strong evidence that many other OpenAI employees must think we’re fear-mongering when we say that everyone might get killed? Actually pretty solid logic.
Eliezer responded to JJ differently:
Not in their current state, no, although I don’t think this applies fully to ‘future models of the size of GPT-4, after algorithmic implements, scaffolding and innovations over several years.’ Not that I’ve seen cases that this is probable, but I don’t think there are zero people who wouldn’t rule it out.
Bepis is instead worried about us torturing the AIs.
It would make me feel better to less often see people toying with the pro-extinction position. I browsed the reddit to get a second opinion, I mostly didn’t see torture, and also Reddit typically features the people who are spending time testing and screwing with the system and doing the sickest thing possible, way out of proportion to the rest of humanity. Also, as skeptical as I am of CEV, that’s not how CEV works.
Brian Chau remains worried about what will happen if and when we do get an AGI, but is not so worried that we will get one soon, arguing that AI progress will soon slow and exhibit s-curve behavior because many drivers of AI progress have reached their limits. This first article focuses on hardware.
The problem is that we are looking at an orders-of-magnitude growth in the amount of investment of time and money and attention into AI. Even if we have picked many forms of low-hanging fruit now, and some methods we previously exploited are approaching hard limits, the algorithmic opportunities, the places where people can innovate and iterate, are everywhere. I keep tripping over ideas I would be trying out if I had the time or resources, many super obvious things are as yet untried. I don’t buy that it’s all scaling laws, nor do individual model results suggest this either, as I see them. Also hardware historically keeps looking like it has no more ways to improve and then improving anyway. Still, the perspective on hardware is appreciated.
Two CEOs make a valid point, one turns it into an invalid argument.
Declining aging populations are indeed a good argument for developing, deploying and utilizing incremental AI systems, as we have more need to gain productivity and save labor, and less concern about potential unemployment. Krishna takes it too far to presume that not having AI automate more work is ‘not an option’ but certainly this makes that option worse. It could even be a good argument for developing dangerous AI if you think that the alternative is catastrophic.
What this isn’t is protection against potential existential threats. If AGI is going to kill us, our refusal to have babies is not going to ensure the continuation of the human race. At best, one could (in theory) presume some sort of ‘let the humans go extinct on their own’ approach, perhaps, under some conditions, although for various reasons I don’t consider this plausible. What I don’t see is how us not having enough children causes AGI to act such that there are humans instead of there not being humans.
(I could, of course, imagine this with regard to environmental collapse or climate change scenarios, but that’s not relevant to the AGI discussion, and an existentially risky AGI would almost certainly render those other problems moot one way or another, it’s in the non-AGI scenarios that they should worry us.)
What Do We Call Those Unworried People?
Eliezer Yudkowsky sent out a call asking for a similar term to doomer, ideally something similarly short and catchy with exactly the same level of implied respect.
Best I could come up with was Faithers, which Dustin Moskovitz suggested as well.
Rob Bensinger notes an issue and offers a taxonomy.
That seems broadly correct to me.
Eliezer notes a variation.
I strongly disagree with the idea that you can’t ever know anything without direct observation, also known as (for practical purposes) never knowing anything, but I can also confirm that it almost never takes the form of playing fair with the principle, because that would round off to not knowing anything.
The Week in Podcasts, Video and Audio
TED talks, get your TED talks about AI. I mean, presumably don’t, but if you want to.
Rhetorical Innovation
Or perhaps it is not those who take these risks seriously who need rhetorical innovation, and we can keep doing what we’re doing, laying out our case in detail while we get called various names and dunks and subjected to amateur psychoanalysis, especially when we point out that this is happening?
I can verify the second statement is quite accurate about replies to thee first one.
The strategy ‘let them double down’ has increasing appeal.
GFodor suggests that to derisk idea we need new ideas, suggests this requires increasing AI capabilities. Eliezer responds, they go back and forth, GFodor falls back on ‘you can’t possibly actually stop capabilities.’
They also had this interaction:
I still don’t see a better pick. I mean, what is someone else going to do, that’s going to go better according to Eliezer’s model than asking Eliezer?
Rob Miles tries new metaphors and the new one transformer we understand.
I didn’t check the math, is that right?
Simeon suggests perhaps saying the obvious, if we agree there is existential risk?
I would also settle for the other option, which is where our society acknowledges that things that have big upsides also often have big costs and people might get hurt. And that we should respond by reporting and estimating the costs versus the benefits and the risks versus the rewards, and then if the numbers look good, then boldly go, rather than having infinite veto points to build a transmission line or a wind farm or a house. That would be great.
You’d still then need to do that calculation here, with proper consideration of the cost of an existential threat, and to actually compare to the existential risks you might prevent – but first we’d need to stablish the principle that this is how any of this works.
In that spirit, Eliezer describes where exactly he draws the line.
The Wit and Wisdom of Sam Altman
He’s always here to make you feel better.
The Lighter Side
Doesn’t always work. Still.
It’s everywhere.
Say it with me: The. Efficient. Market. Hypothesis. Is. False.
I didn’t buy enough Nvidia. I definitely didn’t buy enough $AI.
Eliezer’s not single, ladies, but he’s polyamorous, so it’s still a go.
Timelines!
Speak for yourself, sir.
Yeah, not so much?