LESSWRONG
LW

All of Adam Scholl's Comments + Replies

A case for courage, when speaking of AI danger

For two, a person who has done evil, versus a person who is evil, are quite different things. I think that it's sadly not always the case that a person's character is aligned with a particular behavior of theirs.

I do think many of the historical people most widely considered to be evil now were similarly not awful in full generality, or even across most contexts. For example, Eichmann, the ops lead for the Holocaust, was apparently a good husband and father, and generally took care not to violate local norms in his life or work. Yet personally I feel quite... (read more)

Ben Pace5d*141

I'm not quite sure what I make of this, I'll take this opportunity to think aloud about it.

I often take a perspective where most people are born a kludgey mess, and then if they work hard they can become something principled and consistent and well-defined. But without that, they don't have much in the way of persistent beliefs or morals such that they can be called 'good' or 'evil'.

I think of an evil person as someone more like Voldemort in HPMOR, who has reflected on his principles and will be persistently a murdering sociopath, than someone ... (read more)

“Flaky breakthroughs” pervade coaching — but no one tracks them

Adam Scholl1mo*133

I share the sense that "flaky breakthroughs" are common, but also... I mean, it clearly is possible for people to learn and improve, right? Including by learning things about themselves which lastingly affect their behavior.

Personally, I've had many such updates which have had lasting effects—e.g., noticing when reading the Sequences that I'd been accidentally conflating "trying as hard as I can" with "appearing to others to be trying as hard as one might reasonably be expected to" in some cases, and trying thereafter to correct for that.

I do think it's wo... (read more)

AI Doomerism in 1879

Adam Scholl2mo2120

I think the word "technical" is a red herring here. If someone tells me a flood is coming, I don't much care how much they know about hydrodynamics, even if in principle this knowledge might allow me to model the threat with more confidence. Rather, I care about things like e.g. how sure they are about the direction from which the flood is coming, about the topography of our surroundings, etc. Personally, I expect I'd be much more inclined to make large/confident updates on the basis of information at levels of abstraction like these, than at levels about ... (read more)

AI Doomerism in 1879

Adam Scholl2mo2719

I think the simple argument "building minds vastly smarter than our own seems dangerous" is in fact pretty compelling, and seems relatively easy to realize beforehand, as e.g. Turing and many others did. Personally, there are not any technical facts about current ML systems which update me more overall either way about our likelihood of survival than this simple argument does.

And I see little reason why they should—technical details of current AI systems strike me as around as relevant to predicting whether future, vastly more intelligent systems will care... (read more)

7Matthew Barnett2mo

There appears to be a motte-and-bailey worth unpacking. The weaker, easily defensible claim is that advanced AI could be risky or dangerous. This modest assertion requires little evidence, similar to claims that extraterrestrial aliens, advanced genetic engineering of humans, or large-scale human cloning might be dangerous. I do not dispute this modest claim. The stronger claim about AI doom is that doom is likely rather than merely possible. This substantial claim demands much stronger evidence than the weaker claim. The tension I previously raised addresses this stronger claim of probable AI doom ("AI doomerism"), not the weaker claim that advanced AI might be risky. Many advocates of the strong claim of AI doom explicitly assert that their belief is backed by technical arguments, such as the counting argument for scheming behavior in SGD, among other arguments. However, if the premise of AI doom does not, in fact, rely on such technical arguments, then it is a mistake to argue about these ideas as if they are the key cruxes generating disagreement about AI doom.

Putting up Bumpers

Adam Scholl2mo75

I interpreted Habryka's comment as making two points, one of which strikes me as true and important (that it seems hard/unlikely for this approach to allow for pivoting adequately, should that be needed), and the other of which was a misunderstanding (that they don't literally say they hope to pivot if needed).

4habryka2mo

(This aligns with what I intended. I feel like my comment is making a fine point, even despite having missed the specific section.)

Putting up Bumpers

Adam Scholl2mo*95

We believe that, even without further breakthroughs, this work can almost entirely mitigate the risk that we unwittingly put misaligned circa-human-expert-level agents in a position where they can cause severe harm.

Sam, I'm confused where this degree of confidence is coming from? I found this post helpful for understanding Anthropic's strategy, but there wasn't much argument given about why one should expect the strategy to work, much less to "almost entirely" mitigate the risk!

To me, this seems wildly overconfident given the quality of the available evide... (read more)

Anthropic, and taking "technical philosophy" more seriously

Adam Scholl4mo*2-2

Yeah, I buy that he cares about misuse. But I wouldn't quite use the word "believe," personally, about his acting as though alignment is easy—I think if he had actual models or arguments suggesting that, he probably would have mentioned them by now.

2Raemon4mo

I don't particularly disagree with the first half, but your second sentence isn't really a crux for me for the first part.

Anthropic, and taking "technical philosophy" more seriously

Adam Scholl4mo20

No, I agree it's worth arguing the object level. I just disagree that Dario seems to be "reasonably earnestly trying to do good things," and I think this object-level consideration seems relevant (e.g., insofar as you take Anthropic's safety strategy to rely on the good judgement of their staff).

2Raemon4mo

I think (moderately likely, though not super confident) it makes more sense to model Dario as: "a person who actually is quite worried about misuse, and is making significant strategic decisions around that (and doesn't believe alignment is that hard)" than as "a generic CEO who's just generally following incentives and spinning narrative post-hoc rationalizations."

Anthropic, and taking "technical philosophy" more seriously

Adam Scholl4mo*160

Dario/Anthropic-leadership are at least reasonably earnestly trying to do good things within their worldview

I think as stated this is probably true of the large majority of people, including e.g. the large majority of the most historically harmful people. "Worldviews" sometimes reflect underlying beliefs that lead people to choose actions, but they can of course also be formed post-hoc, to justify whatever choices they wished to make.

In some cases, one can gain evidence about which sort of "worldview" a person has, e.g. by checking it for coherency. But th... (read more)

8Raemon4mo

I think... agree denotationally and (lean towards) disagreeing connotationally? (like, seems like this is implying "and because he doesn't seem like he obviously has coherent views on alignment-in-particular, it's not worth arguing the object level?") (to be clear, I don't super expect this post to affect Dario's decisionmaking models, esp. directly. I do have at least some hope for Anthropic employees to engage with these sorts of models/arguments, and my sense from talking them is that a lot of the LW-flavored arguments have often missed their cruxes)

Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

Adam Scholl4mo*2211

I spent some time learning about neural coding once, and while interesting it sure didn't help me e.g. better predict my girlfriend; I think in general neuroscience is fairly unhelpful for understanding psychology. For similar reasons, I'm default-skeptical of claims that work on the level of abstraction of ML is likely to help with figuring out whether powerful systems trained via ML are trying to screw us, or with preventing that.

Eli's shortform feed

Adam Scholl5mo*810

I haven't perceived the degree of focus as intense, and if I had I might be tempted to level similar criticism. But I think current people/companies do clearly matter some, so warrant some focus. For example:

I think it's plausible that governments will be inclined to regulate AI companies more like "tech startups" than "private citizens building WMDs," the more those companies strike them as "responsible," earnestly trying their best, etc. In which case, it seems plausibly helpful to propagate information about how hard they are in fact trying, and how goo

... (read more)

Mikhail Samin's Shortform

Adam Scholl5mo2219

When do you think would be a good time to lock in regulation? I personally doubt RSP-style regulation would even help, but the notion that now is too soon/risks locking in early sketches, strikes me as in some tension with e.g. Anthropic trying to automate AI research ASAP, Dario expecting ASL-4 systems between 2025—the current year!—and 2028, etc.

4Zac Hatfield-Dodds5mo

Here I am on record supporting SB-1047, along with many of my colleagues. I will continue to support specific proposed regulations if I think they would help, and oppose them if I think they would be harmful; asking "when" independent of "what" doesn't make much sense to me and doesn't seem to follow from anything I've said. My claim is not "this is a bad time", but rather "given the current state of the art, I tend to support framework/liability/etc regulations, and tend to oppose more-specific/exact-evals/etc regulations". Obviously if the state of the art advanced enough that I thought the latter would be better for overall safety, I'd support them, and I'm glad that people are working on that.

Mikhail Samin's Shortform

Adam Scholl5mo*6443

Give me your model, with numbers, that shows supporting Anthropic to be a bad bet, or admit you are confused and that you don't actually have good advice to give anyone.

It seems to me that other possibilities exist, besides "has model with numbers" or "confused." For example, that there are relevant ethical considerations here which are hard to crisply, quantitatively operationalize!

One such consideration which feels especially salient to me is the heuristic that before doing things, one should ideally try to imagine how people would react, upon learning w... (read more)

9Nathan Helm-Burger5mo

It has been pretty clearly announced to the world by various tech leaders that they are explicitly spending billions of dollars to produce "new minds vastly smarter than any person, which pose double-digit risk of killing everyone on Earth". This pronouncement has not yet incited riots. I feel like discussing whether Anthropic should be on the riot-target-list is a conversation that should happen after the OpenAI/Microsoft, DeepMind/Google, and Chinese datacenters have been burnt to the ground. Once those datacenters have been reduced to rubble, and the chip fabs also, then you can ask things like, "Now, with the pressure to race gone, will Anthropic proceed in a sufficiently safe way? Should we allow them to continue to exist?" I think that, at this point, one might very well decide that the company should continue to exist with some minimal amount of compute, while the majority of the compute is destroyed. I'm not sure it makes sense to have this conversation while OpenAI and DeepMind remain operational.

Martin Randall5mo149

Does your model predict literal worldwide riots against the creators of nuclear weapons? They posed a single-digit risk of killing everyone on Earth (total, not yearly).

It would be interesting to live in a world where people reacted with scale sensitivity to extinction risks, but that's not this world.

4Knight Lee5mo

That's a very good heuristic. I bet even Anthropic agrees with it. Anthropic did not release their newer models until OpenAI released ChatGPT and the race had already started. That's not a small sacrifice. Maybe if they released it sooner, they would be bigger than OpenAI right now due to the first mover advantage. I believe they want the best for humanity, but they are in a no-win situation, and it's a very tough choice what they should do. If they stop trying to compete, the other AI labs will build AGI just as fast, and they will lose all their funds. If they compete, they can make things better. AI safety spending is only $0.1 billion while AI capabilities spending is $200 billion. A company which adds a comparable amount of effort on both AI alignment and AI capabilities should speed up the former more than the latter. Even if they don't support all the regulations you believe in, they're the big AI company supporting relatively much more regulation than all the others. I don't know, I may be wrong. Sadly it is so very hard to figure out what's good or bad for humanity in this uncertain time.

Ten people on the inside

Adam Scholl5mo2617

The only safety techniques that count are the ones that actually get deployed in time.

True, but note this doesn't necessarily imply trying to maximize your impact in the mean timelines world! Alignment plans vary hugely in potential usefulness, so I think it can pretty easily be the case that your highest EV bet would only pay off in a minority of possible futures.

4Nathan Helm-Burger5mo

Be that as it may, I nevertheless feel discomfited by the fact that I have been arguing for 2026-2028 arrival of AGI for several years now, and people have been dismissing my concerns and focusing on plans for dealing with AGI in the 2030s or later. The near-term-AGI space getting systematically neglected because it feels hard to come up with plans for is a bad pattern. [Edit: I think that the relatively recent work done on pragmatic near-term control by Ryan and Buck at Redwood is a relieving departure from this pattern.]

What are the good rationality films?

Answer by Adam SchollNov 20, 2024242

Prelude to Power is my favorite depiction of scientific discovery. Unlike any other such film I've seen, it adequately demonstrates the inquiry from the perspective of the inquirer, rather than from conceptual or biographical retrospect.

Untrusted smart models and trusted dumb models

Adam Scholl8moΩ120

I'm curious if "trusted" in this sense basically just means "aligned"—or like, the superset of that which also includes "unaligned yet too dumb to cause harm" and "unaligned yet prevented from causing harm"—or whether you mean something more specific? E.g., are you imagining that some powerful unconstrained systems are trusted yet unaligned, or vice versa?

4Buck8mo

I mostly mean "we are sure that it isn't egregiously unaligned and thus treating us adversarially". So models can be aligned but untrusted (if they're capable enough that we believe they could be schemers, but they aren't actually schemers). There shouldn't be models that are trusted but unaligned. Everywhere I wrote "unaligned" here, I meant the fairly specific thing of "trying to defeat our safety measures so as to grab power", which is not the only way the word "aligned" is used.

2Olli Järviniemi8mo

You might be interested in this post of mine, which is more precise about what "trustworthy" means. In short, my definition is "the AI isn't adversarially trying to cause a bad outcome". This includes aligned models, and also unaligned models that are dumb enough to realize they should (try to) sabotage. This does not include models that are unaligned, trying to sabotage and which we are able to stop from causing bad outcomes (but we might still have use-cases for such models).

evhub's Shortform

Adam Scholl8mo*1911

I would guess it does somewhat exacerbate risk. I think it's unlikely (~15%) that alignment is easy enough that prosaic techniques even could suffice, but in those worlds I expect things go well mostly because the behavior of powerful models is non-trivially influenced/constrained by their training. In which case I do expect there's more room for things to go wrong, the more that training is for lethality/adversariality.

Given the state of atheoretical confusion about alignment, I feel wary of confidently dismissing these sorts of basic, obvious-at-first-gl... (read more)

adam_scholl's Shortform

Adam Scholl8mo946

It seems the pro-Trump Polymarket whale may have had a real edge after all. Wall Street Journal reports (paywalled link, screenshot) that he’s a former professional trader, who commissioned his own polls from a major polling firm using an alternate methodology—the neighbor method, i.e. asking respondents who they expect their neighbors will vote for—he thought would be less biased by preference falsification.

I didn't bet against him, though I strongly considered it; feeling glad this morning that I didn't.

RobertM8mo187

On one hand, I feel a bit skeptical that some dude outperformed approximately every other pollster and analyst by having a correct inside-view belief about how existing pollster were messing up, especially given that he won't share the surveys. On the other hand, this sort of result is straightforwardly predicted by Inadequate Equilibria, where an entire industry had the affordance to be arbitrarily deficient in what most people would think was their primary value-add, because they had no incentive to accuracy (skin in the game), and as soon as someo... (read more)

Zac Hatfield-Dodds8mo344

I don't remember anyone proposing "maybe this trader has an edge", even though incentivising such people to trade is the mechanism by which prediction markets work. Certainly I didn't, and in retrospect it feels like a failure not to have had 'the multi-million dollar trader might be smart money' as a hypothesis at all.

8Garrett Baker8mo

I can proudly say that though I disparaged the guy in private, I not once put my money where my mouth was, which means outside observers can infer that all along I secretly agreed with his analysis of the situation.

Thomas Kwa8mo*193

Knowing now that he had an edge, I feel like his execution strategy was suspect. The Polymarket prices went from 66c during the order back to 57c on the 5 days before the election. He could have extracted a bit more money from the market if he had forecasted the volume correctly and traded against it proportionally.

5Alexander Gietelink Oldenziel8mo

Yes. https://www.lesswrong.com/posts/tDkYdyJSqe3DddtK4/alexander-gietelink-oldenziel-s-shortform?commentId=JqDaYkRyw2WSAZLDg

JargonBot Beta Test

Adam Scholl8mo40

Thanks; it makes sense that use cases like these would benefit, I just rarely have similar ones when thinking or writing.

JargonBot Beta Test

Adam Scholl8mo20

I also use them rarely, fwiw. Maybe I'm missing some more productive use, but I've experimented a decent amount and have yet to find a way to make regular use even neutral (much less helpful) for my thinking or writing.

8DanielFilan8mo

I enjoyed reading Nicholas Carlini and Jeff Kaufman write about how they use them, if you're looking for inspiration.

Extended Interview with Zhukeepa on Religion

Adam Scholl9mo*120

I don't know much about religion, but my impression is the Pope disagrees with your interpretation of Catholic doctrine, which seems like strong counterevidence. For example, see this quote:

“All religions are paths to God. I will use an analogy, they are like different languages that express the divine. But God is for everyone, and therefore, we are all God’s children.... There is only one God, and religions are like languages, paths to reach God. Some Sikh, some Muslim, some Hindu, some Christian."

And this one:

The pluralism and the diversity of religions,

Adam Scholl9mo64

I claim the phrasing in your first comment ("significant AI presence") and your second ("AI driven R&D") are pretty different—from my perspective, the former doesn't bear much on this argument, while the latter does. But I think little of the progress so far has resulted from AI-driven R&D?

6Logan Zoellner9mo

There is a ton of current AI research that would be impossible without existing AI (mostly generating synthetic data to train models). It seems likely that almost all aspects of AI research (chip design, model design, data curation) will follow this trend. Are there any specific areas in which you would predict "when AGI is achieved, the best results on topic X will have little-to-no influence from AI"?

2Raemon9mo

Well the point of saying "significant AI presence" was "it will have mattered". I think that includes AI driven R&D. (It also includes things like "are the first AIs plugged into systems they get a lot of opportunity to manipulate from an early stage" and "the first AI is in a more multipolar-ish scenario and doesn't get decisive strategic advantage.") I agree we haven't seen much AI driven R&D yet (although I think there's been at least slight coding speedups from pre-o1 copilot, like 5% or 10%, and I think o1 is on track to be fairly significant, and I expect to start seeing more meaningful AI-driven R&D within a year or so). [edit: Logan's argument about synthetic data was compelling to me at least at first glance, although I don't know a ton about it and can imagine learning more and changing my mind again]

COT Scaling implies slower takeoff speeds

Adam Scholl9mo51

Huh, this doesn't seem clear to me. It's tricky to debate what people used to be imagining, especially on topics where those people were talking past each other this much, but my impression was that the fast/discontinuous argument was that rapid, human-mostly-or-entirely-out-of-the-loop recursive self-improvement seemed plausible—not that earlier, non-self-improving systems wouldn't be useful.

4Raemon9mo

I agree that nobody was making a specific claim that there wouldn't be any kind of AI driven R&D pre-fast-takeoff. But, I think if Eliezer et al hadn't been at least implicitly imagining less of this, there would have been at least a bit less talking-past-each-other in the debates with Paul.

COT Scaling implies slower takeoff speeds

Adam Scholl9mo75

Why do you think this? Recursive self-improvement isn't possible yet, so from my perspective it doesn't seem like we've encountered much evidence either way about how fast it might scale.

6Raemon9mo

FWIW I do think we are clearly in a different strategic world than the one I think most people were imagining in 2010. I agree we still have not hit the point where we're seeing how sharp the RSI curve will be, but, we are clearly seeing that there will be some kind of significant AI presence in the world by the time RSI hits, and it'd be surprising if that didn't have some kind of strategic implication.

Why I funded PIBBSS

Adam Scholl10mo1613

Given both my personal experience with LLMs and my reading of the role that empirical engagement has historically played in non-paradigmatic research, I tend to advocate for a methodology which incorporates immediate feedback loops with present day deep learning systems over the classical "philosophy -> math -> engineering" deconfusion/agent foundations paradigm.

I'm curious what your read of the history is, here? My impression is that most important paradigm-forming work so far has involved empirical feedback somehow, but often in ways exceedingly di... (read more)

8Lucas Teixeira8mo

Thanks for the comment @Adam Scholl and apologies for not addressing it sooner, it was on my list but then time flew. I think we're in qualitative agreement that non-paradigmatic research tends to have empirical feedback loops, and that the forms and methods of empirical engagement undergo qualitative changes in the formation of paradigms. I suspect we may have quantitative disagreements with how illegible these methods were to previous practitioners, but I don't expect that to be super cruxy. The position which I would argue against is that the issue of empirical access to ASI necessitates long bouts of philosophical thinking prior to empirical engagement and theorization. The position which I would argue for is that there is significant (and depending on the crowd undervalued) benefit to be gained for conceptual innovation by having research communities which value quick and empirical feedback loops. I'm not an expert on either of these historical periods, but I would be surprised to hear that Carnot or Shannon did not meaningfully benefit from engaging with the practical industrial advancements of their day. Giving my full models is out of scope for a comment and would take a sequence which I'll probably never write, but the 3 history and philosophy of science references which have had the greatest impact on my thinking around empiricism which I tend to point people towards would probably be Inventing Temperature, Exploratory Experiments, and Representing and Intervening. In short I would say yes, because I don't believe the criteria listed above excludes the researchers which you called attention to. But independently of whether you buy into that claim, I would stress that different programs have different mechanisms of admission. The affiliateship as it's currently being run is designed for lower variance and is incidentally more tightly correlated with the research tastes of myself and the horizon scanning team given that these are the folks providing th

The Checklist: What Succeeding at AI Safety Will Involve

Adam Scholl10mo60

For what it's worth, as someone in basically the position you describe—I struggle to imagine automated alignment working, mostly because of Godzilla-ish concerns—demos like these do not strike me as cruxy. I'm not sure what the cruxes are, exactly, but I'm guessing they're more about things like e.g. relative enthusiasm about prosaic alignment, relative likelihood of sharp left turn-type problems, etc., than about whether early automated demos are likely to work on early systems.

Maybe you want to call these concerns unserious too, but regardless I do think... (read more)

AI forecasting bots incoming

Adam Scholl10mo1712

I sympathize with the annoyance, but I think the response from the broader safety crowd (e.g., your Manifold market, substantive critiques and general ill-reception on LessWrong) has actually been pretty healthy overall; I think it's rare that peer review or other forms of community assessment work as well or quickly.

3titotal10mo

Under peer review, this never would have been seen by the public. It would have incentivized CAIS to actually think about the potential flaws in their work before blasting it to the public.

Orpheus1610mo2724

Hendryks had ample opportunity after initial skepticism to remove it, but chose not to.

IMO, this seems to demand a very immediate/sudden/urgent reaction. If Hendrycks ends up being wrong, I think he should issue some sort of retraction (and I think it would be reasonable to be annoyed if he doesn't.)

But I don't think the standard should be "you need to react to criticism within ~24 hours" for this kind of thing. If you write a research paper and people raise important concerns about it, I think you have a duty to investigate them and respond to them, but I... (read more)

My Number 1 Epistemology Book Recommendation: Inventing Temperature

Adam Scholl10mo30

It's not a full conceptual history, but fwiw Boole does give a decent account of his own process and frustrations in the preface and first chapter of his book.

Pay Risk Evaluators in Cash, Not Equity

Adam Scholl10mo*50

I just meant there are many teams racing to build more agentic models. I agree current ones aren't very agentic, though whether that's because they're meaningfully more like "tools" or just still too stupid to do agency well or something else entirely, feels like an open question to me; I think our language here (like our understanding) remains confused and ill-defined.

I do think current systems are very unlike oracles though, in that they have far more opportunity to exert influence than the prototypical imagined oracle design—e.g., most have I/O with ~any browser (or human) anywhere, people are actively experimenting with hooking them up to robotic effectors, etc.

My Number 1 Epistemology Book Recommendation: Inventing Temperature

Adam Scholl10mo270

I liked Thermodynamic Weirdness for similar reasons. It does the best job of books I've found at describing case studies of conceptual progress—i.e., what the initial prevailing conceptualizations were, and how/why scientists realized they could be improved.

It's rare that books describe such processes well, I suspect partly because it's so wildly harder to generate scientific ideas than to understand them, that they tend to strike people as almost blindingly obvious in retrospect. For example, I think it's often pretty difficult for people familiar with ev... (read more)

adamShimi10mo104

It's rare that books describe such processes well, I suspect partly because it's so wildly harder to generate scientific ideas than to understand them, that they tend to strike people as almost blindingly obvious in retrospect.

Completely agreed!

I think this is also what makes great history of science so hard: you need to unlearn most of the modern insights and intuitions that didn't exist at the time, and see as close as possible to what the historical actors saw.

This makes me think of a great quote from World of Flows, a history of hydrodynamics:

There is,

... (read more)

ryan_greenblatt's Shortform

Adam Scholl10mo*Ω5131

This seems like a great activity, thank you for doing/sharing it. I disagree with the claim near the end that this seems better than Stop, and in general felt somewhat alarmed throughout at (what seemed to me like) some conflation/conceptual slippage between arguments that various strategies were tractable, and that they were meaningfully helpful. Even so, I feel happy that the world contains people sharing things like this; props.

ryan_greenblatt10mo*Ω12174

I disagree with the claim near the end that this seems better than Stop

At the start of the doc, I say:

It’s plausible that the optimal approach for the AI lab is to delay training the model and wait for additional safety progress. However, we’ll assume the situation is roughly: there is a large amount of institutional will to implement this plan, but we can only tolerate so much delay. In practice, it’s unclear if there will be sufficient institutional will to faithfully implement this proposal.

Towards the end of the doc I say:

This plan requires qu

... (read more)

Pay Risk Evaluators in Cash, Not Equity

Adam Scholl10mo40

I think the latter group is is much smaller. I'm not sure who exactly has most influence over risk evaluation, but the most obvious examples are company leadership and safety staff/red-teamers. From what I hear, even those currently receive equity (which seems corroborated by job listings, e.g. Anthropic, DeepMind, OpenAI).

Zach Stein-Perlman's Shortform

Adam Scholl10mo120

What seemed psychologizing/unfair to you, Raemon? I think it was probably unnecessarily rude/a mistake to try to summarize Anthropic’s whole RSP in a sentence, given that the inferential distance here is obviously large. But I do think the sentence was fair.

As I understand it, Anthropic’s plan for detecting threats is mostly based on red-teaming (i.e., asking the models to do things to gain evidence about whether they can). But nobody understands the models well enough to check for the actual concerning properties themselves, so red teamers instead check f... (read more)

3dirk10mo

I don't really think any of that affects the difficulty of public communication; your implication that it must be the cause reads to me more like an insult than a well-considered psychological model

2Raemon10mo

(not going to respond in this context out of respect for Zach's wishes. May chat later, and am mulling over my own top-level post on the subject)

Zach Stein-Perlman's Shortform

Adam Scholl10mo*17-1

My guess is that most don’t do this much in public or on the internet, because it’s absolutely exhausting, and if you say something misremembered or misinterpreted you’re treated as a liar, it’ll be taken out of context either way, and you probably can’t make corrections. I keep doing it anyway because I occasionally find useful perspectives or insights this way, and think it’s important to share mine. That said, there’s a loud minority which makes the AI-safety-adjacent community by far the most hostile and least charitable environment I spend

... (read more)

4Raemon10mo

Meta aside: normally this wouldn't seem worth digging into but as a moderator/site-culture-guardian, I feel compelled to justify my negative react on the disagree votes. I'm actually not entirely sure what downvote-reacting is for. Habryka has said the intent is to override inappropriate uses of reacts. We haven't actually really had a sit-down-and-argue-this-out on the moderator team. I'm pretty sure we haven't told or tried to enforce that "override inappropriate use of reacts" as the intended use I think Adam's line: Is psychologizing and summarizing Anthropic unfairly. So I wouldn't agree vote with it. I do think it has some kind of grain of truth to it (me believing this is also kind of "doubting the experience of Anthropic employees" which is also group-epistemologically dicey IMO, but, feels kinda important enough to do in this case). The claim isn't true... but I also don't belief report that it's not true. I initially downvoted the Disagree when it was just Noosphere, since I didn't think Noosphere was really in a position to have an opinion and if he was the only reactor it felt more like noise. A few others who are more positioned to know relevant stuff have since added their own disagree reacts. I... feel sort of justified leaving the anti-react up, with an overall indicator of "a bunch of people disagree with this, but the weight of that disagreement is slightly reduced." (I think I'd remove the anti-react if the the disagree count went much lower than it is now). I don't know whether I particularly endorse any of this, but wanted people to have a bit more model of what one site-admin was thinking here. [/end of rambly meta commentary]

2Joseph Miller10mo

This obvious straw-man makes your argument easy to dismiss. However I think the point is basically correct. Anthropic's strategy to reduce x-risk also includes lobbying against pre-harm enforcement of liability for AI companies in SB 1047.

Fields that I reference when thinking about AI takeover prevention

Adam Scholl11mo91

Open Philanthropy commissioned five case studies of this sort, which ended up being written by Moritz von Knebel; as far as I know they haven't been published, but plausibly someone could convince him to.

Mo Putera11mo152

They have in fact been published (it's in your link), at least the ones authors agreed to make publicly available: these are all the case studies, and Moritz von Knebel's write-ups are

adam_scholl's Shortform

Adam Scholl11mo60

Those are great examples, thanks; I can totally believe there exist many such problems.

Still, I do really appreciate ~never having to worry that food from grocery stores or restaurants will acutely poison me; and similarly, not having to worry that much that pharmaceuticals are adulterated/contaminated. So overall I think I currently feel net grateful about the FDA’s purity standards, and net hateful just about their efficacy standards?

adam_scholl's Shortform

Adam Scholl11mo411

What countries are you imagining? I know some countries have more street food, but from what I anecdotally hear most also have far more food poisoning/contamination issues. I'm not sure what the optimal tradeoff here looks like, and I could easily believe it's closer to the norms in e.g. Southeast Asia than the U.S. But it at least feels much less obvious to me than that drug regulations are overzealous.

(Also note that much regulation of things like food trucks is done by cities/states, not the FDA).

2faul_sname11mo

Mexico and Chile are the most salient examples to me. But also I've only ever gotten food poisoning once in my life despite frequent risky food behavior. Strong agree that the magnitude of the overzealousness is much higher for drugs than for food.

adam_scholl's Shortform

Adam Scholl11mo*267

Arguments criticizing the FDA often seem to weirdly ignore the "F." For all I know food safety regulations are radically overzealous too, but if so I've never noticed (or heard a case for) this causing notable harm.

Overall, my experience as a food consumer seems decent—food is cheap, and essentially never harms me in ways I expect regulators could feasibly prevent (e.g., by giving me food poisoning, heavy metal poisoning, etc). I think there may be harmful contaminants in food we haven't discovered yet, but if so I mostly don't blame the FDA for that lack of knowledge, and insofar as I do it seems an argument they're being under-zealous.

4Noosphere8911mo

Basically, because food is a domain where there are highly negative tail effects, but not highly positive tail effects, conditioning on eating food at all, and thus it's an area where you can afford to be restrictive, and this is notably not the case for medicine, where high negative and positive tail effect effects exist, so you need to be more lenient on your standards.

2ChristianKl11mo

You have to refrigerate eggs in the us while you don't have to in the EU because of FDA regulations about washing the eggs. There's a strain of pro-sugar and anti-fat policies in regards to food for which the FDA shares some of the blame. Through that they might have contributed to the obesity crisis. It's possible that the regulations on farmers markets reduce the amount of farmers markets (which might be central to lower obesity levels in France and better health). When asked to compare French regulations for farmers markets with the US regulations, Claude told me: Overall, while both countries prioritize food safety, the French system tends to be more flexible for small producers and traditional methods, whereas the US system is more uniformly applied regardless of scale. The French approach often allows for more diverse and traditional products at markets, while the US system provides more consistent safety standards across diverse regions. You likely could argue that the FDA shares part the blame for the obesity epidemic by setting bad incentives for low-fat and high fructose corn syrup foods while at the same time making it harder for farmers markets to actually sell healthy food. Apart from that you do have people who oppose forced labeling of GMO products which is also an FDA rule. Do you see microplastics as "harmful contaminants that haven't been discovered yet"? It would be possible to have regulations that limit the amount of microplastic in plastic drinking bottles but those currently don't exist.

4Viliam11mo

I think one exception may be salmonellosis. In USA, you get 1.2 million illnesses, 23000 hospitalizations, and 450 deaths every year. To compare, in EU, selling contaminated chicken products is illegal, and when hundreds of people get sick, it becomes a scandal.

9sapphire11mo

The 'Food' and the 'Drug' parts behave very differently. By default food products are allowed. There may be purity requirements or restaurant regulations but you don't need to run studies or get approvals to serve an edible product or a new combination. By default drugs are banned. I think the FDA is under zealous about heavy metals and other contaminants. But the FDA does a decent job of regulating food. However the 'drug' side is a nightmare. But the two situations are de facto handled in very, very different ways. So its not obvious why an argument would cover both of them.

gwern11mo5914

Criticizing FDA food regulations is a niche; it is hard to criticize 'the unseen', especially when it's mostly about pleasure and the FDA is crying: 'we're saying lives! Won't someone thinking of the children? How can you disagree, just to stuff your face? Shouldn't you be on a diet anyway?'

But if you go looking, you'll find tons of it: pasteurized cheese and milk being a major flashpoint, as apparently the original unpasteurized versions are a lot tastier. (I'm reminded of things like beef tallow for fries or Chipotle - how do you know how good McDonald's... (read more)

1MinusGix11mo

I'd be interested in an article looking at whether the FDA is better at regulating food safety. I do expect food is an easier area, because erring on the side of caution doesn't really lose you much — most food products have close substitutes. If there's some low but not extremely low risk of a chemical in a food being bad for you, then the FDA can more easily deny approval without significant consequences: Medicine has more outsized effects if you are slow to approve usage. Yet, perhaps this has led to reduced variety in food choices? I notice less generic or lesser-known food and beverage brands relative to a decade ago, though I haven't verified whether my that background belief is accurate. I'd be curious also for an investigation in such an article about the extent of the barriers to designing a new food product; especially food products that aren't doing anything new, purely a mixture of ingredients already considered safe (or at least, considered allowed). Would there be more variety? Or notably cheaper food?

9faul_sname11mo

Have you ever visited a country without zealous food safety regulations? I think it's one of those things where it's hard to realize what the alternative looks like (plentiful, cheap, and delicious street food available wherever people gather, so that you no longer have to plan around making sure you either bring food or go somewhere with restaurants, and it is viable for individuals to exist without needing a kitchen of their own).

Buck's Shortform

Adam Scholl11mo*Ω4103

I agree it seems good to minimize total risk, even when the best available actions are awful; I think my reservation is mainly that in most such cases, it seems really important to say you're in that position, so others don't mistakenly conclude you have things handled. And I model AGI companies as being quite disincentivized from admitting this already—and humans generally as being unreasonably disinclined to update that weird things are happening—so I feel wary of frames/language that emphasize local relative tradeoffs, thereby making it even easier to conceal the absolute level of danger.

9Buck11mo

Yep that's very fair. I agree that it's very likely that AI companies will continue to be misleading about the absolute risk posed by their actions.

Buck's Shortform

Adam Scholl1y*Ω7158

*The rushed reasonable developer regime.* The much riskier regimes I expect, where even relatively reasonable AI developers are in a huge rush and so are much less able to implement interventions carefully or to err on the side of caution.

I object to the use of the word "reasonable" here, for similar reasons I object to Anthropic's use of the word "responsible." Like, obviously it could be the case that e.g. it's simply intractable to substantially reduce the risk of disaster, and so the best available move is marginal triage; this isn't my guess, but I do... (read more)

7Buck11mo

I think it makes sense to use the word "reasonable" to describe someone who is taking actions that minimize total risk, even if those actions aren't what they'd take in a different situation, and even if various actors had made mistakes to get them into this situation. (Also note that I'm not talking about making wildly superintelligent AI, I'm just talking about making AGI; my guess is that even when you're pretty rushed you should try to avoid making galaxy-brained superintelligence.)

Buck's Shortform

Adam Scholl1y*124

It sounds like you think it's reasonably likely we'll end up in a world with rogue AI close enough in power to humanity/states to be competitive in war, yet not powerful enough to quickly/decisively win? If so I'm curious why; this seems like a pretty unlikely/unstable equilibrium to me, given how much easier it is to improve AI systems than humans.

4ryan_greenblatt11mo

I think having this equilibrium for a while (e.g. a few years) is plausible because humans will also be able to use AI systems. (Humans might also not want to build much more powerful AIs due to safety concerns and simultaneously be able to substantially slow down self-improvement with compute limitations (and track self-improvement using other means).) Note that by "war" I don't neccessarily mean that battles are ongoing. It is possible this mostly manifests as racing on scaling and taking aggressive actions to hobble the AI's ability to use more compute (including via the use of the army and weapons etc.).

Buck's Shortform

Adam Scholl1y40

I do basically assume this, but it isn't cruxy so I'll edit.

Buck's Shortform

Adam Scholl1y*103

*The existential war regime*. You’re in an existential war with an enemy and you’re indifferent to AI takeover vs the enemy defeating you. This might happen if you’re in a war with a nation you don’t like much, or if you’re at war with AIs.

Does this seem likely to you, or just an interesting edge case or similar? It's hard for me to imagine realistic-seeming scenarios where e.g. the United States ends up in a war where losing would be comparably bad to AI takeover. This is mostly because ~no functional states (certainly no great powers) strike me as so evi... (read more)

7ryan_greenblatt1y

Your comment seems to assume that AI takeover will lead to extinction. I don't think this is a good thing to assume as it seems unlikely to me. (To be clear, I think AI takeover is very bad and might result in huge numbers of human deaths.)

ryan_greenblatt1y11-6

A war against rogue AIs feels like the central case of an existential war regime to me. I think a reasonable fraction of worlds where misalignment causes huge problems could have such a war.

Twitter thread on open-source AI

Adam Scholl1y1711

We should generally have a strong prior favoring technology in general

Should we? I think it's much more obvious that the increase in human welfare so far has mostly been caused by technology, than that most technologies have net helped humans (much less organisms generally).

I'm quite grateful for agriculture now, but unsure I would have been during the Bronze Age; grateful for nuclear weapons, but unsure how many nearby worlds I'd feel similarly; net bummed about machine guns, etc.

tlevin's Shortform

Adam Scholl1y80

I agree music has this effect, but I think the Fence is mostly because it also hugely influences the mood of the gathering, i.e. of the type and correlatedness of people's emotional states.

(Music also has some costs, although I think most of these aren't actually due to the music itself and can be avoided with proper acoustical treatment. E.g. people sometimes perceive music as too loud because the emitted volume is literally too high, but ime people often say this when the noise is actually overwhelming for other reasons, like echo (insofar as walls/floor... (read more)

Towards more cooperative AI safety strategies

Adam Scholl1y*4546

I appreciate you adding the note, though I do think the situation is far more unusual than described. I agree it's widely priced in that companies in general seek power, but I think probably less so that the author of this post personally works for a company which is attempting to acquire drastically more power than any other company ever, and that much of the behavior the post describes as power-seeking amounts to "people trying to stop the author and his colleagues from attempting that."

Towards more cooperative AI safety strategies

Adam Scholl1y*5966

Yeah, this omission felt pretty glaring to me. OpenAI is explicitly aiming to build "the most powerful technology humanity has yet invented." Obviously that doesn't mean Richard is wrong that the AI safety community is too power-seeking, but I would sure have appreciated him acknowledging/grappling with the fact that the company he works for is seeking to obtain more power than any group of people in history by a gigantic margin.

Orpheus161y3227

An elephant in the room (IMO) is that moving forward, OpenAI probably benefits from a world in which the AI safety community does not have much influence.

There's a fine line between "play nice with others and be more cooperative" and "don't actually advocate for policies that you think would help the world, and only do things that the Big Companies and Their Allies are comfortable with."

Again, I don't think Richard sat in his room and thought "how do I spread a meme that is good for my company." I think he's genuinely saying what he believes and givi... (read more)

Buck's Shortform

Adam Scholl1y20

I agree we might end up in a world like that, where it proves impossible to make a decent safety case. I just think of the ~whole goal of alignment research as figuring out how to avoid that world, i.e. of figuring out how to mitigate/estimate the risk as much/precisely as needed to make TAI worth building.

Currently, AI risk estimates are mostly just verbal statements like "I don't know man, probably some double digit chance of extinction." This is exceedingly unlike the sort of predictably tolerable risk humanity normally expects from its engineering proj... (read more)

5ryan_greenblatt1y

I don't think that thing I said is consistent with "impossible to make a safety case good enough to make TAI worth building"? I think you can probably make a safety case which gets to around 1-5% risk while having AIs that are basically black boxs and which are very powerful, but not arbitrarily powerful. (Such a safety case might require decently expensive measures.) (1-5% risk can be consistent with this being worth it - e.g. if there is an impending hot war. That said, it is still a horrifying level of risk that demands vastly more investment.) See e.g. control for what one part of this safety case could look like. I think that control can go quite far. Other parts could look like: * A huge array of model organisms experiments which convince us that P(serious misalignment) is low (or low given XYZ adjustments to training). * coup probes or other simple runtime detection/monitoring techiques. * ELK/honesty techniques which seem to work in a wide variety of cases. * Ensuring AIs closely imitate humans and generalize similarly to humans in the cases we checked. * Ensuring that AIs are very unlikely to be schemers via making sure most of their reasoning happens in CoT and their forward passes are quite weak. (I often think about this from the perspective of control, but it can also be considered separately.) It's possible you don't think any of this stuff gets off the ground. Fair enough if so.

Buck's Shortform

Adam Scholl1y*20

Maybe I'm just confused what you mean by those words, but where is the disanalogy with safety engineering coming from? That normally safety engineering focuses on mitigating risks with complex causes, whereas AI risk is caused by some sort of scaffolding/bureaucracy which is simpler?

3ryan_greenblatt1y

Buck's claim is that safety engineering is mostly focused on problems where there are a huge number of parts that we can understand and test (e.g. airplanes) and the main question is about ensuring failure rates are sufficiently low under realistic operating conditions. In the case of AIs, we might have systems that look more like a single (likely black-box) AI with access to various tools and the safety failures we're worried about are concentrated in the AI system itself. This seems much more analogous to insider threat reduction work than the places where safety engineering is typically applied.

Buck's Shortform

Adam Scholl1y20

I'm still confused what sort of simplicity you're imagining? From my perspective, the type of complexity which determines the size of the fail surface for alignment mostly stems from things like e.g. "degree of goal stability," "relative detectability of ill intent," and other such things that seem far more complicated than airplane parts.

5ryan_greenblatt1y

I think the system built out of AI components will likely be pretty simple - as in the scaffolding and bureaucracy surronding the AI will be simple. The AI components themselves will likely be black-box.