Adam Scholl — LessWrong

LESSWRONG
LW

my gut says pretty strongly that you and Adam are erring way too far in the not-publishing direction, and like, I would pay money for you to publish more.

I am interested in debating the principle here (e.g. whether it sometimes makes sense to write books, whether/why most scientific progress so far has involved writing books, etc), but I feel less interested in debating your gut take on the tradeoffs Aysja and I are making personally, since I expect you know nearly nothing about what those are? Most obviously, the dominant term has been illness rather than choices, but I expect you also have near-zero context on the choices, which we have spent really a lot of time and effort considering. I would... I guess be up for describing those in person, if you want.

Aim for single piece flow

Adam Scholl3mo80

Yep, definitely! The reason why these are big tomes is IMO largely downstream of the distribution methods at the time.

What distribution differences do you mean? Kepler and Bacon lived before academic journals, but I think all the others could easily have published papers; indeed Newton, Darwin and Maxwell published many, and while Carnot didn't many around him did, so he would have known it was an option.

It seems more likely to me that they chose to write up these ideas as books rather than papers simply because the ideas were more "book-sized" than "paper-sized," i.e. because they were trying to discover and describe a complicated cluster of related ideas that was inferentially far from existing understanding, and this tends to be hard to do briefly.

I think that is for most forms of intellectual progress, a better way of developing both ideas and pedagogical content knowledge

It sounds like you're imagining that the process of writing such books tends to involve a bunch of waterfall-style batching, analogous to e.g. finishing the framing in each room of a house before moving on to the flooring, or something like that? If so, I'm confused why; at least my own experience with large writing projects has involved little of this, I think, though I'm sure writing processes vary widely.

Aim for single piece flow

Adam Scholl3mo172

I was pretty with you until this paragraph:

In many ways Inkhaven is an application of single piece flow to the act of writing. I do not believe intellectual progress must consist of long tomes that take months or years to write. Intellectual labor should aggregate minute-by-minute with revolutionary insights aggregating from hundreds of small changes. Publishing daily moves intellectual progress much closer to single piece flow.

Of course intellectual progress doesn’t always require tomes, but I think in many fields of science, important conceptual progress has historically occurred so dominantly via tomes that they can almost be considered its unit. Take for example well-regarded tomes like Astronomia Nova, Instauratio Magna, Principia, Reflections on the Motive Power of Fire, On the Origin of Species, or A Treatise on Electricity and Magnetism—would you guess the discovery or propagation of these ideas would have been more efficient if undertaken somehow more in single piece flow-style? My sense is that tomes are just a pretty natural byproduct of ambitious, large inferential distance-crossing investigations like these.

AI safety undervalues founders

Adam Scholl3mo207

I do think I'd feel very alarmed by the 27% figure in your position—much more alarmed than e.g. I am about what happened with AIRCS, which seems to me to have failed more in the direction of low than actively bad impact—but to be clear I didn't really mean to express a claim here about the overall sign of MATS; I know little about the program.

Rather, my point is just that multiplier effects are scary for much the same reason they are exciting—they are in effect low-information, high-leverage bets. Sometimes single conversations can change the course of highly effective people's whole careers, which is wild; I think it's easy to underestimate how valuable this can be. But I think it's similarly easy to underestimate their risk, given that the source of this leverage—that you're investing relatively little time getting to know them, etc, relative to the time they'll spend doing... something as a result—also means you have unusually limited visibility into what the effects will be.

Given this, I think it's worth taking unusual care, when pursuing multiplier effect strategies, to model the overall relative symmetry of available risks/rewards in the domain. For example, whether A) there might be lemons market problems, such that those who are easiest to influence (especially quickly) might tend all else equal to be more strategically confused/confusable, or B) whether there might in fact currently be more easy ways to make AI risk worse than better, etc.

AI safety undervalues founders

Adam Scholl3mo132

That may be, but personally I am unpersuaded that the observed paradoxical impacts should update us that the world would have been better off if we hadn't made the problem known, since I roughly can't imagine worlds where we do survive where the problem wasn't made known, and I think it should be pretty expected with a problem this confusing that initially people will have little idea how to help, and so many initial attempts won't. In my imagination, at least, basically all surviving worlds look like that at first, but then eventually people who were persuaded to worry about the problem do figure out how to solve it.

(Maybe this isn't what you mean exactly, and there are ways we could have made the problem known that seemed less like "freaking out"? But to me this seems hard to achieve, when the problem in question is the plausibly relatively imminent death of everyone).

AI safety undervalues founders

Adam Scholl3mo3819

Great founders and field-builders have multiplier effects on recruiting, training, and deploying talent to work on AI safety [...] If we want to 10-100x the AI safety field in the next 8 years, we need multiplicative capacity, not just marginal hires

I spent much of 2018-2020 trying to help MIRI with recruiting at AIRCS workshops. At the time, I think AIRCS workshops and 80k were probably the most similar things the field had to MATS, and I decided to help with them largely because I was excited about the possibility of multiplier effects like these.

The single most obvious effect I had on a participant—i.e., where at the beginning of our conversations they seemed quite uninterested in working on AI safety, but by the end reported deciding to—was that a few months later they quit their (non-ML) job to work on capabilities at OpenAI, which they have been doing ever since.

Multiplier effects are real, and can be great; I think AIRCS probably had helpful multiplier effects too, and I'd guess the workshops were net positive overall. But much as pharmaceuticals often have paradoxical effect—i.e., to impact the intended system in roughly the intended way, except with the sign of the key effect flipped—it seems disturbingly common to have "paradoxical impact."

I suspect the risk of paradoxical impact—even from your own work—is often substantial, especially in poorly understood domains. My favorite example of this is the career of Fritz Haber, who by discovering how to efficiently mass-produce fertilizer, explosives, and chemical weapons, seems plausibly to have both counterfactually killed and saved millions of lives.

But it's even harder to predict the sign when the impact in question is on other people—e.g., on their choice of career—since you have limited visibility into their reasoning or goals, and nearly zero control over what actions they choose to take as a result. So I do think it's worth being fairly paranoid about this in high-stakes, poorly-understood domains, and perhaps especially so in AI safety, where numerous such skulls have already appeared.

Leaving Open Philanthropy, going to Anthropic

Adam Scholl3mo42

Yeah, certainly there are other possible forms of bias besides financial conflicts of interest; as you say, I think it's worth trying to avoid those too.

Leaving Open Philanthropy, going to Anthropic

Adam Scholl3mo4444

Sure, but humanity currently has so little ability to measure or mitigate AI risk that I doubt it will be obvious in any given case that the survival of the human race is at stake, or that any given action would help. And I think even honorable humans tend to be vulnerable to rationalization amidst such ambiguity, which (as I model it) is why society generally prefers that people in positions of substantial power not have extreme conflicts of interest.

Leaving Open Philanthropy, going to Anthropic

Adam Scholl3mo7661

I’m going to try to make sure that my lifestyle and financial commitments continue to make me very financially comfortable both with leaving Anthropic, and with Anthropic’s equity (and also: the AI industry more broadly – I already hold various public AI-correlated stocks) losing value, but I recognize some ongoing risk of distorting incentives, here.

Why do you feel comfortable taking equity? It seems to me that one of the most basic precautions one ought ideally take when accepting a job like this (e.g. evaluating Claude's character/constitution/spec), is to ensure you won't personally stand to lose huge sums of money should your evaluation suggest further training or deployment is unsafe.

(You mention already holding AI-correlated stocks—I do also think it would be ideal if folks with influence over risk assessment at AGI companies divested from these generally, though I realize this is difficult given how entangled they are with the market as a whole. But I'd expect AGI company staff typically have much more influence over their own company's value than that of others, so the COI seems much more extreme).

leogao's Shortform

Adam Scholl4mo40

They typically explain where the room is located right after giving you the number, which is almost like making a memory palace entry for you. Perhaps the memory is more robust when it includes a location along with the number?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments