GradientDissenter

One example immediately comes to mind: Eliezer Yudkowsky, the epitome of old-school AI safety person, is highly confident that animals have no moral patienthood. (Which isn't the same the claim you made but it's strongly related.)

Yep I had Eliezer and Nate Soares in mind when I wrote the footnote "Some people don't think nonhuman animals are sentient beings, but I feel relatively confident they're applying a standard Peter Singer would approve of as morally consistent."

Note that Eliezer has written a relatively thoughtful justification on his vies of theory of mind and why he thinks various farmed animals aren't moral patients. He also says:

If there were no health reason to eat cows I

GradientDissenter

As the possibility of ASI moves out of kooky thought experiments and into Q4 projections, mainstream animal welfare folks are showing increasing interest in the implications of ASI for animals and on animal welfare in the long-run future.

Some animal welfare people seem keen on convincing the AI safety community to care about animal-welfare focused AI safety. I think this is mostly a misunderstanding: the AI safety community is the ASI-pilled/longtermist animal welfare community. The old-school AI safety folks are way more into weird bullet-biting than the animal welfare people, and I can't think of a single one who would think that conscious and sentient beings should be tortured or who would fail... (read 2147 more words →)

GradientDissenter1mo*Quick Take

It's useful for evals to be run reliably for every model and maintained for long periods. A lot of the point of safety-relevant evals is to be a building block people can use for other things: they can make forecasts/bets about what models will score on the eval or what will happen if a certain score is reached, they can make commitments about what to do if a model achieves a certain score, they can make legislation that applies only to models with specific scores, and they can advise the world to look to these scores to understand if risk is high.

Much of that falls apart if there's FUD about whether a... (read more)

GradientDissenter1mo

Some of that error is correlated between models; they also have versions of the graph with error bars on the trendline and those error bars are notably smaller.

The error bars are also much smaller when you look at the plot on a log-y-axis. Like, in some sense not being able to distinguish a 10-minute time horizon from a 30-minute one is a lot of error, but it's still very distinct from the one-minute time horizon of the previous generation or the 2-hour time horizon you might expect from the next generation. In other words, when you look at the image you shared, the error bars on o4 mini don't look so bad,... (read more)

GradientDissenter1mo

I think that there are still very real trade-offs. Examples:

Should you wear sunscreen?
Should you smoke?
Should you decrease sodium intake so that you don't develop hypertension?

And for many things wealth there is some short-term cost and some long-term longevity cost the long-term cost might be large enough to change the calculus.

GradientDissenter1moQuick Take

Many people who think ASI will be developed soon seem to assume this means they should care less about their long-term health because in most worlds it won't matter: they figure most likely by the time they get old they'll either be dead or humanity will have cured aging and disease. I think it's important to remember that the bigger update is probably on the size of the value at stake, not the probability of health interventions mattering.

Even if ASI seems like it will happen soon, I think there's a real (if small) chance that humanity develops radical life-extension technology but not for another 50-100 years: maybe there's an AI winter, maybe... (read more)

End-of year donation taxes 101

GradientDissenter

2mo

Tl;dr

If you’re taking the standard deduction (ie donating <~$15k), ignore all this–there are basically no tax implications for you
Consider how much money you want to donate to c3s specifically (as opposed to c4s, political stuff, random individuals, some foreign organizations, etc.). For money you definitely want to give to c3s, you can put it in a DAF to count them as a donation this year, then figure out where to direct it later. For non-c3 money, it doesn’t really matter when you give it

A surprisingly large number of my friends are scrambling to make donations before the end of the year, or wondering whether or not they should be scrambling to make... (read 874 more words →)

GradientDissenter3moQuick Take

When I was first trying to learn ML for AI safety research, people told me to learn linear algebra. And today lots of people I talk to who are trying to learn ML^[1] seem under the impression they need to master linear algebra before they start fiddling with transformers. I find in practice I almost never use 90% of the linear algebra I've learned. I use other kinds of math much more, and overall being good at empiricism and implementation seems more valuable than knowing most math beyond the level of AP calculus.

The one part of linear algebra you do absolutely need is a really, really good intuition for what a dot product... (read 363 more words →)

•••

GradientDissenter3moQuick Take

The other day I was speaking to one of the most productive people I’d ever met.^[1] He was one of the top people in a very competitive field who was currently single-handedly performing the work of a team of brilliant programmers. He needed to find a spot to do some work, so I offered to help him find a desk with a monitor. But he said he generally liked working from his laptop on a couch, and he felt he was “only 10% slower” without a monitor anyway.

I was aghast. I’d been trying to optimize my productivity for years. A 10% productivity boost was a lot! Those things compound! How was this man,... (read more)

-1

GradientDissenter3mo

Fair enough. This doesn't seem central to my point so I don't really want to go down a rabbit-hole here. As I said originally "I’m picking this example not because it’s the best analysis of its kind, but because it’s the sort of analysis I think people should be doing all the time and should be practiced at, and I think it's very reasonable to produce things of this quality fairly regularly." I know this particular analysis surfaced some useful considerations others' hadn't thought of, and I learned things from reading it.

I also suspect you dislike the original analysis for reasons that stem from deep-seated worldview disagreements with Eric, not because the methodology is flawed.

GradientDissenter3mo*Quick Take

The advice and techniques from the rationality community seem to work well at avoiding a specific type of high-level mistake: they help you notice weird ideas that might otherwise get dismissed and take them seriously. Things like AI being on a trajectory to automate all intellectual labor and perhaps take over the world, animal suffering, longevity, cryonics. The list goes on.

This is a very valuable skill and causes people to do things like pivot their careers to areas that are ten times better. But once you’ve had your ~3-5 revelations, I think the value of these techniques can diminish a lot.^[1]

Yet a lot of the rationality community’s techniques and culture seem oriented... (read 661 more words →)

•••

GradientDissenter3mo

Sorry this is what I meant, you're right.

LessWrong feature request: make it easy for authors to opt-out of having their posts in the training data.

If most smart people were put in the position of a misaligned AI and tried to take over the world, I think they’d be caught and fail.^[1] If I were a misaligned AI, I think I’d have a much better shot at succeeding, largely because I’ve read lots of text about how people evaluate and monitor models, strategies schemers can use to undermine evals and take malicious actions without being detected, and creative paths to taking over the world as an AI.

A lot of that information is from LessWrong.^[2] It's unfortunate that this information will probably wind up... (read more)

Ways training incentivizes and disincentivizes introspection in LLMs.

Recent work has shown some LLMs have some ability to introspect. Many people were surprised to learn LLMs had this capability at all. But I found the results somewhat surprising for another reason: models are trained to mimic text, both in pre-training and fine-tuning. Almost every time a model is prompted in training to generate text related to introspection, the answer it's trained to give is whatever answer the LLMs in the training corpus would say, not what the model being trained actually observes from its own introspection. So I worry that even if models could introspect, they might learn to never introspect in response... (read 407 more words →)

TurboTax and H&R Block famously lobby the US government to make taxes more annoying to file to drum up demand for their products.^[1] But as far as I can tell, they each only spend ~$3-4 million a year on lobbying. That's... not very much money (contrast it with the $60 billion the government gave the IRS to modernize its systems or the $4.9 billion in revenue Intuit made last fiscal year from TurboTax or the hundreds of millions of hours^[2] spent that a return-free tax filing system could save).

Perhaps it would "just" take a multimillionaire and a few savvy policy folks to make the US tax system wildly better? Maybe TurboTax and H&R Block... (read more)

There’s a cottage industry that thrives off of sneering, gawking, and maligning the AI safety community. This isn't new, but it's probably going to become more intense and pointed now that there are two giant super PACs that (allegedly^[1]) see safety as a barrier to [innovation/profit, depending on your level of cynicism]. Brace for some nasty, uncharitable articles.

I think the largest cost of this targeted bad press will be the community's overreaction, not the reputational effects outside the AI safety community. I've already seen people shy away from doing things like donating to politicians that support AI safety for fear of provoking the super PACs.

Historically, the safety community often freaked out in... (read 408 more words →)

Notes on living semi-frugally in the Bay Area.

I live in the Bay Area, but my cost of living is pretty low: roughly $30k/year. I think I live an extremely comfortable life. I try to be fairly frugal, both so I don't end up dependent on jobs with high salaries and so that I can donate a lot of my income, but it doesn't feel like much of a sacrifice. Often when I tell people how little I spend, they're shocked. I think people conceive of the Bay as exorbitantly expensive, and it can be, but it doesn't have to be.

Rent: I pay ~$850 a month for my room. It's a small room... (read 497 more words →)

•••

There's some chance oral herpes is pretty bad for you?

GradientDissenter

3mo

Tl;dr:

Herpes viruses scare me (in a way other STDs don't). I think there are some mechanistic and a-priori reasons to worry that HSV 1 or 2, especially orally, could have bad consequences later in life. In particular:

Multiple other species of herpes can cause surprising long-term consequences (that sometimes go undiscovered for a long time)
HSV 1&2 permanently live in your nerve cells
Oral herpes can make its way into your brain

Plus there's some (inconclusive) evidence linking oral herpes to Alzheimer's.

There might be cheap actions people can take to reduce their risk of contracting oral herpes (don't do things like share straws or forks, don't make out with a ton of people, ask short-term partners... (read 1551 more words →)

Reflections on 4 years of meta-honesty

GradientDissenter

3mo

Honesty is quite powerful in many cases: if you have a reputation for being honest, people will trust you more and your words will have more weight (or so the argument goes).

Unfortunately, being extremely honest all the time is also pretty difficult. What happens when the Nazis come knocking and ask if you have jews in the basement? Or when your girlfriend asks you if this dress makes her look fat? (Or so the argument goes)

Meta-honesty is a proposed solution to these problems. The gist is you act very honesty, but can lie when it’s very important to do so. The catch is you have to always be completely honest about what... (read 1768 more words →)

Summary and Comments on Anthropic's Pilot Sabotage Risk Report

GradientDissenter

3mo

Anthropic released a report on the misalignment sabotage risks of Claude Opus 4 and 4.1. They concluded "there is a very low, but not completely negligible, risk of misaligned autonomous actions that contribute significantly to later catastrophic outcomes, abbreviated as sabotage risk". They also had two independent reviews of their risk report, one conducted by METR and one conducted by an internal team.

I don't think that conclusion is very surprising; I'm not concerned about today's Claude models posing a catastrophic risk. But I still think this report is a big deal as a matter of precedent. This is the first time a model developer has made something like a safety case for... (read 1351 more words →)

Considerations around career costs of political donations

GradientDissenter

4mo

I’m close to a single-issue voter/donor. I tend to like politicians who show strong support for AI safety, because I think it’s an incredibly important and neglected problem. So when I make political donations, it’s not as salient to me which party the candidate is part of, if they've gone out of their way to support AI safety and have some integrity.^[1] I think many people who focus on AI safety feel similarly.

But working in government also seems important. I want the government to have the tools and technical understanding it needs to monitor AI and ensure it doesn’t cause a catastrophe. Some people are concerned that donating to Democrats makes it harder... (read 4428 more words →)

A Review of Nina Panickssery’s Review of Scott Alexander’s Review of “If Anyone Builds It, Everyone Dies”

GradientDissenter

5mo

A review of Nina Panickssery’s review of Scott Alexander’s review of the book "If Anyone Builds It, Everyone Dies" (IABIED).

This essay is not my best work but I just couldn't resist. Thanks to Nina and others for comments/feedback.

I confess I mostly wrote this because I think a review of a review of a review is funny. But I also have a lot of disagreements with both Nina and the authors of IABIED. Nina’s review is the first time a skeptic has substantively argued against the book (because the book isn’t out and the authors haven’t given an advance copy to very many skeptics). I want the discourse around critiques of the book to be good.... (read 1354 more words →)

•••

CoT May Be Highly Informative Despite “Unfaithfulness” [METR]

GradientDissenter

6mo

This is a link-post for METR's CoT May Be Highly Informative Despite “Unfaithfulness”. I recommend viewing the post on METR's website, since it contains interactive widgets.

Recent work [1, 2, 3, 4, 5] demonstrates that LLMs’ chain of thoughts (CoTs)^[1] aren’t always “faithful”: they don’t contain an accurate representation of every cognitive step^[2] the LLM used to arrive at its answer.

However, consistent “faithfulness” in all scenarios is a very high bar that might not be required for most safety analyses of models. We find it helpful to think of the CoT less like a purported narrative about a model’s reasoning and more like a tool: a scratchpad models can use to assist them with sequential... (read 7024 more words →)

METR's Evaluation of GPT-5

GradientDissenter

6mo

METR (where I work, though I'm cross-posting in a personal capacity) evaluated GPT-5 before it was externally deployed. We performed a much more comprehensive safety analysis than we ever have before; it feels like pre-deployment evals are getting more mature.

This is the first time METR has produced something we've felt comfortable calling an "evaluation" instead of a "preliminary evaluation". It's much more thorough and comprehensive than the things we've created before and it explores three different threat models.

It's one of the closest things out there to a real-world autonomy safety-case. It also provides a rough sense of how long it'll be before current evaluations no longer provide safety assurances.

I've ported the blogpost... (read 5852 more words →)

145

GradientDissenter's Shortform

GradientDissenter

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

110

LESSWRONG
LW

LESSWRONG
LW

METR's Evaluation of GPT-5

LessWrong readers are invited to apply to the Lurkshop

Considerations around career costs of political donations

A Review of Nina Panickssery’s Review of Scott Alexander’s Review of “If Anyone Builds It, Everyone Dies”

GradientDissenter

GradientDissenter

Does focusing on animal welfare make sense if you're AI-pilled?

End-of year donation taxes 101

There's some chance oral herpes is pretty bad for you?

Reflections on 4 years of meta-honesty

Summary and Comments on Anthropic's Pilot Sabotage Risk Report

Considerations around career costs of political donations

A Review of Nina Panickssery’s Review of Scott Alexander’s Review of “If Anyone Builds It, Everyone Dies”

GradientDissenter

METR's Evaluation of GPT-5

LessWrong readers are invited to apply to the Lurkshop

Considerations around career costs of political donations

A Review of Nina Panickssery’s Review of Scott Alexander’s Review of “If Anyone Builds It, Everyone Dies”

GradientDissenter

GradientDissenter

Does focusing on animal welfare make sense if you're AI-pilled?

End-of year donation taxes 101

There's some chance oral herpes is pretty bad for you?

Reflections on 4 years of meta-honesty

Summary and Comments on Anthropic's Pilot Sabotage Risk Report

Considerations around career costs of political donations

A Review of Nina Panickssery’s Review of Scott Alexander’s Review of “If Anyone Builds It, Everyone Dies”

Tl;dr: