Ben Pace

I'm an admin of LessWrong. Here are a few things about me.

I generally feel more hopeful about a situation when I understand it better.
I have signed no contracts nor made any agreements whose existence I cannot mention.
I believe it is good take responsibility for accurately and honestly informing people of what you believe in all conversations; and also good to cultivate an active recklessness for the social consequences of doing so.
It is wrong to directly cause the end of the world. Even if you are fatalistic about what is going to happen.

Sequences

AI Alignment Writing Day 2019

Transcript of Eric Weinstein / Peter Thiel Conversation

AI Alignment Writing Day 2018

Share Models, Not Beliefs

Posts

Sorted by New

23Benito's Shortform Feed

288

37LessOnline 2025: Early Bird Tickets On Sale

2mo

20Open Thread Spring 2025

2mo

279Arbital has been imported to LessWrong

2mo

136The Failed Strategy of Artificial Intelligence Doomers

3mo

109Thread for Sense-Making on Recent Murders and How to Sanely Respond

3mo

146

82What are the good rationality films?

5mo

932024 Petrov Day Retrospective

7mo

136[Completed] The 2024 Petrov Day Scenario

7mo

114

55Thiel on AI & Racing with China

8mo

53Extended Interview with Zhukeepa on Religion

9mo

Wikitag Contributions

Adversarial Collaboration (Dispute Protocol)

4mo

Epistemology

6mo

(-454)

Epistemology

6mo

(+56/-56)

Epistemology

6mo

(+9/-4)

Epistemology

6mo

(+66/-553)

Petrov Day

7mo

(+714)

Comments

Sorted by

Newest

"The Urgency of Interpretability" (Dario Amodei)

Ben Pace3d69

I don't think that propaganda must necessarily involve lying. By "propaganda," I mean aggressively spreading information or communication because it is politically convenient / useful for you, regardless of its truth (though propaganda is sometimes untrue, of course).

When a government puts up posters saying "Your country needs YOU" this is intended to evoke a sense of duty and a sense of glory to be had; sometimes this sense of duty is appropriate, but sometimes your country wants you to participate in terrible wars for bad reasons. The government is saying it loudly because for them it's convenient for you to think that way, and that’s not particularly correlated with the war being righteous or with the people who decided to make such posters even having thought much about that question. They’re saying it to win a war, not to inform their populace, and that’s why it’s propaganda.

Returning to the Amodei blogpost: I’ll happily concede that you don’t always need to give reasons for your beliefs when expressing them—context matters. But in every context—tweets, podcasts, ads, or official blogposts—there’s a difference between sharing something to inform and sharing it to push a party line.

I claim that many people have asked why Anthropic believes it’s ethical for them to speed up AI progress (by contributing to the competitive race), and Anthropic have rarely-if-ever given a justification of it. Senior staff keep indicating that not building AGI is not on the table, yet they rarely-if-ever show up to engage with criticism or to give justifications for this in public discourse. This is a key reason why it reads to me as propaganda, because it's an incredibly convenient belief for them and they state it as though any other position is untenable, without argument and without acknowledging or engaging with the position that it is ethically wrong to speed up the development of a technology they believe has a 10-20% chance of causing human extinction (or a similarly bad outcome).

I wish that they would just come out, lay out the considerations for and against building a frontier lab that is competing to reach the finish line first, acknowledge other perspectives and counterarguments, and explain why they made the decision they have made. This would do wonders for the ability to trust them.

(Relatedly, I don't believe the Machines of Loving Grace essay is defending the position that speeding up AI is good; the piece in fact explicitly says it will not assess the risks of AI. Here are my comments at the time on that essay also being propaganda.)

"The Urgency of Interpretability" (Dario Amodei)

Ben Pace3d*42

I'm saying that he is presenting it as something he believes from his place of expertise and private knowledge without argument, because it is something that is exceedingly morally and financially beneficial to him (he gets to make massive money and not be a moral monster), rather than because he has any evidence, and he stated it without evidence.

It is a similar sentence to if a President of a country who just initiated war said “If there’s one thing I’ve learned in my life it’s that war is inevitable, and there’s just a question of who wins and how to make sure it’s over quickly”, in a way that means they should be absolved of responsibility for initiating war.

Edit: Just as Casey B was writing his reply below, I edited an example out of Mark Zuckerberg saying something like "If there's one thing I've learned in my career, it's that social media is good, and the only choice is which sort of good social media we have". Leaving this note so that ppl aren't confused by his reply.

Wei Dai's Shortform

Ben Pace4d*40

Can you expand on this? How can you tell the difference, and does it make much of a difference in the end (e.g., if most people get corrupted by power regardless of initial intentions)?

But I don't believe most people get corrupted by power regardless of initial intentions? I don't think Francis Bacon was corrupted by power, I don't think James Watt was corrupted by power, I don't think Stanislav Petrov was corrupted by power, and all of these people had far greater influence over the world than most people who are "corrupted by power".

I'm hearing you'd be interested in me saying more words about the difference in what it looks like to be motivated by responsibility versus power-seeking. I'll say some words, can see if they help.

I think someone motivated by responsibility often will end up looking more aligned with their earlier self over time even as they grow and change, will often not accept opportunities for a lot of power/prestige/money because they're uninteresting to them, will often make sacrifices of power/prestige for ethical reasons, will pursue a problem they care about long after most would give up or think it likely to be solved.
I think someone primarily seeking power will be much more willing to do things that pollute the commons or break credit-allocation mechanisms to get credit, and generally game a lot of systems that other people are earnestly rising through. They will more readily pivot on what issue they say they care about or are working on because they're not attached to the problem, but to the reward for solving the problem, and many rewards can be gotten from lots of different problems. They'll be more guided by what's fashionable right now, and more attuned to it. They'll maneuver themselves in order to be able to politically work with whoever has power that they want, regardless of the ethics/competence/corruption of those people.

> As a background model, I think if someone wants to take responsibility for some part of the world going well, by-default this does not look like "situating themselves in the center of legible power".
And yet, Eliezer, the writer of "heroic responsibility" is also the original proponent of "build a Friendly AI to take over the world and make it safe".

Building an AGI doesn't seem to me like a very legible mechanism of power, or at least it didn't in the era Eliezer pursued it (where it wasn't also credibly "a path to making billions of dollars and getting incredible prestige"). The word 'legible' was doing a lot of work in the sentence I wrote.

Another framing I sometimes look through (H/T Habryka) is constrained vs unconstrained power. Having a billion dollars is unconstrained power, because you can use it to do a lot of different things – buy loads of different companies or resources. Being an engineer overlooking missile-defense systems in the USSR is very constrained, you have an extremely well-specified set of things you can control. This changes the adversarial forces on you, because in the former case a lot of people stand to gain a lot of different possible things they want if they can get leverage over you, and they have to be concerned about a lot of different ways you could be playing them. So the pressures for insanity are higher. Paths that give you the ability to influence very specific things that route through very constrained powers are less insanity-inducing, I think, and I think most routes that look like "build a novel invention in a way that isn't getting you lots of money/status along the way" are less insanity-inducing, and I rarely find the person to have become as insane as some of the tech-company CEOs have. I also think people motivated by taking responsibility for fixing a particular problem in the world are more likely to take constrained power, because... they aren't particularly motivated by all the other power they might be able to get.

I don't suspect I addressed your cruxes here so far about whether this idea of heroic responsibility is/isn't predictably misused. I'm willing to try again if you wish, or if you can try pointing again to what you'd guess I'm missing.

"The Urgency of Interpretability" (Dario Amodei)

Ben Pace6d91

Not sure I get your overall position. But I don’t believe all humans are delusional about the most important questions in their lives. See here for an analysis of pressures on people that can cause them to be insane on a topic. I think you can create inverse pressures in yourself, and you can also have no pressures and simply use curiosity and truth-seeking heuristics. It’s not magic to not be delusional. It just requires doing the same sorts of cognition you use to fix a kitchen sink.

"The Urgency of Interpretability" (Dario Amodei)

Ben Pace6d115

Not only would most people be hopelessly lost on these questions (“Should I give up millions-of-dollars-and-personal-glory and then still probably die just because it is morally right to do so?”), they have also picked up something that they cannot put down. These companies have 1,000s of people making millions of dollars, and they will reform in another shape if the current structure is broken apart. If we want to put down what has been picked up more stably, we must use other forces that do not wholly arise from within the companies.

Wei Dai's Shortform

Ben Pace6d*5120

My sense is that most of the people with lots of power are not taking heroic responsibility for the world. I think that Amodei and Altman intend to achieve global power and influence but this is not the same as taking global responsibility. I think, especially for Altman, the desire for power comes first relative to responsibility. My (weak) impression is that Hassabis has less will-to-power than the others, and that Musk has historically been much closer to having responsibility be primary.

I don’t really understand this post as doing something other than asking “on the margin are we happy or sad about present large-scale action” and then saying that the background culture should correspondingly praise or punish large-scale action. Which is maybe reasonable, but alternatively too high level of a gloss. As per the usual idea of rationality, I think whether you are capable of taking large-scale action in a healthy way is true in some worlds and not in others, and you should try to figure out which world you’re in.

The financial incentives around AI development are blatantly insanity-inducing on the topic and anyone should’ve been able to guess that going in, I don’t think this was a difficult question. Though I guess someone already exceedingly wealthy (i.e. already having $1B or $10B) could have unusually strong reason to not be concerned about that particular incentive (and I think it is the case Musk has seemed differently insane than the others taking action in this area, and lacking in some of the insanities).

However I think most moves around wielding this level of industry should be construed as building an egregore more powerful than you. The founders/CEOs of the AI big-tech companies are not able to simple turn their companies off, nor their industry. If they grow to believe their companies are bad for the world, either they’ll need to spend many years dismantling / redirecting them, or else they’ll simply quit/move on and some other person will take their place. So it's still default-irresponsible even if you believe you can maintain personal sanity.

Overall I think taking responsibility for things is awesome and I wish people were doing more of it and trying harder. And I wish people took ultimate responsibility for as big of a thing they can muster. This is not the same as “trying to pull the biggest lever you can” or "reaching for power on a global level", those are quite different heuristics; grabbing power can obviously just cost you sanity, and often those pulling the biggest lever they can are doing so foolishly.

As a background model, I think if someone wants to take responsibility for some part of the world going well, by-default this does not look like "situating themselves in the center of legible power". Lonely scientist/inventor James Watt spent his early years fighting poverty before successfully inventing better steam engines, and had far more influence by helping cause the industrial revolution than most anyone in government did during his era. I think confusing "moving toward legible power" for "having influence over the world" is one of the easiest kinds of insanity.

"The Urgency of Interpretability" (Dario Amodei)

Ben Pace6d*1311

I think that Anthropic is doing some neat alignment and control work, but it is also the company most effectively incentivizing people who care about existential risk to sell out, to endorse propaganda, silence themselves, and get on board with the financial incentives of massive monetization and capabilities progress. In this way I see it as doing more damage than OpenAI (though OpenAI used to have this mantle pre-Anthropic, while the Amodei siblings were there and with Christiano as researcher and Karnofsky on the board).

I don't really know the relative numbers, in my mind the uncertainty I have spans orders of magnitude. The numbers are all negative.

"The Urgency of Interpretability" (Dario Amodei)

Ben Pace8d1018

I couldn’t get two sentences in without hitting propaganda, so I set it aside. But I’m sure it’s of great political relevance.

aog's Shortform

Ben Pace13d94

Key ideas include long timelines, slow takeoff, eventual explosive growth, optimism about alignment, concerns about overregulation, concerns about hawkishness towards China, advocating the likelihood of AI sentience and desirability of AI rights, debating the desirability of different futures, and so on.

Small semantic note: these are not new ideas to Epoch, they are a new package of positions on ideas predominantly originating from the MIRI/LW cluster that you earlier mentioned.

Caleb Biddulph's Shortform

Ben Pace14d30

Of note: the AI Alignment Forum content is a mirror of LW content, not distinct. It is a strict subset.