Richard_Ngo

Formerly alignment and governance researcher at DeepMind and OpenAI. Now independent.

Sequences

Twitter threads
Understanding systematization
Stories
Meta-rationality
Replacing fear
Shaping safer goals
AGI safety from first principles

Wiki Contributions

Comments

Sorted by

In general people should feel free to DM me with pitches for this sort of thing.

I think this epistemic uncertainty is distinct from the type of "objective probabilities" I talk about in my post, and I don't really know how to use language without referring to degrees of my epistemic uncertainty. 

The part I was gesturing at wasn't the "probably" but the "low measure" part.

Is your position that the problem is deeper than this, and there is no objective prior over worlds, it's just a thing like ethics that we choose for ourselves, and then later can bargain and trade with other beings who have a different prior of realness?

Yes, that's a good summary of my position—except that I think that, like with ethics, there will be a bunch of highly-suggestive logical/mathematical facts which make it much more intuitive to choose some priors over others. So the choice of prior will be somewhat arbitrary but not totally arbitrary.

I don't think this is a fully satisfactory position yet, it hasn't really dissolved the confusion about why subjective anticipation feels so real, but it feels directionally correct.

Hmmm, uncertain if we disagree. You keep saying that these concepts are cursed and yet phrasing your claims in terms of them anyway (e.g. "probably very low measure"), which suggests that there's some aspect of my response you don't fully believe.

In particular, in order for your definition of "what beings are sufficiently similar to you" to not be cursed, you have to be making claims not just about the beings themselves (since many Boltzmann brains are identical to your brain) but rather about the universes that they're in. But this is kinda what I mean by coalitional dynamics: a bunch of different copies of you become more central parts of the "coalition" of your identity based on e.g. the types of impact that they're able to have on the world around them. I think describing this as a metric of similarity is going to be pretty confusing/misleading.

you can estimate who are the beings whose decision correlate with this one, and what is the impact of each of their decisions, and calculate the sum of all that

You still need a prior over worlds to calculate impacts, which is the cursed part.

I don't think this line of argument is a good one. If there's a 5% chance of x-risk and, say, a 50% chance that AGI makes the world just generally be very chaotic and high-stakes over the next few decades, then it seems very plausible that you should mostly be optimizing for making the 50% go well rather than the 5%.

Worse than the current situation, because the counterfactual is that some later project happens which kicks off in a less race-y manner.

In other words, whatever the chance of its motivation shifting over time, it seems dominated by the chance that starting the equivalent project later would just have better motivations from the outset.

Great post. One slightly nitpicky point, though: even in the section where you argue that probabilities are cursed, you are still talking in the language of probabilities (e.g. "my modal guess is that I'm in a solipsist simulation that is a fork of a bigger simulation").

I think there's probably a deeper ontological shift you can do to a mindset where there's no actual ground truth about "where you are". I think in order to do that you probably need to also go beyond "expected utilities are real", because expected utilities need to be calculated by assigning credences to worlds and then multiplying them by expected impact in each world.

Instead the most "real" thing here I'd guess is something like "I am an agent in a superposition of being in many places in the multiverse. Each of my actions is a superposition of uncountable trillions of actions that will lead to nothing plus a few that will have lasting causal influence. The degree to which I care about one strand of causal influence over another is determined by the coalitional dynamics of my many subagents".

FWIW I think this is roughly the perspective on the multiverse Yudkowsky lays out in Planecrash (especially in the bits near the end where Keltham and Carissa discuss anthropics). Except that the degrees of caring being determined by coalitional dynamics is more related to geometric rationality.

I also tweeted about something similar recently (inspired by your post).

Richard_NgoΩ8110

Cool, ty for (characteristically) thoughtful engagement.

I am still intuitively skeptical about a bunch of your numbers but now it's the sort of feeling which I would also have if you were just reasoning more clearly than me about this stuff (that is, people who reason more clearly tend to be able to notice ways that interventions could be surprisingly high-leverage in confusing domains).

Ty for the link but these seem like both clearly bad semantics (e.g. under either of these the second-best hypothesis under consideration might score arbitrarily badly).

Just changed the name to The Minority Coalition.

Richard_NgoΩ11140

1. Yepp, seems reasonable. Though FYI I think of this less as some special meta argument, and more as the common-sense correction that almost everyone implicitly does when giving credences, and rationalists do less than most. (It's a step towards applying outside view, though not fully "outside view".)

2. Yepp, agreed, though I think the common-sense connotations of "if this became" or "this would have a big effect" are causal, especially in the context where we're talking to the actors who are involved in making that change. (E.g. the non-causal interpretation of your claim feels somewhat analogous to if I said to you "I'll be more optimistic about your health if you take these pills", and so you take the pills, and then I say "well the pills do nothing but now I'm more optimistic, because you're the sort of person who's willing to listen to recommendations". True, but it also undermines people's willingness/incentive to listen to my claims about what would make the world better.)

3. Here are ten that affect AI risk as much one way or the other:

  1. The US government "waking up" a couple of years earlier or later (one operationalization: AISIs existing or not right now).
  2. The literal biggest names in the field of AI becoming focused on AI risk.
  3. The fact that Anthropic managed to become a leading lab (and, relatedly, the fact that Meta and other highly safety-skeptical players are still behind).
  4. Trump winning the election.
  5. Elon doing all his Elon stuff (like founding x.AI, getting involved with Trump, etc).
  6. The importance of transparency about frontier capabilities (I think of this one as more of a logical update that I know you've made).
  7. o1-style reasoning as the next big breakthrough.
  8. Takeoff speeds (whatever updates you've made in the last three years).
  9. China's trajectory of AI capabilities (whatever updates you've made about that in last 3 years).
  10. China's probability of invading Taiwain (whatever updates you've made about that in last 3 years).

And then I think in 3 years we'll be able to publish a similar list of stuff that mostly we just hadn't predicted or thought about before now.

I expect you'll dispute a few of these; happy to concede the ones that are specifically about your updates if you disagree (unless you agree that you will probably update a bunch on them in the next 3 years).

But IMO the easiest way for safety cases to become the industry-standard thing is for AISI (or internal safety factions) to specifically demand it, and then the labs produce it, but kinda begrudgingly, and they don't really take them seriously internally (or are literally not the sort of organizations that are capable of taking them seriously internally—e.g. due to too much bureaucracy). And that seems very much like the sort of change that's comparable to or smaller than the things above.

I think I would be more sympathetic to your view if the claim were "if AI labs really reoriented themselves to take these AI safety cases as seriously as they take, say, being in the lead or making profit". That would probably halve my P(doom), it's just a very very strong criterion.

Load More