Richard_Ngo

Formerly alignment and governance researcher at DeepMind and OpenAI. Now independent.

Sequences

Twitter threads
Understanding systematization
Stories
Meta-rationality
Replacing fear
Shaping safer goals
AGI safety from first principles

Wiki Contributions

Comments

Sorted by

Because I might fund them or forward it to someone else who will.

In general people should feel free to DM me with pitches for this sort of thing.

I think this epistemic uncertainty is distinct from the type of "objective probabilities" I talk about in my post, and I don't really know how to use language without referring to degrees of my epistemic uncertainty. 

The part I was gesturing at wasn't the "probably" but the "low measure" part.

Is your position that the problem is deeper than this, and there is no objective prior over worlds, it's just a thing like ethics that we choose for ourselves, and then later can bargain and trade with other beings who have a different prior of realness?

Yes, that's a good summary of my position—except that I think that, like with ethics, there will be a bunch of highly-suggestive logical/mathematical facts which make it much more intuitive to choose some priors over others. So the choice of prior will be somewhat arbitrary but not totally arbitrary.

I don't think this is a fully satisfactory position yet, it hasn't really dissolved the confusion about why subjective anticipation feels so real, but it feels directionally correct.

Hmmm, uncertain if we disagree. You keep saying that these concepts are cursed and yet phrasing your claims in terms of them anyway (e.g. "probably very low measure"), which suggests that there's some aspect of my response you don't fully believe.

In particular, in order for your definition of "what beings are sufficiently similar to you" to not be cursed, you have to be making claims not just about the beings themselves (since many Boltzmann brains are identical to your brain) but rather about the universes that they're in. But this is kinda what I mean by coalitional dynamics: a bunch of different copies of you become more central parts of the "coalition" of your identity based on e.g. the types of impact that they're able to have on the world around them. I think describing this as a metric of similarity is going to be pretty confusing/misleading.

you can estimate who are the beings whose decision correlate with this one, and what is the impact of each of their decisions, and calculate the sum of all that

You still need a prior over worlds to calculate impacts, which is the cursed part.

I don't think this line of argument is a good one. If there's a 5% chance of x-risk and, say, a 50% chance that AGI makes the world just generally be very chaotic and high-stakes over the next few decades, then it seems very plausible that you should mostly be optimizing for making the 50% go well rather than the 5%.

Worse than the current situation, because the counterfactual is that some later project happens which kicks off in a less race-y manner.

In other words, whatever the chance of its motivation shifting over time, it seems dominated by the chance that starting the equivalent project later would just have better motivations from the outset.

Great post. One slightly nitpicky point, though: even in the section where you argue that probabilities are cursed, you are still talking in the language of probabilities (e.g. "my modal guess is that I'm in a solipsist simulation that is a fork of a bigger simulation").

I think there's probably a deeper ontological shift you can do to a mindset where there's no actual ground truth about "where you are". I think in order to do that you probably need to also go beyond "expected utilities are real", because expected utilities need to be calculated by assigning credences to worlds and then multiplying them by expected impact in each world.

Instead the most "real" thing here I'd guess is something like "I am an agent in a superposition of being in many places in the multiverse. Each of my actions is a superposition of uncountable trillions of actions that will lead to nothing plus a few that will have lasting causal influence. The degree to which I care about one strand of causal influence over another is determined by the coalitional dynamics of my many subagents".

FWIW I think this is roughly the perspective on the multiverse Yudkowsky lays out in Planecrash (especially in the bits near the end where Keltham and Carissa discuss anthropics). Except that the degrees of caring being determined by coalitional dynamics is more related to geometric rationality.

I also tweeted about something similar recently (inspired by your post).

Richard_NgoΩ8110

Cool, ty for (characteristically) thoughtful engagement.

I am still intuitively skeptical about a bunch of your numbers but now it's the sort of feeling which I would also have if you were just reasoning more clearly than me about this stuff (that is, people who reason more clearly tend to be able to notice ways that interventions could be surprisingly high-leverage in confusing domains).

Ty for the link but these seem like both clearly bad semantics (e.g. under either of these the second-best hypothesis under consideration might score arbitrarily badly).

Just changed the name to The Minority Coalition.

Load More