J. Dmitri Gallow — LessWrong

LESSWRONG
LW

Replying toInstrumental Convergence? [Draft]

Thanks for the read and for the response.

>None of your models even include actions that are analogous to the convergent actions on that list.

I'm not entirely sure what you mean by "model", but from your use in the penultimate paragraph, I believe you're talking about a particular decision scenario Sia could find herself in. If so, then my goal wasn't to prove anything about a particular model, but rather to prove things about every model.

>The non-sequential theoretical model is irrelevant to instrumental convergence, because instrumental convergence is about putting yourself in a better position to pursue your goals later on.

Sure. I started with the easy cases to get the main ideas out.... (read more)

Replying toInstrumental Convergence? [Draft]

J. Dmitri Gallow3y

Instrumental Convergence? [Draft]

There's nothing unusual about my assumptions regarding instrumental rationality. It's just standard expected utility theory.

The place I see to object is with my way of spreading probabilities over Sia's desires. But if you object to that, I want to hear more about which probably distribution I should be using to understand the claim that Sia's desires are likely to rationalise power-seeking, resource acquisition, and so on. I reached for the most natural way of distributing probabilities I could come up with---I was trying to be charitable to the thesis, & interpreting it in light of the orthogonality thesis. But if that's not the right way to distribute probability over potential desires, if it's not the right way of understanding the thesis, then I'd like to hear something about what the right way of understanding it is.

Replying toInstrumental Convergence? [Draft]

J. Dmitri Gallow3y

Instrumental Convergence? [Draft]

A quick prefatory note on how I'm thinking about 'goals' (I don't think it's relevant, but I'm not sure): as I'm modelling things, Sia's desires/goals are given by a function from ways the world could be (colloquially, 'worlds') to real numbers, $D$ , with the interpretation that $D (W)$ is how well satisfied Sia's desires are if $W$ turns out to be the way the world actually is. By 'the world', I mean to include all of history, from the beginning to the end of time, and I mean to encompass every region of space. I assume that this function can be well-defined even for worlds in which Sia never existed or dies quickly. Humans can want to never... (read more)

Replying toInstrumental Convergence? [Draft]

J. Dmitri Gallow3y

Instrumental Convergence? [Draft]

Wouldn't this imply a bias towards eliminating other agents? (Since that would make the world more predictable, and thereby leave less up to chance?)

A few things to note. Firstly, when I say that there's a 'bias' towards a certain kind of choice, I just mean that the probability that a superintelligent agent with randomly sampled desires (Sia) would make that choice is greater than 1/N, where N is the number of choices available. So, just to emphasize the scale of the effect: even if you were right about that inference, you should still assign very low probability to Sia taking steps to eliminate other agents.

Secondly, when I say that a choice... (read 445 more words →)

Replying toInstrumental Convergence? [Draft]

J. Dmitri Gallow3y*

Instrumental Convergence? [Draft]

There are infinitely many desires like that, in fact (that's what proposition 2 shows).

More generally, take any self-preservation contingency plan, A, and any other contingency plan, B. If we start out uncertain about what Sia wants, then we should think her desires are just as likely to make A more rational than B as they are to make B more rational than A. (That's what proposition 3 shows.)

That's rough and subject to a bunch of caveats, of course. I try to go through all of those caveats carefully in the draft.

Instrumental Convergence? [Draft]

J. Dmitri Gallow

This is a draft written by J. Dmitri Gallow, Senior Research Fellow at the Dianoia Institute of Philosophy at ACU, as part of the Center for AI Safety Philosophy Fellowship. This draft is meant to solicit feedback. Here is a PDF version of the draft.

Abstract

The thesis of instrumental convergence holds that a wide range of ends have common means: for instance, self preservation, desire preservation, self improvement, and resource acquisition. Bostrom (2014) contends that instrumental convergence gives us reason to think that ''the default outcome of the creation of machine superintelligence is existential catastrophe''. I use the tools of decision theory to investigate whether this thesis is true. I find that, even... (read 9852 more words →)