All of J. Dmitri Gallow's Comments + Replies

Thanks for the read and for the response.

>None of your models even include actions that are analogous to the convergent actions on that list.

I'm not entirely sure what you mean by "model", but from your use in the penultimate paragraph, I believe you're talking about a particular decision scenario Sia could find herself in. If so, then my goal wasn't to prove anything about a particular model, but rather to prove things about every model.

>The non-sequential theoretical model is irrelevant to instrumental convergence, because instrumental convergence ... (read more)

4Jeremy Gillen
Ah thanks, this clears up most of my confusion, I had misunderstood the intended argument here. I think I can explain my point better now: I claim that proposition 3, when extended to sequential decisions with a resolute decision theory, shouldn't be interpreted the way you interpret it. The meaning changes when you make A and B into sequences of actions. Let's say action A is a list of 1000000 particular actions (e.g. 1000000 small-edits) and B is a list of 1000000 particular actions (e.g. 1 improve-technology, then 999999 amplified-edits).[1] Proposition 3 says that A is equally likely to be chosen as B (for randomly sampled desires). This is correct. Intuitively this is because A and B are achieving particular outcomes and desires are equally likely to favor "opposite" outcomes. However this isn't the question we care about. We want to know whether action-sequences that contain "improve-technology" are more likely to be optimal than action-sequences that don't contain "improve-technology", given a random desire function. This is a very different question to the one proposition 3 gives us an answer to. Almost all optimal action-sequences could contain "improve-technology" at the beginning, while any two particular action sequences are equally likely to be preferred to the other on average across desires. These two facts don't contradict each other. The first fact is true in many environments (e.g. the one I described[2]) and this is what we mean by instrumental convergence. The second fact is unrelated to instrumental convergence. ---------------------------------------- I think the error might be coming from this definition of instrumental convergence:  When A is a sequence of actions, this definition makes less sense. It'd be better to define it as something like "from a menu of n initial actions, she has a better than 1/n probability of choosing a particular initial action A1".      Yep, I was using "model" to mean "a simplified representation of a c

There's nothing unusual about my assumptions regarding instrumental rationality. It's just standard expected utility theory.

The place I see to object is with my way of spreading probabilities over Sia's desires. But if you object to that, I want to hear more about which probably distribution I should be using to understand the claim that Sia's desires are likely to rationalise power-seeking, resource acquisition, and so on. I reached for the most natural way of distributing probabilities I could come up with---I was trying to be charitable to the thesis, &... (read more)

A quick prefatory note on how I'm thinking about 'goals' (I don't think it's relevant, but I'm not sure): as I'm modelling things, Sia's desires/goals are given by a function from ways the world could be (colloquially, 'worlds') to real numbers, , with the interpretation that  is how well satisfied Sia's desires are if  turns out to be the way the world actually is. By 'the world', I mean to include all of history, from the beginning to the end of time, and I mean to encompass every region of space. I assume that this functio... (read more)

Wouldn't this imply a bias towards eliminating other agents? (Since that would make the world more predictable, and thereby leave less up to chance?)

A few things to note. Firstly, when I say that there's a 'bias' towards a certain kind of choice, I just mean that the probability that a superintelligent agent with randomly sampled desires (Sia) would make that choice is greater than 1/N, where N is the number of choices available. So, just to emphasize the scale of the effect: even if you were right about that inference, you should still assign very low pro... (read more)

0rvnnt
Thanks for the response. To the extent that I understand your models here, I suspect they don't meaningfully bind/correspond to reality. (Of course, I don't understand your models at all well, and I don't have the energy to process the whole post, so this doesn't really provide you with much evidence; sorry.) I wonder how one could test whether or not the models bind to reality? E.g. maybe there are case examples (of agents/people behaving in instrumentally rational ways) one could look at, and see if the models postdict the actual outcomes in those examples?

There are infinitely many desires like that, in fact (that's what proposition 2 shows).

More generally, take any self-preservation contingency plan, A, and any other contingency plan, B. If we start out uncertain about what Sia wants, then we should think her desires are just as likely to make A more rational than B as they are to make B more rational than A. (That's what proposition 3 shows.)

That's rough and subject to a bunch of caveats, of course. I try to go through all of those caveats carefully in the draft.

2Evan R. Murphy
Interesting... still taking that in. Related question: Doesn't goal preservation typically imply self preservation? If I want to preserve my goal, and then I perish, I've failed because now my goal has been reassigned from X to nil.