Comment author: Manfred 28 October 2014 02:12:50AM *  1 point [-]

1 - I don't have a general solution, there are plenty of things I'm confused about - and certain cases where anthropic probability depends on your action are at the top of the list. There is a sense in which a certain extension of UDT can handle these cases if you "pre-chew" indexical utility functions into world-state utility functions for it (like a more sophisticated version of what's described in this post, actually), but I'm not convinced that this is the last word.

Absurdity and confusion have a long (if slightly spotty) track record of indicating a lack in our understanding, rather than a lack of anything to understand.

2 - Same way that CDT gets the right answer on how much to pay for 50% chance of winning $1, even though CDT isn't correct. The Sleeping Beauty problem is literally so simple that it's within the zone of validity of CDT.

Comment author: lackofcheese 28 October 2014 02:47:02AM *  1 point [-]

On 1), I agree that "pre-chewing" anthropic utility functions appears to be something of a hack. My current intuition in that regard is to reject the notion of anthropic utility (although not anthropic probability), but a solid formulation of anthropics could easily convince me otherwise.

On 2), if it's within the zone of validity then I guess that's sufficient to call something "a correct way" of solving the problem, but if there is an equally simple or simpler approach that has a strictly broader domain of validity I don't think you can be justified in calling it "the right way".

Comment author: Manfred 28 October 2014 12:05:02AM *  3 points [-]

Probabilities have a foundation independent of decision theory, as encoding beliefs about events. They're what you really do expect to see when you look outside.

This is an important note about the absent-minded driver problem et al, that gets lost if one gets comfortable in the effectiveness of UDT. The agent's probabilities are still accurate, and still correspond to the frequency with which they see things (truly!) - but they're no longer related to decision-making in quite the same way.

"The use" is then to predict, as accurately as ever, what you'll see when you look outside yourself.

And yes, probabilities can sometimes depend on decisions, not only in some anthropic problems but more generally in Newcomb-like ones. Yes, the idea of having a single unqualified belief, before making a decision, doesn't make much sense in these cases. But Sleeping Beauty is not one of these cases.

Comment author: lackofcheese 28 October 2014 01:37:27AM *  1 point [-]

That's a reasonable point, although I still have two major criticisms of it.

  1. What is your resolution to the confusion about how anthropic reasoning should be applied, and to the various potential absurdities that seem to come from it? Non-anthropic probabilities do not have this problem, but anthropic probabilities definitely do.
  2. How can anthropic probability be the "right way" to solve the Sleeping Beauty problem if it lacks the universality of methods like UDT?
Comment author: Manfred 27 October 2014 10:34:22PM *  2 points [-]

I agree with most of this post, but not with most parts mentioning SSA/SIA or the sleeping beauty problem. In general, aside from those two areas I find your written works to be valuable resources. Now that I've said something nice, here's a long comment predictably focusing on the bad bits.

SSA and SIA, as interpreted by you, seem uninformative (treating them as two different black boxes rather than two settings on a transparent box), so I'm not surprised that you decided SSA vs SIA was meaningless. But this does not mean that anthropic probability is meaningless. Certainly you didn't prove that - you tried something else, that's all. It's analogous to how just because UDT solves Psy-Kosh's non-anthropic problem without mentioning classical probability updates, that doesn't mean classical probability updates are "meaningless."

Each of them reasons something like this:

"There are four possible worlds here. In the tails world, I, Jack/Roger, could exist in Room 1 or in Room 2. And in the heads world, it could be either me existing in Room 1, or the other person existing in Room 1 (in which case I don't exist). I'm completely indifferent to what happens in worlds where I don't exist (sue me, I'm selfish). So if I buy the coupon for £x, I expect to make utility: 0.25(0) + 0.25(-x) + 0.5(1-x)=0.5-0.75x. Therefore I will buy the coupon for x<£2/3."

This is the gnome's reasoning with different labels. But that doesn't mean that it has the right labels to be the human's reasoning.

It sounds like the sort of thing that a person who believed that anthropic probabilities were meaningless would write as the person's reasoning.

Let me try and give an analogy for how this sounds to me. It will be grossly unfair to you, and I apologize - pretend the content is a lot better even as the sound remains similar.

Suppose you're sitting in your room, and also in your room is a clock. Now imagine there was a gnome flying by with time dilation of 0.5. The gnome reasons as follows "I see a human and a clock moving past me together. The clock ticks at half a tick per second, and the person thinks at half normal speed, so the human sees the clock tick once per second"

My grossly unfair parody of you would then say: "Physics would be the same if I was moving past with time dilation 0.5. I'd see myself and my clock moving past me together. The clock would tick at half a tick per second, and I'd think at half normal speed, so I see the clock tick once per second."

This is the right conclusion, but it's just copying what the gnome said even when that's not appropriate.

What do I think the right way would look like? Well, it would have anthropic probabilities in it.

Comment author: lackofcheese 27 October 2014 11:41:37PM 2 points [-]

The strongest argument against anthropic probabilities in decision-making comes from problems like the Absent-Minded Driver, in which the probabilities depend upon your decisions.

If anthropic probabilities don't form part of a general-purpose decision theory, and you can get the right answers by simply taking the UDT approach and going straight to optimising outcomes given the strategies you could have, what use are the probabilities?

I won't go so far as to say they're meaningless, but without a general theory of when and how they should be used I definitely think the idea is suspect.

Comment author: lackofcheese 27 October 2014 11:05:48PM *  4 points [-]

OK; I agree with you that selfishness is ill-defined, and the way to actually specify a particular kind of selfishness is to specify a utility function over all possible worlds (actual and counterfactual). Moreover, the general procedure for doing this is to assign "me" or "not me" label to various entities in the possible worlds, and derive utilities for those worlds on the basis of those labels. However, I think there are some issues that still need to be resolved here.

If I don't exist, I value the person that most closely resembles me.

This appears suspect to me. If there is no person who closely resembles you, I guess in that case you're indifferent, right? However, what if two people are equally close to you, how do you assign utility to them in that case? Also, why should you only value people who closely resemble you if you don't exist? If anything, wouldn't you care about them in worlds where you do exist?

As you've noted, in a simple case where you only have to worry about actual worlds and not counterfactual ones, and there is only a single "me", assigning selfish utility is a relatively straightforward task. Being indifferent about counterfactual worlds where "you" don't exist also makes some sense from a selfish perspective, although it brings you into potential conflict with your own past self. Additionally, the constant "C" may not be quite so arbitrary in the general case---what if your decision influences the probability of your own existence? In such a situation, the value of that constant will actually matter.

However, the bigger issue that you haven't covered is this: if there are multiple entities in the same world to which you do (or potentially could) assign the label "me", how do you assign utility to that world?

For example, in the scenario in your post, if I assume that the person in Room 1 in the heads world can indeed be labeled as "me", how do I assign utilities to a tails world in which I could be either one of the two created copies? It appears to me that there are two different approaches, and I think it makes sense to apply the label "selfish" to both of them. One of them would be to add utility over selves (again a "thirder" position), and another would be to average utility over selves (which is halfer-equivalent). Nor do I think that the "adding" approach is equivalent to your notion of "copy-altruism", because under the "adding" approach you would stop caring about your copies once you figured out which one you were, whereas under copy-altruism you would continue to care.

Under those assumptions, a "halfer" would be very strange indeed, because
1) They are only willing to pay 1/2 for a ticket.
2) They know that they must either be Jack or Roger.
3) They know that upon finding out which one they are, regardless of whether it's Jack or Roger, they would be willing to pay 2/3.

Can a similar argument be made against a selfish thirder?

Comment author: Beluga 25 October 2014 05:45:55PM *  1 point [-]

Thanks a lot for your comments, they were very insightful for me. Let me play the Advocatus Diaboli here and argue from the perspective of a selfish agent against your reasoning (and thus also against my own, less refined version of it).

"I object to the identification 'S = $B'. I do not care about the money owned by the person in cell B, I only do so if that person is me. I do not know whether the coin has come up heads or tails, but I do not care about how much money the other person that may have been in cell B had the coin come up differently would have paid or won. I only care about the money owned by the person in cell B in "this world", where that person is me. I reject identifying myself with the other person that may have been in cell B had the coin come up differently, solely because that person would exist in the same cell as I do. My utility function thus cannot be expressed as a linear combination of $B and $C.

I would pay a counterfactual mugger. In that case, there is a transfer, as it were, between two possible selfes of mine that increases "our" total fortune. We are both both possible descendants of the same past-self, to which each of us is connected identically. The situation is quite different in the incubator case. There is no connection over a mutual past self between me and the other person that may have existed in cell B after a different outcome of the coin flip. This connection between past and future selves of mine is exactly what specifies my selfish goals. Actually, I don't feel like the person that may have existed in cell B after a different outcome of the coin flip is "me" any more than the person in cell C is "me" (if that person exists). Since I will pay and win as much as the person in cell C (if they exist), I cannot win any money from them, and I don't care about whether they exist at all, I think I should decide as an average utilitarian would. I will not pay more than $0.50."

Is the egoist arguing this way mistaken? Or is our everyday notion of selfishness just not uniquely defined when it comes to the possibility of subjectively indistinguishable agents living in different "worlds", since it rests on the dubious concept of personal identity? Can one understand selfishness both as caring about everyone living in subjectively identical circumstances as oneself (and their future selves), and as caring about everyone to whom one is directly connected only? Do these two possibilities correspond to SIA-egoists and SSA-egoists, respectively, which are both coherent possibilities?

Comment author: lackofcheese 26 October 2014 01:41:39PM *  0 points [-]

First of all, I think your argument from connection of past/future selves is just a specific case of the more general argument for reflective consistency, and thus does not imply any kind of "selfishness" in and of itself. More detail is needed to specify a notion of selfishness.

I understand your argument against identifying yourself with another person who might counterfactually have been in the same cell, but the problem here is that if you don't know how the coin actually came up you still have to assign amounts of "care" to the possible selves that you could actually be.

Let's say that, as in my reasoning above, there are two cells, B and C; when the coin comes up tails humans are created in both cell B and cell C, but when the coin comes up heads a human is created in either cell B or cell C, with equal probability. Thus there are 3 "possible worlds":
1) p=1/2 human in both cells
2) p=1/4 human in cell B, cell C empty
3) p=1/4 human in cell C, cell B empty

If you're a selfish human and you know you're in cell B, then you don't care about world (3) at all, because there is no "you" in it. However, you still don't know whether you're in world (1) or (2), so you still have to "care" about both worlds. Moreover, in either world the "you" you care about is clearly the person in cell B, and so I think the only utility function that makes sense is S = $B. If you want to think about it in terms of either SSA-like or SIA-like assumptions, you get the same answer because both in world (1) and world (2) there is only a single observer who could be identified as "you".

Now, what if you didn't know whether you were in cell B or cell C? That's where things are a little different. In that case, there are two observers in world (1), either of whom could be "you". There are basically two different ways of assigning utility over the two different "yous" in world (1)---adding them together, like a total utilitarian, and averaging them, like an average utilitarian; the resulting values are x=2/3 and x=1/2 respectively. Moreover, the first approach is equivalent to SIA, and the second is equivalent to SSA.

However, the SSA answer has a property that none of the others do. If the gnome was to tell the human "you're in cell B", an SSA-using human would change their cutoff point from 1/2 to 2/3. This seems to be rather strange indeed, because whether the human is in cell B or in cell C is not in any way relevant to the payoff. No human with any of the other utility functions we've considered would change his/her answer upon being told that they are in cell B.

Comment author: So8res 25 October 2014 02:59:01AM *  3 points [-]

Thanks, and nice work!

Thus the utility of (a1, o) for o in Press should be equivalent to the utility of the same (a1, o) under the counterfactual assumption that o is not in Press, and vice versa

Yeah, this is pretty key. You need it to optimize for both cases as if the probability of the button being pressed is fixed and independent of whether the programmers actually press the button. We can achieve this via a causal intervention on whether or not the button is pressed, and then clean up your U a bit by redefining it as follows:

U(a1, o, a2) :=
{ UN(a1, o, a2) + E[US|do(O in Press)] if o not in Press
; US(a1, o, a2) + E[UN|do(O not in Press)] else }

(Choosing how to compare UN values to US values makes the choice of priors redundant. If you want the priors to be 2:1 in favor of US then you could also have just doubled US in the first place instead; the degree of freedom in the prior is the same as the degree of freedom in the relative scaling. See also Loudness Priors, a technical report from the last workshop.)

This method does seem to fulfill all the desiderata in the paper, although we're not too confident in it yet (it took us a little while to notice the "managing the news" problem in the first version, and it seems pretty likely that this too will have undesirable properties lurking somewhere). I'm fairly pleased with this solution, though, and a little miffed -- we found something similar to this a little while back (our research outstrips our writing speed, unfortunately) and now you've gone and ruined the surprise! :-)

(In seriousness, though, nice work. Next question is, can we pick any holes in it?)

Comment author: lackofcheese 25 October 2014 05:08:57AM *  4 points [-]

That's definitely a more elegant presentation.

I'm not too surprised to hear you had already discovered this idea, since I'm familiar with the gap between research and writing speed. As someone who is not involved with MIRI, consideration of some FAI-related problems is at least somewhat disincentivized by the likelihood that MIRI already has an answer.

As for flaws, I'll list what I can think of. First of all, there are of course some obvious design difficulties, including the difficulty of designing US in the first place, and the difficulty of choosing the appropriate way of scaling US, but those seem to be resolvable.

One point that occurs to me under the assumptions of the toy model is that decisions involving larger differences in values of UN are at the same time more dangerous and more likely to outweigh the agent's valuation of its future corrigibility. Moreover, simply increasing the scaling of US to compensate would cause US to significantly outweigh UN in the context of smaller decisions.

An example would be that the AI decides it's crucial to take over the world in order to "save" it, so it starts building an army of subagents to do it, and it decides that building corrigibility into those subagents is not worth the associated risk of failure.

However, it appears that this problem can still be solved by designing US correctly in the first place; a well-designed US should clearly assign greater negative weighting to larger-scale corrigibility failures than to smaller scale ones.

There's two other questions that I can see that relate to scaling up the toy model.

  1. How does this model extend past the three-timestep toy scenario?
  2. Does the model remain stable under assumptions of bounded computational power? In more complex scenarios there are obvious questions of "tiling", but I think there is a more basic issue to answer that applies even in the three-timestep case. That is, if the agent will not be able to calculate the counterfactual utility values E[U | do(.)] exactly, can we make sure that the agent's process of estimation will avoid making systematic errors that result in pathological behaviour?
Comment author: Manfred 25 October 2014 02:38:01AM 1 point [-]

Could you give a worked example of the correct action for the gnome with a human in their cell depending on the payoffs for the gnome without a human in their cell? (Assuming they know whether there's a human in their cell, and know the three different possible sets of payoffs for the available actions - if these constraints were relaxed I think it would be clearly doable. As it is I'm doubtful.)

Comment author: lackofcheese 25 October 2014 03:53:14AM *  1 point [-]

I already have a more detailed version here; see the different calcualtions for E[T] vs E[IT]. However, I'll give you a short version. From the gnome's perspective, the two different types of total utilitarian utility functions are:
T = total $ over both cells
IT = total $ over both cells if there's a human in my cell, 0 otherwise.
and the possible outcomes are
p=1/4 for heads + no human in my cell
p=1/4 for heads + human in my cell
p=1/2 for tails + human in my cell.

As you can see, these two utility functions only differ when there is no human in the gnome's cell. Moreover, by the assumptions of the problem, the utility functions of the gnomes are symmetric, and their decisions are also. UDT proper doesn't apply to gnomes whose utility function is IT, because the function IT is different for each of the different gnomes, but the more general principle of linked decisions still applies due to the obvious symmetry between the gnomes' situations, despite the differences in utility functions. Thus we assume a linked decision where either gnome recommends buying a ticket for $x.

The utility calculations are therefore
E[T] = (1/4)(-x) + (1/4)(-x) + (1/2)2(1-x) = 1-(3/2)x (breakeven at 2/3)
E[IT] = (1/4)(0) + (1/4)(-x) + (1/2)2(1-x) = 1-(5/4)x (breakeven at 4/5)

Thus gnomes who are indifferent when no human is present (U = IT) should precommit to a value of x=4/5, while gnomes who still care about the total $ when no human is present (U = T) should precommit to a value of x=2/3.

Note also that this is invariant under the choice of which constant value we use to represent indifference. For some constant C, the correct calculation would actually be
E[IT | buy at $x] = (1/4)(C) + (1/4)(-x) + (1/2)2(1-x) = (1/4)C + 1-(5/4)x
E[IT | don't buy] = (1/4)(C) + (1/4)(0) + (1/2)(0) = (1/4)C
and so the breakeven point remains at x = 4/5

Comment author: shminux 25 October 2014 02:58:39AM 1 point [-]

Not sure what you are saying. My guess is that you are implying that the quotation is not the referent, and unicorns are hypothetical magical creatures, while "unicorns" are vivid and very real descriptions of them in the stories often read and written by the local bronies. If so, then all I have to say that unicorn is not an accurate or fertile theory, while "unicorn" most definitely is. The difference is the domain of validity: can you go outside and find one running around, or can you mostly encounter them in books and movies? But that applies to most theories. If you go slow, Newtonian mechanics is adequate, if you study fast-moving objects, Newton gives bad predictions. Similarly, if you apply the predictions of the "unicorn" model beyond the domain of its validity, you are going to be disappointed, though occasionally you might discover a new applicable domain, such as a cosplay or a SFF convention.

Comment author: lackofcheese 25 October 2014 03:29:57AM 1 point [-]

The distinction is that a theory of "unicorns" is a theory that describes how and why other people (and probably you yourself) think about unicorns, while a theory of unicorns would explain actual unicorns. The latter would clearly fail as a theory, because you're never going to actually see a unicorn.

The same distinction doesn't apply to Newtonian mechanics, because Newtonian mechanics is a theory of mechanics, not a theory of how people think about mechanics.

On those grounds, I think it's quite reasonable to say that virtual particles are real, and "unicorns" are real, but unicorns are not real.

Comment author: lackofcheese 25 October 2014 02:35:59AM *  6 points [-]

I think I can give a concise unification of my idea with Karl's. In short, the comment in the paper that

The concerns in Section 4.2 could potentially be addressed by some form of counterfactual (rather than conditional) reasoning.

is correct, and the fix is a pretty simple one. Basically, we want the following: 1. In selecting a1, the agent should act as though it was indifferent between the counterfactual possibilities of shutting down and not shutting down, conditional on the same actions and observations. 2. In selecting a2, the agent should desire to shut down when it is required to, and desire not to shut down when it is not required to. 3. Despite its indifference according to (1), the agent should care about its future corrigibility (i.e. its ability to actually shut down when the button is pressed).

One utility function that satisfies these requirements, in a general form, is U(a1, o, a2) = {
(1-p) * UN(a1, o, a2) + p * max_b(US(a1, o, b)), if o is not in Press
p * US(a1, o, a2) + (1-p) * max_b(UN(a1, o, b)), if o is in Press
}, where p is a fixed constant.

From the agent's point of view, this is equivalent to deciding a1 based on an a priori assumption that it will shut down (maximise US) with a fixed probability p, and not shut down (maximise UN) otherwise. However, in selecting a2 it will in fact maximise either UN or US depending upon whether the shutdown button was pressed or not. As a particular example, Karl's proposition below is equivalent to the limit of this general case as p->0. See below for further details on how/why this utility function achieves the three criteria above.

  1. We don't want the agent to be indifferent to the actual event of the shutdown button being pressed. In fact, what we want is indifference as to whether or not any given event is classed as being a shutdown press. Thus the expected utility of (a1, o) for o in Press should be equivalent to the expected utility of the same (a1, o) under the counterfactual assumption that o is not in Press, and vice versa. To do this, the utilities of both branches of U (the Press branch and the NotPress branch) need to be made counterfactually equivalent, even though the agent selects a2 via UN in the NotPress branch and via US in the Press branch. This is precisely what Karl's modification achieves.

  2. This is pretty easy to achieve by ensuring that only the UN component of the NotPress branch depends upon a2, and only the US component of the Press branch depends upon a2.

  3. We want the agent to be incentivised to take actions that are compatible with future shutdown, which means that the agent's selection of a1 should be influenced by US as well as UN, but without introducing perverse incentives. Fortunately, the counterfactual indifference approach already solves the perversity issue, and so all that remains is to appropriately combine the influence of both UN and US. A reasonable approach is for the agent to act as though it will have to shut down with a fixed probability p, regardless of its observations. Because p is fixed, and because the agent is indifferent between any given observation being classed as a button press or not a button press, the agent is not in any way incentivised to influence the likelihood of the shutdown button being pressed.

Comment author: shminux 24 October 2014 05:44:15PM 0 points [-]

Interesting, thank you. I guess our views are not that far apart. And I also

don't think that there is a meaningful sense in which some of these things are more real than others.

though if someone comes up with an interesting, accurate and fruitful meta-model of partial existence, I'd be happy to change my mind.

I don't think unicorns or ghosts exist to any degree, because they are not part of our best theory of the relevant domain.

Could it be because you are trying to apply them to a wrong domain? Would you agree that in a certain setting (a fantasy tale, a horror story) we can predict behavioral and visual features of the creatures inhabiting it with a fair degree of accuracy? Often more accurately than, say, a path and strength of a tropical storm being born in the Atlantic.

Comment author: lackofcheese 25 October 2014 01:04:27AM 0 points [-]

Ah, but then you're talking about a theory of "unicorns" rather than a theory of unicorns.

View more: Prev | Next