lackofcheese — LessWrong

LESSWRONG
LW

I think there are some rather significant assumptions underlying the idea that they are "non-relevant". At the very least, if the agents were distinguishable, I think you should indeed be willing to pay to make n higher. On the other hand, if they're indistinguishable then it's a more difficult question, but the anthropic averaging I suggested in my previous comments leads to absurd results.

What's your proposal here?

"Solving" selfishness for UDT

lackofcheese11y10

I don't think that's entirely correct; SSA, for example, is a halfer position and it does exclude worlds where you don't exist, as do many other anthropic approaches.

Personally I'm generally skeptical of averaging over agents in any utility function.

"Solving" selfishness for UDT

lackofcheese11y10

You definitely don't have a 50% chance of dying in the sense of "experiencing dying". In the sense of "ceasing to exist" I guess you could argue for it, but I think that it's much more reasonable to say that both past selves continue to exist as a single future self.

Regardless, this stuff may be confusing, but it's entirely conceivable that with the correct theory of personal identity we would have a single correct answer to each of these questions.

"Solving" selfishness for UDT

lackofcheese11y10

OK, the "you cause 1/10 of the policy to happen" argument is intuitively reasonable, but under that kind of argument divided responsibility has nothing to do with how many agents are subjectively indistinguishable and instead has to do with the agents who actually participate in the linked decision.

On those grounds, "divided responsibility" would give the right answer in Psy-Kosh's non-anthropic problem. However, this also means your argument that SIA+divided = SSA+total clearly fails, because of the example I just gave before, and beca... (read more)

"Solving" selfishness for UDT

lackofcheese11y10

As I mentioned earlier, it's not an argument against halfers in general; it's against halfers with a specific kind of utility function, which sounds like this: "In any possible world I value only my own current and future subjective happiness, averaged over all of the subjectively indistinguishable people who could equally be "me" right now."

In the above scenario, there is a 1/2 chance that both Jack and Roger will be created, a 1/4 chance of only Jack, and a 1/4 chance of only Roger.

Before finding out who you are, averaging would lead ... (read more)

"Solving" selfishness for UDT

lackofcheese11y10

Linked decisions is also what makes the halfer paradox go away.

I don't think linked decisions make the halfer paradox I brought up go away. Any counterintuitive decisions you make under UDT are simply ones that lead to you making a gain in a counterfactual possible worlds at the cost of a loss in actual possible worlds. However, in the instance above you're losing both in the real scenario in which you're Jack, and in the counterfactual one in which you turned out to be Roger.

Granted, the "halfer" paradox I raised is an argument against having... (read more)

"Solving" selfishness for UDT

lackofcheese11y10

But SIA also has some issues with order of information, though it's connected with decisions

Can you illustrate how the order of information matters there? As far as I can tell it doesn't, and hence it's just an issue with failing to consider counterfactual utility, which SIA ignores by default. It's definitely a relevant criticism of using anthropic probabilities in your decisions, because failing to consider counterfactual utility results in dynamic inconsistency, but I don't think it's as strong as the associated criticism of SSA.

Anyway, if your ref

... (read more)

"Solving" selfishness for UDT

lackofcheese11y10

That's not true. The SSA agents are only told about the conditions of the experiment after they're created and have already opened their eyes.

Consequently, isn't it equally valid for me to begin the SSA probability calculation with those two agents already excluded from my reference class?

Doesn't this mean that SSA probabilities are not uniquely defined given the same information, because they depend upon the order in which that information is incorporated?

"Solving" selfishness for UDT

lackofcheese11y10

I think that argument is highly suspect, primarily because I see no reason why a notion of "responsibility" should have any bearing on your decision theory. Decision theory is about achieving your goals, not avoiding blame for failing.

However, even if we assume that we do include some notion of responsibility, I think that your argument is still incorrect. Consider this version of the incubator Sleeping Beauty problem, where two coins are flipped.
HH => Sleeping Beauties created in Room 1, 2, and 3
HT => Sleeping Beauty created in Room 1
TH =&... (read more)

"Solving" selfishness for UDT

lackofcheese11y10

There's no "should" - this is a value set.

The "should" comes in giving an argument for why a human rather than just a hypothetically constructed agent might actually reason in that way. The "closest continuer" approach makes at least some intuitive sense, though, so I guess that's a fair justification.

The halfer is only being strange because they seem to be using naive CDT. You could construct a similar paradox for a thirder if you assume the ticket pays out only for the other copy, not themselves.

I think there's more t... (read more)

"Solving" selfishness for UDT

lackofcheese11y10

On 1), I agree that "pre-chewing" anthropic utility functions appears to be something of a hack. My current intuition in that regard is to reject the notion of anthropic utility (although not anthropic probability), but a solid formulation of anthropics could easily convince me otherwise.

On 2), if it's within the zone of validity then I guess that's sufficient to call something "a correct way" of solving the problem, but if there is an equally simple or simpler approach that has a strictly broader domain of validity I don't think you can be justified in calling it "the right way".

"Solving" selfishness for UDT

lackofcheese11y10

That's a reasonable point, although I still have two major criticisms of it.

What is your resolution to the confusion about how anthropic reasoning should be applied, and to the various potential absurdities that seem to come from it? Non-anthropic probabilities do not have this problem, but anthropic probabilities definitely do.
How can anthropic probability be the "right way" to solve the Sleeping Beauty problem if it lacks the universality of methods like UDT?

"Solving" selfishness for UDT

lackofcheese11y30

The strongest argument against anthropic probabilities in decision-making comes from problems like the Absent-Minded Driver, in which the probabilities depend upon your decisions.

If anthropic probabilities don't form part of a general-purpose decision theory, and you can get the right answers by simply taking the UDT approach and going straight to optimising outcomes given the strategies you could have, what use are the probabilities?

I won't go so far as to say they're meaningless, but without a general theory of when and how they should be used I definitely think the idea is suspect.

"Solving" selfishness for UDT

lackofcheese11y70

OK; I agree with you that selfishness is ill-defined, and the way to actually specify a particular kind of selfishness is to specify a utility function over all possible worlds (actual and counterfactual). Moreover, the general procedure for doing this is to assign "me" or "not me" label to various entities in the possible worlds, and derive utilities for those worlds on the basis of those labels. However, I think there are some issues that still need to be resolved here.

If I don't exist, I value the person that most closely resembles

lackofcheese11y00

First of all, I think your argument from connection of past/future selves is just a specific case of the more general argument for reflective consistency, and thus does not imply any kind of "selfishness" in and of itself. More detail is needed to specify a notion of selfishness.

I understand your argument against identifying yourself with another person who might counterfactually have been in the same cell, but the problem here is that if you don't know how the coin actually came up you still have to assign amounts of "care" to the poss... (read more)

Introducing Corrigibility (an FAI research subfield)

lackofcheese11y70

That's definitely a more elegant presentation.

I'm not too surprised to hear you had already discovered this idea, since I'm familiar with the gap between research and writing speed. As someone who is not involved with MIRI, consideration of some FAI-related problems is at least somewhat disincentivized by the likelihood that MIRI already has an answer.

As for flaws, I'll list what I can think of. First of all, there are of course some obvious design difficulties, including the difficulty of designing US in the first place, and the difficulty of choosing th... (read more)

Anthropic decision theory for selfish agents

lackofcheese11y20

I already have a more detailed version here; see the different calcualtions for E[T] vs E[IT]. However, I'll give you a short version. From the gnome's perspective, the two different types of total utilitarian utility functions are:
T = total $ over both cells
IT = total $ over both cells if there's a human in my cell, 0 otherwise.
and the possible outcomes are
p=1/4 for heads + no human in my cell
p=1/4 for heads + human in my cell
p=1/2 for tails + human in my cell.

As you can see, these two utility functions only differ when there is no human in the gnome's ... (read more)

What false beliefs have you held and why were you wrong?

lackofcheese11y20

The distinction is that a theory of "unicorns" is a theory that describes how and why other people (and probably you yourself) think about unicorns, while a theory of unicorns would explain actual unicorns. The latter would clearly fail as a theory, because you're never going to actually see a unicorn.

The same distinction doesn't apply to Newtonian mechanics, because Newtonian mechanics is a theory of mechanics, not a theory of how people think about mechanics.

On those grounds, I think it's quite reasonable to say that virtual particles are real, and "unicorns" are real, but unicorns are not real.

Introducing Corrigibility (an FAI research subfield)

lackofcheese11y100

I think I can give a concise unification of my idea with Karl's. In short, the comment in the paper that

The concerns in Section 4.2 could potentially be addressed by some form of counterfactual (rather than conditional) reasoning.

is correct, and the fix is a pretty simple one. Basically, we want the following:

In selecting a1, the agent should act as though it was indifferent between the counterfactual possibilities of shutting down and not shutting down, conditional on the same actions and observations.
In selecting a2, the agent should desire to shu

lackofcheese11y00

Ah, but then you're talking about a theory of "unicorns" rather than a theory of unicorns.