I don't think that's entirely correct; SSA, for example, is a halfer position and it does exclude worlds where you don't exist, as do many other anthropic approaches.
Personally I'm generally skeptical of averaging over agents in any utility function.
You definitely don't have a 50% chance of dying in the sense of "experiencing dying". In the sense of "ceasing to exist" I guess you could argue for it, but I think that it's much more reasonable to say that both past selves continue to exist as a single future self.
Regardless, this stuff may be confusing, but it's entirely conceivable that with the correct theory of personal identity we would have a single correct answer to each of these questions.
OK, the "you cause 1/10 of the policy to happen" argument is intuitively reasonable, but under that kind of argument divided responsibility has nothing to do with how many agents are subjectively indistinguishable and instead has to do with the agents who actually participate in the linked decision.
On those grounds, "divided responsibility" would give the right answer in Psy-Kosh's non-anthropic problem. However, this also means your argument that SIA+divided = SSA+total clearly fails, because of the example I just gave before, and beca...
As I mentioned earlier, it's not an argument against halfers in general; it's against halfers with a specific kind of utility function, which sounds like this: "In any possible world I value only my own current and future subjective happiness, averaged over all of the subjectively indistinguishable people who could equally be "me" right now."
In the above scenario, there is a 1/2 chance that both Jack and Roger will be created, a 1/4 chance of only Jack, and a 1/4 chance of only Roger.
Before finding out who you are, averaging would lead ...
Linked decisions is also what makes the halfer paradox go away.
I don't think linked decisions make the halfer paradox I brought up go away. Any counterintuitive decisions you make under UDT are simply ones that lead to you making a gain in a counterfactual possible worlds at the cost of a loss in actual possible worlds. However, in the instance above you're losing both in the real scenario in which you're Jack, and in the counterfactual one in which you turned out to be Roger.
Granted, the "halfer" paradox I raised is an argument against having...
But SIA also has some issues with order of information, though it's connected with decisions
Can you illustrate how the order of information matters there? As far as I can tell it doesn't, and hence it's just an issue with failing to consider counterfactual utility, which SIA ignores by default. It's definitely a relevant criticism of using anthropic probabilities in your decisions, because failing to consider counterfactual utility results in dynamic inconsistency, but I don't think it's as strong as the associated criticism of SSA.
...Anyway, if your ref
That's not true. The SSA agents are only told about the conditions of the experiment after they're created and have already opened their eyes.
Consequently, isn't it equally valid for me to begin the SSA probability calculation with those two agents already excluded from my reference class?
Doesn't this mean that SSA probabilities are not uniquely defined given the same information, because they depend upon the order in which that information is incorporated?
I think that argument is highly suspect, primarily because I see no reason why a notion of "responsibility" should have any bearing on your decision theory. Decision theory is about achieving your goals, not avoiding blame for failing.
However, even if we assume that we do include some notion of responsibility, I think that your argument is still incorrect. Consider this version of the incubator Sleeping Beauty problem, where two coins are flipped.
HH => Sleeping Beauties created in Room 1, 2, and 3
HT => Sleeping Beauty created in Room 1
TH =&...
There's no "should" - this is a value set.
The "should" comes in giving an argument for why a human rather than just a hypothetically constructed agent might actually reason in that way. The "closest continuer" approach makes at least some intuitive sense, though, so I guess that's a fair justification.
The halfer is only being strange because they seem to be using naive CDT. You could construct a similar paradox for a thirder if you assume the ticket pays out only for the other copy, not themselves.
I think there's more t...
On 1), I agree that "pre-chewing" anthropic utility functions appears to be something of a hack. My current intuition in that regard is to reject the notion of anthropic utility (although not anthropic probability), but a solid formulation of anthropics could easily convince me otherwise.
On 2), if it's within the zone of validity then I guess that's sufficient to call something "a correct way" of solving the problem, but if there is an equally simple or simpler approach that has a strictly broader domain of validity I don't think you can be justified in calling it "the right way".
That's a reasonable point, although I still have two major criticisms of it.
The strongest argument against anthropic probabilities in decision-making comes from problems like the Absent-Minded Driver, in which the probabilities depend upon your decisions.
If anthropic probabilities don't form part of a general-purpose decision theory, and you can get the right answers by simply taking the UDT approach and going straight to optimising outcomes given the strategies you could have, what use are the probabilities?
I won't go so far as to say they're meaningless, but without a general theory of when and how they should be used I definitely think the idea is suspect.
OK; I agree with you that selfishness is ill-defined, and the way to actually specify a particular kind of selfishness is to specify a utility function over all possible worlds (actual and counterfactual). Moreover, the general procedure for doing this is to assign "me" or "not me" label to various entities in the possible worlds, and derive utilities for those worlds on the basis of those labels. However, I think there are some issues that still need to be resolved here.
...If I don't exist, I value the person that most closely resembles
First of all, I think your argument from connection of past/future selves is just a specific case of the more general argument for reflective consistency, and thus does not imply any kind of "selfishness" in and of itself. More detail is needed to specify a notion of selfishness.
I understand your argument against identifying yourself with another person who might counterfactually have been in the same cell, but the problem here is that if you don't know how the coin actually came up you still have to assign amounts of "care" to the poss...
That's definitely a more elegant presentation.
I'm not too surprised to hear you had already discovered this idea, since I'm familiar with the gap between research and writing speed. As someone who is not involved with MIRI, consideration of some FAI-related problems is at least somewhat disincentivized by the likelihood that MIRI already has an answer.
As for flaws, I'll list what I can think of. First of all, there are of course some obvious design difficulties, including the difficulty of designing US in the first place, and the difficulty of choosing th...
I already have a more detailed version here; see the different calcualtions for E[T] vs E[IT]. However, I'll give you a short version. From the gnome's perspective, the two different types of total utilitarian utility functions are:
T = total $ over both cells
IT = total $ over both cells if there's a human in my cell, 0 otherwise.
and the possible outcomes are
p=1/4 for heads + no human in my cell
p=1/4 for heads + human in my cell
p=1/2 for tails + human in my cell.
As you can see, these two utility functions only differ when there is no human in the gnome's ...
The distinction is that a theory of "unicorns" is a theory that describes how and why other people (and probably you yourself) think about unicorns, while a theory of unicorns would explain actual unicorns. The latter would clearly fail as a theory, because you're never going to actually see a unicorn.
The same distinction doesn't apply to Newtonian mechanics, because Newtonian mechanics is a theory of mechanics, not a theory of how people think about mechanics.
On those grounds, I think it's quite reasonable to say that virtual particles are real, and "unicorns" are real, but unicorns are not real.
I think I can give a concise unification of my idea with Karl's. In short, the comment in the paper that
The concerns in Section 4.2 could potentially be addressed by some form of counterfactual (rather than conditional) reasoning.
is correct, and the fix is a pretty simple one. Basically, we want the following:
Ah, but then you're talking about a theory of "unicorns" rather than a theory of unicorns.
The deeper point is important, and I think you're mistaken about the necessary and sufficient conditions for an isomorphism here.
If a human appears in a gnome's cell, then that excludes the counterfactual world in which the human did not appear in the gnome's cell. However, on UDT, the gnome's decision does depend on the payoffs in that counterfactual world.
Thus, for the isomorphism argument to hold, the preferences of the human and gnome must align over counterfactual worlds as well as factual ones. It is not sufficient to have the same probabilities for ...
I think this means "indifference" isn't really the right term any more, because the agent is not actually indifferent between the two sets of observations, and doesn't really need to be.
So, how about
U(a1, o, a2) =
UN(a1, o, a2) + max_b(US(a1, o, b)), if o is not in Press
US(a1, o, a2) + max_b(UN(a1, o, b)), if o is in Press
or, in your notation, U(a1, o, a2) = g(a1, o) + UN(a1, o, a2) if o is in Press, or US(a1, o, a2) + f(a1, o) if o is in Press.
OK, you're right on that point; I misunderstood the "managing the news" problem because I hadn't quite realised that it was about shifting observations between the Press/NotPress sets. As you've said, the only resolution is to select a1 based on
E[max_b(UN(a1, O, b) | O; a1]
and not
E[max_b(UN(a1, O, b) | O not in Press; a1]
OK, following on from Karl's comment below, which seems to resolve Section 4.2, I think it shouldn't be that difficult to resolve some or all of the Section 4.1 concerns as well. In order to satisfy Desideratum 4, we need the agent to be incentivised to build in safeguards to ensure that its possible future actions will be rated well according to US in cases where the shutdown button is pressed. The most obvious way to do this is to penalise situations where there is no good shutdown action according to US and reward situations where there is such an actio...
I guess your comment means that you must have blinked an eye, so your comment can't be completely true. That said, as discussions of pre-emptively submissive gnomes go, I would generally expect the amount of eye-blinking on LW to be well below average ^_~
OK, time for further detail on the problem with pre-emptively submissive gnomes. Let's focus on the case of total utilitarianism, and begin by looking at the decision in unlinked form, i.e. we assume that the gnome's advice affects only one human if there is one in the room, and zero humans otherwise. Conditional on there being a human in cell B, the expected utility of the human in cell B buying a ticket for $x is, indeed, (1/3)(-x) + (2/3)(1-x) = 2/3 - x, so the breakeven is obviously at x = 2/3. However, if we also assume that the gnome in the other cel...
Yep, I think that's a good summary. UDT-like reasoning depends on the utility values of counterfactual worlds, not just real ones.
I don't think that works, because 1) isn't actually satisfied. The selfish human in cell B is indifferent over worlds where that same human doesn't exist, but the gnome is not indifferent.
Consequently, I think that as one of the humans in your "closest human" case you shouldn't follow the gnome's advice, because the gnome's recommendation is being influenced by a priori possible worlds that you don't care about at all. This is the same reason a human with utility function T shouldn't follow the gnome recommendation of 4/5 from a gnome with utili...
Having established the nature of the different utility functions, it's pretty simple to show how the gnomes relate to these. The first key point to make, though, is that there are actually two distinct types of submissive gnomes and it's important not to confuse the two. This is part of the reason for the confusion over Beluga's post.
Submissive gnome: I adopt the utility function of any human in my cell, but am completely indifferent otherwise.
Pre-emptively submissive gnome: I adopt the utility function of any human in my cell; if there is no human in my c...
I think I can resolve the confusion here, but as a quick summary, I'm quite sure Beluga's argument holds up. The first step is to give a clear statement of what the difference is between the indexical and non-indexical versions of the utility functions. This is important because the UDT approach translates to "What is the optimal setting for decision variable X, in order to maximise the expected utility over all a priori possible worlds that are influenced by decision variable X?" On the basis of UDT or UDT-like principles such as an assumption o...
There's some confusion here that needs to be resolved, and you've correctly pinpointed that the issue is with the indexical versions of the utility functions, or, equivalently, the gnomes who don't see a human at all.
I think I have a comprehensive answer to these issues, so I'm going to type it up now.
A good point. By abuse I wouldn't necessarily mean anything blatant though, just that selfish people are happy to receive resources from selfless people.
Sure, and there isn't really anything wrong with that as long as the person receiving the resources really needs them.
Valuing people equally by default when their instrumental value isn't considered. I hope I didn't misunderstand you. That's about as extreme it gets but I suppose you could get even more extreme by valuing other people more highly than yourself.
The term "altruism" is often ...
That's one way to put it, yes.
One can reasonably argue the other way too. New children are easier to make than new adults.
True. However, regardless of the relative value of children and adults, it is clear that one ought to devote significantly more time and effort to children than to adults, because they are incapable of supporting themselves and are necessarily in need of help from the rest of society.
Since she has finite resources, is there a practical difference?
Earlier I specifically drew a distinction between devoting time and effort and valuation; you don't have to value ...
If you have the values already and you don't have any reason to believe the values themselves could be problematic, does it matter how you got them?
It may be that an altruistic high in the past has led you to value altruism in the present, but what matters in the present is whether you value the altruism itself over and above the high.
Accounting for possible failure modes and the potential effects of those failure modes is a crucial part of any correctly done "morality math".
Granted, people can't really be relied upon to actually do it right, and it may not be a good idea to "shut up and multiply" if you can expect to get it wrong... but then failing to shut up and multiply can also have significant consequences. The worst thing you can do with morality math is to only use it when it seems convenient to you, and ignore it otherwise.
However, none of this talk of failu...
Probably not just any random person, because one can reasonably argue that children should be valued more highly than adults.
However, I do think that the mother should hold other peoples' children as being of equal value to her own. That doesn't mean valuing her own children less, it means valuing everyone else's more.
Sure, it's not very realistic to expect this of people, but that doesn't mean they shouldn't try.
So, either there is such a thing as the "objective" value and hence, implicitly, you should seek to approach that value, or there is not.
I don't see any reason to believe in an objective worth of this kind, but I don't really think it matters that much. If these is no single underlying value, then the act of assigning your own personal values to people is still the same thing as "passing judgement on the worth of humans", because it's the only thing those words could refer to; you can't avoid the issue simply by calling it a subjective ...
My actions alone don't necessarily imply a valuation, or at least not one that makes any sense.
There are a few different levels at which one can talk about what it means to value something, and revealed preference is not the only one that makes sense.
I'm not entirely sure what a "personal perception of the value of a human being" is, as distinct from the value or worth of a human being. Surely the latter is what the former is about?
Granted, I guess you could simply be talking about their instrumental value to yourself (e.g. "they make me happy"), but I don't think that's really the main thrust of what "caring" is.
I can (and do) believe that consciousness and subjective experience are things that exist, and are things that are important, without believing that they are in some kind of separate metaphysical category.
There is no need for morality to be grounded in emotional effects alone. After all, there is also a part of you that thinks that there is, or might be, something "horrible" about this, and that part also has input into your decision-making process.
Similarly, I'd be wary of your point about utility maximisation. You're not really a simple utility-maximising agent, so it's not like there's any simple concept that corresponds to "your utility". Also, the concept of maximising "utility generally" doesn't really make sense; there i...
It's a rather small sample size, isn't it? I don't think you can draw much of a conclusion from it.
The game AIs for popular strategy games are often bad because the developers don't actually have the time and resources to make a really good one, and it's not a high priority anyway - most people playing games like Civilization want an AI that they'll have fun defeating, not an AI that actually plays optimally.
I think you're mostly correct on this. Sometimes difficult opponents are needed, but for almost all games that can be trivially achieved by making the AI cheat rather than improving the algorithms. That said, when playing a game vs an AI you do w...
I wouldn't say that poker is "much easier than the classic deterministic games", and poker AI still lags significantly behind humans in several regards. Basically, the strongest poker bots at the moment are designed around solving for Nash equilibrium strategies (of an abstracted version of the game) in advance, but this fails in a couple of ways:
Although computers beat humans at board games without needing any kind of general intelligence at all, I don't think that invalidates game-playing as a useful domain for AGI research.
The strength of AI in games is, to a significant extent, due to the input of humans in being able to incorporate significant domain knowledge into the relatively simple algorithms that game AIs are built on.
However, it is quite easy to make game AI into a far, far more challenging problem (and, I suspect, a rather more widely applicable one)---consider the design of algorithms...
I agree; I don't see a significant difference between thinking that I ought to value other human beings equally but failing to do so, and actually viewing them equally and not acting accordingly. If I accept either (1) or (2) it's still a moral failure, and it is one that I should act to correct. In either case, what matters is the actions that I ought to take as a result (i.e. effective altruism), and I think the implications are the same in both cases.
That being said, I guess the methods that I would use to correct the problem would be different in eithe...
Yes, if I really ought to value other human beings equally then it means I ought to devote a significant amount of time and/or money to altruistic causes, but is that really such an absurd conclusion?
Perhaps I don't do those things, but that doesn't mean I can't and it doesn't mean I shouldn't.
Here's some of the literature:
Heuristic search as evidential reasoning by Hansson and Mayer
A Bayesian Approach to Relevance in Game Playing by Baum and Smith
and also work following Stuart Russell's concept of "metareasoning"
On Optimal Game-Tree Search using Rational Meta-Reasoning by Russell and Wefald
Principles of metareasoning by Russell and Wefald
and the relatively recent
Selecting Computations: Theory and Applications by Hay, Russell, Tolpin and Shimony.
On the whole, though, it's relatively limited. At a bare minimum there is plenty of room ...
Surely probability or something very much like it is conceptually the right way to deal with uncertainty, whether it's logical uncertainty or any other kind? Granted, most of the time you don't want to deal with explicit probability distributions and Bayesian updates because the computation can be expensive, but when you work with approximations you're better off if you know what it is you're approximating.
In the area of search algorithms, I think these kinds of approaches are woefully underrepresented, and I don't think it's because they aren't particula...
I think there are some rather significant assumptions underlying the idea that they are "non-relevant". At the very least, if the agents were distinguishable, I think you should indeed be willing to pay to make n higher. On the other hand, if they're indistinguishable then it's a more difficult question, but the anthropic averaging I suggested in my previous comments leads to absurd results.
What's your proposal here?