Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Your existence is informative

2 Post author: KatjaGrace 30 June 2012 02:46PM

Cross Posted from Overcoming Bias

Suppose you know that there are a certain number of planets, N. You are unsure about the truth of a statement Q. If Q is true, you put a high probability on life forming on any given arbitrary planet. If Q is false, you put a low probability on this. You have a prior probability for Q. So far you have not taken into account your observation that the planet you are on has life. How do you update on this evidence, to get a posterior probability for Q? Since your model just has a number of planets in it, with none labeled as 'this planet', you can't update directly on 'there is life on this planet', by excluding worlds where 'this planet' doesn't have life. And you can't necessarily treat 'this' as an arbitrary planet, since you wouldn't have seen it if it didn't have life.

I have an ongoing disagreement with an associate who suggests that you should take 'this planet has life' into account by conditioning on 'there exists a planet with life'. That is,

P(Q|there is life on this planet) = P(Q|there exists a planet with life).

Here I shall explain my disagreement.

Nick Bostrom argues persuasively that much science would be impossible if we treated 'I observe X' as 'someone observes X'. This is basically because in a big world of scientists making measurements, at some point somebody will make most mistaken measurements. So if all you know when you measure the temperature of a solution to be 15 degrees is that you are not in a world where nobody ever measures its temperature to be 15 degrees, this doesn't tell you much about the temperature.

You can add other apparently irrelevant observations you make at the same time - e.g. that the table is blue chipboard - in order to make your total observations less likely to arise once in a given world (at its limit, this is the suggestion of FNC). However it seems implausible that you should make different inferences from taking a measurement when you can also see a detailed but irrelevant picture at the same time than those you make with limited sensory input. Also the same problem re-emerges if the universe is supposed to be larger. Given that the universe is thought to be very, very large, this is a problem. Not to mention, it seems implausible that the size of the universe should greatly affect probabilistic judgements made about entities which are close to independent from most of the universe.

So I think Bostrom's case is good. However I'm not completely comfortable arguing from the acceptability of something that we do (science) back to the truth of the principles that justify it. So I'd like to make another case against taking 'this planet has life' as equivalent evidence to 'there exists a planet with life'.

Evidence is what excludes possibilities. Seeing the sun shining is evidence against rain, because it excludes the possible worlds where the sky is grey, which include most of those where it is raining. Seeing a picture of the sun shining is not much evidence against rain, because it excludes worlds where you don't see such a picture, which are about as likely to be rainy or sunny as those that remain are.

Receiving the evidence 'there exists a planet with life' means excluding all worlds where all planets are lifeless, and not excluding any other worlds. At first glance, this must be different from 'this planet has life'. Take any possible world where some other planet has life, and this planet has no life. 'There exists a planet with life' doesn't exclude that world, while 'this planet has life' does. Therefore they are different evidence.

At this point however, note that the planets in the model have no distinguishing characteristics. How do we even decide which planet is 'this planet' in another possible world? There needs to be some kind of mapping between planets in each world, saying which planet in world A corresponds to which planet in world B, etc. As far as I can tell, any mapping will do, as long as a given planet in one possible world maps to at most one planet in another possible world. This mapping is basically a definition choice.

So suppose we use a mapping where in every possible world where at least one planet has life, 'this planet' corresponds to one of the planets that has life. See the below image.

Squares are possible worlds, each with two planets. Pink planets have life, blue do not. Define 'this planet' as the circled one in each case. Learning that there is life on this planet is equal to learning that there is life on some planet.

Now learning that there exists a planet with life is the same as learning that this planet has life. Both exclude the far righthand possible world, and none of the other possible worlds. What's more, since we can change the probability distribution we end up with, just by redefining which planets are 'the same planet' across worlds, indexical evidence such as 'this planet has life' must be horseshit.

Actually the last paragraph was false. If in every possible world which contains life, you pick one of the planets with life to be 'this planet', you can no longer know whether you are on 'this planet'. From your observations alone, you could be on the other planet, which only has life when both planets do. The one that is not circled in each of the above worlds. Whichever planet you are on, you know that there exists a planet with life. But because there's some probability of you being on the planet which only rarely has life, you have more information than that. Redefining which planet was which didn't change that.

Perhaps a different definition of 'this planet' would get what my associate wants? The problem with the last was that it no longer necessarily included the planet we are on. So what about we define 'this planet' to be the one you are on, plus a life-containing planet in all of the other possible worlds that contain at least one life-containing planet. A strange, half-indexical definition, but why not? One thing remains to be specified - which is 'this' planet when you don't exist? Let's say it is chosen randomly.

Now is learning that 'this planet' has life any different from learning that some planet has life? Yes. Now again there are cases where some planet has life, but it's not the one you are on. This is because the definition only picks out planets with life across other possible worlds, not this one. In this one, 'this planet' refers to the one you are on. If you don't exist, this planet may not have life. Even if there are other planets that do. So again, 'this planet has life' gives more information than 'there exists a planet with life'.

You either have to accept that someone else might exist when you do not, or you have to define 'yourself' as something that always exists, in which case you no longer know whether you are 'yourself'. Either way, changing definitions doesn't change the evidence. Observing that you are alive tells you more than learning that 'someone is alive'.

Comments (41)

Comment author: dspeyer 30 June 2012 04:04:27PM 4 points [-]

Isn't "I observe X" equivalent to "someone chosen for reasons unrelated to this observation observed X"? That solves the "at some point somebody will make most mistaken measurements" problem because the likelihood of randomly choosing the scientist making that mistake is small.

You can't use this logic for observations of the form "I'm alive" because if you weren't alive you wouldn't be observing. What you can use that as evidence of is a hard problem. But it isn't a general problem.

Comment author: KatjaGrace 30 June 2012 06:15:32PM *  1 point [-]

Perhaps, but then there is the question of how you should pretend they were chosen. This is controversial.

If you weren't alive you wouldn't be observing "I'm alive". If X wasn't true you wouldn't be observing X. Could you be more clear on how you think the logic differs?

Comment author: dspeyer 30 June 2012 08:09:12PM 0 points [-]

Slight double-meaning in the word observing:

When I said "if you weren't alive you wouldn't be observing" I meant you wouldn't be seeing whether you were alive or not.

When you said "If X wasn't true you wouldn't be observing X" you meant you wouldn't be seeing that X is true.

I'm finding my second paragraph surprisingly hard to reword.

Comment author: KatjaGrace 01 July 2012 11:21:00PM 0 points [-]

If your existence depends on X, there are two possibilities: you observe X, you observe nothing

If your existence doesn't depend on X but you have some other way of observing whether X is true, the possibilities are: you observe X, you observe not X.

Do you think that observing X provides different information about something else in these two cases?

Comment author: Yvain 01 July 2012 02:33:59AM *  3 points [-]

What is the advantage of talking about "this planet" versus standard anthropic SIA as you have used so many times on your blog and elsewhere?

I mean, I can see the disadvantages, those being that it's really hard to get a good definition of "this planet" that remains constant across universes, especially between universes with different numbers of planets, universes in which you don't exist, universes in which multiple yous exist, etc.

But with SIA, you can just rephrase it as "I was born on a certain planet, presumably selected randomly among planets in the multiverse that have life, and I call it 'this planet' because I was born there."

("This is a fertile planet, and we will thrive. We will rule over this planet, and we will call it...This Planet.")

Now on your diagram, "a planet has life" gives you a 33% chance of being in frames A, B, or C, and "This planet has life" under the previous equivalence with SIA means you choose a randomly selected pink planet and get 50% chance of being in frame A, 25% chance in frame B, and 25% chance in frame C, which justifies your statement that there should be a difference.

This also solves the scientist's problem just as dspeyer mentions.

Comment author: KatjaGrace 01 July 2012 11:49:50PM 1 point [-]

I don't follow why your rephrasing is SIA-specific.

Here I'm not arguing for SIA in particular, just against the position that you should only update when your observations completely exclude a world (i.e. 'non-indexical' updating, as in Radford Neal's 'full non-indexical conditioning' for instance). If we just talk about the evidence of existence, before you know anything else about yourself (if that's possible) SSA also probably says you shouldn't update, though it does say you should update on other such evidence in the way I'm arguing, so doesn't have the same problems as this non-indexical position.

I'm addressing this instead of the usual question because I want to settle the debate.

Comment deleted 02 July 2012 08:43:17PM *  [-]
Comment author: KatjaGrace 03 July 2012 02:58:41PM 1 point [-]

Sorry to confuse you. I did respond to the specific and coherent feedback, I just changed it on OB as well, so you can't tell.

What's the 'error they 'share'?

Comment author: private_messaging 01 July 2012 10:15:11AM *  1 point [-]

The large world issues seem kind of confused.

Suppose an ideal agent is using Solomonoff induction to predict it's inputs. The models which have the agent located very far away, at positions with enormously huge spatial distance, have to encode this distance into the model somehow, to be able to predict input that you are getting. That makes them very huge (all of them) and they all combined have incredibly tiny contribution to algorithmic probability.

If you are to do confused Solomonoff induction whereby you seek 'explanation' rather than a proper model - seek anything that contains the agent somewhere inside of it - then the whole notion just breaks down and you do not get anything useful out, you just get iterator over all possible (or if you skip the low level fundamental problem, you run into some form of big-universe issue where you hit 'why bother if there's a copy of me somewhere far away' and 'what is the meaning of measurement if there's some version of me measuring something wrong', but ultimately if you started from scratch you wouldn't even get to that point as you'd never be able to form any even remotely useful world model).

Comment author: KatjaGrace 01 July 2012 11:55:31PM 2 points [-]

I don't know what you mean by 'large world issues'.

Why is the agent's distance from you relevant to predicting its inputs? Why does a large distance imply huge complexity?

Comment author: paulfchristiano 02 July 2012 12:59:30AM *  1 point [-]

A model for your observations consists (informally) of a model for the universe and then coordinates within the universe which pinpoint your observations, at least in the semantics of Solomonoff induction. So in an infinite universe, most observations must be very complicated, since the coordinates must already be quite complicated. Solomonoff induction naturally defines a roughly-uniform measure over observers in each possible universe, which very slightly discounts observers as they get farther away from distinguished landmarks. The slight discounting makes large universes unproblematic.

I wrote about these things at some point, here, though that was when I was just getting into these things and it now looks silly even to current me. But that's still the only framework I know for reasoning about big universes, splitting brains, and the born probabilities.

Comment author: Vladimir_Nesov 03 July 2012 08:17:36AM 2 points [-]

But that's still the only framework I know for reasoning about big universes, splitting brains, and the born probabilities.

I get by with none...

Comment author: Tyrrell_McAllister 03 July 2012 05:36:30PM 0 points [-]

Are you sure?

Comment author: Vladimir_Nesov 03 July 2012 09:57:26PM *  0 points [-]

Consequentialist decision making on "small" mathematical structures seems relatively less perplexing (and far from entirely clear), but I'm very much confused about what happens when there are too "many" instances of decision's structure or in the presence of observations, and I can't point to any specific "framework" that explains what's going on (apart from the general hunch that understanding math better clarifies these things, and it does so far).

Comment author: Tyrrell_McAllister 03 July 2012 10:06:06PM 1 point [-]

If X has a significant probability of existing, but you don't know at all how to reason about X, how confident can you be that your inability to reason about X isn't doing tremendous harm? (In this case, X = big universes, splitting brains, etc.)

Comment author: Manfred 05 July 2012 03:20:55AM 1 point [-]

This is the problem of the mathematician with N children, one of them a girl.

And the question at hand is "if you're one of the children and you're a girl, does that result in a different problem?"

Comment author: AlexSchell 03 July 2012 02:11:49AM 1 point [-]

At least on a semi-superficial glance, you seem to be switching between using "I" / "this planet" as rigid designators in some places and as indexicals/demonstratives/non-rigid designators (i.e. "whatever this thing here is") in other places. This may be at least part of what made this post seem unconvincing -- e.g. there is nothing weird about being uncertain about "you == you" if by that you mean "whatever this thing here is == Katja Grace".

Comment author: endoself 01 July 2012 02:20:00AM 1 point [-]

Are you familiar with Stuart Armstrong's work on anthropics?

Comment author: KatjaGrace 01 July 2012 11:42:28PM 1 point [-]

Yes. Anything in particular there you think is relevant?

Comment author: endoself 04 July 2012 04:24:55AM *  1 point [-]

Well first, Stuart discusses these problems in terms of decision theory rather than probability. I think this is a better way of approaching this, as it avoids pointless debates over, eg., the probability that Sleeping Beauty's coin landed heads when all participants agree as to how she should act, as well as more complicated dilemmas where representing knowledge using probabilities just confuses people.

That said, your ideas could easily be rephrased as decision theoretic rather than epistemic. The framework in Stuart's paper would suggest imagining what strategy a hypothetical agent with your goals would plan 'in advance' and implementing that. I guess it might not be obvious that this gives the correct solution, but the reasons that I think it does come from UDT, which I cannot explain in the space of this comment. There's a lot available about it on the LW wiki, though alternatively you might find it obvious that the framing in terms of a hypothetical agent is equivalent. (Stuart's proposed ADT may or may not be equivalent to UDT; it is unclear whether he intends for precommitments to be able to deal with something like a variant of Parfit's hitchhiker where the driver decides what to do before the hitchhiker comes into existence, but it seems that they wouldn't. The differences are minor enough anyways.)

You propose an alternative anthropic framework, which indicates that you either disagree that the hypothetical agent framing is equivalent or you disagree that Stuart's suggestion is the correct way for such an agent to act in such a scenario.

Comment author: Manfred 05 July 2012 03:00:41AM 0 points [-]

Obligatory warning of deep flaws.

Comment author: endoself 06 July 2012 02:21:32AM *  0 points [-]

You provide very little information. I'm not even sure what you disagree with exactly. If it would be inconvenient for you to explain your disagreement that's fine, but I'm didn't update much on your comment. If you want to give me a bit more information about your state of mind, you can tell me how familiar you are with UDT and whether you think it is on the right path towards solving anthropics.

Comment author: Manfred 06 July 2012 05:33:58AM *  1 point [-]

Yeah, sorry about that. The basic idea is that by providing multiple answers, the proposal has immediately given up on getting the same answer as an expected utility maximizer. This is a perfectly fine thing to do if probabilities are impossible to assign, and so maximizing expected utility breaks down. But probability does not in fact break down when confronted with anthropic situations, so picking from N answers just gives you at least an (N-1)/N chance of being wrong.

Comment author: endoself 11 July 2012 12:28:29AM 1 point [-]

Expected utility does break down in the presence of indexical uncertainty though; if there are multiple agents with exactly your observations, it is important to take into account that your decision is the one they will all make. Psy-Kosh's non-anthropic problem deals with this sort of thing, though it also points out that such correlation between agents can exist even without indexical uncertainty, which is irrelevant here.

I'm not sure what the N answers that you are talking about are. The different solutions in Stuart's paper refer to agents with different utility functions. Changing the utility function usually does change the optimal course of action.

Comment author: Manfred 11 July 2012 03:24:27AM *  1 point [-]

Psy-Kosh's non-anthropic problem is just regular uncertainty. The experimenters flipped a coin, and you don't know if the coin is heads or tails. The collective decision making then runs you into trouble. I can't think of any cases with indexical uncertainty but no collective decision making that run into similar trouble - in the Sleeping Beauty problem at least the long-term frequency of events is exactly the same as the thing you plug into the utility function to maximize average reward, unlike in the non-anthropic problem. Do you have an example you could give?

EDIT: Oh, I realized one myself - the absent-minded driver problem. In that problem, if you assign utility to the driver at the first intersection - rather than just the driver who makes it to an exit - you end up double-counting and getting the wrong answer. In a way it's collective decision-making with yourself - you're trying to take into account how past-you affected present-you, and how present-you will affect future-you, but the simple-seeming way is wrong. In fact, we could rejigger the problem so it's a two-person, non anthropic problem! Then if we do the reverse transform on Psy-kosh's problem, maybe we could see something interesting... Update forthcoming, but the basic idea seems to be that the problem is when you're cooperating with someone else, even yourself, and are unsure who's filling what role. So you're pretty much right.

The objects in Stuart's paper are decision procedures, but do not involve utility directly (though it is a theorem that you can find a utility function that gives anything). Utility functions have to use a probability before you get a decision out, but these decision procedures don't. Moreover, he uses terms like "average utilitarian" to refer to the operations of the decision procedure (averages individual utilities together), rather than the properties of the hypothetical corresponding utility function.

What's happening is that he's taking an individual utility function and a decision procedure, and saying that together these specify what happens. And I'm saying that this is an over-specified problem.

Comment author: endoself 12 July 2012 03:32:08PM 0 points [-]

Then if we do the reverse transform on Psy-kosh's problem, maybe we could see something interesting...

That's what I was originally trying to suggest, but it seems I was unclear. The absent-minded driver is a simpler example anyways, and does deal with exactly the kind of breakdown of expected utility I was referring to.

What's happening is that he's taking an individual utility function and a decision procedure, and saying that together these specify what happens. And I'm saying that this is an over-specified problem.

Each decision procedure is derived from a utility function. From the paper:

Anthropic Decision Theory (ADT) An agent should first find all the decisions linked with their own. Then they should maximise expected utility, acting as if they simultaneously controlled the outcomes of all linked decisions, and using the objective (non-anthropic) probabilities of the various worlds.

This fully specifies a decision procedure given a utility function. There is no second constraint taking the utility function into account again, so it is not overspecified.

Comment author: Manfred 12 July 2012 04:09:37PM 1 point [-]

Simpler? Hm. Well, I'm still thinking about that one.

Anyhow, by over-specified I mean that ADT and conventional expected-utility maximization (which I implicitly assumed to come with the utility function) can give different answers. For example, in a non-cooperative problem like copying someone either once or 10^9 times, and then giving the copy a candybar if it can correctly guess how many of them there are. The utility function already gives an answer, and no desiderata are given that show why that's wrong - in fact, it's one of the multiple possible answers laid out.

Comment author: endoself 12 July 2012 09:53:00PM *  2 points [-]

Simpler in that you don't need to transform it before it is useful here.

Standard expected utility maximization requires a probability distribution, but the problem is that in anthropic scenarios it is not obvious what the correct distribution is and how to correctly update it. ADT uses the prior distribution before 'observing one's own existence', so it circumvents the need to preform anthropic updates.

I'm not sure which solution to your candybar problem you think is correct because I am not sure which probability distribution you think is correct, but all the solutions in the paper that disagree with yours actually are what you would want to precommit to given the associated utility function and are therefore correct.

Comment author: Manfred 13 July 2012 05:33:20AM *  1 point [-]

Standard expected utility maximization requires a probability distribution, but the problem is that is anthropic scenarios it is not obvious what the correct distribution is and how to correctly update it.

If it was solved in a way that made it obvious for, say, the Sleeping Beauty problem, would that then be the right way to do it?

all the solutions in the paper that disagree with yours actually are what you would want to precommit to given the associated utility function and are therefore correct.

I think you're just making up utility functions here - is a real utility function (that is, a function of the state of the world) ever calculated in the paper, other than the use of the individual utility function? And we're talking about regular ol' utility functions, why are ADT's decisions necessarily invariant under changing time-like uncertainty (normal sleeping beauty problem) to space-like uncertainty (sleeping beauty problem with duplicates)?

Comment author: Viliam_Bur 30 June 2012 07:38:46PM *  1 point [-]

I'm not sure if I get the idea, so let me ask this:

Suppose there are only two inabitable planets in the whole multiverse -- a planet A with 1000 people, and a planet B with 1000000 people. I live in a primitive society, so I don't have a clue how many people live on my planet. All I know is the information in this paragraph.

Based on the information that "I exist", should I suppose that I live on a planet A with probability 0.001 and on a planet B with probability 0.999? Or should it be 0.5 and 0.5, because either way, there can be only one me on each planet?

To me it seems that the 0.001 and 0.999 is the correct answer.

Another example: in the whole multiverse there are only four planets. One of them has two inhabitable continents with 1000 people each, two of them have one inhabitable continent with 1000 people and one empty continent, the last one has two empty continents. Seems to me there is 0.5 probability I am one of the 2000 inhabitants of the first planet, and 0.5 probability I am one of the 1000 + 1000 inhabitants of the second and the third planet.

Comment author: KatjaGrace 01 July 2012 11:30:54PM 0 points [-]

You do get the idea. Assuming that before taking your existence into account you put .5 probability on each type of planet, then the two options you give are the standard SIA and SSA answers respectively. The former involves treating your existence as more evidence than just that someone exists, as I was suggesting in this post.

I think everyone agrees that in the multiverse case (or any case where everyone exists in the same world) you should reason as you do above. The question is whether to treat cases where the people are in different possible worlds analogously with those where the people are just on different planets or in different rooms for instance.

Comment author: dspeyer 30 June 2012 08:12:55PM 0 points [-]

In your examples, you're using your existence to answer questions about yourself, not about the planets. This is a special case, and not IMHO a very interesting one.

Comment author: KatjaGrace 01 July 2012 11:35:25PM 0 points [-]

Answering questions about whether there are more or fewer people like you is equivalent to answering which planets exist or what characteristics they have, if those things coincide to some degree. If they don't, you won't get much out of anthropic reasoning anyway.

Comment author: dspeyer 03 July 2012 01:12:33AM 0 points [-]

Re-read his examples. He already knows how many planets they are and how many people are on each of them. He's only trying to figure out which one he's on.

Comment author: MaoShan 16 July 2012 03:57:30AM 0 points [-]

I am assuming that since this actual planet has life, testing the truth of Q is possible on this planet. Put more work into finding out whether Q is true, then you wouldn't need to argue about other planets that we lack actual data for, since Q would apply on any of the applicable planets.

Comment author: JulianMorrison 03 July 2012 09:20:21PM -1 points [-]

Nick Bostrom argues persuasively that much science would be impossible if we treated 'I observe X' as 'someone observes X'. This is basically because in a big world of scientists making measurements, at some point somebody will make most mistaken measurements.

The obvious flaw in this idea is that it's doing half a boolean update - it's ignoring the prior. And scientists spend effort setting themselves up in probabilistic states where their prior is that when they measure a temperature of 15 degrees, it's because the temperature is 15 degrees. Stuff like calibrating the instruments and repeating the measurements are, whether or not they are seen as such, plainly intended to create a chain of AND-ed probability where inaccuracy becomes vanishingly unlikely.

Comment author: shminux 01 July 2012 01:10:51AM -1 points [-]

I have an ongoing disagreement with an associate who suggests that you should take 'this planet has life' into account by conditioning on 'there exists a planet with life'.

This seems like SSA vs SIA, so maybe you should first agree with your associate on which assumption each one of you is using.

Comment author: pragmatist 01 July 2012 01:40:19AM 2 points [-]

This isn't right. Both SSA and SIA are ways to take indexical information into account. Katja's associate seems to be denying that indexical information makes a difference. So he or she would presumably reject the scientific relevance of both SSA and SIA.

Comment author: KatjaGrace 01 July 2012 11:39:43PM 1 point [-]

Yes. SSA is complicated though - it effectively doesn't take your existence as a thing in the reference class as evidence, but then it does take any further information you get about yourself into account.

Yes, my associate rejects the scientific relevance of any anthropic principles.