Comment author: CronoDAS 07 February 2014 08:27:11PM 2 points [-]

So, an L-zombie is a person that could exist, but doesn't?

Comment author: Benja 07 February 2014 09:06:14PM 1 point [-]

(Agree with Coscott's comment.)

L-zombies! (L-zombies?)

21 Benja 07 February 2014 06:30PM

Reply to: Benja2010's Self-modification is the correct justification for updateless decision theory; Wei Dai's Late great filter is not bad news

"P-zombie" is short for "philosophical zombie", but here I'm going to re-interpret it as standing for "physical philosophical zombie", and contrast it to what I call an "l-zombie", for "logical philosophical zombie".

A p-zombie is an ordinary human body with an ordinary human brain that does all the usual things that human brains do, such as the things that cause us to move our mouths and say "I think, therefore I am", but that isn't conscious. (The usual consensus on LW is that p-zombies can't exist, but some philosophers disagree.) The notion of p-zombie accepts that human behavior is produced by physical, computable processes, but imagines that these physical processes don't produce conscious experience without some additional epiphenomenal factor.

An l-zombie is a human being that could have existed, but doesn't: a Turing machine which, if anybody ever ran it, would compute that human's thought processes (and its interactions with a simulated environment); that would, if anybody ever ran it, compute the human saying "I think, therefore I am"; but that never gets run, and therefore isn't conscious. (If it's conscious anyway, it's not an l-zombie by this definition.) The notion of l-zombie accepts that human behavior is produced by computable processes, but supposes that these computational processes don't produce conscious experience without being physically instantiated.

Actually, there probably aren't any l-zombies: The way the evidence is pointing, it seems like we probably live in a spatially infinite universe where every physically possible human brain is instantiated somewhere, although some are instantiated less frequently than others; and if that's not true, there are the "bubble universes" arising from cosmological inflation, the branches of many-worlds quantum mechanics, and Tegmark's "level IV" multiverse of all mathematical structures, all suggesting again that all possible human brains are in fact instantiated. But (a) I don't think that even with all that evidence, we can be overwhelmingly certain that all brains are instantiated; and, more importantly actually, (b) I think that thinking about l-zombies can yield some useful insights into how to think about worlds where all humans exist, but some of them have more measure ("magical reality fluid") than others.

So I ask: Suppose that we do indeed live in a world with l-zombies, where only some of all mathematically possible humans exist physically, and only those that do have conscious experiences. How should someone living in such a world reason about their experiences, and how should they make decisions — keeping in mind that if they were an l-zombie, they would still say "I have conscious experiences, so clearly I can't be an l-zombie"?

If we can't update on our experiences to conclude that someone having these experiences must exist in the physical world, then we must of course conclude that we are almost certainly l-zombies: After all, if the physical universe isn't combinatorially large, the vast majority of mathematically possible conscious human experiences are not instantiated. You might argue that the universe you live in seems to run on relatively simple physical rules, so it should have high prior probability; but we haven't really figured out the exact rules of our universe, and although what we understand seems compatible with the hypothesis that there are simple underlying rules, that's not really proof that there are such underlying rules, if "the real universe has simple rules, but we are l-zombies living in some random simulation with a hodgepodge of rules (that isn't actually ran)" has the same prior probability; and worse, if you don't have all we do know about these rules loaded into your brain right now, you can't really verify that they make sense, since there is some mathematically possible simulation whose initial state has you remember seeing evidence that such simple rules exist, even if they don't; and much worse still, even if there are such simple rules, what evidence do you have that if these rules were actually executed, they would produce you? Only the fact that you, like, exist, but we're asking what happens if we don't let you update on that.

I find myself quite unwilling to accept this conclusion that I shouldn't update, in the world we're talking about. I mean, I actually have conscious experiences. I, like, feel them and stuff! Yes, true, my slightly altered alter ego would reason the same way, and it would be wrong; but I'm right...

...and that actually seems to offer a way out of the conundrum: Suppose that I decide to update on my experience. Then so will my alter ego, the l-zombie. This leads to a lot of l-zombies concluding "I think, therefore I am", and being wrong, and a lot of actual people concluding "I think, therefore I am", and being right. All the thoughts that are actually consciously experienced are, in fact, correct. This doesn't seem like such a terrible outcome. Therefore, I'm willing to provisionally endorse the reasoning "I think, therefore I am", and to endorse updating on the fact that I have conscious experiences to draw inferences about physical reality — taking into account the simulation argument, of course, and conditioning on living in a small universe, which is all I'm discussing in this post.

NB. There's still something quite uncomfortable about the idea that all of my behavior, including the fact that I say "I think therefore I am", is explained by the mathematical process, but actually being conscious requires some extra magical reality fluid. So I still feel confused, and using the word l-zombie in analogy to p-zombie is a way of highlighting that. But this line of reasoning still feels like progress. FWIW.

But if that's how we justify believing that we physically exist, that has some implications for how we should decide what to do. The argument is that nothing very bad happens if the l-zombies wrongly conclude that they actually exist. Mostly, that also seems to be true if they act on that belief: mostly, what l-zombies do doesn't seem to influence what happens in the real world, so if only things that actually happen are morally important, it doesn't seem to matter what the l-zombies decide to do. But there are exceptions.

Consider the counterfactual mugging: Accurate and trustworthy Omega appears to you and explains that it just has thrown a very biased coin that had only a 1/1000 chance of landing heads. As it turns out, this coin has in fact landed heads, and now Omega is offering you a choice: It can either (A) create a Friendly AI or (B) destroy humanity. Which would you like? There is a catch, though: Before it threw the coin, Omega made a prediction about what you would do if the coin fell heads (and it was able to make a confident prediction about what you would choose). If the coin had fallen tails, it would have created an FAI if it has predicted that you'd choose (B), and it would have destroyed humanity if it has predicted that you would choose (A). (If it hadn't been able to make a confident prediction about what you would choose, it would just have destroyed humanity outright.)

There is a clear argument that, if you expect to find yourself in a situation like this in the future, you would want to self-modify into somebody who would choose (B), since this gives humanity a much larger chance of survival. Thus, a decision theory stable under self-modification would answer (B). But if you update on the fact that you consciously experience Omega telling you that the coin landed heads, (A) would seem to be the better choice!

One way of looking at this is that if the coin falls tails, the l-zombie that is told the coin landed heads still exists mathematically, and this l-zombie now has the power to influence what happens in the real world. If the argument for updating was that nothing bad happens even though the l-zombies get it wrong, well, that argument breaks here. The mathematical process that is your mind doesn't have any evidence about whether the coin landed heads or tails, because as a mathematical object it exists in both possible worlds, and it has to make a decision in both worlds, and that decision affects humanity's future in both worlds.

Back in 2010, I wrote a post arguing that yes, you would want to self-modify into something that would choose (B), but that that was the only reason why you'd want to choose (B). Here's a variation on the above scenario that illustrates the point I was trying to make back then: Suppose that Omega tells you that it actually threw its coin a million years ago, and if it had fallen tails, it would have turned Alpha Centauri purple. Now throughout your history, the argument goes, you would never have had any motive to self-modify into something that chooses (B) in this particular scenario, because you've always known that Alpha Centauri isn't, in fact, purple.

But this argument assumes that you know you're not a l-zombie; if the coin had in fact fallen tails, you wouldn't exist as a conscious being, but you'd still exist as a mathematical decision-making process, and that process would be able to influence the real world, so you-the-decision-process can't reason that "I think, therefore I am, therefore the coin must have fallen heads, therefore I should choose (A)." Partly because of this, I now accept choosing (B) as the (most likely to be) correct choice even in that case. (The rest of my change in opinion has to do with all ways of making my earlier intuition formal getting into trouble in decision problems where you can influence whether you're brought into existence, but that's a topic for another post.)

However, should you feel cheerful while you're announcing your choice of (B), since with high (prior) probability, you've just saved humanity? That would lead to an actual conscious being feeling cheerful if the coin has landed heads and humanity is going to be destroyed, and an l-zombie computing, but not actually experiencing, cheerfulness if the coin has landed heads and humanity is going to be saved. Nothing good comes out of feeling cheerful, not even alignment of a conscious' being's map with the physical territory. So I think the correct thing is to choose (B), and to be deeply sad about it.

You may be asking why I should care what the right probabilities to assign or the right feelings to have are, since these don't seem to play any role in making decisions; sometimes you make your decisions as if updating on your conscious experience, but sometimes you don't, and you always get the right answer if you don't update in the first place. Indeed, I expect that the "correct" design for an AI is to fundamentally use (more precisely: approximate) updateless decision theory (though I also expect that probabilities updated on the AI's sensory input will be useful for many intermediate computations), and "I compute, therefore I am"-style reasoning will play no fundamental role in the AI. And I think the same is true for humans' decisions — the correct way to act is given by updateless reasoning. But as a human, I find myself unsatisfied by not being able to have a picture of what the physical world probably looks like. I may not need one to figure out how I should act; I still want one, not for instrumental reasons, but because I want one. In a small universe where most mathematically possible humans are l-zombies, the argument in this post seems to give me a justification to say "I think, therefore I am, therefore probably I either live in a simulation or what I've learned about the laws of physics describes how the real world works (even though there are many l-zombies who are thinking similar thoughts but are wrong about them)."

And because of this, even though I disagree with my 2010 post, I also still disagree with Wei Dai's 2010 post arguing that a late Great Filter is good news, which my own 2010 post was trying to argue against. Wei argued that if Omega gave you a choice between (A) destroying the world now and (B) having Omega destroy the world a million years ago (so that you are never instantiated as a conscious being, though your choice as an l-zombie still influences the real world), then you would choose (A), to give humanity at least the time it's had so far. Wei concluded that this means that if you learned that the Great Filter is in our future, rather than our past, that must be good news, since if you could choose where to place the filter, you should place it in the future. I now agree with Wei that (A) is the right choice, but I don't think that you should be happy about it. And similarly, I don't think you should be happy about news that tells you that the Great Filter is later than you might have expected.

Comment author: TruePath 30 January 2014 04:44:46AM 0 points [-]

I meant useful in the context of AI since any such sequence would obviously have to be non-computable and thus not something the AI (or person) could make pragmatic use of.

Also, it is far from clear that T0 is the union of all theories (and this is the problem in the proof in the other rightup). It may well be that there is a sequence of theories like this all true in the standard model of arithmetic but that their construction requires that Tn add extra statements beyond the schema for the proof predicate in T_{n+1}

Also, the claim that Tn must be stronger than T{n+1} (prove a superset of it...to be computable we can't take all these theories to be complete) is far from obvious if you don't require that Tn be true in the standard model. If Tn is true in the standard model than, as it proves that Pf(Tn+1, \phi) -> \phi this is true so if T{n+1} |- \phi then (as this witnessed in a finite proof) there is a proof that this holds from T_n and thus a proof of \phi. However, without this assumption I don't even see how to prove the containment claim.

Comment author: Benja 30 January 2014 12:13:12PM *  0 points [-]

I meant useful in the context of AI since any such sequence would obviously have to be non-computable and thus not something the AI (or person) could make pragmatic use of.

I was replying to this:

Ultimately, you can always collapse any computable sequence of computable theories (necessary for the AI to even manipulate) into a single computable theory so there was never any hope this kind of sequence could be useful.

I.e., I was talking about computable sequences of computable theories, not about non-computable ones.

Also, it is far from clear that T_0 is the union of all theories (and this is the problem in the proof in the other rightup). It may well be that there is a sequence of theories like this all true in the standard model of arithmetic but that their construction requires that T_n add extra statements beyond the schema for the proof predicate in T_{n+1}

I can't make sense of this. Of course T_n can contain statements other than those in T_{n+1} and the Löb schema of T_{n+1}, but this is no problem for the proof that T_0 is the union of all the theories; the point is that because of the Löb schema, we have T_{n+1} \subset T_n for all n, and therefore (by transitivity of the subset operation) T_n \subseteq T_0 for all n.

Also, the claim that T_n must be stronger than T_{n+1} (prove a superset of it...to be computable we can't take all these theories to be complete) is far from obvious if you don't require that T_n be true in the standard model. If T_n is true in the standard model than, as it proves that Pf(T_n+1, \phi) -> \phi this is true so if T_{n+1} |- \phi then (as this witnessed in a finite proof) there is a proof that this holds from T_n and thus a proof of \phi. However, without this assumption I don't even see how to prove the containment claim.

Note again that I was talking about computable sequences T_n. If T_{n+1} |- \phi and T_{n+1} is computable, then PA |- Pf(T_{n+1}, \phi) and therefore T_n |- Pf(T_{n+1}, \phi) if T_n extends PA. This doesn't require either T_n or T_{n+1} to be sound.

Comment author: TruePath 29 January 2014 03:56:05PM 2 points [-]

Actually, the `proof' you gave that no true list of theories like this exists made the assumption (not listed in this paper) that the sequence of indexes for the computable theories is definable over arithmetic. In general there is no reason this must be true but of course for the purposes of an AI it must.

Ultimately, you can always collapse any computable sequence of computable theories (necessary for the AI to even manipulate) into a single computable theory so there was never any hope this kind of sequence could be useful.

Comment author: Benja 29 January 2014 04:58:42PM *  0 points [-]

Actually, the `proof' you gave that no true list of theories like this exists made the assumption (not listed in this paper) that the sequence of indexes for the computable theories is definable over arithmetic. In general there is no reason this must be true but of course for the purposes of an AI it must.

("This paper" being Eliezer's writeup of the procrastination paradox.) That's true, thanks.

Ultimately, you can always collapse any computable sequence of computable theories (necessary for the AI to even manipulate) into a single computable theory so there was never any hope this kind of sequence could be useful.

First of all (always assuming the theories are at least as strong as PA), note that in any such sequence, T_0 is the union of all the theories in the sequence; if T_(n+1) |- phi, then PA |- Box_(T_(n+1)) "phi", so T_n |- Box_(T_(n+1)) "phi", so by the trust schema, T_n |- phi; going up the chain like this, T_0 |- phi. So T_0 is in fact the "collapse" of the sequence into a single theory.

That said, I disagree that there is no hope that this kind of sequence could be useful. (I don't literally want to use an unsound theory, but see my writeup about an infinite sequence of sound theories each proving the next consistent, linked from the main post; the same remarks apply there.) Yes, T_0 is stronger than T_1, so why would you ever want to use T_1? Well, T_0 + Con(T_0) is stronger than T_0, so why would you ever want to use T_0? But by this argument, you can't use any sound theory including PA, so this doesn't seem like a remotely reasonable argument against using T_1. Moreover, the fact that an agent using T_0 can construct an agent using T_1, but it can't construct an agent using T_0, seems like a sufficient argument against the claim that the sequence as a whole must be useless because you could always use T_0 for everything.

Comment author: ThisSpaceAvailable 27 January 2014 04:58:56AM 0 points [-]

Invariance of the players' utility functions by the same affine transformation, or by independent transformations?

Comment author: Benja 27 January 2014 11:44:19AM 0 points [-]

Independent.

Comment author: Benja 24 January 2014 04:05:36PM *  4 points [-]

I'm hard-pressed to this of any more I could want from [the coco-value] (aside from easy extensions to bigger classes of games).

Invariance to affine transformations of players' utility functions. This solution requires that both players value outcomes in a common currency, plus the physical ability to transfer utility in this currency outside the game (unless there are two outcomes o_1 and o_2 of the game such that A(o_1) + B(o_1) = A(o_2) + B(o_2) = max_o A(o) + B(o), and such that A(o_1) >= A's coco-value >= A(o_2), in which case the players can decide to play the convex combination of these two outcomes that gives each player their coco-value, but this only solves the utility transfer problem, it doesn't make the solution invariant under affine transformations).

Comment author: Luke_A_Somers 24 January 2014 02:30:22PM 0 points [-]

If you can trade cash conditionally on behavior, it's trivial to fix the prisoners' dilemma. Each of you offers to pay the other a fee to cooperate, then you cooperate and you 'both pay' which balances out. PD is only hard when you can't do that.

Comment author: Benja 24 January 2014 03:38:11PM *  0 points [-]

...so? What you say is true but seems entirely irrelevant to the question what the superrational outcome in an asymmetric game should be.

Comment author: shminux 16 January 2014 05:30:04PM 0 points [-]

but you get the idea of what a probability over logical statements should mean

Not from your example, I do not. I suspect that if you remove this local Omega meme, you are saying that there are many different possible worlds in your inner simulator and in p*100% of them the conjecture ends up being proven... some day before that world ends. Unless you are a Platonist and assign mathematical "truths" independent immaterial existence.

Comment author: Benja 16 January 2014 11:23:51PM 0 points [-]

Retracted my comment for being unhelpful (I don't recognize what I said in what you heard, so I'm clearly not managing to explain myself here).

Comment author: shminux 15 January 2014 11:06:27PM -1 points [-]

if you're not sure whether the twin prime conjecture is true, then each time you discover a new twin prime larger than all that you have seen before, you should ever so slightly increase the probability you assign to the conjecture.

I do not understand what you mean by "probability" here. Suppose I use one criterion to estimate that the twin-prime conjecture is true with probability 0.99, but a different criterion gives me 0.9999. In what situation would my choice of the criterion matter?

Are we talking about some measure over many (equally?) possible worlds in some of which the TPC is true and in others false (or maybe unprovable)? What would I do differently if I am convinced that one criterion is "right" and the other is "wrong" vs the other way around? Would I spend more time trying to prove the conjecture if I thought it is more likely true, or something?

Comment author: Benja 16 January 2014 08:20:51AM *  0 points [-]

Agree with Nisan's intuition, though I also agree with Wei Dai's position that we shouldn't feel sure that Bayesian probability is the right way to handle logical uncertainty. To more directly answer the question what it means to assign a probability to the twin prime conjecture: If Omega reveals to you that you live in a simulation, and it offers you a choice between (a) Omega throws a bent coin which has probability p of landing heads, and shuts down the simulation if it lands tails, otherwise keeps running it forever; and (b) Omega changes the code of the simulation to search for twin primes and run for one more step whenever it finds one; then you should be indifferent between (a) and (b) iff you assign probability p to the twin prime conjecture. [ETA: Argh, ok, sorry, not quite, because in (b) you may get to run for a long time still before getting shut down -- but you get the idea of what a probability over logical statements should mean.]

Results from MIRI's December workshop

45 Benja 15 January 2014 10:29PM

Last week (Dec. 14-20), MIRI ran its 6th research workshop on logic, probability, and reflection. Writing up mathematical results takes time, and in the past, it's taken quite a while for results from these workshops to become available even in draft form. Because of this, at the December workshop, we tried something new: taking time during and in the days immediately after the workshop to write up results in quick and somewhat dirty form, while they still feel fresh and exciting.

In total, there are seven short writeups. Here's a list, with short descriptions of each. Before you get started on these writeups, you may want to read John Baez's blog post about the workshop, which gives an introduction to the two main themes of the workshop.

continue reading »

View more: Prev | Next