Self-modification is the correct justification for updateless decision theory

Benya

Self-modification is the correct justification for updateless decision theory — LessWrong

25 Self-modification is the correct justification for updateless decision theory

by Benya

11th Apr 2010

2 min read

25

Reply to: Late great filter is not bad news

Suppose that you build an AI, and Omega appears to it and says:

Here's a button. A million years ago I calculated the umpteenth digit of pi. If it is even, I calculated whether you would press this button (in such a way that your human creator was never simulated as a conscious being). If I predicted that you wouldn't press the button, I destroyed Earth right then and there.* If it is odd, I created a doomsday device that will destroy the solar system if you press this button.

[* ETA: Assume that if the digit is even and the AI is predicted to press the button, Omega does not destroy Earth, but does turn Alpha Centauri purple (say). The point is for this to be a scenario that you, the AI creator, know not to have come to pass.]

Suppose you're the kind of AI creator whose AI is time consistent in a certain sense from the beginning of time and presses the button. Then you have an AI that satisfies a certain kind of philosopher, wins big in a certain logically impossible world, and destroys humanity.

Suppose, on the other hand, that you're a very similar kind of AI creator, only you program your AI not to take into account impossible possible worlds that had already turned out to be impossible (when you created the AI | when you first became convinced that timeless decision theory is right). Then you've got an AI that most of the time acts the same way, but does worse in worlds we know to be logically impossible, and destroys humanity less often in worlds we do not know to be logically impossible.

Wei Dai's great filter post seems to suggest that under UDT, you should be the first kind of AI creator. I don't think that's true, actually; I think that in UDT, you should probably not start with a "prior" probability distribution that gives significant weight to logical propositions you know to be false: do you think the AI should press the button if it was the first digit of pi that Omega calculated?

But obviously, you don't want tomorrow's you to pick the prior that way just after Omega has appeared to it in a couterfactual mugging (because according to your best reasoning today, there's a 50% chance this loses you a million dollars).

The most convincing argument I know for timeless flavors of decision theory is that if you could modify your own source code, the course of action that maximizes your expected utility is to modify into a timeless decider. So yes, you should do that. Any AI you build should be timeless from the start; and it's reasonable to make yourself into the kind of person that will decide timelessly with your probability distribution today (if you can do that).

But I don't think you should decide that updateless decision theory is therefore so pure and reflectively consistent that you should go and optimize your payoff even in worlds whose logical impossibility was clear before you first decided to be a timeless decider (say). Perhaps it's less elegant to justify UDT through self-modification at some arbitrary point in time than through reflective consistency all the way from the big bang on; but in the worlds we can't rule out yet, it's more likely to win.

Personal Blog

25

New Comment

34 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:03 AM

[-]Academian16y30

I wish I could upvote this more than once. An agent not yet capable of 100% precommitments can't be expected to make them, but you can imagine it slowly self-modifying to install stronger and stronger internal pre-commitment mechanisms, since even probabilistic precommitments are better than nothing (like Joe at least tried to do in Newcomb's problem happened to me).

Then it would fare more optimally on average for scenarios that involve predicting its behavior after those installments. I think that's the most one can hope for if one does not start with precommitment mechanisms. You can't stop Omega (in Judea Pearl's timeless sense of causality, not classically) from simulating a bad decision you would have made one year ago, but you can stop him from simulating a bad decision you would make in the future.

I'll take it :)

[-]Wei Dai16y20

(Sorry about the late reply. I'm not sure how I missed this post.)

Suppose you're right and we do want to build an AI that would not press the button in this scenario. How do we go about it?

We can't program "the umpteenth digit of pi is odd" into the AI as an axiom, because we don't know this scenario will occur yet.
We also can't just tell the AI "I am conscious and I have observed Alpha Centari as not purple", because presumably when Omega was predicting the AI's decision a million years ago, it was predicting the AI's output when given "I am conscious and I have observed Alpha Centari as not purple" as part of its input.
What we can do is, give the AI an utility function that does not terminally value beings who are living in a universe with a purple Alpha Centauri.

Do you agree with the above reasoning? If so, we can go on to talk about whether doing 3 is a good idea or not. Or do you have some other method in mind?

BTW, I find it helpful to write down such problems as world programs so I can see the whole structure at a glance. This is not essential to the discussion, but if you don't mind I'll reproduce it here for my own future reference.

def P():
    if IsEven(Pi(10^100)):
        if OmegaPredict(S, "Here's a button... Alpha Centauri does not look purple.") = "press":
            MakeAlphaCentauriPurple()
        else:
            DestroyEarth()
    else:
        LetUniverseRun(10^6 years)
        if S("Here's a button... Alpha Centauri does not look purple.") = "press":
            DestroyEarth()

    LetUniverseRun(forever)

Then, assuming our AI can't compute Pi(10^100), we have:

U("press") = .5 * U(universe runs forever with Alpha Centauri purple) + 0.5 * U(universe runs for 10^6 years then Earth is destroyed)
U("not press") = .5 * U(Earth is destroyed right away) + 0.5 * U(universe runs forever)

And clearly U("not press") > U("press") if U(universe runs forever with Alpha Centauri purple) = U(Earth is destroyed right away) = 0.

[-]Benya16y00

Thanks for your answer! First, since it's been a while since I posted this: I'm not sure my reasoning in this post is correct, but it does still seem right to me. I'd now gloss it as, in a Counterfactual Mugging there really is a difference as to the best course of action given your information yesterday and your information today. Yes, acting time-inconsistently is bad, so by all means, do decide to be a timeless decider; but this does not make paying up ideal given what you know today, choosing according to yesterday's knowledge is just the best of the bad alternatives. (Choosing according to what a counterfactual you would have known a million years ago, OTOH, does not seem the best of the bad alternatives.)

That said, to answer your question -- if we can assume for the purpose of the thought experiment that we know the source code of the universe, what would seem natural to me would be to program UDT's "mathematical intuition module" to assign low probability to the proposition that this source code would output a purple Alpha Centauri.

Which is -- well -- a little fuzzy, I admit, because we don't know how the mathematical intuition module is supposed to work, and it's not obvious what it should mean to tell it that a certain proposition (as opposed to a complete theory) should have low probability. But if we can let logical inference and "P is false" stand in for probability and "P is improbable," we'd tell the AI "the universe program does NOT output a purple Alpha Centauri," and by simple logic the AI would conclude IsOdd(Pi(10^100)).

[-][anonymous]14y00

The solution is for the AI to compute that digit of pi and press the button iff it's even. :-)

[This comment is no longer endorsed by its author]Reply

[-]Jonii16y00

Then you have an AI that satisfies a certain kind of philosopher, wins big in a certain logically impossible world, and destroys humanity.

And if we assume that it's better to have Earth exist one million years longer, this is the correct thing to do, no question about it, right? If you're going to take a bet which decides the fate of entire human civilization, you want to take the best bet, which in this case(we assume) was to risk to live only for million years instead risking of exploding right away.

Unless, of course, in the counterfactual you know you would've pressed the button even though you now don't. Rigging the lottery is a sneaky way out of the problem.

[-]Benya16y00

If you create your AI before you can infer from Omega's actions what the umpteenth digit of pi is, then I agree that you should create an AI that presses the button, even if the AI finds out (through Omega's actions) that the digit is in fact odd. This is because from your perspective when you create the AI, this kind of AI maximizes your expected utility (measured in humanity-years).

But if you create your AI after you can infer what the digit is (in the updated-after-your-comment version of my post, by observing that you exist and Alpha Centauri isn't purple), I argue that you should not create an AI that presses the button, because at that point, you know that's the losing decision. If you disagree, I don't yet understand why.

[-]Jonii16y00

If you create your AI before you can infer from Omega's actions what the umpteenth digit of pi is, then I agree that you should create an AI that presses the button, even if the AI finds out (through Omega's actions) that the digit is in fact odd.

If you can somehow figure it out, then yes, you shouldn't press the button. If you know that the simulated you would've known to press the button when you don't, you're not anymore dealing with "take either 50% chance of world exploding right now VS. 50% chance of world exploding million years from now", but a lot simpler "I offer to destroy the world, you can say yes or no". Updateless agent would naturally want to take the winning bet if gaining that information were somehow possible.

So, if you know which digit omega used to decide his actions, and how, and you happen to know that digit, the bet you're taking is the simpler one, the one where you can simply answer 'yes' or 'no'. Observing that Earth has not been destroyed is not enough evidence though, because the simulated, non-conscious you would've observed roughly the same thing. Only if there were some sort of difference that you knew you could and would use in the simulation, like, your knowledge of umpteenth digit of pi, or color of some object in the sky(we're assuming Omega tells you this much in both cases. This about the fate of humanity, you should seriously be certain about what sort of bets you're taking.

[-]JGWeissman16y00

Why do you think that you should conclude that pushing the button is a losing decision upon observing evidence that the digit is odd, but the AI should not? Is a different epistemology and decision theory ideal for you than what is ideal for the AI?

[-]Benya16y00

Why do you think that you should conclude that pushing the button is a losing decision upon observing evidence that the digit is odd, but the AI should not? Is a different epistemology and decision theory ideal for you than what is ideal for the AI?

I think that the AI should be perfectly aware that it is a losing decision (in the sense that it should be able to conclude that it wipes out humanity with certainty), but I think that you should program it to make that decision anyway (by programming it to be an updateless decider, not by special-casing, obviously).

The reason that I think you should program it that way is that programming it that way maximizes the utility you expect when you program the AI, because you can only preserve humanity in one possible future if you make the AI knowingly destroy humanity in the other possible future.

I guess the short answer to your question is that I think it's sensible to discuss what a human should do, including how a human should build an AI, but not how "an AI should act" (in any other sense than how a human should build an AI to act); after all, a human might listen to advice, but a well-designed AI probably shouldn't.

If we're discussing the question how a human should build an AI (or modify themselves, if they can modify themselves), I think they should maximize their expected payoff and make the AI (themselves) updateless deciders. But that's because that's their best possible choice according to their knowledge at that point in time, not because it's the best possible choice according to timeless philosophical ideals. So I don't conclude that humans should make the choice that would have been their best possible bet a million years ago, but is terrible according to the info they in fact have now.

[-][anonymous]16y00

Let's take for granted that earth existing for million years before exploding is better than it exploding right away. Then, of course you should press the button. If you don't do that, that'd mean that every time this sort of thing happens, you fare, on average, worse than someone who doesn't update like you suggest. You should take the best bet, even acausally, especially if you're gambling with the fate of the entire world.

If there's some trick here why you should not press the button, I'm not seeing it.

[-]Stuart_Armstrong16y00

"you" could be a UDT agent. So does this example show a divergence between UDT and XDT applied to UDT?

[-]Benya16y30

Stuart, sorry I never replied to this; I wasn't sure what to say until thinking about it again when replying to Wei Dai just now.

I'd say that what's going on here is that UDT does not make the best choices given the information available at every point in time -- which is not a defect of UDT, it's a reflection of the fact that if at every point in time, you make what would be the best choice according to the info available at that point in time, you become time-inconsistent and end up with a bad payoff.

To bring this out, consider a Counterfactual Mugging scenario where you must pay $1 before the coin flip, plus $100 if the coin comes up tails, to win the $10,000 if the coin comes up heads. According to the info available before the flip, it's best to pay $1 now and $100 on tails. According to the info when the coin has come up tails, it's best to not pay up. So an algorithm making both of these choices in the respective situations would be a money-pump that pays $1 without ever getting anything in return.

So my answer to your question would be, what this shows is a divergence between UDT applied to the info before the flip and UDT applied to the info after the flip -- and no, you can't have the best of both worlds...

[-]FAWS16y00

I'm wondering where this particular bit of insanity (from my perspective) is coming from. I assume that if Omega would have destroyed the solar system (changed from just the earth because trading off 1million years of human history vs the rest of the SS didn't seem to be the point of the thought experiment) a million years ago if AI would not press the button and made the button also destroy the solar system you'd want the AI to press the button. Why should a 50% chance to be lucky change anything?

Would you also want the AI not to press the button if the "lucky" digit stayed constant, i. e. if Omega left the solar system alone in either case if the digit was even, destroyed the solar system a million years ago if the digit was not even and the AI would not press the button and made the button destroy the solar system if the digit was not even and the AI would press the button? If not, why do you expect the choice of the AI to affect the digit of pi? Ignorance does not have magic powers. You can't make any inference on the digit of pi from your existence because you don't have any more information than the hypothetical you whose actions determine your existence. Or do you rely on the fact that the hypothetical you isn't "really" (because it doesn't exist) conscious? In that case you probably also think you can safely two-box if the boxes are transparent, you see money in both boxes and Omega told you it doesn't use any conscious simulations. (you can't, btw, because consciousness doesn't have magic powers either).

[-]Benya16y30

I'm wondering where this particular bit of insanity (from my perspective) is coming from.

Well, let's see whether I can at least state my position clearly enough that you know what it is, even if you think it's insane :-)

Or do you rely on the fact that the hypothetical you isn't "really" (because it doesn't exist) conscious? In that case you probably also think you can safely two-box if the boxes are transparent, you see money in both boxes and Omega told you it doesn't use any conscious simulations. (you can't, btw, because consciousness doesn't have magic powers either).

My argument is that since I'm here, and since I wouldn't be if Omega destroyed the solar system a million years ago and nobody ever simulated me, I know that Omega didn't destroy the solar system. It seems that in what you've quoted above, that's what you're guessing my position is.

I'm not sure whether I have accepted timelessness enough to change myself so that I would one-box in Newcomb with transparent boxes. However, if I thought that I would two-box, and Omega tells me that it has, without using conscious simulations, predicted whether I would take both boxes (and save 1,001 lives) or only one box (and save 1,000 lives), and only in the latter case had filled both boxes, and now I'm seeing both boxes full in front of me, I should be very confused: one of my assumptions must be wrong. The problem posed seems inconsistent, like if you ask me what I would do if Omega offered me Newcomb's problem and as an aside told me that six times nine equals fourty-two.

Perhaps this simplification of the original thought experiment will help make our respective positions clearer: Suppose that Omega appears and tells me that a million years ago, it (flipped a coin | calculated the umpteenth digit of pi), and (if it came up heads | was even), then it destroyed the solar system. It didn't do any simulations or other fancy stuff. In this case, I would conclude from the fact that I'm here that (the coin came up tails | the digit was odd).

I'm curious: would you say that I'm wrong to make that inference, because "consciousness isn't magic" (in my mind I don't think I'm treating it as such, of course), or does Omega making a prediction without actually simulating me in detail make a difference to you?

[-]FAWS16y20

Perhaps this simplification of the original thought experiment will help make our respective positions clearer: Suppose that Omega appears and tells me that a million years ago, it (flipped a coin | calculated the umpteenth digit of pi), and (if it came up heads | was even), then it destroyed the solar system. It didn't do any simulations or other fancy stuff. In this case, I would conclude from the fact that I'm here that (the coin came up tails | the digit was odd).

In this case this is unproblematic because there is no choice involved. But when the choice is entangled with the existence of the scenario/the one making the choice you can't simultaneously assume choice and existence, because your choice won't rewrite other things to make them consistent.

Simple example: Omega appears and tells you it predicted (no conscious simulations) you will give it $100. If you wouldn't Omega would instead give you $100. Omega is never wrong. Should you give Omega $100? Of course not. Should you anticipate that Omega is wrong, or that some force will compel you, that lightening from the clear sky strikes you down before you can answer, that Omega disappears in a pink puff of logic, that you disappear in a pink puff of logic? It doesn't really matter, as long as you make sure you don't hand over $100. Personally I'd assume that I retroactively turn out not to exist because the whole scenario is only hypothetical (and of course my choice can't change me from a hypothetical person into a real person no matter what).

For you to get the $100 there needs to be a fact about what you would do in the hypothetical scenario of Omega predicting that you give it $100, and the only way for that fact to be what you want it to be is to actually act like you want the hypothetical to act. That means when confronted with apparent impossibility you must not draw any conclusions form the apparent contradiction that differentiate the situation from the hypothetical. Otherwise you will be stuck with the differentiated situation as the actual hypothetical. To get the benefit of hypothetically refusing to give $100 you must be ready to actually refuse to give $100 and disappear in a puff of logic. So far so uncontroversial, I assume.

Now, take the above and change it to Omega predicting you will give it $100 unless X is true. Nothing important changes, at all. You can't make X true or untrue by changing your choice. If X is "the sky is green" your choice will not change the color of the sky. If X is that the first digit of pi is even your choice will not change pi. If X is that you have a fatal heart disease you cannot cure yourself by giving Omega $100. Whether you already know about X doesn't matter, because ignorance doesn't have magical powers, even if you add consciousness to the mix.

[-]Benya16y10

Now, take the above and change it to Omega predicting you will give it $100 unless X is true. Nothing important changes, at all. You can't make X true or untrue by changing your choice.

Wait, are you thinking I'm thinking I can determine the umpteenth digit of pi in my scenario? I see your point; that would be insane.

My point is simply this: if your existence (or any other observation of yours) allows you to infer the umpteenth digit of pi is odd, then the AI you build should be allowed to use that fact, instead of trying to maximize utility even in the logically impossible world where that digit is even.

The goal of my thought experiment was to construct a situation like in Wei Dai's post, where if you lived two million years ago you'd want your AI to press the button, because it would give humanity a 50% chance of survival and a 50% chance of later death instead of a 50% chance of survival and a 50% chance of earlier death; I wanted to argue that despite the fact that you'd've built the AI that way two million years ago, you shouldn't today, because you don't want it to maximize probability in worlds you know to be impossible.

I guess the issue was muddled by the fact that my scenario didn't clearly rule out the possibility that the digit is even but you (the human AI creator) are alive because Omega predicted the AI would press the button. I can't offhand think of a modification of my original thought experiment that would take care of that problem and still be obviously analgous to Wei Dai's scenario, but from my perspective, at least, nothing would change in my argument if, if the digit is even, and Omega predicted that the AI would press the button and so Omega didn't destroy the world, then Omega turned Alpha Centauri purple; since Alpha Centauri isn't purple, you can conclude that the digit is odd. [Edit: changed the post to include that proviso.]

(But if you had built your AI two million years ago, you'd've programmed it in such a way that it would press the button even if it observes Alpha Centauri to be purple -- because then, you would really have to make the 50/50 decision that Wei Dai has in mind.)

[-]FAWS16y-20

Wait, are you thinking I'm thinking I can determine the umpteenth digit of pi in my scenario? I see your point; that would be insane.

My point is simply this: if your existence (or any other observation of yours) allows you to infer the umpteenth digit of pi is odd, then the AI you build should be allowed to use that fact, instead of trying to maximize utility even in the logically impossible world where that digit is even.

Actually you were: There are four possibilities:

The AI will press the button, the digit is even
The AI will not press the button, the digit is even, you don't exist
The AI will press the button, the digit is odd, the word will kaboom
The AI will not press the button, the digit is odd.

Updating on the fact that the second possibility is not true is precisely equivalent to concluding that if the AI does not press the button the digit must be odd, and ensuring that the AI does not means choosing the digit to be odd.

If you already know that the digit is odd independent from the choice of the AI the whole thing reduces to a high stakes counterfactual mugging (if the destruction by Omega if the digit is even depends on what the AI knowing the digit to be odd would do, otherwise there is no dilemma in the first place).

[-]Tyrrell_McAllister16y10

Updating on the fact that the second possibility is not true is precisely equivalent to concluding that if the AI does not press the button the digit must be odd, and ensuring that the AI does not means choosing the digit to be odd.

There is nothing insane about this, provided that it is properly understood. The resolution is essentially the same as the resolution of the paradox of free will in a classically-deterministic universe.

In a classically-deterministic universe, all of your choices are mathematical consequences of the universe's state 1 million years ago. And people often confused themselves by thinking, "Suppose that my future actions are under my control. Well, I will choose to take a certain action if and only if certain mathematical propositions are true (namely, the propositions necessary to deduce my choice from the state of the universe 1 million years ago). Therefore, by choosing to take that action, I am getting to decide the truth-values of those propositions. But the truth-values of mathematical propositions is beyond my control, so my future actions must also be beyond my control."

I think that people here generally get that this kind of thinking is confused. Even if we lived in a classically-deterministic universe, we could still think of ourselves as choosing our actions without concluding that we get to determine mathematical truth on a whim.

Similarly, Benja's AI can think of itself as getting to choose whether to push the button without thereby implying that it has the power to modify mathematical truth.

[-]Benya16y20

Similarly, Benja's AI can think of itself as getting to choose whether to push the button without thereby thinking that it has the power to modify mathematical truth.

I think we're all on the same page about being able to choose some mathematical truths, actually. What FAWS and I think is that in the setup I described, the human/AI does not get to determine the digit of pi, because the computation of the digits of pi does not involve a computation of the human's choices in the thought experiment. [Unless of course by incredible mathematical coincidence, the calculation of digits of pi happens to be a universal computer, happens to simulate our universe, and by pure luck happens to depend on our choices just at the umpteenth digit. My math knowledge doesn't suffice to rule that possibility out, but it's not just astronomically but combinatorially unlikely, and not what any of us has in mind, I'm sure.]

[-]Benya16y00

I'll grant you that my formulation had a serious bug, but--

There are four possibilities:

The AI will press the button, the digit is even

The AI will not press the button, the digit is even, you don't exist

The AI will press the button, the digit is odd, the word will kaboom

The AI will not press the button, the digit is odd.

Updating on the fact that the second possibility is not true is precisely equivalent to concluding that if the AI does not press the button the digit must be odd

Yes, if by that sentence you mean the logical proposition (AI presses button => digit is odd), also known as (digit odd \/ ~AI presses button).

and ensuring that the AI does not means choosing the digit to be odd.

I'll only grant that if I actually end up building an AI that presses the button, and the digit is even, then Omega is a bad predictor, which would make the problem statement contradictory. Which is bad enough, but I don't think I can be accused of minting causality from logical implication signs...

In any case,

If you already know that the digit is odd independent from the choice of the AI the whole thing reduces to a high stakes counterfactual mugging

That's true. I think that's also what Wei Dai had in mind in http://lesswrong.com/lw/214/late_great_filter_is_not_bad_news/ of the great filter post (and not the ability to change Omega's coin to tails by not pressing the button!). My position is that you should not pay in counterfactual muggings whose counterfactuality was already known prior to your decision to become a timeless decision theorist, although you should program (yourself | your AI) to pay in counterfactual muggings you don't yet know to be counterfactual.

[-]Tyrrell_McAllister16y00

I agree. The way I think of it is this:

If you find yourself deciding contrary to UDT you may infer that either

your designer was stupider or crazier than you, or
your designer was smarter or less crazy than you.

In the first case, your designer failed to realize something that you would have realized, had you been in your designer's place. In the second case, your designer knew something that you would not have known in your designer's place.

For example, if you find yourself turning down Omega in Pascal's Mugging [ETA: I meant the Counterfactual Mugging], you may infer that either

your designer failed to design you in a way that you would have expected would maximize your utility, or
your designer somehow predicted, with sufficient confidence, how Omega's coin would land, and so designed you to do what is best in that case.

Of course, as a present-day human, you already knew that (1) is the case, not (2). You were designed by a stupid process of natural selection. You already knew that your designer didn't take into account the possibility of being confronted by something like Omega. It's true that other humans contributed some modest amount to your design, through education and the art of rationality. However, even these additions were made by people who were, for the most part, crazier or stupider than you are now.

So, finding that you are declining Omega, contrary to UDT, isn't even necessarily bad news. It probably just means that your designers lucked out more than they deserved, which would be good news.

But that doesn't change the fact that your designers were stupid, and that you ought to want to do better. You ought to modify yourself and any of your progeny to act more in accord with UDT in the future.

[-]za3k16y00

Just to be explicit: The first AI type, which is "time consistent", would probably press the button. The second AI type, which does worse in impossible worlds, and better in world we don't know are impossible, would not press the button.

[-]Benya16y00

Yes, that's what I meant.

[-]JGWeissman16y00

do you think the AI should press the button if it was the first digit of pi that Omega calculated?

If a UDT agent is presented with a counterfactual mugging based on uncertainty of a logical proposition, it should attempt to resolve the logical uncertainty and act accordingly. But, in the Least Convenient Possible World, Omega says, "I chose some logical proposition that I assigned .5 probability of being true, then computed if it is in fact true, and if it is, then I ..."

[-]Benya16y00

If a UDT agent is presented with a counterfactual mugging based on uncertainty of a logical proposition, it should attempt to resolve the logical uncertainty and act accordingly.

Ok, the intuition pump is problematic in that not only do you know what the first digit of pi is, it is also easy for the AI to calculate. Can you imagine a least convenient possible world in which there is a logical fact for Omega to use that you know the answer to, but that is not trivial for the AI to calculate? Would you agree that it makes sense to enter it into the AI's prior?

My point was that since you're consciously creating the AI, you know that Omega didn't destroy Earth, so you know that the umpteenth digit of pi is odd, and you should program that into the AI. (A'ight, perhaps the digit is in fact even and you're conscious only because you're a Boltzmann brain who's about to be destroyed, but let's assume that case away.)

[-]Nick_Tarleton16y40

since you're consciously creating the AI, you know that Omega didn't destroy Earth

Omega's unconscious model of you also 'knows' this. The abstract computation that is your decision process doesn't have direct knowledge of whether or not it's instantiated by anything 'real' or 'conscious' (whatever those mean).

[-]Benya16y00

My intent when I said "never instantiated as a conscious being" was that Omega used some accurate statistical method of prediction that did not include a whole simulation of what you are experiencing right now. I agree that I can't resolve the confusion about what "conscious" means, but when considering Omega problems, I don't think it's going too far to postulate that Omega can use statistical models that predict very accurately what I'll do without that prediction leading to a detailed simulation of me.

Ok, I can't rigorously justify a fundamental difference between "a brain being simulated (and thus experiencing things)" and "a brain not actually simulated (and therefore not experiencing things)," so perhaps I can't logically conclude that Omega didn't destroy Earth even if its prediction algorithm doesn't simulate me. But it still seems important to me to work well if there is such a difference (if there isn't, why should I care whether Omega "really" destroys Earth "a million years before my subjective now", if I go on experiencing my life the way I "only seem" to experience it now?)

[-]JGWeissman16y10

My intent when I said "never instantiated as a conscious being" was that Omega used some accurate statistical method of prediction that did not include a whole simulation of what you are experiencing right now.

The point is that the accurate statistical method is going to predict what the AI would do if it were created by a conscious human, so the decision theory cannot use the fact that the AI was created by a conscious human to discriminate between the two cases. It has equal strength beliefs in that fact in both cases, so the likelihood ratio is 1:1.

(Though it seems that if a method of prediction, without making any conscious people, accurately predicts what a person would do, because that person really would do the thing it predicted, then we are talking about p-zombies, which should not be possible. Perhaps this method can predict what sort of AI we would build, and what that AI would do, but not what we would say about subjective experience, though I would expect that subjective experience is part of the causal chain that causes us to build a particular AI, so that seems unlikely.)

[-]Benya16y00

The point is that the accurate statistical method is going to predict what the AI would do if it were created by a conscious human, so the decision theory cannot use the fact that the AI was created by a conscious human to discriminate between the two cases. It has equal strength beliefs in that fact in both cases, so the likelihood ratio is 1:1.

I think we're getting to the heart of the matter here, perhaps, although I'm getting worried about all the talk about consciousness. My argument is that when you build an AI, you should allow yourself to take into account any information you know to be true (knew when you decided to be a timeless decider), even if there are good reasons that you don't want your AI to decide timelessly and, at some points in the future, make decisions optimizing worlds it at this point 'knows' to be impossible. I think it's really only a special case that if you're conscious, and you know you wouldn't exist anywhere in space-time as a conscious being if a certain calculation came out a certain way, then the ship has sailed, the calculation is in your "logical past", and you should build your AI so that it can use the fact that the calculation does not come out that way.

Though it seems that if a method of prediction, without making any conscious people, accurately predicts what a person would do, because that person really would do the thing it predicted, then we are talking about p-zombies, which should not be possible.

The person who convinced me of this [unless I misunderstood them] argued that there's no reason to assume that there can't be calculations coarse enough that they don't actually simulate a brain, yet specific enough to make some very good predictions about what a brain would do; I think they also argued that humans can be quite good at making predictions (though not letter-perfect predictions) about what other humans will say about subjective experience, without actually running an accurate conscious simulation of the other human.

[-]rwallace16y00

calculations coarse enough that they don't actually simulate a brain, yet specific enough to make some very good predictions about what a brain would do

Maybe, but when you're making mathematical arguments, there is a qualitative difference between a deterministically accurate prediction and a merely "very good" one. In particular, for any such shortcut calculation, there is a way to build a mind such that the shortcut calculation will always give the wrong answer.

If you're writing a thought experiment that starts with "suppose... Omega appears," you're doing that because you're making an argument that relies on deterministically accurate prediction. If you find yourself having to say "never simulated as a conscious being" in the same thought experiment, then the argument has failed. If there's an alternative argument that works with merely "very good" predictions, then by all means make it - after deleting the part about Omega.

[-]JGWeissman16y10

Ok, the intuition pump is problematic in that not only do you know what the first digit of pi is, it is also easy for the AI to calculate.

Perhaps I wasn't clear. I meant that Omega does not actually tell you what logical proposition it used. The phrase "some logical proposition" is literally what Omega says, it is not a placeholder for something more specific. All you have to go on is that of the things that Omega believes with probability .5, on average half of them are actually true.

Can you imagine a least convenient possible world in which there is a logical fact for Omega to use that you know the answer to, but that is not trivial for the AI to calculate? Would you agree that it makes sense to enter it into the AI's prior?

No. A properly designed AGI should be able to figure out any logical fact that I know.

My point was that ...

My point was that one particular argument you made does not actually support your point.

[-]Unknowns16y20

I've given such a logical fact before.

"After thinking about it for a sufficiently long time, the AI at some time or other will judge this statement to be false."

This might very well be a logical fact because it's truth or falsehood can be determined from the AI's programming, something quite logically determinate. But it is quite difficult for the AI to discover the truth of the matter.

[-]Benya16y00

My point was that one particular argument you made does not actually support your point.

Ok, fair enough, I guess. I still think you're not assuming the least convenient possible world; perhaps some astrophysical observation of yours that isn't available to the AI allows you to have high confidence about some digits of Chaitin's constant. But that's much more subtle than what I had in mind when writing the post, so thanks for pointing that out.

Perhaps I wasn't clear. I meant that Omega does not actually tell you what logical proposition it used.

Ok, I misunderstood, sorry. I don't understand the point you were making there, then. My intent was to use a digit large enough that the AI cannot compute it in the time Omega is allowing it; I don't see any difference between your version and mine, then?

[-]JGWeissman16y00

perhaps some astrophysical observation of yours that isn't available to the AI

The best approach I know now for constructing an FAI is CEV. An AI that can pull that off should be able to also access any astrophysical data I possess. I am not sure what the point would be if it didn't. The expected utility of programming the FAI to be able to figure this stuff out is much higher than building it a giant lookup table of stuff I know, unless I had magical advance knowledge that some particular fact that I know will be incredibly useful to the FAI.

My intent was to use a digit large enough that the AI cannot compute it in the time Omega is allowing it; I don't see any difference between your version and mine, then?

Yes, there is no difference, given that you have a sufficiently large digit. The reason I brought up my version is so that you don't have to worry about computing the truth value of the logical proposition as a strategy, as you don't even know which logical proposition was used.

Moderation Log