Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: drnickbone 22 April 2014 08:22:29AM 0 points [-]

One issue here is that worlds with an "almost-friendly" AI (one whose friendliness was botched in some respect) may end up looking like siren or marketing worlds.

In that case, worlds as bad as sirens will be rather too common in the search space (because AIs with botched friendliness are more likely than AIs with true friendliness) and a satisficing approach won't work.

Comment author: CellBioGuy 05 April 2014 10:48:24PM *  1 point [-]

I just reject utilitarianism on the grounds that you cannot actually compare or aggregate utility between two agents (their utilities being not actually comparable on the same axis, or alternately being in 'different units'), and on the grounds that human behavior does not satisfy the logical axioms required for us to be said to have a utility function.

Comment author: drnickbone 10 April 2014 09:23:56PM 0 points [-]

Well you can make such comparisons if you allow for empathic preferences (imagine placing yourself in someone else's position, and ask how good or bad that would be, relative to some other position). Also the fact that human behavior doesn't perfectly fit a utility function is not in itself a huge issue: just apply a best fit function (this is the "revealed preference" approach to utility).

Ken Binmore has a rather good paper on this topic, see here.

Comment author: TsviBT 01 April 2014 03:15:26PM 1 point [-]

No April Fool here.

Comment author: drnickbone 01 April 2014 04:54:20PM 2 points [-]

OK, I also got a "non-cheat" solution: unfortunately, it is non-constructive and uses the Nkvbz bs Pubvpr, so it still feels like a bit of a cheat. Is there a solution which doesn't rely on that (or is it possible to show there is no solution in such a case?)

Comment author: TsviBT 01 April 2014 03:15:26PM 1 point [-]

No April Fool here.

Comment author: drnickbone 01 April 2014 04:23:49PM 3 points [-]

Oh dear, I suppose that rules out other "cheats" then: such as prisoner n guessing after n seconds. At any point in time, only finitely many have guessed, so only finitely many have guessed wrong. Hence the prisoners can never be executed. (Though they can never be released either.)

Comment author: TsviBT 01 April 2014 02:38:03AM 8 points [-]

Puzzle:

A countable infinity of prisoners are placed in a room so that they can all see each other, but are not allowed to communicate in any way and cannot see their own heads. The warden places on the head of each prisoner a red hat or a black hat. The prisoners will each guess the color of their own hat. They will all be released if at most finitely many of them guess incorrectly, and they will all be killed otherwise. The prisoners know all of this, and may collude beforehand. The prisoners are all distinguishable - think of them as being numbered 1,2,3,.... Again, once the warden has placed the hats, the prisoners receive no information other than the color of their fellow prisoners' hats. Prove that there is a strategy that guarantees a win for the prisoners.

(On my honor, this is possible.)

Comment author: drnickbone 01 April 2014 07:46:54AM 2 points [-]

I suspect an April Fool:

Cevfbare a+1 gnxrf gur ung sebz cevfbare a naq chgf vg ba uvf bja urnq. Gura nyy cevfbaref (ncneg sebz cevfbare 1) thrff gur pbybe pbeerpgyl!

Comment author: Squark 25 March 2014 07:10:39PM *  1 point [-]

In terms of expected utility, it is better for "you" (that is, all linked instances of you) to take the gamble, even if the vast majority of light-cones don't contain simulations.

It is not the case if the money can be utilized in a manner with long term impact.

No it isn't meaningless: chances simply become operationalised in terms of bets, or other decisions with variable payoff.

This doesn't give an unambiguous recipe to compute probabilities since it depends on how the results of the bets are accumulated to influence utility. An unambiguous recipe cannot exist since it would have to give precise answers to ambiguous questions such as: if there are two identical simulations of you running on two computers, should they be counted as two copies or one?

Incidentally, in terms of original modal realism (due to David Lewis), "you" are a concrete unique individual who inhabits exactly one world, but it is unknown which one. Other versions of "you" are your "counterparts". It is usually not possible to group all your counterparts together and treat them as a single (distributed) being, YOU, because the counterpart relation is not an equivalence relation (it doesn't partition possible people into neat equivalence classes). As one example, imagine a long chain of possible people whose experiences and memories are indistinguishable from immediate neighbours in the chain (and they are counterparts of their neighbours). But there is a cumulative "drift" along the chain, so that the ends are very different from each other (and not counterparts).

UDT doesn't seem to work this way. In UDT, "you" are not a physical entity but an abstract decision algorithm. This abstract decision algorithm is correlated to different extent with different physical entities in different worlds. This leads to the question of whether some algorithms are more "conscious" than others. I don't think UDT currently has an answer for this, but neither do other frameworks.

You weren't born believing in the many worlds interpretation (or in modal realism) and if you are a normal human being you most likely regarded it as quite outlandish at some point. Then some line of evidence or reasoning caused you to shift your opinion (e.g. because it seemed simpler, or overall a better explanation for physical evidence). If it shifted one way, then considering other evidence could shift it back again.

If we think of knowledge as a layered pie, with lower layers corresponding to knowledge which is more "fundamental", then somewhere near the bottom we have paradigms of reasoning such as Occam's razor / Solomonoff induction and UDT. Below them lie "human reasoning axioms" which are something we cannot formalize due to our limited introspection ability. In fact the paradigms of reasoning are our current best efforts at formalizing this intuition. However, when we build an AI we need to use something formal, we cannot just transfer our reasoning axioms to it (at least I don't know how to do it; meseems every way to do it would be "ingenuine" since it would be based on a formalism). So, for the AI, UDT (or whatever formalism we use) is the lowest layer. Maybe it's a philosophical limitation of any AGI, but I doubt it can be overcome and I doubt it's a good reason not to build an (F)AI.

Comment author: drnickbone 28 March 2014 10:26:56AM 0 points [-]

As one example, imagine a long chain of possible people whose experiences and memories are indistinguishable from immediate neighbours in the chain (and they are counterparts of their neighbours). But there is a cumulative "drift" along the chain, so that the ends are very different from each other (and not counterparts).

UDT doesn't seem to work this way. In UDT, "you" are not a physical entity but an abstract decision algorithm. This abstract decision algorithm is correlated to different extent with different physical entities in different worlds. This leads to the question of whether some algorithms are more "conscious" than others. I don't think UDT currently has an answer for this, but neither do other frameworks.

I think it works quite well with "you" as a concrete entity. Simply use the notion that "your" decisions are linked to those of your counterparts (and indeed, to other agents), such that if you decide in a certain way in given circumstances, your counterparts will decide that way as well. The linkage will be very tight for neighbours in the chain, but diminishing gradually with distance, and such that the ends of the chain are not linked at all. This - I think - addresses the problem of trying to identify what algorithm you are implementing, or partitioning possible people into those who are running "the same" algorithm.

Comment author: Squark 25 March 2014 07:10:39PM *  1 point [-]

In terms of expected utility, it is better for "you" (that is, all linked instances of you) to take the gamble, even if the vast majority of light-cones don't contain simulations.

It is not the case if the money can be utilized in a manner with long term impact.

No it isn't meaningless: chances simply become operationalised in terms of bets, or other decisions with variable payoff.

This doesn't give an unambiguous recipe to compute probabilities since it depends on how the results of the bets are accumulated to influence utility. An unambiguous recipe cannot exist since it would have to give precise answers to ambiguous questions such as: if there are two identical simulations of you running on two computers, should they be counted as two copies or one?

Incidentally, in terms of original modal realism (due to David Lewis), "you" are a concrete unique individual who inhabits exactly one world, but it is unknown which one. Other versions of "you" are your "counterparts". It is usually not possible to group all your counterparts together and treat them as a single (distributed) being, YOU, because the counterpart relation is not an equivalence relation (it doesn't partition possible people into neat equivalence classes). As one example, imagine a long chain of possible people whose experiences and memories are indistinguishable from immediate neighbours in the chain (and they are counterparts of their neighbours). But there is a cumulative "drift" along the chain, so that the ends are very different from each other (and not counterparts).

UDT doesn't seem to work this way. In UDT, "you" are not a physical entity but an abstract decision algorithm. This abstract decision algorithm is correlated to different extent with different physical entities in different worlds. This leads to the question of whether some algorithms are more "conscious" than others. I don't think UDT currently has an answer for this, but neither do other frameworks.

You weren't born believing in the many worlds interpretation (or in modal realism) and if you are a normal human being you most likely regarded it as quite outlandish at some point. Then some line of evidence or reasoning caused you to shift your opinion (e.g. because it seemed simpler, or overall a better explanation for physical evidence). If it shifted one way, then considering other evidence could shift it back again.

If we think of knowledge as a layered pie, with lower layers corresponding to knowledge which is more "fundamental", then somewhere near the bottom we have paradigms of reasoning such as Occam's razor / Solomonoff induction and UDT. Below them lie "human reasoning axioms" which are something we cannot formalize due to our limited introspection ability. In fact the paradigms of reasoning are our current best efforts at formalizing this intuition. However, when we build an AI we need to use something formal, we cannot just transfer our reasoning axioms to it (at least I don't know how to do it; meseems every way to do it would be "ingenuine" since it would be based on a formalism). So, for the AI, UDT (or whatever formalism we use) is the lowest layer. Maybe it's a philosophical limitation of any AGI, but I doubt it can be overcome and I doubt it's a good reason not to build an (F)AI.

Comment author: drnickbone 26 March 2014 09:11:37AM *  1 point [-]

It is not the case if the money can be utilized in a manner with long term impact.

OK, I was using $ here as a proxy for utils, but technically you're right: the bet should be expressed in utils (as for the general definition of a chance that I gave in my comment). Or if you don't know how to bet in utils, use another proxy which is a consumptive good and can't be invested (e.g. chocolate bars or vouchers for a cinema trip this week). A final loop-hole is the time discounting: the real versions of you mostly live earlier than the sim versions of you, so perhaps a chocolate bar for the real "you" is worth many chocolate bars for sim "you"s? However we covered that earlier in the thread as well: my understanding is that your effective discount rate is not high enough to outweigh the huge numbers of sims.

An unambiguous recipe cannot exist since it would have to give precise answers to ambiguous questions such as: if there are two identical simulations of you running on two computers, should they be counted as two copies or one?

Well this is your utility function, so you tell me! Imagine a hacker is able to get into the simulations and replace pleasant experiences by horrible torture. Does your utility function care twice as much if he hacks both simulations versus hacking just one of them? (My guess is that it does). And this style of reasoning may cover limit cases like a simulation running on a wafer which is then cut in two (think about whether the sims are independently hackable, and how much you care.)

Comment author: Squark 23 March 2014 05:25:05PM 1 point [-]

So: if a bet is offered that you are a sim (in some form of computronium) and it becomes possible to test that (and so decide the bet one way or another), you would bet heavily on being a sim?

It depends on the stakes of the best.

But on the off-chance that you are not a sim, you're going to make decisions as if you were in the real world, because those decisions (when suitably generalized across all possible light-cones) have a huge utility impact. Is that right?

It's not an "off-chance". It is meaningless to speak of the "chance I am a sim": some copies of me are sims, some copies of me are not sims.

This all seems to be part of a general problem with asking UDT to model selfish (or self-interested) preferences. Perhaps it can't.

It surely can: just give more weight to humans of a very particular type ("you").

What if modal realism is wrong? What if there is, in fact, evidence that it is wrong, because the world as we see it is not what we should expect to see if it was right?

Subjective expectations are meaningless in UDT. So there is no "what we should expect to see".

Or does a UDT agent have to stay dogmatically committed to modal realism in the face of whatever it sees? That doesn't seem very rational does it?

Does it have to stay dogmatically committed to Occam's razor in the face of whatever it sees? If not, how would it arrive at a replacement without using Occam's razor? There must be some axioms at the basis of any reasoning system.

Comment author: drnickbone 25 March 2014 12:59:05PM 1 point [-]

So: if a bet is offered that you are a sim (in some form of computronium) and it becomes possible to test that (and so decide the bet one way or another), you would bet heavily on being a sim?

It depends on the stakes of the best.

I thought we discussed an example earlier in the thread? The gambler pays $1000 if not in a simulation; the bookmaker pays $1 if the gambler is in a simulation. In terms of expected utility, it is better for "you" (that is, all linked instances of you) to take the gamble, even if the vast majority of light-cones don't contain simulations.

It is meaningless to speak of the "chance I am a sim": some copies of me are sims, some copies of me are not sims

No it isn't meaningless: chances simply become operationalised in terms of bets, or other decisions with variable payoff. The "chance you are a sim" becomes equal to the fraction of a util you are prepared to pay for a betting slip which pays out one util if you are a sim, and pays nothing otherwise. (Lots of linked copies of "you" take the gamble; some win, some lose.)

Incidentally, in terms of original modal realism (due to David Lewis), "you" are a concrete unique individual who inhabits exactly one world, but it is unknown which one. Other versions of "you" are your "counterparts". It is usually not possible to group all your counterparts together and treat them as a single (distributed) being, YOU, because the counterpart relation is not an equivalence relation (it doesn't partition possible people into neat equivalence classes). As one example, imagine a long chain of possible people whose experiences and memories are indistinguishable from immediate neighbours in the chain (and they are counterparts of their neighbours). But there is a cumulative "drift" along the chain, so that the ends are very different from each other (and not counterparts).

Subjective expectations are meaningless in UDT. So there is no "what we should expect to see".

A subjective expectation is rather like a bet: it is a commitment of mental resource to modelling certain lines of future observations (and preparing decisions for such a case). If you spend most of your modelling resource on a scenario which doesn't materialise, this is like losing the bet. So it is reasonable to talk about subjective expectations in UDT; just model them as bets.

Does it have to stay dogmatically committed to Occam's razor in the face of whatever it sees? If not, how would it arrive at a replacement without using Occam's razor?

Occam's razor here is just a method for weighting hypotheses in the prior. It is only "dogmatic" if the prior assigns weights in such an unbalanced way that no amount of evidence will ever shift the weights. If your prior had truly massive weight (e.g, infinite weight) in favour of many worlds, then it will never shift, so that looks dogmatic. But to be honest, I rather doubt this. You weren't born believing in the many worlds interpretation (or in modal realism) and if you are a normal human being you most likely regarded it as quite outlandish at some point. Then some line of evidence or reasoning caused you to shift your opinion (e.g. because it seemed simpler, or overall a better explanation for physical evidence). If it shifted one way, then considering other evidence could shift it back again.

Comment author: Squark 21 March 2014 08:31:04AM 0 points [-]

So I suspect that this approach gives a weighting rather like 2^-K(s,t) for light-cones which are offset from the Big Bang.

In some sense it does, but we must be wary of technicalities. In initial singularity models I'm not sure it makes sense to speak of "light cone with vertex in singularity" and it certainly doesn't make sense to speak of a privileged point in space. In eternal inflation models there is no singularity so it might make space to speak of the "Big Bang" point in space-time, however it is slightly "fuzzy".

I disagree. If models like MWI and/or eternal inflation are taken seriously, then they imply the existence of a huge number of civilisations (spread across multiple branches or multiple inflating regions), and a huge number of expanded civilisations (unless the chance of expansion is exactly zero). Observers should then predict that they will be in one of the expanded civilisations. (Or in UDT terms, they should take bets that they are in such a civilisation). Since our observations are not like that, this forces us into simulation conclusions (most people making our observations are in sims, so that's how we should bet).

I don't think it does. If we are not in a sim, our actions have potentially huge impact since they can affect the probability and the properties of a hypothetical expanded post-human civilization.

Incidentally, there are versions of inflation and many worlds which don't run into that problem. You can always take a "local" view of inflation (see for instance these papers), and a "modal" interpretation of many worlds (see here). Combined, these views imply that all that actually exists is within one branch of a wave function constructed over one observable universe.

In UDT it doesn't make sense to speak of what "actually exists". Everything exists, you just assign different weights to different parts of "everything" when computing utility. The "U" in UDT is for "updateless" which means that you don't update on being in a certain branch of the wavefunction to conclude other branches "don't exist", otherwise you lose in counterfactual mugging.

Comment author: drnickbone 21 March 2014 05:29:09PM *  1 point [-]

I don't think it does. If we are not in a sim, our actions have potentially huge impact since they can affect the probability and the properties of a hypothetical expanded post-human civilization.

So: if a bet is offered that you are a sim (in some form of computronium) and it becomes possible to test that (and so decide the bet one way or another), you would bet heavily on being a sim? But on the off-chance that you are not a sim, you're going to make decisions as if you were in the real world, because those decisions (when suitably generalized across all possible light-cones) have a huge utility impact. Is that right?

The problem I have is this only works if your utility function is very impartial (it is dominated by "pro bono universo" terms, rather than "what's in it for me" or "what's in it for us" terms). Imagine for instance that you work really hard to ensure a positive singularity, and succeed. You create a friendly AI, it starts spreading, and gathering huge amounts of computational resources... and then our simulation runs out of memory, crashes, and gets switched off. This doesn't sound like it is a good idea "for us" does it?

This all seems to be part of a general problem with asking UDT to model selfish (or self-interested) preferences. Perhaps it can't. In which case UDT might be a great decision theory for saints, but not for regular human beings. And so we might not want to program UDT into our AI in case that AI thinks it's a good idea to risk crashing our simulation (and killing us all in the process).

In UDT it doesn't make sense to speak of what "actually exists". Everything exists, you just assign different weights to different parts of "everything" when computing utility.

I've remarked elsewhere that UDT works best against a background of modal realism, and that's essentially what you've said here. But here's something for you to ponder. What if modal realism is wrong? What if there is, in fact, evidence that it is wrong, because the world as we see it is not what we should expect to see if it was right? Isn't it maybe a good idea to then - er - update on that evidence?

Or does a UDT agent have to stay dogmatically committed to modal realism in the face of whatever it sees? That doesn't seem very rational does it?

Comment author: Squark 19 March 2014 08:17:50PM 0 points [-]

I was assuming that the "vertex" of your light cone is situated at or shortly after the Big Bang (e.g. maybe during the first few minutes of nucleosynthesis).

No, it can be located absolutely anywhere. However you're right that the light cones with vertex close to Big Bang will probably have large weight to low K-complexity.

...given that a super-strong future filter looks very unlikely, most of the probability will be concentrated on models where there are only a few civilisations to start with.

This looks correct, but it is different from your initial argument. In particular there's no reason to believe MWI is wrong or anything like that.

...in short I believe your summed discounted utility is diverging (or in any case dominated by the Boltzmann Brains).

It is guaranteed to converge and seems to be pretty harsh on BBs either. Here is how it works. Every "universe" is an infinite sequence of bits encoding a future light cone. The weight of the sequence is 2^{-K-complexity}. More precisely I sum over all programs producing such sequences and give weight 2^{-length} to each. Since sum of 2^-{length} over all programs is 1 I get a well-defined probability measures. Each sequence gets assigned a utility by a computable function that looks like integral over space-time with temporal discount. The temporal discount here can be fast e.g. exponential. So the utility function is bounded and its expectation value converges. However the effective temporal discount is slow since for every universe, its sub-light-cones are also within the sum. Nevertheless its not so slow that BBs come ahead. If you put the vertex of the light cone at any given point (e.g. time 4^^^^4) there will be few BBs within the fast cutoff time and most far points are suppressed due to high K-complexity.

Comment author: drnickbone 20 March 2014 03:01:57PM *  0 points [-]

No, it can be located absolutely anywhere. However you're right that the light cones with vertex close to Big Bang will probably have large weight to low K-complexity.

Ah, I see what you're getting at. If the vertex is at the Big Bang, then the shortest programs basically simulate a history of the observable universe. Just start from a description of the laws of physics and some (low entropy) initial conditions, then read in random bits whenever there is an increase in entropy. (For technical reasons the programs will also need to simulate a slightly larger region just outside the light cone, to predict what will cross into it).

If the vertex lies elsewhere, the shortest programs will likely still simulate starting from the Big Bang, then "truncate" i.e. shift the vertex to a new point (s, t) and throw away anything outside the reduced light cone. So I suspect that this approach gives a weighting rather like 2^-K(s,t) for light-cones which are offset from the Big Bang. Probably most of the weight comes from programs which shift in t but not much in s.

The temporal discount here can be fast e.g. exponential.

That's what I thought you meant originally: this would ensures that the utility in any given light-cone is bounded, and hence that the expected utility converges.

...given that a super-strong future filter looks very unlikely, most of the probability will be concentrated on models where there are only a few civilisations to start with.

This looks correct, but it is different from your initial argument. In particular there's no reason to believe MWI is wrong or anything like that.

I disagree. If models like MWI and/or eternal inflation are taken seriously, then they imply the existence of a huge number of civilisations (spread across multiple branches or multiple inflating regions), and a huge number of expanded civilisations (unless the chance of expansion is exactly zero). Observers should then predict that they will be in one of the expanded civilisations. (Or in UDT terms, they should take bets that they are in such a civilisation). Since our observations are not like that, this forces us into simulation conclusions (most people making our observations are in sims, so that's how we should bet). The problem is still that there is a poor fit to observations: yes we could be in a sim, and it could look like this, but on the other hand it could look like more or less anything.

Incidentally, there are versions of inflation and many worlds which don't run into that problem. You can always take a "local" view of inflation (see for instance these papers), and a "modal" interpretation of many worlds (see here). Combined, these views imply that all that actually exists is within one branch of a wave function constructed over one observable universe. These "cut-down" interpretations make either the same physical predictions as the "expansive" interpretations, or better predictions, so I can't see any real reason to believe in the expansive versions.

View more: Next