the post anthropics and the universal distribution (recommended dependencies: 1, 2, 3, 4, 5) tries to unify anthropics with the notion of a universal distribution (whether that be solomonoff prior or what i'll call the "levin prior") by splitting hypotheses about a reasoner's location among the set of possible worlds as a "world and claw" pair. the "world" part is the hypothesis program as to what world you inhabit, as opposed to counterfactual worlds, and the "claw" part is a program that locates you within that world.
i've proposed before to stick to just a universal program as world hypothesis. that is, the "world" is a fixed program, and all of the complexity is in figuring out the "claw" — epistemology, the work of finding out how stuff around you works, becomes all claw, no world. in this post, i expand on this view, and explore some ramifications, notably for formal aligned AI design.
one consequence of doing this is that epistemology becomes all location, no counterfactuals — nothing is ever ruled out, all programs are considered instantiated in the same qualitative sense. the following space-time-realities are all real in the same qualitative way (though not necessarily to the same quantitative degree!):
where you are now, but on a different day.
a different country.
the many-worlds everett branch where an electron you just measured has a different spin.
middle earth from lord of the rings. that one might be stretching it depending on your interpretations of things like magic, but the universal distribution is capable of a lot of stuff.
if you ignore acausal weirdness, these worlds are all causally separate from ours. we didn't make lord of the rings real — we just wrote a bunch of text, and there happens to be a world out there, real in the same way as ours, that we'd consider to be accurately described by that text. but like a library of babel of worlds, all other variants that we don't describe are also real. the only thing we can do to affect which worlds get more juice than others is choosing to compute some of them but not others. and our making decisions in this world and affecting the way it continues to run "normally" is just a particular case of this, just like someone continuing to live is a particular case of a sequence of moral-patient-instants each causing a similar-to-themselves moral-patient-instant to get instantiated in a world like theirs.
and, just like acausal/anthropic stuff doesn't save us (1, 2, 3), it turns out that despite any ethical implications that the cosmos being a universal program might have, the expected utility probly plays out about the same, just like it probly plays out about the same under most interpretations of quantum mechanics, regardless of whether other everett branches are real. these things might get you to care differently, but mostly not in any way you can do anything about (unless something is looking at us from outside a computation of us and cares about the way we care about things, but it'd be hard to reason about that).
there are nice consequences for decision theory, however: functional decision theory (FDT), which wants you to cooperate with other instances of FDT not just across spacetime and everett branches but also across counterfactual worlds, might become simpler when you "flatten" the set of counterfactual worlds to be the same kind of thing as the set of spacetime locations and the set of everett branches.
nevertheless, some things are realer than others. so, what is the measure of realness juice/amplitude, which any living person right now probly has more of than gandalf? i feel like it ought to be something to do with "time steps" in the universal program, because it doesn't feel like there could be any other measure which wouldn't eventually just become a more complex version of time steps. the reason there's more of me than gandalf in the universal program, even though it eventually contains about as many me's as it contains (variations on a particular as-detailed-as-me interpretation of) gandalf's (whether that quantity is infinite or not), is that the me's tend to occur before the gandalf's — or, to be more formal, at sets of timesteps that are earlier in the universal program or more compressible than gandalf's. or, more testably: the reason i see coherent stuff rather than noise when i look at a monitor, even though there are more ways to arrange pixels on a monitor that i'd interpret as noise than as coherent stuff, is that the instances of me seeing coherent stuff must tend to occur at sets of timesteps that are earlier or more compressible than those of instances of me seeing noise.
given this quantitative measure, can we re-capture a qualitative notion of "is this real or not"? this is where computational complexity can come in to help us. the excellent why philosophers should care about computational complexity argues that things computable in polynomial time are, in a meaningful sense, essentially easier than things only computable in exponential time. if we apply this to time steps, and if it is earlier sets of time steps rather than more compressible sets of time steps which counts, then our world is real and lord of the rings (assuming it is a polynomial world) can be said to be real, in a sense that worlds whose physics require solving NP-complete problems or PSPACE problems to progress, can not be said to be real. but i suspect that this doesn't actually track observation that much, because worlds in which people get mind-controlled into believing NP-complete problems are being solved are probly polynomial themselves (though less common than worlds without such mind control, i'd expect).
note that this does make our world weird, because we seem to be able to solve BQP computations. maybe BQP=BPP, or maybe the cosmos runs on a quantum solomonoff prior? or maybe, despite how unintutive that feels, it takes this kind of physics for anthropic reasoning to occur? or maybe i'm being mind-controlled or fictional or who knows what else.
there are now two issues that arise to make a universal program prior usable, even theoretically:
in a sense, a consequence of this is that the UDASSA notion of "first simulated a world, then extract me" can in general be flattened into "just simulate me" — which also captures more intentional simulations such as those of "solomonoff deism". but what is a me? what does an extracted me look like? it can't be just be any representation of my observations, because otherwise i'd be observing a blank monitor rather than one with some contents on it. to take a concrete example, let's say i'm chatting with someone who starts the sentence "I'm from…", and i'm trying to predict whether the next word they'll say — call it — is more likely to be "California" or "Nuuk". the comparison can't just be (with a simplicity measure), because this probly favors "Nuuk" even though in practice i'd expect to hear "California" a lot more. it feels like what we're heading for at this point would be some kind of function, where is the observation in a format that makes sense to me (english text strings), and is some kind of "prior" relating those to contexts in which they'd be percieved. programs that produce "I'm from Nuuk" have higher amplitude than programs that produce "I'm from California", but programs that produce me observing "I'm from California" have higher amplitude than programs that produce me observing "I'm from Nuuk".
let's say we're launching an attempt at an aligned AI based on QACI based on this post with a given (question, answer, observation) tuple. if the AI simply fills the future with question-answer intervals engineered so that they'd dominate most of the solomonoff-space of programs, then it can hijack its own decision process. in a sense, this is just a special case of demons in the solomonoff prior. which is a neat simplification of alignment! by "flattening" counterfactual worlds and everett branches to be the same kind of thing as objects that are distant in spacetime, we've managed to describe the alignment problem in a way that captures counterfactual adverserial agents ("demons") and factual future unaligned AIs in the same category. now we just need a solution that takes care of both.
i feel like extracting a notion of causality within the universal program, one that would let us determine that:
stuff outside our past lightcone can't causate onto us yet
decohered everett branches don't causate onto one another
two different simulations of conway's game of life on my computer don't causate on one another
would be useful here — though it might need to be able to measure "non-strict" probabilistic causation when needed.
we can't just base this on time, because in any universal program that is sequentially implemented (such as a turing machine), different implementations of our world will have different events occur at different points in time. using a parallel model of computation such as graph rewriting might shed some light on which phenomena causate each other and which are being computed in parallel in a causally isolated manner, but it would miss some others: as an extreme example, a homomorphically encrypted simulation of our world would make its internal causal graph unobservable to the outside, even though there's still real causal independences going on inside that world. so sticking to the simple and sequential paradigm of turing machines will force us to develop more clever but more general notions of causal dependence.
next, whatever measure we build, if we weren't dealing with adverserial intelligences we could just do a big sum weighed by simplicity and hope that the signal from the things we care about wins out, as with something like with being the "claws", weighing the value of action in each possible claw-situations by some factor . but because we're dealing with (potentially superintelligent!) adverserial agents, we have to make really sure that the undesired results from whichever we use to weigh actions is drowned out by sufficiently low s, that the overall signal that determines the is from the desired s. as an example: in my attempt at formalizing QACI, we want the weights of carvings that capture the human involved in the original question-answer interval to sufficiently outweigh the weights of the AI filling the future with adverserially-answering "fake" question-answer intervals that would allow its earlier (as well as remote/counterfactual) selves to find actions that make its job easier.
so, what could a causality relationship look like? one difficulty is that one change in one world could end up modifying pretty much everything everywhere, but not in a way that "really matters". for example: maybe if world number does some operation rather than , all the other worlds end up being computed in the same way, but all shifted by one extra time step into the future.
this is where the computer science notions of simulation and bisimulation (which aren't usually quite what i mean by those words, but it's related) might come in, which i intend to learn about next; though i wouldn't be surprised if such a measure might be hacked together just out of kolmogorov complexity again, or something like it.
as a final note on the universal distribution: i've recently learned that theoretical turing machines augmented with "halting oracles" give rise to interesting computational classes, which in particular let those turing machines do hypercomputation in the form of obtaining results that should require an infinite amount of computation, in a finite number of steps. this might enable us to build a universal prior which captures something closer to the full tegmark level 4 mathematical multiverse. though it's not clear to me whether that's actually desired; what would it actually mean to inhabit a hypercomputational multiverse? if the halting oracle runs an infinite amount of moral patients in a finite amount of steps, how the hell does the anthropic and ethics juice work out? i'll be sticking to the less uncomputable, regular solomonoff or even levin prior for now, but this question might be worthy of further consideration, unless we get the kind of aligned AI that doesn't require us to figure this out up front.
the post anthropics and the universal distribution (recommended dependencies: 1, 2, 3, 4, 5) tries to unify anthropics with the notion of a universal distribution (whether that be solomonoff prior or what i'll call the "levin prior") by splitting hypotheses about a reasoner's location among the set of possible worlds as a "world and claw" pair. the "world" part is the hypothesis program as to what world you inhabit, as opposed to counterfactual worlds, and the "claw" part is a program that locates you within that world.
i've proposed before to stick to just a universal program as world hypothesis. that is, the "world" is a fixed program, and all of the complexity is in figuring out the "claw" — epistemology, the work of finding out how stuff around you works, becomes all claw, no world. in this post, i expand on this view, and explore some ramifications, notably for formal aligned AI design.
one consequence of doing this is that epistemology becomes all location, no counterfactuals — nothing is ever ruled out, all programs are considered instantiated in the same qualitative sense. the following space-time-realities are all real in the same qualitative way (though not necessarily to the same quantitative degree!):
(see also: the ultimate meta mega crossover)
if you ignore acausal weirdness, these worlds are all causally separate from ours. we didn't make lord of the rings real — we just wrote a bunch of text, and there happens to be a world out there, real in the same way as ours, that we'd consider to be accurately described by that text. but like a library of babel of worlds, all other variants that we don't describe are also real. the only thing we can do to affect which worlds get more juice than others is choosing to compute some of them but not others. and our making decisions in this world and affecting the way it continues to run "normally" is just a particular case of this, just like someone continuing to live is a particular case of a sequence of moral-patient-instants each causing a similar-to-themselves moral-patient-instant to get instantiated in a world like theirs.
and, just like acausal/anthropic stuff doesn't save us (1, 2, 3), it turns out that despite any ethical implications that the cosmos being a universal program might have, the expected utility probly plays out about the same, just like it probly plays out about the same under most interpretations of quantum mechanics, regardless of whether other everett branches are real. these things might get you to care differently, but mostly not in any way you can do anything about (unless something is looking at us from outside a computation of us and cares about the way we care about things, but it'd be hard to reason about that).
there are nice consequences for decision theory, however: functional decision theory (FDT), which wants you to cooperate with other instances of FDT not just across spacetime and everett branches but also across counterfactual worlds, might become simpler when you "flatten" the set of counterfactual worlds to be the same kind of thing as the set of spacetime locations and the set of everett branches.
nevertheless, some things are realer than others. so, what is the measure of realness juice/amplitude, which any living person right now probly has more of than gandalf? i feel like it ought to be something to do with "time steps" in the universal program, because it doesn't feel like there could be any other measure which wouldn't eventually just become a more complex version of time steps. the reason there's more of me than gandalf in the universal program, even though it eventually contains about as many me's as it contains (variations on a particular as-detailed-as-me interpretation of) gandalf's (whether that quantity is infinite or not), is that the me's tend to occur before the gandalf's — or, to be more formal, at sets of timesteps that are earlier in the universal program or more compressible than gandalf's. or, more testably: the reason i see coherent stuff rather than noise when i look at a monitor, even though there are more ways to arrange pixels on a monitor that i'd interpret as noise than as coherent stuff, is that the instances of me seeing coherent stuff must tend to occur at sets of timesteps that are earlier or more compressible than those of instances of me seeing noise.
given this quantitative measure, can we re-capture a qualitative notion of "is this real or not"? this is where computational complexity can come in to help us. the excellent why philosophers should care about computational complexity argues that things computable in polynomial time are, in a meaningful sense, essentially easier than things only computable in exponential time. if we apply this to time steps, and if it is earlier sets of time steps rather than more compressible sets of time steps which counts, then our world is real and lord of the rings (assuming it is a polynomial world) can be said to be real, in a sense that worlds whose physics require solving NP-complete problems or PSPACE problems to progress, can not be said to be real. but i suspect that this doesn't actually track observation that much, because worlds in which people get mind-controlled into believing NP-complete problems are being solved are probly polynomial themselves (though less common than worlds without such mind control, i'd expect).
note that this does make our world weird, because we seem to be able to solve BQP computations. maybe BQP=BPP, or maybe the cosmos runs on a quantum solomonoff prior? or maybe, despite how unintutive that feels, it takes this kind of physics for anthropic reasoning to occur? or maybe i'm being mind-controlled or fictional or who knows what else.
there are now two issues that arise to make a universal program prior usable, even theoretically:
i feel like extracting a notion of causality within the universal program, one that would let us determine that:
would be useful here — though it might need to be able to measure "non-strict" probabilistic causation when needed.
we can't just base this on time, because in any universal program that is sequentially implemented (such as a turing machine), different implementations of our world will have different events occur at different points in time. using a parallel model of computation such as graph rewriting might shed some light on which phenomena causate each other and which are being computed in parallel in a causally isolated manner, but it would miss some others: as an extreme example, a homomorphically encrypted simulation of our world would make its internal causal graph unobservable to the outside, even though there's still real causal independences going on inside that world. so sticking to the simple and sequential paradigm of turing machines will force us to develop more clever but more general notions of causal dependence.
next, whatever measure we build, if we weren't dealing with adverserial intelligences we could just do a big sum weighed by simplicity and hope that the signal from the things we care about wins out, as with something like with being the "claws", weighing the value of action in each possible claw-situations by some factor . but because we're dealing with (potentially superintelligent!) adverserial agents, we have to make really sure that the undesired results from whichever we use to weigh actions is drowned out by sufficiently low s, that the overall signal that determines the is from the desired s. as an example: in my attempt at formalizing QACI, we want the weights of carvings that capture the human involved in the original question-answer interval to sufficiently outweigh the weights of the AI filling the future with adverserially-answering "fake" question-answer intervals that would allow its earlier (as well as remote/counterfactual) selves to find actions that make its job easier.
so, what could a causality relationship look like? one difficulty is that one change in one world could end up modifying pretty much everything everywhere, but not in a way that "really matters". for example: maybe if world number does some operation rather than , all the other worlds end up being computed in the same way, but all shifted by one extra time step into the future.
this is where the computer science notions of simulation and bisimulation (which aren't usually quite what i mean by those words, but it's related) might come in, which i intend to learn about next; though i wouldn't be surprised if such a measure might be hacked together just out of kolmogorov complexity again, or something like it.
as a final note on the universal distribution: i've recently learned that theoretical turing machines augmented with "halting oracles" give rise to interesting computational classes, which in particular let those turing machines do hypercomputation in the form of obtaining results that should require an infinite amount of computation, in a finite number of steps. this might enable us to build a universal prior which captures something closer to the full tegmark level 4 mathematical multiverse. though it's not clear to me whether that's actually desired; what would it actually mean to inhabit a hypercomputational multiverse? if the halting oracle runs an infinite amount of moral patients in a finite amount of steps, how the hell does the anthropic and ethics juice work out? i'll be sticking to the less uncomputable, regular solomonoff or even levin prior for now, but this question might be worthy of further consideration, unless we get the kind of aligned AI that doesn't require us to figure this out up front.