Existing UDTs test the limits of Bayesianism (and consistency)

8 min read

Comment Permalink

The goal is to make pre-commitments unnecessary, including retro-active pre-commitments. I think it's misleading to frame the agent taking an action now as the same agent that hypothetically considers it from a position of radical ignorance (together with all other possible actions in all alternative states, forming a policy). The usual UDT perspective is to only have the abstract radically ignorant agent, so that the thing that carries out actions is not really an agent, it's just an automaton that carries out the policy chosen by the abstract radically ignorant agent, according to what it says to do in the current situation.

I think a better way is to distinguish them as different agents coordinating with each other (with knowledgeable future concrete agents selectively deferring to abstract ignorant past agents on some things), probably with different preferences even. The advantage of radical ignorance is a coherent way of looking at all ways it might get resolved, not needing to deal with counterfactuals relative to what you've already accepted as a part of you, not needing to excise or factor out this deeply integrated knowledge to consider its alternatives. But it's the agent in the present that decides whether to take the advice offered by that less committed perspective, that decides which more ignorant abstract agents to use as intermediaries for coordination with other knowledgeable concrete agents (such as alternative versions of yourself that need to act in different situations).

2Cole Wyeth1mo

This sounds like how Scott formulated it, but as far as I know none of the actual (semi)formalizations look like this this.

Vladimir_Nesov1mo50

The first paragraph is my response to how you describe UDT in the post, I think the slightly different framing where only the abstract algorithm is the agent fits UDT better. It only makes the decision to choose the policy, but it doesn't make commitments for itself, because it only exists for that single decision, influencing all the concrete situations where that decision (policy) gets accessed/observed/computed (in part).

The second paragraph is the way I think about how to improve on UDT, but I don't have a formulation of it that I like. Specifically, I... (read more)

See in context

Meta-theory of rationality

25 Existing UDTs test the limits of Bayesianism (and consistency)

by Cole Wyeth

12th Mar 2025

8 min read

25

Epistemic status: Using UDT as a case study for the tools developed in my meta-theory of rationality sequence so far, which means all previous posts are prerequisites. This post is the result of conversations with many people at the CMU agent foundations conference, including particularly Daniel A. Herrmann, Ayden Mohensi, Scott Garrabrant, and Abram Demski. I am a bit of an outsider to the development of UDT and logical induction, though I've worked on pretty closely related things.

I'd like to discuss the limits of consistency as an optimality standard for rational agents. A lot of fascinating discourse and useful techniques have been built around it, but I think that it can be in tension with learning at the extremes. Updateless decision theory (UDT) is one of those extremes; but in order to think about it properly, we need to start with its Bayesian roots. Because, appropriately enough for a sequence on the meta-theory of rationality, I want to psychoanalyze the invention/inventors of UDT. Hopefully, we'll then be in a position to ask what we think we know and how we think we know it in regards to updatelessness (also sometimes called priorism), the driving idea behind UDT.

Subjective Bayesianism is about consistency^[1] among beliefs. The Cox axioms force real-valued credences to act to like probabilities under some natural conditions that ultimately boil down to consistency; one way to intuitively compress the assumptions is that beliefs about related things have to continuously "pull on each other," so I think of the Cox axioms as requiring credence to propagate through an ontology properly. Dutch book arguments further require that betting behavior is consistent with probabilistic structure to avoid being "money-pumped" or accepting a series of bets that is sure to lose money (a kind of dominance principle). That handles the "statics." Bayesian updating is of course a theorem of probability theory, forced from Kolmogorov's axioms, so in that sense it is a consequence of the preceding arguments. But insofar as we want to it describe belief dynamics, updating enforces a kind of consistency (with respect to old beliefs and new information) across time. Similar arguments motivate maximizing some expected utility with respect to these credences = subjective (prior/posterior) probabilities - but I actually won't be very concerned with utilities here.

This all seems good - if we expend enough cognitive resources on understanding a problem or situation, we should hope that our beliefs eventually stabilize into something consistent. Otherwise, it does feel like we are open to arbitrage and something is going obviously wrong. Unfortunately, Bayesian probability theory doesn't exactly tell us how to remedy the situation; in that way it fails Demski's criterion that a theory of rationality is meant to provide advice about how to be more rational. Occasionally though, we might have a decent source of "objective" priors, derived from our knowledge of the situation, maximum entropy, or just the catch-all universal distribution. In cases like this^[2], I think there is a decent argument that this describes normative reasoning. It is an optimality standard, and a pretty powerful one because it not only constrains an agent's actions but even their beliefs. Arguably, in this capacity it acts a lot like a convergent algorithm. I think it is, and it will be discovered and "consciously" applied in many cases by many AGI designs, because it should often be tractable to do so. However, note that though the idea of a Bayesian core engine of cognition has many proponents, it does not follow from any of this argumentation. Still, I think Bayesian probability is quite central to understanding cognition, on pain of inconsistency.

But if we push hard enough on this desire for consistency, it starts to break down as a reasonable optimality standard. Updateless decision theory, at least in its current form, provides a sort of counterexample to the supremacy of consistency by using it to justify absurdly unagentic behavior.

The problem ultimately comes from the priors. Unless they capture something about reality, priors are just mistaken beliefs. An agent which acts according to sufficiently mistaken beliefs may never learn it was wrong (failing at self-optimization) and will then remain stupid forever. Fortunately, I tend to think that reasonable priors will eventually reach agreement in practice.

Updatelessness throws that argument out the window. In its strongest form, an updateless agent should obey all pre-commitments it would have made, at some previous time, if it had the chance (as Abram Demski emphasizes, the goal is not to pre-commit, but rather to make pre-commitment unnecessary). How far back in time should we "retro-actively pre-commit," according to UDT? It's not really clear to me, which is apparently because it's not really agreed among updateless decision theorists (I talked to many of them for a week). I think the general strong and perhaps original view is as early as possible; even before the agent was created, in case other agents may have reasoned about its code when deciding whether to create it. This would mean choosing pre-commitments from a time when you did not even exist, meaning you knew nothing whatsoever about the universe, except perhaps whatever can be determined by pure reason. This is starting to sound more like classical rationalism than modern rationality! It seems likely to massively amplify any problems with the agent's prior - and really, it's not clear what class of priors (short of near-perfect knowledge about our universe) this is really safe for.

At this point, someone sufficiently MIRI-brained might start to think about (something equivalent to) Tegmark's level 4 mathematical multiverse, where such agents might theoretically outperform others. Personally, I see no direct reason to believe in the mathematical multiverse as a real object, and I think this might be a case of the mind projection fallacy - computational multiverses are something that agents reason about in order to succeed in the real universe^[3]. Even if a mathematical multiverse does exist (I can't rule it out) and we can somehow learn about its structure^[4], I am not sure that any effective, tractable agents can reason about or form preferences over it - and if they do, they should be locally out-competed by agents that only care about our universe, which means those are probably the ones we should worry about. My cruxiest objection is the first, but I think all of them are fairly valid.

From this view, it's not clear that reasoning about being the best agent behind a veil of total ignorance about the universe is even a sensible idea. Humans seem to have arrived at agent theory only because we were motivated by considering all the agents in the actual world around us, and invented the abstractions we use for agent theory because they don't seem empirically to be very leaky. Are those observations of a lower status than the true, multiversal theory of agency, and where exactly would such a thing come from or live?

We can instead do something like form retroactive commitments starting from, say, the time the agent came into existence, or shortly thereafter when it knows at least the basic facts about our universe. This still makes sense, but now, why not just pre-commit then? The answer is that UDT is (secretly?) about computational boundedness! An agent presumably can't think through every possible pre-commitment instantly at birth. That's another reason to make them retro-actively, once we've had time to realize they are valuable.

At this point, UDT (as introduced by Wei Dai) takes a further leap in the "priorist" direction: if we're going to make pre-commits according to our previous self's beliefs about the world, why not also their logical beliefs? After all, we are considering computationally bounded Bayesians; its natural to put credences on logical statements as well as empirical facts. Insofar as the two are entangled, I can see the elegance^[5] of the idea, but it massively amplifies my objection to updatelessness: now an agent may follow a stupid strategy forever, simply because it at one point was wrong about math.

I think it's possible to not notice the danger of serious error here if you're thinking in terms of policy theory, and everything seems a little more abstract, but "dropping down" to agent theory makes it look a lot less sensible. I just would not build a robot that way. And I would not really act that way.

There may be a solution within UDT - perhaps some kind of prior that is carefully constructed to make nearly all pre-commitments look bad until you're a smart agent. If so, that sounds fascinating, and I'd love to discover or learn about it! Lots of smart people have ideas for other elaborations (or perhaps complete refactors and hopefully simplifications) that might solve the problem; for instance I believe Scott Garrabrant views it as closely analogous to alignment (in the ordinary AI safety sense) between an agent's past and current selves.

But there might also be a merely conventional solution outside of UDT: evidential decision theory (EDT). Specifically, EDT on the policy selection problem, as academic decision theorists seem to put it. This is a policy theory that takes into account everything it currently knows to form pre-commitments, and it seems to be the relevant problem faced by (some) AGI with a Bayesian core engine. This would normally be called Son of EDT in lesswrong lingo; it is also roughly equivalent to sequential policy evidential decision theory (SPEDT). For brevity, perhaps WDT, because E "turns into" W? ;)

How would this work? What, if anything, would it converge to?

Well, it should obviously succeed at Newcomb-like problems insofar as it anticipated facing them, which is arguably the reasonable thing to ask. In practice, I don't see any way in which it should act much less reasonably than UDT, except perhaps "around boundary conditions" at its creation.

Unfortunately, Son of EDT seems likely to inherit many of the problems of UDT if it is allowed unrestricted ability to self-modify. That is because it might start self-modifying at the moment of its creation, at which point it still knows essentially nothing (unless, again, an appropriately conservative prior can be constructed). The dynamics might be a little better particularly regarding logical uncertainty (even if we continue to treat logical credences in a Bayesian way). This is because the agent can at least take advantage of the logical facts it currently knows as it performs each self-modification, and perhaps it needs to do a lot of math before arriving at the conclusion that it ought to self-modify (depending on the detailed implementation). This switches real time to logical time in a way that I suspect is actually useful in practice.

The whole scheme does feel highly heuristic and ramshackle, but perhaps it's not as bad as it seems. First of all, it's clearly unsafe to hand a newborn agent a screwdriver to modify itself with unless you can safely unmodify and restart it, and this doesn't really seem to be EDT's fault (it's just an unforgiving environment for any decision theory). By the time the agent "grows up" perhaps it only makes sensible modifications. Certainly Bayesian decision theory has proven itself quite robust to criticism, once its applied very carefully, with all considerations taken into account^[6].

In fact, I think it's quite likely that we are going through this exact sort of decision process in this very discussion, using everything we know about agency in our universe to reason about the policy that would make the best agent (we control the former, but consider the consequences for the later). If we are reasoning locally at the action level, then this forms a descending chain of abstraction, where action theory looks at policy theory looking at agent theory. So, if we are operating in a Bayesian way, it seems questionable whether we can arrive at any theory of agency better than Son of EDT!

The problem with Son of EDT is that it's not in itself a clean decision theory. EDT does not tile, so perhaps picks a sequence of increasingly arcane self-modifications and ends up with some sort of incomprehensible policy. But I suspect it isn't actually incomprehensible; it just may not be a grand unified theory of rationality (GUTR). We can still attempt to analyze its behavior on the problems we care about, in particular alignment. Indeed, there may be no useful GUTR, in which case the best we can do is analyze particular important or recurring (sub)problems of cognition and agency. I wouldn't go this far, but I also wouldn't be surprised if the unifiable part of the theory looks a lot like EDT, and the rest like Son of EDT.

^{^}
Frequently "coherence," which feels stronger because to be incoherent sounds quite negative.
^{^}
Richard Ngo would probably say that this does not apply to any interesting situations.
^{^}
Here I notably depart from Infra-Bayesian Physicalism (as I understand it).
^{^}
This is related to the robustness of definitions for the mathematical multiverse.
^{^}
Or perhaps just... consistency?
^{^}
Thanks to Aydin Mohensi for suggesting this outside view.

New to LessWrong?

Getting Started

FAQ

Library

Updateless Decision Theory2World Modeling2

Frontpage

25

What makes a theory of intelligence useful?

No comments14 karma

Existing UDTs test the limits of Bayesianism (and consistency)

New Comment

18 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:42 PM

[-]Wei Dai10d140

At this point, someone sufficiently MIRI-brained might start to think about (something equivalent to) Tegmark's level 4 mathematical multiverse, where such agents might theoretically outperform others. Personally, I see no direct reason to believe in the mathematical multiverse as a real object, and I think this might be a case of the mind projection fallacy - computational multiverses are something that agents reason about in order to succeed in the real universe[3]. Even if a mathematical multiverse does exist (I can't rule it out) and we can somehow learn about its structure[4], I am not sure that any effective, tractable agents can reason about or form preferences over it - and if they do, they should be locally out-competed by agents that only care about our universe, which means those are probably the ones we should worry about. My cruxiest objection is the first, but I think all of them are fairly valid.

I don't want to defend UDT overall (see here for my current position on it), but I think Tegmark Level 4 is a powerful motivation for UDT or something like it even if you're not very sure about it being real.

Since we can't rule out the mathematical multiverse being a real object with high confidence, or otherwise being a thing that we can care about, we have to assign positive, non-negligible credence to this possibility.
If it is real or something we can care about, then given our current profound normative uncertainty we also have to assign positive, non-negligible credence to the possibility that we should care about the entire multiverse, and not just our local environment or universe. (There are some arguments for this, such as arguments for broadening our circle of concern in general.)
If we can't strongly conclude that we should neglect the possibility that we can and should care about something like Tegmark Level 4, then we have to work out how to care about it or how to take it into account when we make decisions that can affect "distant" parts of the multiverse, so that such conclusions could be further fed into whatever mechanism we use to handle moral/normative uncertainty (such as Bostrom and Ord's Moral Parliament idea).

As for "direct reason", I think AIT played a big role for me, in that the algorithmic complexity (or rather, some generalization of algorithmic complexity to possibly uncomputable universes/mathematical objects) of Tegmark 4 as a whole is much lower than that of any specific universe within it like our apparent universe. (This is similar to the fact that the program tape for a UTM can be shorter than that of any non-UTM, as it can just be the empty string, or that you can print a history of all computable universes with a dovetailing program, which is very short.) Therefore it seems simpler to assume that all of Tegmark 4 exists rather than only some specific universe.

[-]Cole Wyeth10d42

I am doing a PhD in AIT, but I still don’t want to take it that literally. I don’t believe that existence is actually the stochastic process specified by a UTM with random input tape - that’s a convenient but fictional model that I reason about because it’s sometimes easier than thinking about a Bayesian mixture over lsc semimeasures, and the two are equivalent (up to a constant which ~can even be forced to 1). AIT intuitions do make the level 4 multiverse seem more natural, but I think this is just the mind projection fallacy again. Of course if you take the universal distribution seriously, it does make sense to reason that the level 4 multiverse has low K complexity - but that doesn’t justify assuming it for us since we’d still need our index into that multiverse. See Hutter’s “A true theory of everything (will be subjective).”

I suppose it is valid to expect that the level 4 multiverse is hard to rule out for K-complexity reasons. With our limited understanding of philosophy/metaphysics, we probably do need to assign some non-negligible weight to that possibility. But I suspect that superintelligences won’t need to - they’ll be able to rule it out from their more informed position (assuming my strong suspicion is right - which means I am sampling from and thereby collapsing my own mixture model). This means the level 4 multiverse should be irrelevant to understanding superintelligences.

[-]Wei Dai10d20

Do you think a superintelligence will be able to completely rule out the hypothesis that our universe literally is a dovetailing program that runs every possible TM, or literally is a bank of UTMs running every possible program (e.g., by reproducing every time step and adding 0 or 1 to each input tape)? (Or the many other hypothetical universes that similarly contain a whole Level-4-like multiverse?) It seems to me that hypotheses like these will always collectively have a non-negligible weight, and have to be considered when making decisions.

Another argument that seems convincing to me is that if only one universe exists, how to explain that it seems fine-tuned for being able to evolve intelligent life? Was it just some kind of metaphysical luck?

Also, can you try to explain your strong suspicion that only one universe exists (and is not the kind that contains a L4 multiverse)? In other words, do you just find the arguments for L4 unconvincing and defaulting to some unexplainable intuition, or have arguments to support your own position?

[-]Cole Wyeth9d20

Do you think a superintelligence will be able to completely rule out the hypothesis that our universe literally is a dovetailing program that runs every possible TM, or literally is a bank of UTMs running every possible program (e.g., by reproducing every time step and adding 0 or 1 to each input tape)? (Or the many other hypothetical universes that similarly contain a whole Level-4-like multiverse?) It seems to me that hypotheses like these will always collectively have a non-negligible weight, and have to be considered when making decisions.

I want to frame this question Q as follows: Let H be our hypothesis that "the universe is a dovetailed simulation on a UTM" or something similar (but perhaps technically distinct) like "a member of an infinite mathematical multiverse distributing reality-fluid according to simplicity." Currently we agree that H should be taken seriously; Q := "is taking H seriously a philosophical mistake which a superintelligence would see through?" To be clear, it's possible to take H seriously and still believe Q with high probability. There does not need to be a simple arithmetical relationship between the two though, because H and Q can be false together.

After reflecting on this a bit, I think my P(H) is around 33%, and I'm pretty confident Q is true (coherence only requires 0 <= P(Q) <= 67% but I think I put it on the upper end).

Another argument that seems convincing to me is that if only one universe exists, how to explain that it seems fine-tuned for being able to evolve intelligent life? Was it just some kind of metaphysical luck?

Interestingly, though I've encountered the fine-tuning argument many times, I have never actually read an accessible but technical exposition to it. I understand that physicists believe that only a narrow range of parameter = physical constant values can support life. But how confident can they be? It seems very hard to reason about universes with drastically different conditions - how can anyone know that they are incapable of harboring life? Our universe is mostly inhospitable to life because nothing can survive inside stars or in the vacuum of space - would an alien/alternate universe physicist guess that our parameter settings make life impossible without considering the possibility of planets? Is it possible that in fact other parameter settings allow stars to harbor life???

Also, are the "free parameters" actually free, or are we simply not clever enough to derive the underlying theory that would necessitate fixed values for them? To me, the most plausible explanation for "fine-tuning" of physical theories is that the "intelligent designer" is physicists tinkering with their theories until they retro-dict our universe, which of course we already know to contain life. In this sense physical theories are in fact selected to support life (but not for anthropic reasons). Note that all known theories of physics are wrong, incomplete, or too complicated to prove wrong/incomplete so far - so free parameters do not have established status as fundamental truths.

Finally, even if there is some type of multiverse, it may be much smaller than the mathematical universe. The "simplicity/elegance" of the mathematical universe does not mean it should be preferred because this is more or less exactly compensated by the required increase in indexical complexity. Unless, I suppose, you just prefer elegant theories without additional predictive value.

Also, can you try to explain your strong suspicion that only one universe exists (and is not the kind that contains a L4 multiverse)? In other words, do you just find the arguments for L4 unconvincing and defaulting to some unexplainable intuition, or have arguments to support your own position?

I'm not sure that only one universe exists, I just don't believe in the mathematical multiverse. Beyond the intuition that platonic belief in mathematical objects is probably the mind projection fallacy, I also think it's important that we do not have a computable theory of everything. We have a computable wrong theory of everything. Our universe may be computable (I weakly suspect it is), but in fact we have only established that some phenomena within our universe are computable. In fact, this is roughly what we might expect in a "nice"/"learnable" but not computable universe, since any theory we can formalize and rely on to make predictions is likely to be computable. It seems presumptuous to guess that our universe is one of infinitely many dovetailed computer simulations when we don't even know that our universe can be simulated on a computer!

[-]Wei Dai9d22

After reflecting on this a bit, I think my P(H) is around 33%, and I'm pretty confident Q is true (coherence only requires 0 <= P(Q) <= 67% but I think I put it on the upper end).

Thanks for clarifying your view this way. I guess my question at this point is why your P(Q) is so high, given that it seems impossible to reduce P(H) further by updating on empirical observations (do you agree with this?), and we don't seem to have even an outline of a philosophical argument for "taking H seriously is a philosophical mistake". Such an argument seemingly has to include that having a significant prior for H is a mistake, but it's hard for me to see how to argue for that, given that the individual hypotheses in H like "the universe is a dovetailed simulation on a UTM" seem self-consistent and not too complex or contrived. How would even a superintelligence be able to rule them out?

Perhaps the idea is that a SI, after trying and failing to find a computable theory of everything, concludes that our universe can't be computable (otherwise it would have found the theory already), thus ruling out part of H, and maybe does the same for mathematical theories of everything, ruling out H altogether? (This seems far-fetched, i.e., how can even a superintelligence confidently conclude that our universe can't be described by a mathematical theory of everything, given the infinite space of such theories, but this is my best guess of what you think will happen.)

Beyond the intuition that platonic belief in mathematical objects is probably the mind projection fallacy

Can you give an example of a metaphysical theory that does not seem like a mind projection fallacy to you? (If all such theories look that way, then platonic belief in mathematical objects looking like the mind projection fallacy shouldn't count against it, right?)

It seems presumptuous to guess that our universe is one of infinitely many dovetailed computer simulations when we don't even know that our universe can be simulated on a computer!

I agree this seems presumptuous and hence prefer Tegmark over Schmidhuber, because the former is proposing a mathematical multiverse, unlike the latter's computable multiverse. (I talked about "dovetailed computer simulations" just because it seems more concrete and easy to imagine than "a member of an infinite mathematical multiverse distributing reality-fluid according to simplicity.")

Do you suspect that our universe is not even mathematical (i.e., not fully describable by a mathematical theory of everything or isomorphic to some well-defined mathematical structure)?

ETA: I'm not sure if it's showing through in my tone, but I'm genuinely curious whether you have a viable argument against "superintelligence will probably take something like L4 multiverse seriously". It's rare to see someone with the prerequisites for understanding the arguments (e.g. AIT and metamathematics) trying to push back on this , so I'm treasuring this opportunity. (Also, it occurs to me that we might be in a bubble and plenty of people outside LW with the prerequisites do not share our views about this. Do you have any observations related to this?)

[-]Cole Wyeth9d*40

Thanks for clarifying your view this way. I guess my question at this point is why your P(Q) is so high, given that it seems impossible to reduce P(H) further by updating on empirical observations (do you agree with this?), and we don't seem to have even an outline of a philosophical argument for "taking H seriously is a philosophical mistake". Such an argument seemingly has to include that having a significant prior for H is a mistake, but it's hard for me to see how to argue for that, given that the individual hypotheses in H like "the universe is a dovetailed simulation on a UTM" seem self-consistent and not too complex or contrived. How would even a superintelligence be able to rule them out?

I think that you're leaning too heavily on AIT intuitions to suppose that "the universe is a dovetailed simulation on a UTM" is simple. This feels circular to me - how do you know it's simple? You're probably thinking it's described by a simple program, but that seems circular - of course if we're already judging things by how hard they are to implement on a UTM, dovetailing all programs for that UTM is simple. We'd probably need a whole dialogue to get to the root of this, but basically, I think you need some support from outside of AIT to justify your view here. Why do you think you can use AIT in this way? I'm not sure that the reasons that we have arrived at AIT justify this - we have some results showing that it's a best in class predictor (sort of), so I take the predictions of the universal distribution seriously. But it seems you want to take its ontology literally. I don't see any reason to do that - actually, I'm about to drop a post and hopefully soon a paper closely related to this point (EDIT: the post, which discusses the interpretation of AIXI's ontology).

Perhaps the idea is that a SI, after trying and failing to find a computable theory of everything, concludes that our universe can't be computable (otherwise it would have found the theory already), thus ruling out part of H, and maybe does the same for mathematical theories of everything, ruling out H altogether? (This seems far-fetched, i.e., how can even a superintelligence confidently conclude that our universe can't be described by a mathematical theory of everything, but this is my best guess of what you think will happen.)

Experiments might cast doubt on these multiverses: I don't think a superintelligence would need to prove that the universe can't have a computable theory of everything - just ruling out the simple programs that we could be living in would seem sufficient to cast doubt on the UTM theory of everything. Of course, this is not trivial, because some small computable universes will be very hard to "run" for long enough that they make predictions disagreeing with our universe! I haven't thought as much about uncomputable mathematical universes, but does this universe look like a typical mathematical object? I'm not sure.

However, I suspect that a superintelligence rules these huge multiverses out mostly through "armchair" reasoning based on the same level of evidence we have available.

Can you give an example of a metaphysical theory that does not seem like a mind projection fallacy to you? (If all such theories look that way, then platonic belief in mathematical objects looking like the mind projection fallacy shouldn't count against it, right?)

This is an interesting point to consider; I am very conservative about making claims about "absolute reality" of things as opposed to the effectiveness of models (I suppose I'm following Kant). Generally I'm on board with materialism, naturalized induction, and the claims about causal structure made by Eliezer in "Highly advanced epistemology 101." An example of a wrong metaphysical theory that is NOT really the mind projection fallacy is theism in most forms. But animism probably is making the fallacy.

Do you suspect that our universe is not even mathematical (i.e., not fully describable by a mathematical theory of everything or isomorphic to some well-defined mathematical structure)?

I don't know.

ETA: I'm not sure if it's showing through in my tone, but I'm genuinely curious whether you have a viable argument against "superintelligence will probably take something like L4 multiverse seriously". It's rare to see someone with the prerequisites for understanding the arguments (e.g. AIT and metamathematics) trying to push back on this , so I'm treasuring this opportunity. (Also, it occurs to me that we might be in a bubble and plenty of people outside LW with the prerequisites do not share our views about this. Do you have any observations related to this?)

I'm glad! I think Daniel Herrman maybe agrees with me here - but he's not exactly a mainstream academic decision theorist. So I'm not sure if there's a large group of scholars which thinks about AIT and rejects the UTM theory of everything.

[-]Wei Dai8d20

I think that you’re leaning too heavily on AIT intuitions to suppose that “the universe is a dovetailed simulation on a UTM” is simple. This feels circular to me—how do you know it’s simple?

The intuition I get from AIT is broader than this, namely that the "simplicity" of an infinite collection of things can be very high, i.e., simpler than most or all finite collections, and this seems likely true for any formal definition of "simplicity" that does not explicitly penalize size or resource requirements. (Our own observable universe already seems very "wasteful" and does not seem to be sampled from a distribution that penalizes size / resource requirements.) Can you perhaps propose or outline a definition of complexity that does not have this feature?

I don’t think a superintelligence would need to prove that the universe can’t have a computable theory of everything—just ruling out the simple programs that we could be living in would seem sufficient to cast doubt on the UTM theory of everything. Of course, this is not trivial, because some small computable universes will be very hard to “run” for long enough that they make predictions disagreeing with our universe!

Putting aside how easy it would be to show, you have a strong intuition that our universe is not or can't be a simple program? This seems very puzzling to me, as we don't seem to see any phenomenon in the universe that looks uncomputable or can't be the result of running a simple program. (I prefer Tegmark over Schmidhuber despite thinking our universe looks computable, in case the multiverse also contains uncomputable universes.)

I haven’t thought as much about uncomputable mathematical universes, but does this universe look like a typical mathematical object? I’m not sure.

If it's not a typical computable or mathematical object, what class of objects is it a typical member of?

An example of a wrong metaphysical theory that is NOT really the mind projection fallacy is theism in most forms.

Most (all?) instances of theism posit that the world is an artifact of an intelligent being. Can't this still be considered a form of mind projection fallacy?

I asked AI (Gemini 2.5 Pro) to come with other possible answers (metaphyiscal theories that aren't mind projection fallacy), and it gave Causal Structuralism, Physicalism, and Kantian-Inspired Agnosticism. I don't understand the last one, but the first two seem to imply something similar to "we should take MUH seriously", because the hypothesis of "the universe contains the class of all possible causal structures / physical systems" probably has a short description in whatever language is appropriate for formulating hypotheses.

In conclusion, I see you (including in the new post) as trying to weaken arguments/intuitions for taking AIT's ontology literally or too seriously, but without positive arguments against the universe being an infinite collection of something like mathematical objects, or the broad principle that reality might arise from a simple generator encompassing vast possibilities, which seems robust across different metaphysical foundations, I don't see how we can reduce our credence for that hypothesis to a negligible level, such that we no longer need to consider it in decision theory. (I guess you have a strong intuition in this direction and expect superintelligence to find arguments for it, which seems fine, but naturally not very convincing for others.)

[-]Cole Wyeth8d20

Putting aside how easy it would be to show, you have a strong intuition that our universe is not or can't be a simple program? This seems very puzzling to me, as we don't seem to see any phenomenon in the universe that looks uncomputable or can't be the result of running a simple program. (I prefer Tegmark over Schmidhuber despite thinking our universe looks computable, in case the multiverse also contains uncomputable universes.)

I don't see conclusive evidence either way, do you? What would a phenomenon that "looks uncomputable" look like concretely, other than mysterious or hard to understand? It seems many aspects of the universe are hard to understand. Maybe you would expect things at higher levels of the arithmetical hierarchy to live in uncomputable universes, and the fact that we can't build a halting oracle implies to you that our universe is computable? That seems plausible but questionable to me. Also, the standard model is pretty complicated - it's hard to assess what this means because the standard model is wrong (is there a simpler or more complicated true theory of everything?).

The intuition I get from AIT is broader than this, namely that the "simplicity" of an infinite collection of things can be very high, i.e., simpler than most or all finite collections, and this seems likely true for any formal definition of "simplicity" that does not explicitly penalize size or resource requirements. (Our own observable universe already seems very "wasteful" and does not seem to be sampled from a distribution that penalizes size / resource requirements.) Can you perhaps propose or outline a definition of complexity that does not have this feature?

Yes, in some cases ensembles can be simpler than any element in the ensemble. If our universe is a typical member of some ensemble, we should take seriously the possibility that the whole ensemble exists. Now it is hard to say whether that is decision-relevant; it probably depends on the ensemble.

Combining these two observations, a superintelligence should take the UTM multiverse seriously if we live in a typical (~= simple) computable universe. I put that at about 33%, which leaves it consistent with my P(H).

My P(Q) is lower than 1 - P(H) because the answer may be hard for a superintelligence to determine. But I lean towards betting on the superintelligence to work it out (whether the universe should be expected to be a simple program seems like not only an empirical but a philosophical question), which is why I put P(Q) fairly close to 1 - P(H). Though I think this discussion is starting to shift my intuitions a bit in your direction.

[-]Wei Dai8d20

What would a phenomenon that "looks uncomputable" look like concretely, other than mysterious or hard to understand?

There could be some kind of "oracle", not necessarily a halting oracle, but any kind of process or phenomenon that can't be broken down into elementary interactions that each look computable, or otherwise explainable as a computable process. Do you agree that our universe doesn't seem to contain anything like this?

[-]Cole Wyeth8d20

If the universe contained a source of ML-random bits they might look like uniformly random coin flips to us, even if they actually had some uncomputable distribution. For instance, perhaps spin measurements are not iid Bernoulli, but since their distribution is not computable, we aren’t able to predict it any better than that model?

I’m not sure how you’re imagining this oracle would act? Nothing like what you’re describing seems to be embedded as a physical object in spacetime, but I think that’s the wrong thing to expect, failures of computability wouldn’t act like Newtonian objects.

[-]interstice8d20

It's rare to see someone with the prerequisites for understanding the arguments (e.g. AIT and metamathematics) trying to push back on this

My view is probably different from Cole's, but it has struck me that the universe seems to have a richer mathematical structure than one might expect given a generic AIT-ish view(e.g. continuous space/time, quantum mechanics, diffeomorphism invariance/gauge invariance), so we should perhaps update that the space of mathematical structures instantiating life/sentience might be narrower than it initially appears(that is, if "generic" mathematical structures support life/agency, we should expect ourselves to be in a generic universe, but instead we seem to be in a richly structured universe, so this is an update that maybe we can only be in a rich/structured universe[or that life/agency is just much more likely to arise in such a universe]). Taken to an extreme, perhaps it's possible to derive a priori that the universe has to look like the standard model. (Of course, you could run the standard model on a Turing machine, so the statement would have to be about how the universe relates/appears to agents inhabiting it, not its ultimate ontology which is inaccessible since any Turing-complete structure can simulate any other)

[-]Aydin Mohseni12d30

Fantastic post! I appreciate your direction of thought.

Your take on updatelessness—as an adaptive self-modification to handle anticipated experiences or strategic interactions (e.g., generalizations of the prisoner’s dilemma with a twin, or transparent Newcomb problems)—is the most sensible I’ve yet encountered. (I've also appreciated Martin Soto's efforts to get clear on this topic.) As you say, there are many takes; you helped me see a coherent motivation behind one clearly.

And your picture of EDT evolving into “son of EDT” certainly seems plausible. As you say, perhaps that’s the best we can do—and to go further, we just have to do the hard work of analyzing particular, important problems.

The dynamics might be a little better particularly regarding logical uncertainty (even if we continue to treat logical credences in a Bayesian way).

One deeply insightful but lesser known analysis of logical uncertainty within a Bayesian framework is by Seidenfeld, Schervish, and Kadane (2012): "What Kind of Uncertainty Is That? Using Personal Probability for Expressing One’s Thinking About Logical and Mathematical Propositions." I'd love to hear your thoughts on it sometime.

[-]Cole Wyeth10d50

It looks like I have many points of agreement with Martin.

[-]Aydin Mohseni12d30

Unfortunately, Bayesian probability theory doesn't exactly tell us how to remedy the situation; in that way it fails Demski's criterion that a theory of rationality is meant to provide advice about how to be more rational.

There is some work on this. In "Measures of incoherence: How not to Gamble If You Must," Schervish, Seidenfeld, and Kadane (2002) provide a measure of incoherence and show that, for an incoherent agent, updating via Bayes will reduce their incoherence.

This isn’t a complete answer to how best to deal with incoherent beliefs, but it’s perhaps the start of one—and you can still tell your incoherent friends to use Bayes’ rule to become more coherent!

[-]Cole Wyeth10d50

Very interesting! I have been enjoying reading up on Seidenfeld's work.

[-]Vladimir_Nesov1mo*20

[-]Cole Wyeth1mo20

This sounds like how Scott formulated it, but as far as I know none of the actual (semi)formalizations look like this this.

[-]Vladimir_Nesov1mo50

The second paragraph is the way I think about how to improve on UDT, but I don't have a formulation of it that I like. Specifically, I don't like for those past abstract agents to be an explicit part of a multi-step history, like in Logical Induction (or a variant that includes utilities), with explicit stages. It seems too much of a cludge and doesn't seem to have a prospect of describing coordination between different agents (with different preferences and priors).

Past stages should be able to take into account their influence on any computations at all that choose to listen to them, not just things that were explicitly included as later stages or receiving messages or causal observations, in a particular policy formulation game. Influence on outcomes mediated purely through choice of behavior that an abstract algorithm makes for itself also seems more in spirit of UDT. The issue with UDT is that it tries to do too much in that single policy-choosing thing that it wants to be an algorithm but that mostly can't be an actual algorithm, rather than working through smaller actual algorithms that form parts of a larger setting, interacting through choice of their behavior and by observing each other's behavior.

Moderation Log