My reasoning is this:
Consider the domain of bit streams - to avoid having to deal with infinity, let's take some large but finite length, say a trillion bits. Then there are 2^trillion possible bit streams. Now restrict our attention to just those that begin with a particular ordered pattern, say the text of Hamlet, and choose one of those at random. (We can run this experiment on a real computer by taking a copy of said text and appending enough random noise to bring the file size up to a trillion bits.) What can we say about the result?
Well, almost all bit streams that begin with the text of Hamlet, consist of just random noise thereafter, so almost certainly the one we choose will lapse into random noise as soon as the chosen text ends.
Suppose we go about things in a different way and instead of choosing a bit stream directly, we take our domain as that of programs a trillion bits long, and then take the first trillion bits of the program's output as our bit stream. Now restrict our attention to just those programs whose output begins with the text of Hamlet, and choose one of those at random. What can we say about the result this time?
It's possible that the program we chose consists of print "...", i.e. the output bit stream is just embedded in the program code. Then our conclusion from choosing a bit stream directly still applies.
But this is highly unlikely. The entropy of English text has been estimated at about one bit per character. In other words, there exist subroutines that will output the text of Hamlet, that are about eight times shorter than print "...". That in turn means that in our domain of programs a trillion bits long, exponentially more programs contain the compact subroutine than the literal print statement. Therefore our randomly selected program is almost certainly printing the text of Hamlet by using a compact subroutine, not a literal print statement.
But the compact subroutine will probably not lapse into random noise as soon as Hamlet is done. It will continue to generate... probably not a meaningful sequel, Hamlet doesn't constrain the universe enough for that, but at least something that bears some syntactic resemblance to English text.
For the constraint the output bit stream must begin with the text of Hamlet, substitute at least one observer must exist. We are then left with the conclusion that if the universe were randomly chosen from all bit streams, we should be Boltzmann brains, observing just enough order for an instance of observation, with an immediate lapse back into noise. As we observe continuing order, we may conclude that the universe was randomly chosen from programs (or mathematical descriptions, if the universe is not restricted to be computable) so that it was selected to contain a compact generator of the patterns that produced us, therefore a generator that will continue producing order.
Note that this is an independent derivation of Solomonoff's result that future data should be expected to be that which would be generated by the shortest program that would produce past data. (I actually figured out the above argument as a way to refute Hume's claim that induction is not logical, before I heard of Solomonoff induction.)
It also works equally well if output data is allowed to be infinite, or even uncountably infinite in size (e.g. continuum physics), because uncountably infinite data can still be generated by a formal definition of finite size.
That in turn means that in our domain of programs a trillion bits long, exponentially more programs contain the compact subroutine than the literal print statement.
Are you sure this is right? There's exponentially many different print statements. Do you have an argument why they should have low combined weight?
Every now and then I see a claim that if there were a uniform weighting of mathematical structures in a Tegmark-like 'verse---whatever that would mean even if we ignore the decision theoretic aspects which really can't be ignored but whatever---that would imply we should expect to find ourselves as Boltzmann mind-computations, or in other words thingies with just enough consciousness to be conscious of nonsensical chaos for a brief instant before dissolving back into nothingness. We don't seem to be experiencing nonsensical chaos, therefore the argument concludes that a uniform weighting is inadequate and an Occamian weighting over structures is necessary, leading to something like UDASSA or eventually giving up and sweeping the remaining confusion into a decision theoretic framework like UDT. (Bringing the dreaded "anthropics" into it is probably a red herring like always; we can just talk directly about patterns and groups of structures or correlated structures given some weighting, and presume human minds are structures or groups of structures much like other structures or groups of structures given that weighting.)
I've seen people who seem very certain of the Boltzmann-inducing properties of uniform weightings for various reasons that I am skeptical of, and others who seemed uncertain of this for reason that sound at least superficially reasonable. Has anyone thought about this enough to give slightly more than just an intuitive appeal? I wouldn't be surprised if everyone has left such 'probabilistic' cosmological reasoning for the richer soils of decision theoretically inspired speculation, and if everyone else never ventured into the realms of such madness in the first place.
(Bringing in something, anything, from the foundations of set theory, e.g. the set theoretic multiverse, might be one way to start, but e.g. "most natural numbers look pretty random and we can use something like Goedel numbering for arbitrary mathematical structures" doesn't seem to say much to me by itself, considering that all of those numbers have rich local context that in their region is very predictable and non-random, if you get my metaphor. Or to stretch the metaphor even further, even if 62534772 doesn't "causally" follow 31256 they might still be correlated in the style of Dust Theory, and what meta-level tools are we going to use to talk about the randomness or "size" of those correlations, especially given that 294682462125 could refer to a mathematical structure of some underspecified "size" (e.g. a mathematically "simple" entire multiverse and not a "complex" human brain computation)? In general I don't see how such metaphors can't just be twisted into meaninglessness or assumptions that I don't follow, and I've never seen clear arguments that don't rely on either such metaphors or just flat out intuition.)