All of Matthias Dellago's Comments + Replies

Simplicity Priors are Tautological

Any non-uniform prior inherently encodes a bias toward simplicity. This isn't an additional assumption we need to make - it falls directly out of the mathematics.

For any hypothesis h, the information content is $I(h) = -\log(P(h))$, which means probability and complexity have an exponential relationship: $P(h) = e^{-I(h)}$

This demonstrates that simpler hypotheses (those with lower information content) are automatically assigned higher probabilities. The exponential relationship creates a strong bias toward simplicity without requiring any special mechanisms.

The "simplicity prior" is essentially tautological - more probable things are simple by definition.

2Alex Gibson
You can have a hypothesis with really high kolmogorov complexity, but if the hypothesis is true 50% of the time it will require 1 bit of information to specify with respect to a coding scheme that merely points to cached hypotheses. This is why when kolmogorov complexity is defined it's with respect to a fixed universal description language, as otherwise you're right, it's vacuous to talk about the simplicity of a hypothesis.

I would be interested in seeing those talks, can you maybe share links to these recordings?

4Stephen Fowler
These recordings I watched were actually from 2022 and weren't the Sante Fe ones. 

Very good work, thank you for sharing!

Intuitively speaking, the connection between physics and computability arises because the coarse-grained dynamics of our Universe are believed to have computational capabilities equivalent to a universal Turing machine [19–22].

I can see how this is a reasonable and useful assumption, but the universe seems to be finite in both space and time and therefore not a UTM. What convinced you otherwise?

2Aram Ebtekar
Thanks. Obviously this claim needs some interpretation, but a UTM still seems a better model of the Universe than, say, any lower automation in the Chomsky hierarchy. For the purposes of defining entropy, it's important that we can use a small base machine, plus a memory tape that we may think of as expanding in an online fashion.

Thank you! I'll have a look!

Simplified the solomonoff prior is the distribution you get when you take a uniform distribution over all strings and feed them to a turing machine.

Since the outputs are also strings: What happens if we iterate this? What is the stationary distribution? Is there even one? The fixed points will be quines, programs that copy their source code to the output. But how are they weighted? By their length? Presumably you can also have quine-cycles of programs that generate each other in turn, in a manner reminiscent metagenesis. Do these quine cycles capture all p... (read more)

3TsviBT
Very relevant: https://web.archive.org/web/20090608111223/http://www.paul-almond.com/WhatIsALowLevelLanguage.htm
6Kaarel
A few quick observations (each with like 90% confidence; I won't provide detailed arguments atm, but feel free to LW-msg me for more details): * Any finite number of iterates just gives you the solomonoff distribution up to at most a const multiplicative difference (with the const depending on how many iterates you do). My other points will be about the limit as we iterate many times. * The quines will have mass at least their prior, upweighted by some const because of programs which do not produce an infinite output string. They will generally have more mass than that, and some will gain mass by a larger multiplicative factor than others, but idk how to say something nice about this further. * Yes, you can have quine-cycles. Relevant tho not exactly this: https://github.com/mame/quine-relay * As you do more and more iterates, there's not convergence to a stationary distribution, at least in total variation distance. One reason is that you can write a quine which adds a string to itself (and then adds the same string again next time, and so on)[1], creating "a way for a finite chunk of probability to escape to infinity". So yes, some mass diverges. * Quine-cycles imply (or at least very strongly suggest) probabilities also do not converge pointwise. * What about pointwise convergence when we also average over the number of iterates? It seems plausible you get convergence then, but not sure (and not sure if this would be an interesting claim). It would be true if we could somehow think of the problem as living on a directed graph with countably many vertices, but idk how to do that atm. * There are many different stationary distributions — e.g. you could choose any distribution on the quines. ---------------------------------------- 1. a construction from o3-mini-high: https://colab.research.google.com/drive/1kIGCiDzWT3guCskgmjX5oNoYxsImQre-?usp=sharing ↩︎

"Many parts of the real world we care about just turn out to be the efficiently predictable."

I had a dicussion about exactly these 'pockets of computational reducibility' today. Whether they are the same as the more vague 'natural abstractions', and if there is some observation selection effect going on here.

Very nice! Alexander and I were thinking about this after our talk as well. We thought of this in terms of the kolmogorov structure function and I struggled with what you call Claim 3, since the time requirements are only bounded by the busybeaver number. I think if you accept some small divergence it could work, I would be very interested to see.

4Lucius Bushnaq
For claim 3, I think we just want to assume that the process we are trying to predict doesn’t have time requirements that are too large for us to make a prediction we are happy with. I think this has to be an assumption about the data we make because it is just genuinely not true of many processes we can conceive of, and I don’t think deep learning would work to predict those processes. Many parts of the real world we care about just turn out to be the efficiently predictable.

Small addendum: The padding argument gives a lower bound of the multiplicity. Above it is bounded by the Kraft-McMillan inequality.

Interesting! I think the problem is dense/compressed information can be represented in ways in which it is not easily retrievable for a certain decoder. The standard model written in Chinese is a very compressed representation of human knowledge of the universe and completely inscrutable to me.
Or take some maximally compressed code and pass it through a permutation. The information content is obviously the same but it is illegible until you reverse the permutation.

In some ways it is uniquely easy to do this to codes with maximal entropy because per definit... (read more)

Good points! I think we underestimate the role that brute force plays in our brains though.

Damn! Dark forest vibes, very cool stuff!
Reference for the sub collision: https://en.wikipedia.org/wiki/HMS_Vanguard_and_Le_Triomphant_submarine_collision

And here's another one!
https://en.wikipedia.org/wiki/Submarine_incident_off_Kildin_Island

Might as well start equipping them with fenders at this point.


And 2050 basically means post-AGI at this point. ;)

Great write up Alex!
I wonder how well the transparent battlefied translates to the naval setting.
1. Detection and communication through water is significantly harder than air, requiring shorter distances.
2. Surveilling a volume scales worse than a surface.

Am I missing something or do you think drones will just scale anyway?

4Alexander Gietelink Oldenziel
Great to hear this post had \geq 1 readers hah. * both the US and China are already deploying a number of surface and underwater drones. Ukraine has had a lot of success with surface suicide drones sinking several Russian ships iirc, damaging bridges etc. Outside of Ukraine and Russia, maybe Israel, nobody is really on the ball when it comes to military competitiveness. To hit home this point, consider that the US military employs about 10.000 drones of all sizes while Ukraine, with an economy 1/5 of the Netherlands, now produces 1-4 million drones a year alone. [ofc drones vary widely in size and capability so this is ofc a little misleading] It should be strongly suspected that when faced with a real peer opponent warring powers will quickly realize they need to massively up production of drones. * there is an interesting acoustic phenomenon where a confluence of environmental factors (like sea depth, temperature, range, etc) create 'sonar deadzones' where submarines are basically invisible. The exact nature of these deadzones is a closely-held state secret - as is the exact design of submarines to make them as silent as possible. As stated, my understanding is that is one of a few remaining areas where the US has a large technological advantage over her Chinese counterparts. You can't hit something you can't see so this advantage is potentially very large. As mentioned, a single torpedo hit will sink a ship; a ballistic missile hit is a mission kill; both attack submarines and ballistic missile submarines are lethal. * Although submarines can dive fairly deep, there are various constraints on how deep they typically dive. e.g. they probably want to stay in these sonar deadzones. -> There was an incident a while back where a (russian? english? french?) submarine hit another submarine (russian? englih? french?) by accident. It underscores how silent submarines are and how there are probably preferred regions underwater where submarines are much more likely t

I don't know if that is a meaningful question.
Consider this: a cube is something that is symmetric under the octahedral group - that's what *makes* it a cube. If it wasn't symmetric under these transformations, it wouldn't be a cube. So also with spacetime - it's something that transforms according to the Poincaré group (plus some other mathematical properties, metric etc.). That's what makes it spacetime.

2Noosphere89
So space symmetry is always assumed when we talk about spacetime, and if space symmetry didn't hold, spacetime as we know it would not work/exist?

I'll bet you! ;)

Sadly my claim is somewhat unfalsifiable because the emergence might always be hiding at some smaller scale, but I would be surprised if we find the theory that the standard model emerges from and it's contains classical spacetime.

I did a little search, and if it's worth anything Witten and Wheeler agree: https://www.quantamagazine.org/edward-witten-ponders-the-nature-of-reality-20171128/ (just search for 'emergent' in the article)

3Noosphere89
Can you have emergent spacetime while space symmetry remains a bedrock fundamental principle, and not emergent of something else?

You're making an interesting connection to symmetry! But scale invariance as discussed here is actually emergent - it arises when theories reach fixed points under coarse-graining, rather than being a fundamental symmetry of space. This is why quantities like electric charge can change with scale, despite spacetime symmetries remaining intact.

And while spacetime symmetries still seem scale invariant, considering the above argument they might also break down at small scales. It seems exceedingly unlikely that they would not! The initial parameters of the theory would have to be chosen just so as to be a fixed point. It seems much more likely that these symmetries emerged through RG flow rather than being fundamental.

3Noosphere89
While this is an interesting idea, I do still think space symmetries are likely to remain fundamental features of physics, rather than being emergent out of some other process.

The act of coarse-graining/scaling up (RG transformation) changes the theory that describes the system, specifically the theories parameters. If you consider in the space of all theories and iterate the coarse-graining, this induces a flow where each theory is mapped to a coarse-grained version. This flow may posess attractors, that is stable fixed points x*, meaning that when you apply the coarse-graining you get the same theory back.

And if f(x*)=x* then obviously f(f(x*))=x*, i.e. any repeated application will still yield the fixed point.

So you can scale up as much as you want - entering a fixed point really is a one way street, you can can check out any time you like but you can never leave!

As a corollary: Maybe power laws for AI should not surprise us, they are simply the default outcome of scaling.

Scale invariance is itself an emergent phenomenon. 

Imagine scaling something (say a physical law) up - if it changes, it is obviously not scale invariant as it will continue changing with each scale up. If it does not change it has reached a fixed point and will not change in the next scale up either!
Scale invariances are just fixed points of coarse-graining.
Therefore, we should expect anything we think of as scale invariant to break down at small scales. For instance, electric charge is not scale invariant at small scales! 
In the opposite direct... (read more)

3Nathan Helm-Burger
This sounds like a fascinating insight, but I think I may be missing some physics context to fully understand. Why is it that the derived laws approximating a true underlying physical law are expected to stay scale invariant over increasing scale after being scale invariant for two steps? Is there a reason that there can't be a scale invariant region that goes back to being scale variant at large enough scales just like it does at small enough scales?
3Noosphere89
The main source of scale-invariance itself probably would have to do with symmetry meaning that an object has a particular property that is preserved across scales. Space symmetry is an example, where the basic physical laws are preserved across all scales of spacetime, and in particular means that scaling a system down doesn't mean different laws of physics apply at different scales, there is only 1 physical law, which produces varied consequences at all scales.
3Matthias Dellago
As a corollary: Maybe power laws for AI should not surprise us, they are simply the default outcome of scaling.