shminux comments on An Intuitive Explanation of Solomonoff Induction - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (210)
What's the length of the hypothesis that F=ma?
How many bits does it take to say "mass times the second derivative of position"? An exact answer would depend on the coding system. But if mass, position, time, multiplication, and differentiation are already defined, then, not many bits. You're defining force as ProductOf(mass,d[d(position)/d(time)]/d(time)).
See the discussion here on the bit complexity of physical laws. One thing missing from that discussion is how to code the rules for interpreting empirical data in terms of a hypothesis. A model is useless if you don't know what empirical phenomena it is supposed to be describing, and that has to be specified somehow too.
I understand all that, I just want a worked example, not only hand-waving. After all, a formalization of Occam's razor is supposed to be useful in order to be considered rational.
Remember, the Kolmogorov complexity depends on your "universal Turing machine", so we should expect to only get estimates. Mitchell makes an estimate of ~50000 bits for the new minimal standard model. I'm not an expert on physics, but the mathematics required to explain what a Lagrangian is would seem to require much more than that. I think you would need Peano arithmetic and a lot of set theory just to construct the real numbers so that you could do calculus (of course people were doing calculus for over one hundred years before real numbers existed, but I have a hard time imagining a rigorous calculus without them...) I admit that 50000 bits is a lot of data, but I'm sceptical that it could rigorously code all that mathematics.
F=ma has the same problem, of course. Does the right hand side really make sense without calculus?
ETA: If you want a fleshed out example, I think a much better problem to start off with would be predicting the digits of pi, or the prime numbers.
My estimate was 27000 bits to "encode the standard model" in Mathematica. To define all the necessary special functions on a UTM might take 50 times that.
I just want something simple but useful. Gotta start small. Once we are clear on F=ma, we can start thinking about formalizing more complicated models, and maybe some day even quantum mechanics and MWI vs collapse.
We can already do MWI vs Collapse without being clear on F=ma. MWI is not even considered because MWI does not output a string that begins with the observed data, i.e. MWI will never be found when doing Solomonoff induction. MWI's code may be a part of correct code, such as Copenhagen interpretation (which includes MWI's code). Or something else may be found (my bet is on something else because general relativity). It is this bloody simple.
The irony is, you can rule MWI out with Solomonoff induction without even choosing the machine or having a halting oracle. Note: you can't rule out existence of many worlds. But MWI simply does not provide the right output.
Why are his comments in this thread getting downvoted? They show a quite nuanced understanding of S. I. and raise interesting points.
If there is no requirement for the observed data to be at the start of the string that is output, then the simplest program that explains absolutely everything that is computable is this:
Print random digits. (This was actually a tongue-in-cheek Schmidhuber result from the early 2000s, IIRC. The easiest program whose output will assuredly contain our universe somewhere along the line.)
Luckily there is such a requirement, and I don't know how MWI could possibly fit into it. This unacknowledged tension has long bugged me, and I'm glad someone else is aware of it.
He identifies subtleties, but doesn't look very hard to see whether other people could have reasonably supposed that the subtleties resolve in a different way than he thinks they "obviously" do. Then he starts pre-emptively campaigning viciously for contempt for everyone who draws a different conclusion than the one from his analysis. Very trigger-happy.
This needlessly pollutes discussion... that is to say, "needless" in the moral perspective of everyone who doesn't already believe that most people who first appear wrong by that criterion that way in fact are wrong, and negligently and effectively incorrigibly so, such that there'd be nothing to lose by loosing broadside salvos before the discussion has even really started. (Incidentally, it also disincentivizes the people who could actually explain the alternative treatment of the subtleties from engaging with him, by demonstrating a disinclination to bother to suppose that their position might be reasonble.) This perception of needlessness, together with the usual assumption that he must already be on some level aware of other peoples' belief in that needlessness but is disregarding that belief, is where most of the negative affect toward him comes from.
Also, his occasional previous lack of concern for solid English grammar didn't help the picture of him as not really caring about the possibility that the people he was talking to might not deserve the contempt for them that third parties would inevitably come away with the impression that he was signaling.
(I wish LW had more people who were capable of explaining their objections understandably like this, instead of being stuck with a tangle of social intuitions which they aren't capable of unpacking in any more sophisticated way than by hitting the "retaliate" button.)
Were capable of and bothered to, I suppose. I rarely bother to explain the reasons for my value judgments unless I'm specifically asked, and sometimes not even then. Especially not when it comes to value judgments of random people on the Internet. Low-value Internet interactions are fungible.
private_messaging is a troll. Safely assume bad faith.
Wikipedia:
Let's see:
so, maybe 25-30% trollness.
I never get this impression from his posts. They seem honest (if sometimes misguided) not malicious to me.
I suspect that others downvote private_messaging because of his notoriety. I did downvote his comment because he strayed away from my explicit (estimate the complexity of the Newton's 2nd law) request and toward a flogged-to-death topic of MWI vs the world. Such a discussion has proven to be unproductive time and again in this forum.
Likewise. (With the caveat that I endorse downvoting extreme cases based on notoriety so probably would have downvoted anyway.)
At this point I am not interested in human logic, I want a calculation of complexity. I want a string (an algorithm) corresponding to F=ma. Then we can build on that.
An interesting point - the algorithm would contain apparent collapses as special instructions even while it did not contain it as general rules.
I think leaving it out as a general rule damages the notion that it's producing the Copenhagen Interpretation, though.
Maybe start by showing how it works to predict a sequence like 010101..., then something more complicated like 011011... It starts to get interesting with a sequence like 01011011101111... - how long would it take to converge on the right model there? (which is that the subsequence of 1s is one bit longer each time).
True Solomonoff induction is uselessly slow. The relationship of Solomonoff induction to actual induction is like the relationship of Principia Mathematica to a pocket calculator; you don't use Russell and Whitehead's methods and notation to do practical arithmetic. Solomonoff induction is a brute-force scan of the whole space of possible computational models for the best fit. Actual induction tends to start apriori with a very narrow class of possible hypotheses, and only branches out to more elaborate ones if that doesn't work.
This would be a derivation of F=ma, vs all other possible laws. I am not asking for that. My question is supposedly much simpler: write a binary string corresponding to just one model out of infinitely many, namely F=ma.
If we adopt the paradigm in the article - a totally passive predictor, which just receives a stream of data and makes a causal model of what's producing the data - then "F=ma" can only be part of the model. The model will also have to posit particular forces, and particular objects with particular masses.
Suppose the input consists of time series for the positions in three dimensions of hundreds of point objects interacting according to Newtonian gravity. I'm sure you can imagine what a program capable of generating such output looks like; you may even have written such a program. If the predictor does its job, then its model of the data source should also be such a program, but encoded in a form readable by a UTM (or whatever computational system we use).
Such a model would possibly have a subroutine which computed the gravitational force exerted by one object on another, and then another subroutine which computed the change in the position of an object during one timestep, as a function of all the forces acting on it. "F=ma" would be implicit in the second subroutine.
This is correct.
Solomonoff induction accounts for all of the data, which is a binary sequence of sensory-level happenings. In its hypothesis, there would have to be some sub-routine that extracted objects from the sensory data, one that extracted a mass from these objects, et cetera. The actual F=ma part would be a really far abstraction, though it would still be a binary sequence.
Solomonoff induction can't really deal with counterfactuals in the same way that a typical scientific theory can. That is, we can say, "What if Jupiter was twice as close?" and then calculate it. With Solomonoff induction, we'd have to understand the big binary sequence hypothesis and isolate the high-level parts of the program to use just those to calculate the consequence of counterfactuals.
Four.
Plus a constant.
I think people missed a joke here. I mean, seriously, EY is not so stupid as to think that it is 4 bits literally. And if it is 4 symbols, and symbols are arbitrary size, then it's not 'plus a constant', it's multiply by a log2(nsymbols in alphabet) plus a constant (suppose I make Turing machine with 8-symbol tape, then on this machine I can compress arbitrary long programs of other machine into third length plus a constant).
No, really. It can be 4 literal bits and a sufficiently arbitrary constant. It's still a joke and I rather liked it myself.
Yes. I was addressing what I thought might be sensible reason not to like the joke given that F=ma is 4 symbols (so is "four").
Then my hypothesis is simpler: GOD.
Eliezer: My hypothesis is even simpler: ME!
As long as the GOD is simple and not too knowable. If he knows everything (about the Universe), he is even more complex than the Universe.
The same logic applies to EY's comment :)
01000110001111010110110101100001
Source
I think this post is making the mistake of allowing the hypothesis to be non-total. Definition: a total hypothesis explains everything, it's a universe-predicting machine and equivalent to "the laws of physics". A non-total hypothesis is like an unspecified total hypothesis with a piece of hypothesis tacked on. Neither what it does, nor any meaningful understanding of its length, can be derived without specifying what it's to be attached to.