Just a general comment on style: I think this article would be much easier to parse if you included different headings, sections, etc. Normally when I approach writing here in LessWrong I scroll slowly to the bottom, do some diagonal reading and try to get a quick impression about the contents and whether the article is worth reading. My impression is that most people will ignore articles that are big uninterrupted long chunks of text like this one
Strong upvoted for visibility and because this sort of posts contribute to create a healthy culture of free speech and rational discussion.
Thank you for the comprehensive answer and for correcting the points where I wasn't clear. Also, thank you for pointing out that the Kolmogorov complexity of a program is the length of the program that writes that program
The complexity of the algorithms was totally arbitrary and for the sake of the example.
I still have some doubts, but everything is more clear now (see my answer to Charlie Steiner also)
I think that re-reading again your answer made something click. So thanks for that
The observed data is not **random**, because random is not a property of the data itself.
The hypotheses that we want to evaluate are not random either, because we are analysing Turing machines that generate those data deterministically.
If the data is HTHTHT, we do not test a python script that is doing:
random.choices(["H","T"], k=6)
What we test instead is something more like
["H"] +["T"]+["H"]+["T"]+["H"]+["T"]
And
["HT"]*3
In this case, this last script will be simpler and for that reason, will receive a higher prior.
If we apply this is a Bayesian setting, the likelihood of all these hyptohesis is necessarily 1, so the posterior probabilty just becomes the prior (divided by some factor), which is proportional to the length of the program. This makes sense because it is in agreement with Occam's razor.
The thing I still struggle to see is how I connect this framework with probabilistic hypothesis that I want to test, such as the data was generated by a fair coin. One possibility that I see (but I am not sure this is the correct thing) is testing all the possible strings generated by an algorithm like this:
i=0
while True:
random.seed(i)
random.choices(["H","T"], k=6)
The likelihood of the strings like HHTHTH is 0 so we remove them and then we are left only with the algorithms that are consistent with the data.
Not totally sure of the last part
The part I understood is, that you weigh the programs based on the length in bits, the longer the program the less weight it has. This makes total sense.
I am not sure that I understand the prefix thing and I think that's relevant. For example, it is not clear to me if once I consider a program that outputs 0101 I will simply ignore other programs that output that same thing plus one bit (e.g. 01010).
I also find still fuzzy (and know at least I can put my finger on it) is the part where Solomonoff induction is extended to deal with randomness.
Let me see if I can make my question more specific:
Let's imagine for a second that we live in a universe where only the next programs could be written:
The programs in A have 5 bits of Kolmogorov complexity each. The programs in B have 6 bits. The program C has 4
We observe the sequence O = HTHHT
I measure the likelihood for each possible model. I discard the models with L = 0
A) There is a model here with likelihood 1
B) There are 2 models here, each of them with likelihood 1 too
C) This model has likelihood 2^-5
Then, things get murky:
the priors for each models will be 2^-5 for model A, 2^-6 for model B and 2^-4 for model C, according to their Kolmogorov complexity?
Yes, this is something I can see easily, but I am not sure how Solomonoff induction accounts for that
I think this is pointing to what I don't understand: how do you account for hypotheses that explain data generated randomly? How do you compare a hypothesis which is a random number generator with some parameters against a hypothesis which has some deterministic component?
Is there a way to understand this without reading the original paper (which will probably take me quite long)?
When you understood this, how was your personal process that took you from knowing about probabilities and likelihood to understanding Solomonoff induction? Did you have to read the original sources or you found some good explanations somewhere?
I also don't get if this is a calculation that you can do in a single step or if this is a continuous thing. In other words, Solomonoff induction would work only if we assume that we keep observing new data?
Sorry for the stupid questions, as you can see, I am confused.
I have followed a similar strategy using Anki cards. However, I think that allocating a specific time slot to review your principles and then "act" on then is probably much more effective than passively remind those principles. I will adopt this.
What happens if there is more than one powerful agent just playing the charade game? Is there any good article about what happens in a universe where multiple AGI are competing among them? I normally find only texts that consider that once we get AGI we all die so there is no room for these scenarios.
12 Angry Men
Connection to rationality:
This is just the perfect movie about rationality. Damn, there is even a fantastic YouTube series discussing this movie in the context of instrumental rationality! And besides, I have never met anyone who did not enjoy this classic film.
This classic film is a masterclass in group decision-making, overcoming biases, and the process of critical thinking. The plot revolves around a jury deliberating the guilt of a young man accused of murder. Initially, 11 out of the 12 jurors vote "guilty," but one juror (played by Henry Fonda) questions the certainty of the evidence. It is an absolute must-watch.