Comment author: MedicJason 25 August 2013 01:59:41AM 0 points [-]

To my mind all such questions are related to arguments about solipcism, i.e. the notion that even other humans don't, or may not, have minds/consciousness/qualia. The basic argument is that I can only see behavior (not mind) in anyone other than myself. Most everyone rejects solipsism, but I don't know if there have actually many very good arguments against it, except that it is morally unappealing (if anyone know of any please point them out). I think the same questions hold regarding emulations, only even more so (at least with other humans we know they are physically similar, suggesting some possibility that they are mentally similar as well - not so with emulations*). Especially, I don't see how there can ever be empirical evidence that anything is conscious or experiences qualia (or that anything is not conscious!): behavior isn't strictly relevant, and other minds are non-perceptible. I think this is the most common objection to Turing-tests as a standard, as well.

*Maybe this is the logic of the biological position you mention - essentially, the more something seems like the one thing I know is conscious (me), the more likehood I assign to it also being conscious. Thus other humans > other complex animals > simple animals > other organisms > abiotics.

Comment author: Viliam_Bur 03 May 2013 04:00:58PM 0 points [-]

However, as I understand it, SI does take into account evidence - one removes all the possibilities incompatible with the evidence, then renormalizes the probablities of the remaining possibilities. Right?

I am not sure about the terminology. I would call the described process "Solomonoff priors, plus updating", but I don't know the official name.

after taking account of all available evidence - is SI then well-calibrated?

I believe the answer is "yes, with enough evidence it is better calibrated then humans".

How much would "enough evidence" be? Well, you need some to compensate for the fact that humans are already born with some physiology and instincts adapted by evolution to our laws of physics. But this is a finite amount of evidence. All the evidence that humans get, should be processed better by the hypothetical "Solomonoff prior plus updating" process. So even if the process would start from zero and get the same information as humans, at some moment it should become and remain better calibrated.

the theory seems to predict that possible (evidence-compatible) events or states in the universe will occur in exact or fairly exact proportion to their relative complexities as measured in bits [...] if I am predicting between 2 (evidence-compatible) possibilities, and one is twice as information-complex as the other, then it should actually occur 1/3 of the time

Let's suppose that there are two hypotheses H1 and H2, each of them predicting exactly the same events, except that H2 is one bit longer and therefore half as likely as H1. Okay, so there is no evidence to distinguish between them. Whatever happens, we either reject both hypotheses, or we keep their ratio at 1:2.

Is that a problem? In real life, no. We will use the system to predict future events. We will ask about a specific event E, and by definition both H1 and H2 would give the same answer. So why should we care whether the answer was derived from H1, from H2, or from a combination of both. The question will be: "Will it rain tomorrow?" and the answer will be: "No." That's all, from outside.

Only if you try to look inside and ask "What was your model of the world that you used for this prediction?" the machine would tell you about H1, H2, and infinitely many other hypotheses. Then, you could ask it to use Occam's razor to only choose the simplest one and display it to you. But internally, it could keep all of them (we already suppose it has an infinite memory and infinite processing power). Note, if I understand it correctly, that it would be actually impossible for the machine to tell whether in general two hypotheses H1 and H2 are evidence-compatible.

Is there any evidence that outcomes in the universe actually occur with probablities in proportion to their information-complexity?

They don't. To get the probabilities about something occuring in our universe, you need to get the information about our universe first. Solomonoff Induction tells you how to do that, in a random universe. After you get enough evidence to understand the universe, only then you start getting good results.

In other words, the laws of our universe don't say "things are probable according to their information complexity". Instead they say other things. The problem is... at the beginning, you don't know the laws of our universe exactly. So how can you learn them?

Imagine yourself living centuries ago. If you knew Solomonoff Induction, it would give you a non-zero probability for quantum physics (and many other things, most of them wrong). A hypothetical machine with infinite power, able to do all the calculations, could in theory derive the quantum physics just by receiving the evidence you see. Isn't that awesome?

I phrased things in terms of probabilities inside a single universe because that is the context in which I observe & make decisions and would like SI to be useful.

Me too. But we still don't know all the laws of our universe. So in that aspect "what universe do we live in" remains a bit unknown.

However I think you could just translate what I have said back into many-worlds language and keep the question intact.

Careful. There is a difference between quantum "many worlds" which are all supposed to follow the same laws of physics, and between hypothetical universes with other laws of physics, called the Tegmark multiverse.

Again, I agree that we should only about our laws of physics, and about our branch of "many worlds". But still we have a problem of not knowing exactly what the laws are, and which branch it is. So we need a method to work with multiple possible laws, and multiple possible branches. With enough updating on our evidence, the probabilities of the other laws and other branches will get close to zero, and the remaining ones will be the most relevant for us.

Comment author: MedicJason 03 May 2013 06:39:17PM 0 points [-]

They don't. To get the probabilities about something occuring in our universe, you need to get the information about our universe first. Solomonoff Induction tells you how to do that, in a random universe. After you get enough evidence to understand the universe, only then you start getting good results.

Yes, but we already have lots of information about our universe. So, making use of all that, if we could start using SI to, say, predict the weather, would its predictions be well-calibrated? (They should be - modern weather predictions are already well-calibrated, and SI is supposed to be better than how we do things now.) That would require that, of all predictions compatible with currently known info, ALL of them would have to occur in EXACT PROPORTION to their bit-length complexity.

Is there any evidence that this is the case?

Comment author: Viliam_Bur 03 May 2013 04:00:58PM 0 points [-]

However, as I understand it, SI does take into account evidence - one removes all the possibilities incompatible with the evidence, then renormalizes the probablities of the remaining possibilities. Right?

I am not sure about the terminology. I would call the described process "Solomonoff priors, plus updating", but I don't know the official name.

after taking account of all available evidence - is SI then well-calibrated?

I believe the answer is "yes, with enough evidence it is better calibrated then humans".

How much would "enough evidence" be? Well, you need some to compensate for the fact that humans are already born with some physiology and instincts adapted by evolution to our laws of physics. But this is a finite amount of evidence. All the evidence that humans get, should be processed better by the hypothetical "Solomonoff prior plus updating" process. So even if the process would start from zero and get the same information as humans, at some moment it should become and remain better calibrated.

the theory seems to predict that possible (evidence-compatible) events or states in the universe will occur in exact or fairly exact proportion to their relative complexities as measured in bits [...] if I am predicting between 2 (evidence-compatible) possibilities, and one is twice as information-complex as the other, then it should actually occur 1/3 of the time

Let's suppose that there are two hypotheses H1 and H2, each of them predicting exactly the same events, except that H2 is one bit longer and therefore half as likely as H1. Okay, so there is no evidence to distinguish between them. Whatever happens, we either reject both hypotheses, or we keep their ratio at 1:2.

Is that a problem? In real life, no. We will use the system to predict future events. We will ask about a specific event E, and by definition both H1 and H2 would give the same answer. So why should we care whether the answer was derived from H1, from H2, or from a combination of both. The question will be: "Will it rain tomorrow?" and the answer will be: "No." That's all, from outside.

Only if you try to look inside and ask "What was your model of the world that you used for this prediction?" the machine would tell you about H1, H2, and infinitely many other hypotheses. Then, you could ask it to use Occam's razor to only choose the simplest one and display it to you. But internally, it could keep all of them (we already suppose it has an infinite memory and infinite processing power). Note, if I understand it correctly, that it would be actually impossible for the machine to tell whether in general two hypotheses H1 and H2 are evidence-compatible.

Is there any evidence that outcomes in the universe actually occur with probablities in proportion to their information-complexity?

They don't. To get the probabilities about something occuring in our universe, you need to get the information about our universe first. Solomonoff Induction tells you how to do that, in a random universe. After you get enough evidence to understand the universe, only then you start getting good results.

In other words, the laws of our universe don't say "things are probable according to their information complexity". Instead they say other things. The problem is... at the beginning, you don't know the laws of our universe exactly. So how can you learn them?

Imagine yourself living centuries ago. If you knew Solomonoff Induction, it would give you a non-zero probability for quantum physics (and many other things, most of them wrong). A hypothetical machine with infinite power, able to do all the calculations, could in theory derive the quantum physics just by receiving the evidence you see. Isn't that awesome?

I phrased things in terms of probabilities inside a single universe because that is the context in which I observe & make decisions and would like SI to be useful.

Me too. But we still don't know all the laws of our universe. So in that aspect "what universe do we live in" remains a bit unknown.

However I think you could just translate what I have said back into many-worlds language and keep the question intact.

Careful. There is a difference between quantum "many worlds" which are all supposed to follow the same laws of physics, and between hypothetical universes with other laws of physics, called the Tegmark multiverse.

Again, I agree that we should only about our laws of physics, and about our branch of "many worlds". But still we have a problem of not knowing exactly what the laws are, and which branch it is. So we need a method to work with multiple possible laws, and multiple possible branches. With enough updating on our evidence, the probabilities of the other laws and other branches will get close to zero, and the remaining ones will be the most relevant for us.

Comment author: MedicJason 03 May 2013 06:22:31PM 0 points [-]

You quoted me

"the theory seems to predict that possible (evidence-compatible) events or states in the universe will occur in exact or fairly exact proportion to their relative complexities as measured in bits [...] if I am predicting between 2 (evidence-compatible) possibilities, and one is twice as information-complex as the other, then it should actually occur 1/3 of the time"

then replied

"Let's suppose that there are two hypotheses H1 and H2, each of them predicting exactly the same events, except that H2 is one bit longer and therefore half as likely as H1. Okay, so there is no evidence to distinguish between them. Whatever happens, we either reject both hypotheses, or we keep their ratio at 1:2."

I am afraid I may have stated this unclearly at first. I meant, given 2 hypotheses that are both compatible with all currently-known evidence, but which predict different outcomes on a future event.

Comment author: DaFranker 03 May 2013 03:10:10PM *  0 points [-]

Is there any evidence that outcomes in the universe actually occur with probablities in proportion to their information-complexity?

Yes, and the first piece of evidence is rather trivial. For any given law of physics, chemistry, etc. or basically any model of anything in the universe, I can conjure up an arbitrary amount of more and more complicated hypotheses that match the current data, but all or nearly-all of which will fail utterly against new data obtained later.

For a very trivial thought experiment / example, we could have an alternate hypothesis which includes all of the current data, with only instructions to the turing machine to print this data. Then we could have another which includes all the current data twice, but tells the turing machine to only print one copy. Necessarily, both of these will fail against new data, because they will only print the old data and halt.

We could conjure any infinities of copies similar to this which also contain arbitrary amounts of gibberish right after the old data, gibberish which will be unlikely to match the new data (with probability 1/2^n where n is the length of the new data / gibberish, assuming perfect randomness).

Comment author: MedicJason 03 May 2013 06:15:47PM 0 points [-]

This seems reasonable - it basically makes use of the fact that most statements are wrong, therefore adding a given statement whose truth-value is as-yet-unknown is likely to be wrong.

However, that's vague. It supports Occam's Razor pretty well, but does it also offer good evidence that that those likelihoods will manifest in real-world probabilities IN EXACT PROPORTION to the bit-lengths of their inputs? That is a much more precise claim! (For convenience I am ignoring the problem of multiple algorithms where hypotheses have different bit-lengths.)

Comment author: DaFranker 03 May 2013 02:04:16PM *  0 points [-]

Viliam_Bur makes a great run-down of what's going on. For a more detailed introduction though, see this post explaining Solomonoff Induction, or perhaps you'd prefer to jump straight to this paragraph (Solomonoff's Lightsaber) that contains an explanation of why shorter (simpler) hypotheses are more likely under Solomonoff Induction.

To make the bridge between that and what Viliam is saying, basically, if we consider all mathematically possible universes, then half the universes will start with a 1, and the other half will start with a 0. Then a quarter will start with 11, and another with 10, and so on. Which means that, to reuse the example in the above-linked post, 01001101 (which matches observed data perfectly so far) will appear in 1 out of 256 mathematically-possible universes, and 1000111110111111000111010010100001 (which also matches the data just as perfectly) will only appear in 1 out of 17179869184 mathematically-possible universe.

So if we expect to live in one out of all mathematically-possible universe, but we have no idea what properties it has (or if you just got warped to a different universe with different laws of physics), which of the two hypotheses do you want? The one that is true more often, in more of the possible universes, because you're more likely to be in one of those than in one that has the longer, rarer hypothesis.

That's the basic simplified logic behind it.

Comment author: MedicJason 03 May 2013 02:58:35PM 0 points [-]

Yes, that was the post I read that generated my current line of questioning.

My reply to Viliam_Bur was phrased in terms of probabilities in a single universe, while your post here is in terms of mathematically possible universes. Let me try to rephrase my point to him in many-worlds language. This is not how I originally thought of the question, though, so I may end up a little muddled in translation.

Taking your original example, where half of the Mathematically Possible Universes start with 1, and the other half with 0. It is certainly possible to imagine a hypothetical Actual Multiverse where, nevertheless, there are 5 billion universes with 1, and only 5 universes with 0. Who knows why - maybe there is some overarching multiversal law we are unaware of, or may it's just random. The point is that there is no a priori reason the Multiverse can't be that way. (It may not even be possible to say that the multiverse probably isn't that way without using Solomonoff Induction or Occam's Razor, the very concepts under question.)

If this were the case, and I were somehow universe-hopping, I would over time come to the conclusion that SI was poorly calibrated and stop using it. This, I think, is basically the many-worlds version of my suggestion to Viliam_Bur. As I said to him, I am not arguing for or against SI, I am just asking knowledge people if there is any evidence that the probablities in this universe, or distributions across the multiverse, are actually in proportion to their information-complexities.

Comment author: Viliam_Bur 03 May 2013 07:13:32AM *  3 points [-]

Solomonoff Induction could be well-calibrated across mathematically possible universes. If a hypothesis has a probability 10%, you should expect it to be true in 10% of the universes.

Important thing is that Solomonoff priors are just a starting point in our reasoning. Then we update on evidence, which is at least as important as having reasonable priors. If it does not seem well calibrated, that is because you can't get good calibration without using evidence.

Imagine that at this moment you are teleported to another universe with completely different laws of physics... do you expect any other method to work better than Solomonoff Induction? Yes, gradually you get data about the new universe and improve your model. But that's exactly what you are supposed to do with Solomonoff priors. You wouldn't predictable get better results by starting from different priors.

It appears to just be a formalization of Occam's Razor, which itself is just a rule of thumb.

To me it seems that Occam's Razor is a rule of thumb, and Solomonoff Induction is a mathematical background explaining why the rule of thumb works. (OR: "Choose the most simple hypothesis that fits your data." Me: "Okay, but why?" SI: "Because it is more likely to be the correct one.")

But if it turned out not to be well-calibrated, it would not be a very good "recipe for truth." What am I missing?

You can't get a good "recipe for truth" without actually looking at the evidence. Solomonoff Induction is the best thing you can do without the evidence (or before you start taking the evidence into account).

Essentially, the Solomonoff Induction will help you avoid the following problems:

  • Getting inconsistent results. For example, if you instead supposed that "if I don't have any data confirming or rejecting a hypothesis, I will always assume its prior probability is 50%", then if I give you two new hypotheses X and Y without any data, you are supposed to think that p(X) = 0.5 and p(Y) = 0.5, but also e.g. p(X and Y) = 0.5 (because "X and Y" is also a hypothesis you don't have any data about).

  • Giving so extremely low probability to a reasonable hypothesis that available evidence cannot convince you otherwise. For example if you assume that prior probability of X is zero, then with proper updating no evidence can convince you about X, because there is always an alternative explanation with a very small but non-zero evidence (e.g. the lords of Matrix are messing with your brain). Even if the value is technically non-zero, it could be very small like 1/10^999999999, so all the evidence you could get within your human life could not make you change your mind.

  • On the other hand, some hypotheses do deserve very low prior probability, because reasoning like "any hypothesis, however unlikely, has prior probability at least 0.01" can be exploited by a) Pascal's mugging, b) constructing multiple mutually exclusive hypotheses which together have arbitrarily high probability (e.g. "AAA is the god of this world and I am his prophet", "AAB is the god of this world and I am his prophet"... "ZZZ is the god of this world and I am his prophet").

Comment author: MedicJason 03 May 2013 02:35:11PM 0 points [-]

Thank you for your reply. It does clear up some of the virtues of SI, especially when used to generate priors absent any evidence. However, as I understand it, SI does take into account evidence - one removes all the possibilities incompatible with the evidence, then renormalizes the probablities of the remaining possibilities. Right?

If so, one could still ask - after taking account of all available evidence - is SI then well-calibrated? (At some point it should be well-calibrated, right? More calibrated than human beings. Otherwise, how is it useful? Or why should we use it for induction?)

Essentially the theory seems to predict that possible (evidence-compatible) events or states in the universe will occur in exact or fairly exact proportion to their relative complexities as measured in bits. Possibly over-simplifying, this suggests that if I am predicting between 2 (evidence-compatible) possibilities, and one is twice as information-complex as the other, then it should actually occur 1/3 of the time. Is there any evidence that this is actually true?

(I can see immediately that one would have to control for the number of possible "paths" or universe-states or however you call it that could lead to each event, in order for the outcome to be directly proportional to the information-complexity. I am ignoring this because the inability to compute this appears to be the reason SI as a whole cannot be computed.)

You suggest above that SI explains why Occam's razor works. I could offer another possibility - that Occam's Razor works because it is vague, but that when specified it will not turn out to match how the universe actually works very precisely. Or that Occam's Razor is useful because it suggests that when generating a Map one should use only as much information about the Territory is as is necessary for a certain purpose, thereby allowing one to get maximum usefulness with minimum cognitive load on the user.

I am not arguing for one or the other. Instead I am just asking, here among people knowledgeable about SI - Is there any evidence that outcomes in the universe actually occur with probablities in proportion to their information-complexity? (A much more precise claim than Occam's suggestion that in general simpler explanations are preferable.)

Maybe it will not be possible to answer my question until SI can at least be estimated, in order to actually make the comparison?

(Above you refer to "all mathematically possible universes." I phrased things in terms of probabilities inside a single universe because that is the context in which I observe & make decisions and would like SI to be useful. However I think you could just translate what I have said back into many-worlds language and keep the question intact.)

Comment author: MedicJason 03 May 2013 12:02:39AM 2 points [-]

Hi, my name is Jason, this is my first post. I have recently been reading about 2 subjects here, Calibration and Solomoff Induction; reading them together has given me the following question:

How well-calibrated would Solomonoff Induction be if it could actually be calculated?

That is to say, if one generated priors on a whole bunch of questions based on information complexity measured in bits - if you took all the hypotheses that were measured at 10% likely - would 10% of those actually turn out to be correct?

I don't immediately see why Solomonoff Induction should be expected to be well-calibrated. It appears to just be a formalization of Occam's Razor, which itself is just a rule of thumb. But if it turned out not to be well-calibrated, it would not be a very good "recipe for truth." What am I missing?