Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

royf comments on Conservation of Expected Evidence - Less Wrong

68 Post author: Eliezer_Yudkowsky 13 August 2007 03:55PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (77)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: royf 08 October 2012 08:33:11PM 1 point [-]

You're not really wrong. The thing is that "Occam's razor" is a conceptual principle, not one mathematically defined law. A certain (subjectively very appealing) formulation of it does follow from Bayesianism.

P(AB model) \propto P(AB are correct) and P(A model) \propto P(A is correct). Then P(AB model) <= P(A model).

Your math is a bit off, but I understand what you mean. If we have two sets of models, with no prior information to discriminate between their members, then the prior gives less probability to each model in the larger set than in the smaller one.

More generally, if deciding that model 1 is true gives you more information than deciding that model 2 is true, that means that the maximum entropy given model 1 is lower than that given model 2, which in turn means (under the maximum entropy principle) that model 1 was a-priori less likely.

Anyway, this is all besides the discussion that inspired my previous comment. My point was that even without Popper and Jaynes to enlighten us, science was making progress using other methods of rationality, among which is a myriad of non-Bayesian interpretations of Occam's razor.

Comment author: Decius 08 October 2012 08:44:24PM 0 points [-]

How does deciding one model is true give you more information? Did you mean "If a model allows you to make more predictions about future observations, then it is a priori less likely?"

Comment author: royf 08 October 2012 09:43:09PM 0 points [-]

How does deciding one model is true give you more information?

Let's assume a strong version of Bayesianism, which entails the maximum entropy principle. So our belief is the one that has the maximum entropy, among those consistent with our prior information. If we now add the information that some model is true, this generally invalidate our previous belief, making the new maximum-entropy belief one of lower entropy. The reduction in entropy is the amount of information you gain by learning the model. In a way, this is a cost we pay for "narrowing" our belief.

The upside of it is that it tells us something useful about the future. Of course, not all information regarding the world is relevant for future observations. The part that doesn't help control our anticipation is failing to pay rent, and should be evacuated. The part that does inform us about the future may be useful enough to be worth the cost we pay in taking in new information.

I'll expand on all of this in my sequence on reinforcement learning.

Comment author: Decius 11 October 2012 04:58:37AM 0 points [-]

At what point does the decision "This is true" diverge from the observation "There is very strong evidence for this", other than in cases where the model is accepted as true despite a lack of strong evidence?

I'm not discussing the case where a model goes from unknown to known- how does deciding to believe a model give you more information than knowing what the model is and the reason for the model. To better model an actual agent, one could replace all of the knowledge about why the model is true with the value of the strength of the supporting knowledge.

How does deciding that things always fall down give you more information than observing things fall down?

Comment author: CynicalOptimist 19 August 2016 03:18:21PM *  0 points [-]

I believe the idea was to ask "hypothetically, if I found out that this hypothesis was true, how much new information would that give me?"

You'll have two or more hypotheses, and one of them is the one that would (hypothetically) give you the least amount of new information. The one that would give you the least amount of new information should be considered the "simplest" hypothesis. (assuming a certain definition of "simplest", and a certain definition of "information")

Comment author: aspera 08 October 2012 09:23:22PM 0 points [-]

Crystal clear. Sorry to distract from the point.