All of Kurros's Comments + Replies

Kurros00

Keynes in his "Treatise on probability" talks a lot about analogies in the sense you use it here, particularly in "part 3: induction and analogy". You might find it interesting.

Kurros00

Hmm, thanks. Seems similar to my description above, though as far as I can tell it doesn't deal with my criticisms. It is rather evasive when it comes to the question of what status models have in Bayesian calculations.

Kurros00

I am curious; what is the general LessWrong philosophy about what truth "is"? Personally I so far lean towards accepting an operational subjective Bayesian definition, i.e. the truth of a statement is defined only so far as we agree on some (in principle) operational procedure for determining its truth; that is we have to agree on what observations make it true or false.

For example "it will rain in Melbourne tomorrow" is true if we see it raining in Melbourne tomorrow (trivial, but also means that the truth of the statement doesn't depe... (read more)

0ChristianKl
To the extend that there a general philosophy it's http://lesswrong.com/lw/eqn/the_useful_idea_of_truth/ but individual people might differ slightly.
Kurros00

Lol that is a nice story in that link, but it isn't a Dutch book. The bet in it isn't set up to measure subjective probability either, so I don't really see what the lesson in it is for logical probability.

Say that instead of the digits of pi, we were betting on the contents of some boxes. For concreteness let there be three boxes, one of which contains a prize. Say also that you have looked inside the boxes and know exactly where the prize is. For me, I have some subjective probability P( X_i | I_mine ) that the prize is inside box i. For you, all your s... (read more)

Kurros30

That sounds to me more like an argument for needing lower p-values, not higher ones. If there are many confounding factors, you need a higher threshold of evidence for claiming that you are seeing a real effect.

Physicists need low p-values for a different reason, namely that they do very large numbers of statistical tests. If you choose p=0.05 as your threshold then it means that you are going to be claiming a false detection at least one time in twenty (roughly speaking), so if physicists did this they would be claiming false detections every other day and their credibility would plummet like a rock.

Kurros00

Is there any more straightforward way to see the problem? I argued with you about this for a while and I think you convinced me, but it is still a little foggy. If there is a consistency problem, surely this means that we must be vulnerable to Dutch books doesn't it? I.e. they would not seem to be Dutch books to us, with our limited resources, but a superior intelligence would know that they were and would use them to con us out of utility. Do you know of some argument like this?

0Manfred
Yes, this is right. Also, http://www.spaceandgames.com/?p=27 :) If I know all the digits of pi and you think they're evenly distributed past a certain point, I can take your money. In order to resist this, you need to have a hypothesis for "Manfred will pick the right number" - which, fortunately, is very doable, because the complexity of this hypothesis is only about the complexity of a program that computes the digits of pi. But nonetheless, until you figure this out, that's the dutch book.
Kurros00

Very well, then i will wait for the next entry. But i thought the fact that we were explicitly discussing things the robot could not compute made it clear that resources were limited. There is clearly no such thing as logical uncertainty to the magic logic god of the idealised case.

Kurros00

No we aren't, we're discussing a robot with finite resources. I obviously agree that an omnipotent god of logic can skip these problems.

0VAuroch
The limitation imposed by the bounded resources are the next entry in the sequence. For this, we're still discussing the unbounded case.
Kurros-10

It was your example, not mine. But you made the contradictory postulate that P("wet outside"|"rain")=1 follows from the robots prior knowledge and the probability axioms, and simultaneously that the robot was unable to compute this. To correct this I alter the robots probabilities such that P("wet outside"|"rain")=0.5 until such time as it has obtained a proof that "rain" correlates 100% with "wet outside". Of course the axioms don't determine this; it is part of the robots prior, which is not det... (read more)

0VAuroch
No, you butchered it into a different example. Introduced the Lewis Carroll Paradox, even. He showed you. You weren't paying attention. It can compute the proof. The laws of inference are axioms; P(A|B) is necessarily known a priori. There is no such time. Either it's true initially, or it will never be established with certainty. If it's true initially, that's because it is an axiom. Which was the whole point.
Kurros-20

You haven't been very specific about what you think I'm doing incorrectly so it is kind of hard to figure out what you are objecting to. I corrected your example to what I think it should be so that it satisfies the product rule; where's the problem? How do you propose that the robot can possibly set P("wet outside"|"rain")=1 when it can't do the calculation?

0Manfred
In your example, it can't. Because the axioms you picked do not determine the answer. Because you are incorrectly translating classical logic into probabilistic logic. And then, as one would expect, your translation of classical logic doesn't reproduce classical logic.
Kurros-10

Ok sure, so you can go through my reasoning leaving out the implication symbol, but retaining the dependence on the proof "p", and it all works out the same. The point is only that the robot doesn't know that A->B, therefore it doesn't set P(B|A)=1 either.

You had "Suppose our robot knows that P(wet outside | raining) = 1. And it observes that it's raining, so P(rain)=1. But it's having trouble figuring out whether it's wet outside within its time limit, so it just gives up and says P(wet outside)=0.5. Has it violated the product rule? Yes... (read more)

0Manfred
I'm just going to give up and hope you figure it on your own.
Kurros-10

Hmm this does not feel the same as what I am suggesting.

Let me map my scenario onto yours:

A = "raining"

B = "wet outside"

A->B = "It will be wet outside if it is raining"

The robot does not know P("wet outside" | "raining") = 1. It only knows P("wet outside" | "raining", "raining->wet outside") = 1. It observes that it is raining, so we'll condition everything on "raining", taking it as true.

We need some priors. Let P("wet outside") = 0.5. We also need a ... (read more)

0Manfred
Do you know what truth tables are? The statement "A->B" can be represented on a truth table. A and B can be possible. not-A and B can be possible. Not-A and not-B can be possible. But A and not-B is impossible. A->B and the four statements about the truth table are interchangeable. Even though when I talk about the truth table, I never need to use the "->" symbol. They contain the same content because A->B says that A and not-B is impossible, and saying that A and not-B is impossible says that A->B. For example, "it raining but not being wet outside is impossible." In the language of probability, saying that P(B|A)=1 means that A and not-B is impossible, while leaving the other possibilities able to vary freely. The product rule says P(A and not-B) = P(A) * P(not-B | A). What's P(not-B | A) if P(B | A)=1? It's zero, because it's the negation of our assumption. Writing out things in classical logic doesn't just mean putting P() around the same symbols. It means making things behave the same way.
0[anonymous]
Kurros00

But it turns out that there is one true probability distribution over mathematical statements, given the axioms. The right distribution is obtained by straightforward application of the product rule - never mind that it takes 4^^^3 steps - and if you deviate from the right distribution that means you violate the product rule at some point.

This does not seem right to me. I feel like you are sneakily trying to condition all of the robots probabilities on mathematical proofs that it does not have a-priori. E.g. consider A, A->B, therefore B. To learn th... (read more)

1Manfred
Well, Cox's theorem has as a requirement that when your axioms are completely certain, you assign probability 1 to all classical consequences of those axioms. Assigning probability 0.5 to any of those consequences thus violates Cox's theorem. But this is kind of unsatisfying, so: where do we violate the product rule? Suppose our robot knows that P(wet outside | raining) = 1. And it observes that it's raining, so P(rain)=1. But it's having trouble figuring out whether it's wet outside within its time limit, so it just gives up and says P(wet outside)=0.5. Has it violated the product rule? Yes. P(wet outside) >= P(wet outside and raining) = P(wet outside | rain) * P(rain) = 1. If we accept that the axioms have probability 1, we can deduce the consequences with certainty using the product rule. If at any point we stop deducing the consequences with certainty, this means we have stopped using the product rule.
Kurros00

Perhaps, though, you could argue it differently. I have been trying to understand so-called "operational" subjective statical methods recently (as advocated by Frank Lad and his friends), and he is insisting on only calling a thing a [meaningful, I guess] "quantity" when there is some well-defined operational procedure for measuring what it is. Where for him "measuring" does not rely on a model, he is refering to reading numbers off some device or other, I think. I don't quite understand him yet, since it seems to me that the numbers reported by devices all rely on some model or other to define them, but maybe one can argue their way out of this...

Kurros20

Thanks, this seems interesting. It is pretty radical; he is very insistent on the idea that for all 'quantities' about which we want to reason there must some operational procedure we can follow in order to find out what it is. I don't know what this means for the ontological status of physical principles, models, etc, but I can at least see the naive appeal... it makes it hard to understand why a model could ever have the power to predict new things we have never seen before though, like Higgs bosons...

Kurros00

An example of a "true number" is mass. We can measure the mass of a person or a car, and we use these values in engineering all the time. An example of a "fake number" is utility. I've never seen a concrete utility value used anywhere, though I always hear about nice mathematical laws that it must obey.

It is interesting that you choose mass as your prototypical "true" number. You say we can "measure" the mass of a person or car. This is true in the sense that we have a complex physical model of reality, and in one... (read more)

0cousin_it
Yeah, that sounds right. You could say that a "true" number is a model parameter that fits the observed data well.
Kurros00

Sure, I don't want to suggest we only use the word 'probability' for epistemic probabilities (although the world might be a better place if we did...), only that if we use the word to mean different sorts of probabilities in the same sentence, or even whole body of text, without explicit clarification, then it is just asking for confusion.

Kurros40

Hmm, do you know of any good material to learn more about this? I am actually extremely sympathetic to any attempt to rid model parameters of physical meaning; I mean in an abstract sense I am happy to have degrees of belief about them, but in a prior-elucidation sense I find it extremely difficult to argue about what it is sensible to believe a-priori about parameters, particularly given parameterisation dependence problems.

I am a particle physicist, and a particular problem I have is that parameters in particle physics are not constant; they vary with re... (read more)

0Cyan
I can pass along a recommendation I have received: Operational Subjective Statistical Methods by Frank Lad. I haven't read the book myself, so I can't actually vouch for it, but it was described to me as "excellent". I don't know if it is actively prediction-centered, but it should at least be compatible with that philosophy.
Kurros20

Hmm, interesting. I will go and learn more deeply what de Finetti was getting at. It is a little confusing... in this simple case ok fine p can be defined in a straightforward way in terms of the predictive distribution, but in more complicated cases this quickly becomes extremely difficult or impossible. For one thing, a single model with a single set of parameters may describe outcomes of vastly different experiments. E.g. consider Newtonian gravity. Ok fine strictly the Newtonian gravity part of the model has to be coupled to various other models to des... (read more)

2Cyan
I'd guess that in Geisser-style predictive inference, the meaning or reality or what-have-you of G is to be found in the way it encodes the dependence (or maybe, compresses the description) of the joint multivariate predictive distribution. But like I say, that's not my school of thought -- I'm happy to admit the possibility of physical model parameters -- so I really am just guessing.
Kurros20

Are you referring to De Finetti's theorem? I can't say I understand your point. Does it relate to the edit I made shortly before your post? i.e. Given a stochastic model with some parameters, you then have degrees of belief about certain outcomes, some of which may seem almost the same thing as the parameters themselves? I still maintain that the two are quite different: parameters characterise probability distributions, and just in certain cases happen to coincide with conditional degrees of belief. In this 'beliefs about beliefs' context, though, it is the parameters we have degrees of belief about, we do not have degrees of belief about the conditional degrees of belief to which said parameters may happen to coincide.

4Cyan
Yup, I'm referring to de Finetti's theorem. Thing is, de Finetti himself would have denied that there is such a thing as a parameter -- he was all about only assigning probabilities to observable, bet-on-able things. That's why he developed his representation theorem. From his perspective, p arises as a distinct mathematical entity merely as a result of the representation provided by exchangeability. The meaning of p is to be found in the predictive distribution; to describe p as a bias parameter is to reify a concept which has no place in de Finetti's Bayesian approach. Now, I'm not a de-Finetti-style subjective Bayesian. For me, it's enough to note that the math is the same whether one conceives of p as stochastic model parameter or as the degree of plausibility of any single outcome. That's why I say it's not either/or.
Kurros70

"Jonah was looking at probability distributions over estimates of an unknown probability (such as the probability of a coin coming up heads)"

It sounds like you are just confusing epistemic probabilities with propensities, or frequencies. I.e, due to physics, the shape of the coin, and your style of flipping, a particular set of coin flips will have certain frequency properties that you can characterise by a bias parameter p, which you call "the probability of landing on heads". This is just a parameter of a stochastic model, not a degre... (read more)

0VipulNaik
I understand this, though I hadn't thought of it with such clear terminology. I think the point Jonah was making was that in many cases, people are talking about propensities/frequencies when they refer to probabilities. So it's not so much that Jonah or I are confusing epistemic probabilities with propensities/frequencies, it's that many people use the term "probability" to refer to the latter. With language used this way, the probability distribution for this model parameter can be called the "probability distribution of the probability estimate." If you reserve the term probability exclusive to epistemic probability (degree of belief) then this would constitute an abuse of language.
2Cyan
This is not exactly correct. It's true that in general there's a sharp distinction to be made between model parameters (which govern/summarize/encode properties of the entire stochastic process) and degrees of belief for various outcomes, but that distinction becomes very blurry in the current context. What's going on here is that the probability distribution for the observable outcomes is infinitely exchangeable. Infinite exchangeability gives rise to a certain representation for the predictive distribution under which the prior expected limiting frequency is mathematically equal to the marginal prior probability for any single outcome. So under exchangeability, it's not an either/or -- it's a both/and.
Kurros00

"I view these sorts of distributions over distributions as that- there's some continuous parameter potentially in the world (the proportion of white and black balls in the urn), and that continuous parameter may determine my subjective probability about binary events (whether ball #1001 is white or black)."

To me this just sounds like standard conditional probability. E.g. let p(x|I) be your subjective probability distribution over the parameter x (fraction of white balls in urn), given prior information I. Then

p("ball 1001 is white"|I)... (read more)

Kurros00

Lol ok, so long as I get my answer eventually :p.

Kurros00

Was the "Putting in the Numbers" post the one you were referring to? You didn't post that on Saturday, but now it is Monday and there doesn't seem be a third post. Anyway I did not see this question answered anywhere in "Putting in the Numbers"...

0Manfred
Yeah, sorry, I've been delayed by the realization that everything I wrote for the forthcoming post needed a complete re-write. Planning fallacy!
Kurros10

Yeah I think integral( p*log(p) ) is it. The simplest problem is that if I have some parameter x to which I want to assign a prior (perhaps not over the whole real set, so it can be proper as you say -- the boundaries can be part of the maxent condition set), then via the maxent method I will get a different prior depending on whether I happen to assign the distribution over x, or x^2, or log(x) etc. That is, the prior pdf obtained for one parameterisation is not related to the one obtained for a different parameterisation by the correct transformation rul... (read more)

Kurros20

Refering to this:

"Simply knowing the fact that the entropy is concave down tells us that to maximize entropy we should split it up as evenly as possible - each side has a 1/4 chance of showing."

Ok, that's fine for discrete events, but what about continuous ones? That is, how do I choose a prior for real-valued parameters that I want to know about? As far as I am aware, MAXENT doesn't help me at all here, particularly as soon as I have several parameters, and no preferred parameterisation of the problem. I know Jaynes goes on about how continuous ... (read more)

0Manfred
Well, you can still define information entropy for probability density functions - though I suppose if we ignore Jaynes we can probably get paradoxes if we try. In fact, I'm pretty sure just integrating p*Log(p) is right. There's also a problem if you want to have a maxent prior over the integers or over the real numbers; that takes us into the realm of improper priors. I don't know as much as I should about this topic, so you may have to illustrate using an example before I figure out what you mean.
Kurros50

It would have been kind of impossible to work on AI in 1850, before even modern set theory was developed. Unless by work on AI, you mean work on mathematical logic in general.

Kurros00

Ok, but do you really mean that sentence how it is written? To me it means the same thing as saying that assigning probability to anything is logically equivalent to assigning probability to 0=1 (which I am perfectly happy to do so if that is the point then fine, but that doesn't seem to be your implication)

Kurros00

"But to assign some probability to the wrong answer is logically equivalent to assigning probability to 0=1."

Only if you know it is the wrong answer. You say the robot doesn't know, so what's the problem? We assign probabilities to propositions which are wrong all the time, before we know if they are wrong or not.

0Manfred
I'll tell you on Saturday!
Kurros20

The statistics also remains important at the frontier of high energy physics. Trying to do reasoning about what models are likely to replace the Standard Model is plagued by every issue in the philosophy of statistics that you can imagine. And the arguments about this affect where billions of dollars worth of research funding end up (build bigger colliders? more dark matter detectors? satellites?)

1A1987dM
Sure; if we had enough data to conclusively answer a question it would no longer be at the frontier. :-) (I disagree with several of the claims in the linked post, but that's another story.)
Kurros10

I can't disagree with that :p. I will concede that the survey question needs some refinement.

Kurros10

Hmm, I couldn't agree with that later definition. Physics is just the "map" after all, and we are always improving it. Mathematics (or some future "completed" mathematics) seems to me the space of things that are possible. I am not certain, but this might be along the lines of what Wittgenstein means when he says things like

"In logic nothing is accidental: if a thing can occur in an atomic fact the possibility of that atomic fact must already be prejudged in the thing.

If things can occur in atomic facts, this possibility must alre... (read more)

Kurros10

But don't you think there is an important distinction between events that defy logical description of any kind, and those that merely require an outlandish multi-layered reality to explain? I admit I can't think of anything that could occur in our world that cannot be explained by the simulation hypothesis, but assuming that some world DOES exist outside the layers of nested simulation I can (loosely speaking) imagine that some things really are logically impossible there. And that if the inhabitants of that world observe such impossible events, well, they... (read more)

Kurros10

I'm no theologian, but it seems to me that this view of the supernatural does not conform to the usual picture of God philosophers put forward, in terms of being the "prime mover" and so on. They are usually trying to solve the "first cause" problem, among other things, which doesn't really mesh with God as the super-scientist, since one is still left wondering about where the world external to the simulation comes from.

I agree that my definition of the supernatural is not very useful in practice, but I think it is necessary if one is t... (read more)

1scav
Well, I can't find any use for the word supernatural myself, even in connection with God. It doesn't seem to mean anything. I can imagine discussing God as a hypothetical natural phenomenon that a universe containing sentient life might have, for example, without the s word making any useful contribution. Maybe anything in mathematics that doesn't correspond to something in physics is supernatural? Octonions perhaps, or the Monster Group. (AFAIK, not being a physicist or mathematician)
0hyporational
I'd like to keep the word supernatural in my (inner?) vocabulary, but "unconstrained by physics" makes absolutely no sense to me, so I tried to choose a definition that doesn't make my brain hurt. If we inspect the roots of the word, you can see it roughly means "above nature", nature here being the observable universe whether it's a simulation or not. I find this definition suits the situation pretty well.
Kurros30

To me, the simulation hypothesis definitely does not imply a supernatural creator. 'Supernatural' implies 'unconstrained by natural laws', at least to me, and I see no reason to expect that the simulation creators are free from such constraints. Sure, it means that supernatural-seeming events can in principle occur inside the simulation, and the creators need not be constrained by the laws of the simulation since they are outside of it, but I fully expect that some laws or other would govern their behaviour.

2Lion
A bold, but reasonable expectation that I agree with. There MUST be SOME laws, even if we don't know what they are.
9ialdabaoth
To me, "Supernatural" needs to be evaluated from within the framework of the speaker's reality. Otherwise, the term loses all possible semantic meaning.
Kurros20

You don't think people here have a term for their survey-completing comrades in their cost function? Since I probably won't win either way this term dominated my own cost function, so I cooperated. An isolated defection can help only me, whereas an isolated cooperation helps everyone else and so gets a large numerical boost for that reason.

1aspera
It's true: if you're optimizing for altruism, cooperation is clearly better. I guess it's not really a "dilemma" as such, since the optimal solution doesn't depend at all on what anyone else does. If you're trying to maximize EV, defect. If you're trying to maximize other people's EV, cooperate.
Kurros20

Lol, I cooperated because $60 was not a large enough sum of money for me to really care about trying to win it, and in the calibration I assumed most people would feel similarly. Reading your reasoning here, however, it is possible I should have accounted more strongly for people who like to win just for the sake of winning, a group that may be larger here than in the general population :p.

Edit: actually that's not really what I mean. I mean people who want to make a rational choice to maximum the probability of winning for its own sake, even if they don't... (read more)

2Ander
Agreed, I think that the rational action in this scenario depends on one's goal, and there are different things you could choose as your goal here. I also think I shouldve set a higher value for my 90% confidence of the number of people who would cooperate, because its quite possible that a lot more peopel than I expected would choose alternate goals for this other than 'winning'.
Kurros30

It defined "God" as supernatural didn't it? In what sense is someone running a simulation supernatural? Unless you think for some reason that the real external world is not constrained by natural laws?

1Yaakov T
For a discussion of the meaning of supernatural see here: http://onlinelibrary.wiley.com/doi/10.1525/eth.1977.5.1.02a00040/pdf
0scav
If everything in your universe is a simulation, then the external implementation of it is at least extra-natural from your point of view, not constrained by any of the simulated natural laws. So you might as well call it supernatural if you like. If you include all layers of simulation all the way out to base reality as part of the one huge natural system, then everything is natural, even if most of it is unknowable.
0hyporational
We had some discussion of this here.
3Lion
Maybe my definition of "supernatural" isn't the correct definition, but I often think of the word as describing certain things which we do not (currently) understand. And if we do eventually come to understand them, then we will need to augment our understanding of the natural laws...Assuming this "supernatural" stuff actually exists. I suppose a programer could defy the laws he made for his virtual world when he intervenes from outside the system....But earthly programers obey the natural physical laws when they mess with the hardware, which also runs based on these same laws. I understand this is what you mean by "constrained by natural laws".
Kurros20

In this case, Feynman is worth listening to slowly. There is something about the way he explains this that the transcript does not do justice to.

Kurros30

When you prove something in mathematics, at very least you implicitly assume you have made no mistakes anywhere, are not hallucinating, etc. Your "real" subjective degree of belief in some mathematical proposition, on the other hand, must take all these things into account.

For practical purposes the probability of hallucinations etc. may be very small and so you can usually ignore them. But the OP is right to demonstrate that in some cases this is a bad approximation to make.

Deductive logic is just the special limiting case of probability theory... (read more)

-36DONNAbrian
Kurros00

It is not very useful to discriminate between "seeing with your eyes" and "seeing with the aid of scientific instruments". Vast amounts of information processing occurs between light landing on your retina and an image forming in your brain, so if you are happy to call looking through glasses, or a microscope, or a telescope, "seeing with your eyes" then I see no reason to make a distinction when the information-carrying particle switches from photons to electrons. Especially since we mostly use digital microscopes etc. these days.

0mwengler
Sure and most of the stories I hear I actually read printed words off a page. Somehow, I'd like people describing things to me to not worry so much about what is more important as much as I'd like them to worry about whether what they are saying is accurate. Even if a distinction is claimed to be not important by the teller, they can still stick to accurate descriptions. And sometimes, you know, people disagree about what is and isn't important, and accuracy allows them to still communicate in a productive way.
0Creutzer
You may argue that it is not useful, but it is still natural.
Kurros00

Bayes theorem only works with as much information as you put into it. Humans can only ever be approximate Bayesian agents. If you learn about some proposition you never though of before it is not a failing of Bayesian reasoning, it is just that you learn you have been doing it wrong up until that point and have to recompute everything.

Kurros00

I'd just like to point out that even #1 of the OP's "lessons" is far more problematic than they make it seem. Consider the statement:

"The fact that there are myths about Zeus is evidence that Zeus exists. Zeus's existing would make it more likely for myths about him to arise, so the arising of myths about him must make it more likely that he exists." (supposedly an argument of the form P(E | H) > P(E)).

So first, "Zeus's existing would make it more likely for myths about him to arise" - more likely than what? Than "a pr... (read more)

Kurros00

If you are introduced to 5 blue-haired Xians but no black-haired Xians, you might infer that all or most Xians have blue hair. That is a pretty obvious case of sampling bias.

If a-priori you had no reason to expect that the population was dominantly blue-haired then you should begin to suspect some alternative hypothesis, like your sampling is biased for some reason, rather than believe everyone is blue haired.

Kurros50

Of course acting on beliefs is a decision theory matter. You don't have terribly much to lose by buying a losing lottery ticket, but you have a very large amount to gain if it wins, so yes 1/132 chance of winning sounds well worth $20 or so.