Comment author: VipulNaik 05 February 2014 02:53:20AM *  2 points [-]

I'm pretty sure nothing I say here will be new to you, so consider this more of an effort to explain to you where I (and I think also Jonah, though I won't categorically speak for him) am coming from.

Jonah was looking at probability distributions over estimates of an unknown probability (such as the probability of a coin coming up heads). Unless you have some objection to probability distributions per se, I don't see anything wrong with taking a probability distribution to describe one's current state of knowledge of a probability.

If your goal is to answer the question "Will this coin come up heads?" for a single coin toss, and you can't run any experiments to augment your knowledge about the model, but only have access to your prior knowledge, then it's true that all your knowledge would be captured in a single probability number, and in case you have a subjective probability distribution, then the single probability number would simply be the expected value of the distribution.

If, however, you are trying to answer a similar question "Will this coin come up heads when I toss it on such-and-such date at such-and-such time?" but you can run experiments before that, it would make sense to use those experiments to try to understand the model that determines how the coin tossing works. Your model may be something like "with fairly extreme probability, I believe that there is a probability p such that the coin toss turns up heads with probability p, and that that probability p is independent of the time and place that it is tossed. I also have a Bayesian prior for the probability distribution of the probability p." You would start with the prior and then run coin-tossing experiments to continue updating that probability distribution of probabilities. The day before your grand toss, you'll need to take the expected value of the probability distribution that you have obtained by then. But at intermediate stages it would make sense to store the entire probability distribution rather than the expected value (the point estimate of the probability). For instance, if you think that the coin is either fair (probability 1/3), or always heads (probability 1/3), or always tails (probability 1/3), then it's worth storing that full prior rather than simply saying that there's a 50% chance of it turning up heads, so that you can appropriately update your evidence. I could also construct higher-order versions of this hypothetical, but they would be too tedious to describe.

Secondly, as Jonah said, if you're running the coin-tossing experiment multiple times and measuring the probability of, say, all heads, then the subjective probability distribution for p does matter for calculating the probability of all heads, and just the point estimate (expected value) of p would give a wrong answer.

Sorry if this isn't clear -- I can elaborate more later.

Comment author: Kurros 05 February 2014 03:25:52AM *  4 points [-]

"Jonah was looking at probability distributions over estimates of an unknown probability (such as the probability of a coin coming up heads)"

It sounds like you are just confusing epistemic probabilities with propensities, or frequencies. I.e, due to physics, the shape of the coin, and your style of flipping, a particular set of coin flips will have certain frequency properties that you can characterise by a bias parameter p, which you call "the probability of landing on heads". This is just a parameter of a stochastic model, not a degree of belief.

However, you can have a degree of belief about what p is no problem. So you are talking about your degree of belief that a set of coin flips has certain frequentist properties, i.e. your degree of belief in a particular model for the coin flips.

edit: I could add that GIVEN a stochastic model you then have degrees of belief about whether a given coin flip will result in heads. But this is a conditional probability: see my other comment in reply to Vanvier. This is not, however, "beliefs about beliefs". It is just standard Bayesian modelling.

Comment author: Vaniver 04 February 2014 06:29:35AM *  0 points [-]

For questions of a continuous nature, you think that subjective probability is best expressed as a distribution over the continuous support, right? I view these sorts of distributions over distributions as that- there's some continuous parameter potentially in the world (the proportion of white and black balls in the urn), and that continuous parameter may determine my subjective probability about binary events (whether ball #1001 is white or black).

Now, whether or not this formalism stretches to other ideas might be controversial. I might consider "the strength of the argument for Conclusion X" as having continuous support, possibly from 0 to 1, and so be able to express with my probability distribution over that how much more I expect to learn about the issue, but I can see reasons to avoid doing that.

[edit]That is, rather than modifying the likelihood ratios of all of the pieces of evidence for or against the argument being strong, I can modify my distribution on it. I think this runs in to trouble with, say, argument screening off authority- there's a case where you really do want to modify the likelihood ratios.

Comment author: Kurros 05 February 2014 01:17:28AM *  0 points [-]

"I view these sorts of distributions over distributions as that- there's some continuous parameter potentially in the world (the proportion of white and black balls in the urn), and that continuous parameter may determine my subjective probability about binary events (whether ball #1001 is white or black)."

To me this just sounds like standard conditional probability. E.g. let p(x|I) be your subjective probability distribution over the parameter x (fraction of white balls in urn), given prior information I. Then

p("ball 1001 is white"|I) = integral_x { p("ball 1001 is white"|x,I)*p(x|I) } dx

So your belief in "ball 1001 is white" gets modulated by your belief distributions over x, sure. But I wouldn't call this a "distribution over a distribution". Yes, there is a set of likelihoods p("ball 1001 is white"|x,I) which specify your subjective degree of belief in "ball 1001 is white" GIVEN various x, but in then end you want your degree of belief in "ball 1001 is white" considering ALL values that x might have and their relative plausibilities, i.e. you want the marginal likelihood to make your predictions.

(my marginalisation here ignores hypotheses outside the domain implied by there being a fraction of balls in the urn...)

Comment author: Manfred 03 February 2014 03:24:29AM 0 points [-]

Yeah, sorry, I've been delayed by the realization that everything I wrote for the forthcoming post needed a complete re-write. Planning fallacy!

Comment author: Kurros 03 February 2014 04:30:09AM 0 points [-]

Lol ok, so long as I get my answer eventually :p.

Comment author: Manfred 30 January 2014 07:02:57AM 0 points [-]

what's the problem?

I'll tell you on Saturday!

Comment author: Kurros 03 February 2014 03:19:46AM 0 points [-]

Was the "Putting in the Numbers" post the one you were referring to? You didn't post that on Saturday, but now it is Monday and there doesn't seem be a third post. Anyway I did not see this question answered anywhere in "Putting in the Numbers"...

Comment author: Manfred 31 January 2014 05:04:31AM 0 points [-]

Well, you can still define information entropy for probability density functions - though I suppose if we ignore Jaynes we can probably get paradoxes if we try. In fact, I'm pretty sure just integrating p*Log(p) is right. There's also a problem if you want to have a maxent prior over the integers or over the real numbers; that takes us into the realm of improper priors.

I don't know as much as I should about this topic, so you may have to illustrate using an example before I figure out what you mean.

Comment author: Kurros 31 January 2014 08:24:08AM *  1 point [-]

Yeah I think integral( p*log(p) ) is it. The simplest problem is that if I have some parameter x to which I want to assign a prior (perhaps not over the whole real set, so it can be proper as you say -- the boundaries can be part of the maxent condition set), then via the maxent method I will get a different prior depending on whether I happen to assign the distribution over x, or x^2, or log(x) etc. That is, the prior pdf obtained for one parameterisation is not related to the one obtained for a different parameterisation by the correct transformation rule for probability density functions; that is, they contain logically different information. This is upsetting if you have no reason to prefer one parameterisation or another.

In the simplest case where you have no constraints except the boundaries, and maybe expect to get a flat prior (I don't remember if you do when there are boundaries... I think you do in 1D at least) then it is most obvious that a prior flat in x contains very different information to one flat in x^2 or log(x).

Comment author: Kurros 30 January 2014 11:54:53PM *  2 points [-]

Refering to this:

"Simply knowing the fact that the entropy is concave down tells us that to maximize entropy we should split it up as evenly as possible - each side has a 1/4 chance of showing."

Ok, that's fine for discrete events, but what about continuous ones? That is, how do I choose a prior for real-valued parameters that I want to know about? As far as I am aware, MAXENT doesn't help me at all here, particularly as soon as I have several parameters, and no preferred parameterisation of the problem. I know Jaynes goes on about how continuous distributions make no sense unless you know the sequence whose limit you took to get there, in which case problem solved, but I have found this most unhelpful in solving real problems where I have no preference for any particular sequence, such as in models of fundamental physics.

In response to comment by owencb on On saving the world
Comment author: So8res 30 January 2014 09:04:30PM 5 points [-]

There are a number of factors here. Timescales are certainly important. I obviously can't re-organize people at will. Even in a best-case scenario, it would take decades or even centuries to transition social systems, to shift away from governments and nations, and so on. If I believed AI would take millennia, then I'd keep addressing coordination problems. However, AI is also on the decades-to-centuries timescale.

Furthermore, developing an FAI would (depending upon your definition of 'friendly') address coordination problems. Whether my ideas were flawed or not, developing FAI dominates social restructuring.

when historically do you guess the changeover point was?

I'm not quite sure what you mean. Are you asking the historical date at which I believe the value of a person-hour spent on AI research overtook the value of a person-hour spent on restructuring people? I'd guess maybe 1850, in hopes that we'd be ready to build an FAI as soon as we were able to build a computer. This seems like a strange counterfactual to me, though.

In response to comment by So8res on On saving the world
Comment author: Kurros 30 January 2014 11:36:17PM 4 points [-]

It would have been kind of impossible to work on AI in 1850, before even modern set theory was developed. Unless by work on AI, you mean work on mathematical logic in general.

Comment author: Manfred 30 January 2014 07:02:57AM 0 points [-]

what's the problem?

I'll tell you on Saturday!

Comment author: Kurros 30 January 2014 08:10:50AM 0 points [-]

Ok, but do you really mean that sentence how it is written? To me it means the same thing as saying that assigning probability to anything is logically equivalent to assigning probability to 0=1 (which I am perfectly happy to do so if that is the point then fine, but that doesn't seem to be your implication)

Comment author: Kurros 30 January 2014 04:19:13AM *  0 points [-]

"But to assign some probability to the wrong answer is logically equivalent to assigning probability to 0=1."

Only if you know it is the wrong answer. You say the robot doesn't know, so what's the problem? We assign probabilities to propositions which are wrong all the time, before we know if they are wrong or not.

Comment author: [deleted] 10 December 2013 09:53:42AM *  3 points [-]

Hard sciences (basically physics and its relatives), are far less vulnerable to statistical pitfalls because practitioners in those fields have the ability to generate effectively unlimited quantities of data by simply repeating experiments as many times as necessary.

There are exceptions such as ultra-high-energy cosmic ray physics, where it'd take decades to take enough data for naive frequentist statistics to be reliable.

In response to comment by [deleted] on The Statistician's Fallacy
Comment author: Kurros 10 December 2013 10:37:20PM 1 point [-]

The statistics also remains important at the frontier of high energy physics. Trying to do reasoning about what models are likely to replace the Standard Model is plagued by every issue in the philosophy of statistics that you can imagine. And the arguments about this affect where billions of dollars worth of research funding end up (build bigger colliders? more dark matter detectors? satellites?)

View more: Prev | Next