timtyler comments on Metaphilosophical Mysteries - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (255)
Re: "I don't understand how this notion of "update speed" translates into the Bayesian setting."
Say you think p(heads) is 0.5. If you see ten heads in a row, do you update p(heads) a lot, or a little? It depends on how confident you are of your estimate.
If you had previously seen a thousand coin flips from the same coin, you might be confident of p(heads) being 0.5 - and therefore update little. If you were told that it was a biased coin from a magician, then your estimate of p(heads) being 0.5 might be due to not knowing which way it was biased. Then you might update your estimate of p(heads) rapidly - on seing several heads in a row.
Like that.
What you have just laid out are not different "update speeds" but different priors. "It's a biased coin from a magician" is of the same class of prior assumptions as "It's probably a fair coin" or "It's a coin with some fixed probability of landing heads, but I have no idea what" or "It's a rigged coin that can only come up heads 10 times once activated".
After each toss, you do precisely one Bayesian update. Perhaps the notion of "update speed" might make sense in a more continuous setting, but in a discrete setting like this it is clear it does not. The amount you update is determined by Bayes' Law; different apparent "update speeds" are due to differing priors. "Speed" probably isn't even a good term, as updates aren't even necessarily in the same direction! If you think the coin can only come up heads 10 times, each appearance of heads makes it less likely to come up again.
"Update speed" seems fine to me - when comparing:
.5, .500001, .500002, .500003, .500004...
....with...
.5, 0.7, 0.9, 0.94, 0.96
...but use whatever term you like.
That's a statistic, not a parameter - and it's a statistic ultimately determined by the prior.
I do not know where the idea that "speeds" are "parameters" and not "statistics" comes from. An entity being a statistic doesn't imply that it is not a speed.
The same goes for discrete systems. They have the concept of speed too:
http://en.wikipedia.org/wiki/Glider_%28Conway%27s_Life%29
This is utterly irrelevant. The problem with what you say is not that there's no notion of speed, it's that there is precisely one way of doing updates, and it has no "speed" parameter.
In the game of life, the update speed is always once per generation. However, that doesn't mean it has no concept of speed. In fact the system exhibits gliders with many different speeds.
It's much the same with an intelligent agent's update speed in response to evidence - some will update faster than others - depending on what they already know.
You claimed that:
"Perhaps the notion of "update speed" might make sense in a more continuous setting, but in a discrete setting like this it is clear it does not."
However, the concept of "speed" works equally well in discrete and continuous systems - as the GOL illustrates. "Discreteness" is an irrelevance.
You really seem to be missing the point here. I'm sorry but from your posts I can't help but get the idea that you don't really understand how this sort of prediction scheme works. Sure, "update speed" in the sense you described it elsewhere in the thread makes sense, but who cares? Update speed in these you described it elsewhere in the thread is a consequence of the prior (or current state, rather), it isn't some sort of parameter, and it's not clear it's something at all stable or meaningful. You've established the existence of something trivial and probably irrelevant. In the parametric sense you seemed to be originally using it, it doesn't exist. Can we agree on this?
Probably nobody cares - apart from you, it seems. Apparently, one can't get away with using the phrase "update speed" in connection with an intelligent agent without getting bounced.
When you said:
"I don't understand how this notion of "update speed" translates into the Bayesian setting."
...and I said...
"Say you think p(heads) is 0.5. If you see ten heads in a row, do you update p(heads) a lot, or a little? It depends on how confident you are of your estimate. If you had previously seen a thousand coin flips from the same coin, you might be confident of p(heads) being 0.5 - and therefore update little. If you were told that it was a biased coin from a magician, then your estimate of p(heads) being 0.5 might be due to not knowing which way it was biased. Then you might update your estimate of p(heads) rapidly - on seing several heads in a row. Like that."
...IMO, the conversation could and should have stopped - right there.
This is not analogous. We are speaking of a complete system here.
I have already addressed this. What you have called "update speed" is determined by current distribution.
Re: "We are speaking of a complete system here"
I assure you that I could exhibit a GOL field that consisted entirely of gliders moving at c/2 - and then exhibit another GOL field that consisted entirely of gliders moving at c/4. These systems would have different characteristic speeds. Hopefully, you see the analogy now.
OK, sure. But then to continue the analogy, in the resulting speed is a function of the initial configuration. :)
Hm; there may not be a disagreement here. You seemed to be using it in a way that implied it was not determined by (or even was independent of) the prior. Was I mistaken there?
The idea was that some agents update faster than others (or indeed not at all).
If you like you can think of the agents that update relatively slowly as being confident that they are uncertain about the things they are unsure about. That confidence in their own uncertainty could indeed be represented by other priors.
That's not "other priors", there's just one prior. All the probabilities in Bayes' Rule come from the updated-to-current version of the prior.
Other prior probabilities. There is one prior set of probabilities, which is composed of many prior probabilities and probability distributions.
If you want to think about it that way, please don't say "other priors". That's very confusing, because "prior" in this context refers to the whole prior, not to pieces of it (which I'm not sure how you're detangling from each other, anyway). If we're talking about something of the universal-prior sort, it has one prior, over its total sensory experience; I'm not clear how you're decomposing that or what alternative model you are suggesting.