That’s an excellent list of questions! It will help me greatly to systematize my thinking on the topic.
Before replying to the specific items you list, perhaps I should first state the general position I’m coming from, which motivates me to get into discussions of this sort. Namely, it is my firm belief that when we look at the present state of human knowledge, one of the principal sources of confusion, nonsense, and pseudosicence is physics envy, which leads people in all sorts of fields to construct nonsensical edifices of numerology and then pretend, consciously or not, that they’ve reached some sort of exact scientific insight. Therefore, I believe that whenever one encounters people talking about numbers of any sort that look even slightly suspicious, they should be considered guilty until proven otherwise -- and this entire business with subjective probability estimates for common-sense beliefs doesn’t come even close to clearing that bar for me.
Now to reply to your list.
(1) Confession of frequentism. Only sensible numerical probabilities are those related to frequencies, i.e. either frequencies of outcomes of repeated experiments, or probabilities derived from there. (Creative drawing of reference-class boundaries may be permitted.) Especially, prior probabilities are meaningless.
(2) Any sensible numbers must be produced using procedures that ultimately don't include any numerical parameters (maybe except small integers like 2,3,4). Any number which isn't a result of such a procedure is labeled arbitrary, and therefore meaningless. (Observation and measurement, of course, do count as permitted procedures. Admittedly arbitrary steps, like choosing units of measurement, are also permitted.)
My answer to (1) follows from my opinion about (2).
In my view, a number that gives any information about the real world must ultimately refer, either directly or via some calculation, to something that can be measured or counted (at least in principle, perhaps using a thought-experiment). This doesn’t mean that all sensible numbers have to be derived from concrete empirical measurements; they can also follow from common-sense insight and generalization. For example, reading about Newton’s theory leads to the common-sense insight that it’s a very close approximation of reality under certain assumptions. Now, if we look at the gravity formula F=m1*m2/r^2 (in units set so that G=1), the number 2 in the denominator is not a product of any concrete measurement, but a generalization from common sense. Yet what makes it sensible is that it ultimately refers to measurable reality via a well-defined formula: measure the force between two bodies of known masses at distance r, and you’ll get log(m1*m2/F)/log(r) = 2.
Now, what can we make out of probabilities from this viewpoint? I honestly can’t think of any sensible non-frequentist answer to this question. Subjectivist Bayesian phrases such as “the degree of belief” sound to me entirely ghostlike unless this “degree” is verifiable via some frequentist practical test, at least in principle. In this sense, I do confess frequentism. (Though I don’t wish to subscribe to all the related baggage from various controversies in statistics, much of which is frankly over my head.)
(3) Degrees of confidence shall be expressed without reflexive thinking about them. Trying to establish a fixed scale of confidence levels (like impossible - very unlikely - unlikely - possible - likely - very likely - almost certain - certain), or actively trying to compare degrees of confidence in different beliefs is cheating, since such scales can be then converted into numbers using a non-numerical procedure.
That depends on the concrete problem under consideration, and on the thinker who is considering it. The thinker’s brain produces an answer alongside a more or less fuzzy feeling of confidence, and the human language has the capacity to express these feelings with about the same level of fuziness as that signal. It can be sensible to compare intuitive confidence levels, if such comparison can be put to a practical (i.e. frequentist) test. Eight ordered intuitive levels of certainty might perhaps be too much, but with, say, four levels, I could produce four lists of predictions labeled “almost impossible,” “unlikely,” “likely,” and “almost certain,” such that common-sense would tell us that, with near-certainty, those in each subsequent list would turn out to be true in ever greater proportion.
If I wish to express these probabilities as numbers, however, this is not a legitimate step unless the resulting numbers can be justified in the sense discussed above under (1) and (2). This requires justification both in the sense of defining what aspect of reality they refer to (where frequentism seems like the only answer), and guaranteeing that they will be accurate under empirical tests. If they can be so justified, then we say that the intuitive estimate is “well-calibrated.” However, calibration is usually not possible in practice, and there are only two major exceptions.
The first possible path towards accurate calibration is when the same person performs essentially the same judgment many times, and from the past performance we extract the frequency with which their brain tends to produce the right answer. If this level of accuracy remains roughly constant in time, then it makes sense to attach it as the probability to that person’s future judgments on the topic. This approach treats the relevant operations in the brain as a black box whose behavior, being roughly constant, can be subjected to such extrapolation.
The second possible path is reached when someone has a sufficient level of insight about some problem to cross the fuzzy limit between common-sense thinking and an actual scientific model. Increasingly subtle and accurate thinking about a problem can result in the construction of a mathematical model that approximates reality well enough that when applied in a shut-up-and-calculate way, it yields probability estimates that will be subsequently vindicated empirically.
(Still, deciding whether the model is applicable in some particular situation remains a common-sense problem, and the probabilities yielded by the model do not capture this uncertainty. If a well-established physical theory, applied by competent people, says that p=0.9999 for some event, common sense tells me that I should treat this event as near-certain -- and, if repeated many times, that it will come out the unlikely way very close to one in 10,000 times. On the other hand, if p=0.9999 is produced by some suspicious model that looks like it might be a product of data-dredging rather than real insight about reality, common sense tells me that the event is not at all certain. But there is no way to capture this intuitive uncertainty with a sensible number. The probabilities coming from calibration of repeated judgment are subject to analogous unquantifiable uncertainty.)
There is also a third logical possibility, namely that some people in some situations have precise enough intuitions of certaintly that they can quantify them in an accurate way, just like some people can guess what time it is with remarkable precision without looking at the clock. But I see little evidence of this occurring in reality, and even if it does, these are very rare special cases.
(4) The question of whether somebody is well calibrated is confused for some reason. Calibrating people has no sense. Although we may take the "almost certain" statements of a person and look at how often they are true, the resulting frequency has no sense for some reason.
I disagree with this, as explained above. Calibration can be done successfully in the special cases I mentioned. However, in cases where it cannot be done, which includes the great majority of the actual beliefs and conclusions made by human brains, devising numerical probabilities makes no sense.
(5) Unlike #3, beliefs can be ordered or classified on some scale (possibly imprecisely), but assigning numerical values brings confusing connotations and should be avoided. Alternatively said, the meaning subjective probabilities is preserved after monotonous rescaling.
This should be clear from the answer to (3).
[Continued in a separate comment below due to excessive length.]
Thanks for the lengthy answer. Still, why it is impossible to calibrate people in general, looking at how often they get the anwer right, and then using them as a device for measuring probabilities? If a person is right on approximately 80% of the issues he says he's "sure", then why not translating his next "sure" into an 80% probability? Doesn't seem arbitrary to me. There may be inconsistency between measurements using different people, but strictly speaking, the thermometers and clocks also sometimes disagree.
Please read the post before voting on the comments, as this is a game where voting works differently.
Warning: the comments section of this post will look odd. The most reasonable comments will have lots of negative karma. Do not be alarmed, it's all part of the plan. In order to participate in this game you should disable any viewing threshold for negatively voted comments.
Here's an irrationalist game meant to quickly collect a pool of controversial ideas for people to debate and assess. It kinda relies on people being honest and not being nitpickers, but it might be fun.
Write a comment reply to this post describing a belief you think has a reasonable chance of being true relative to the the beliefs of other Less Wrong folk. Jot down a proposition and a rough probability estimate or qualitative description, like 'fairly confident'.
Example (not my true belief): "The U.S. government was directly responsible for financing the September 11th terrorist attacks. Very confident. (~95%)."
If you post a belief, you have to vote on the beliefs of all other comments. Voting works like this: if you basically agree with the comment, vote the comment down. If you basically disagree with the comment, vote the comment up. What 'basically' means here is intuitive; instead of using a precise mathy scoring system, just make a guess. In my view, if their stated probability is 99.9% and your degree of belief is 90%, that merits an upvote: it's a pretty big difference of opinion. If they're at 99.9% and you're at 99.5%, it could go either way. If you're genuinely unsure whether or not you basically agree with them, you can pass on voting (but try not to). Vote up if you think they are either overconfident or underconfident in their belief: any disagreement is valid disagreement.
That's the spirit of the game, but some more qualifications and rules follow.
If the proposition in a comment isn't incredibly precise, use your best interpretation. If you really have to pick nits for whatever reason, say so in a comment reply.
The more upvotes you get, the more irrational Less Wrong perceives your belief to be. Which means that if you have a large amount of Less Wrong karma and can still get lots of upvotes on your crazy beliefs then you will get lots of smart people to take your weird ideas a little more seriously.
Some poor soul is going to come along and post "I believe in God". Don't pick nits and say "Well in a a Tegmark multiverse there is definitely a universe exactly like ours where some sort of god rules over us..." and downvote it. That's cheating. You better upvote the guy. For just this post, get over your desire to upvote rationality. For this game, we reward perceived irrationality.
Try to be precise in your propositions. Saying "I believe in God. 99% sure." isn't informative because we don't quite know which God you're talking about. A deist god? The Christian God? Jewish?
Y'all know this already, but just a reminder: preferences ain't beliefs. Downvote preferences disguised as beliefs. Beliefs that include the word "should" are are almost always imprecise: avoid them.
Additional rules: