anonymous3 — LessWrong

Suppose the scientists S_10 and S_20 are fitting curves f(i) to noisy observations y(i) at points i = 0...20. Suppose there are two families of models, a polynomial g(i;a) and a trigonometric h(i;Ï,Ï):

g(i) <- sum(a[k]x^k, k=0..infinity)
h(i) <- cos(Ïi+Ï)

The angular frequency Ï is predetermined. The phase Ï is random:

Ï ~ Flat(), equivalently Ï ~ Uniform(0, 2*Ï)

The coefficients a[k] are independently normally distributed with moments matched to the marginal moments of the coefficients in the Taylor expansion of h(i):

a[k] ~ Normal(mean=0, stddev=(Ï^k)/(sqrt(2)*factorial(k)))

There is some probability q that the true curve f(i) is generated by the trigonometric model h(i), and otherwise f(i) is generated by the polynomial model g(i):

isTrigonometric ~ Bernoulli(q)
f(i) <- if(isTrigonometric, then_val=h(i), else_val=g(i))

Noise is iid Gaussian:

n[i] ~ Normal(mean=0, stddev=Ï)
y[i] <- f(i) + n[i]

(The notation has been abused to use i as an index in n[i] and y[i] because each point i is sampled at most once.)

Scientists S_10 and S_20 were randomly chosen from a pool of scientists Sj having different beliefs about q. A known fraction s of the scientists in the pool understand that the trigonometric model is possible. Their belief q{Sj} about the value of q for this problem is that q is equal to v. The remaining scientists do not understand that the trigonometric model is possible, and resort to polynomial approximations to predict everything. Their belief q{S_j} about the value of q for this problem is that q equals 0:

understandsTrigonometricModel(Sj) ~ Bernoulli(s)
q{S_j} <- if(understandsTrigonometricModel(S_j), then_val=v, else_val=0)

(As a variation, the scientists can have beta-distributed beliefs q_{S_j} ~ Beta(Î±, Î²).)

Both scientists report their posterior means for f(i) conditional on their knowledge. S_10 knows y[i] for i=0...9 and S_20 knows y[i] for i=0...19. Both scientists are Bayesians and know the probabilistic structure of the problem and the values of Ï and Ï. Both scientists also predict posterior means for f(20), and therefore for the observable y(20).

You are given the values of Ï, Ï, q, s, and v and the fact that, for each scientist, the mean of the squared differences between the posterior means for f(i) and the observations y[i] is less than Ï^2 ("the theory is consistent with the experiments"). You are not given the values y[i]. (You are also not given any information about any of the scientists' predictive densities at y, conditional or not, which is maddening if you're a Bayesian.) You are asked to choose a mixing coefficient t to combine the two scientists' predictions for y[20] into a mixed prediction y_t[20]:

yt[20] <- t*y{S10}[20] + (1-t)*y{S_20}[20]

Your goal in choosing t is to minimize the expectation of the squared error (y_t[20]-y[20])^2. For some example values of Ï, Ï, q, s, and v, what are the optimal values of t?

(In the variation with beta-distributed q_{S_j}, the optimal t depends on Î± and Î² and not on s and v.)

Note that if Ï is small, Ï is not small, q is not small, s is not small, and v is not small, then the given information implies with very high probability that isTrigonometric==True, that the first scientist understands that the trigonometric model is possible, and that the first scientist's posterior belief that the trigonometric model is correct is very high. (If the polynomial model had been correct, the first scientist's narrow prediction of y[10]...y[19] would have been improbable.) What happens when s is high, so that the second scientist is likely to agree? Would S_20 then be a better predictor than S_10?

In this formulation the scientists are making predictive distributions, which are not what most people mean by "theories". How do you draw the line between a predictive distribution and a theory? When people in this thread use the words "single best theory", what does that even mean? Even the Standard Model and General Relativity use constants which are only known from measurements up to an approximate multivariate Gaussian posterior distribution. Anyone who uses these physical theories to predict the "ideal" outcomes of experiments which measure physical constants must predict a distribution of outcomes, not a point. Does this mean they are using a "distribution over physical theories" and not a "single best physical theory"? Why do we even care about that distinction?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments