steven0461 comments on Value of Information: Four Examples - Less Wrong

76 Post author: Vaniver 22 November 2011 11:02PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (60)

You are viewing a single comment's thread.

Comment author: steven0461 22 November 2011 12:54:39AM *  9 points [-]

Here's a useful rule: if you're faced with a choice between two alternatives, and you have some probability distribution for the difference in utility between them, it's a mistake to pay more than one standard deviation's worth of utility for knowing which is better. (If the distribution is symmetric, it's a mistake to pay more than half a standard deviation.)

(Because the expected benefit from switching is less than (half) the expected absolute deviation, which is less than (half) the standard deviation. Right?)

I think it would be cool to have a list of other results like this that you could use to get bounds and estimates for the value of information in concrete situations.

Comment author: twanvl 23 November 2011 02:43:34PM 0 points [-]

This is not quite correct.

Suppose you know that difference in utility has a uniform distribution between 10 and 20. Then you already known which of the alternatives is better. So you shouldn't pay the standard deviation's worth (which is 2.88675).

The mean of the difference matters much more than the standard deviation. Math will follow.

Comment author: twanvl 23 November 2011 03:16:43PM 6 points [-]

Math, as promised.

Suppose that the difference in utility is uniformly distributed,

U(b) - U(a) ~ Uniform(u,v)

Assume for simplicity that U(a)=0 and that E[U(b)] > 0, so that b is the better choice if there is no more information.

E[U(optimal|noinfo)] = E[U(b)] = (u+v)/2
E[U(optimal|info)] = integral_u^v dx if x<0 then 0 else x
= if 0 <= u <= v then (u+v)/2
if u <= 0 <= v then (0+v)/(v-u)*(0+v)/2 = v^2/2(v-u)

So, if u<0, you should pay at most (u^2 - 2v^2)/(2v - 2u) for information on whether U(b)>U(a).

If the difference is normally distributed with mean m and standard deviation s.

U(b) - U(a) = U(b) ~ Normal(m,s)

Then

E(U|no info) = E[U(b)] = m
E(U|info) = -- thank you, mathematica
Assuming[s > 0, Integrate[x PDF[NormalDistribution[m, s], x], {x, 0, Infinity}]]
= 1/2 (m + Exp[-m^2/(2 s^2)] Sqrt[2/pi] s + m Erf[m/(Sqrt[2] s)])
= s*normpdf(m/s) + m*normcdf(m/s)

A reasonable opproximation seems to be

E[U|info) ~= 0.4 s Exp[-2 (m/s)] + m

So, you should be willing to pay 0.4sExp[-2 (m/s)]. That means that you should pay exponentially less for each standard deviation that the mean is greater than 0. When the mean difference is 0, so when both are apriori equally likely, the information is worth s/sqrt(2pi) ~= 0.4 s. When the mean difference is one standard deviation in favor of b, the information is only worth 0.0833155 s.

To summarize: the more sure you are of which choice is best, the less the information that tells you that for certain is worth.

Comment author: steven0461 23 November 2011 08:06:16PM *  1 point [-]

To summarize: the more sure you are of which choice is best, the less the information that tells you that for certain is worth.

Yes, but that was clear without math.

So, you should be willing to pay 0.4sExp[-2 (m/s)]. That means that you should pay exponentially less for each standard deviation that the mean is greater than 0. When the mean difference is 0, so when both are apriori equally likely, the information is worth s/sqrt(2pi) ~= 0.4 s. When the mean difference is one standard deviation in favor of b, the information is only worth 0.0833155 s.

Thanks, I could see the 0.4 and 0.08 becoming useful rules of thumb. How much does it matter that you assumed symmetry and no fat tails?

Comment author: steven0461 23 November 2011 08:02:08PM 1 point [-]

I said "it's a mistake to pay more than one standard deviation's worth", not "one should pay exactly a standard deviation's worth".