Comment author: gjm 03 October 2014 09:49:34AM 3 points [-]

But that's a situation in which we have a vast number of things that might somewhat-plausibly turn out to be chocolate and severely limited resources. It's not obvious that we can do better.

"But we do OK if we use one sigmoid utility function and not if we use another!"

No, we do different things depending on our utility function. That isn't a problem; it's what utility functions are for. And what's "OK" depends on what the probabilities are, what your resources are, and how much you value different amounts of chocolate. Which, again, is not a problem but exactly how things should be.

Comment author: janos 04 October 2014 06:20:21PM 1 point [-]

Certainly given a utility function and a model, the best thing to do is what it is. The point was to show that some utility functions (eg using the exponential-decay sigmoid) have counterintuitive properties that don't match what we'd actually want.

Every response to this post that takes the utility function for granted and remarks that the optimum is the optimum is missing the point: we don't know what kind of utility function is reasonable, and we're showing evidence that some of them give optima that aren't what we'd actually want if we were turning the world into chocolate/hedonium.

If it seems strange to you to consider representing what you want by a bounded utility function, a post about that will be forthcoming.

Comment author: BenjaminFox 30 June 2014 09:18:21AM 3 points [-]

I appreciate the links. I haven't read Gaifman's paper before, so I'll go ahead and read that.

Anyhow, I won't cotton to any method of assigning a logical probability that takes longer than just brute-forcing the right answer. For this particular problem I think a bottom-up approach is what you want to use.

I see the sentiment there, and that too is a valid approach. That said, after trying to use the bottom-up approach many times and failing, and after seeing others fail using bottom-up approaches, I think that if we can at least build a nonconstructive top-down theory, that would be a starting point. After all, Solomonoff Induction is completely top down, yet it's a very powerful theoretical tool.

Comment author: janos 30 June 2014 01:36:02PM 2 points [-]

One nonconstructive (and wildly uncomputable) approach to the problem is this one: http://www.hutter1.net/publ/problogics.pdf

Comment author: prase 03 January 2013 07:27:22PM 0 points [-]

Isn't there a standard reply containing utility functions? Assuming the usual diminishing marginal utility, if p(win) * U(jackpot) > U($1), then 2 p(win) * U(jackpot) > U($2) and so on. You should spend all your money! (Unless you are rich enough to buy so many tickets as to wander out of the domain of approximate linear dependece of p(win) on the number of tickets you own).

Comment author: janos 04 January 2013 02:33:55AM *  0 points [-]

I think you're making the wrong comparisons. If you buy $1 worth, you get p(win) * U(jackpot) + (1-p(win)) * U(-$1), which is more-or-less p(win)*U(jackpot)+U(-$1); this is a good idea if p(win) * U(jackpot) > -U(-$1). But under usual assumptions -U(-$2)>-2U(-$1). This adds up to normality; you shouldn't actually spend all your money. :)

Comment author: fubarobfusco 25 October 2012 01:36:55PM 6 points [-]

The first thing is that to create a life is to create a death. A life ends. And while the end of a life may not be its most important moment, it reminds us that a life is a whole.

This sounds like a vacuously "deep" assertion. What would the negation mean — "A life is not a whole"? A life is part of something larger? A life is more than one thing?

Comment author: janos 25 October 2012 02:48:37PM 1 point [-]

One good negation is "the value/intrinsic utility of a life is the sum of the values/intrinsic utilities of all the moments/experiences in it, evaluated without reference to their place/context in the life story, except inasmuch as is actually part of that moment/experience".

The "actually" gets traction if people's lives follow narratives that they don't realize as they're happening, but such that certain narratives are more valuable than others; this seems true.

Comment author: janos 28 November 2011 05:16:09PM *  3 points [-]

If your prior distribution for "yes" conditional on the number of papers is still uniform, i.e. if the number of papers has nothing to do with whether they're "yes" or not, then the rule still applies.

Comment author: lessdazed 09 September 2011 10:09:53AM *  4 points [-]

To test whether DID-patients were really affected by interidentity amnesia or whether they were simulating their amnesia, the authors assessed the performance of four groups of subjects on a multiple-choice recognition test. The dependent measure was the number of correct responses. The first group were the DID-patients, the second group were Controls, the third group were controls instructed to simulate interidentity amnesia (Simulators), and the fourth group were controls who had never seen the study list and were therefore True amnesiacs.

...

For instance, consider again the case of the Huntjens et al. study on DID discussed in Section 9.2.6 and throughout this book. For the data from the study, hypothesis H1a states that the mean recognition scores µ for DID-patients and True amnesiacs are the same and that their scores are higher than those of the Simulators: µcon > {µamn = µpat} > µsim, whereas hypothesis H1b states that the mean recognition scores µ for DID-patients and Simulators are the same and that their scores are lower than those of the True amnesiacs: µcon > µamn > {µpat = µsim}. Within the frequentist paradigm, a comparison of these models is problematical. Within the Bayesian paradigm, however, the comparison is natural and elegant

What does this mean? How does nesting work? How does frequentism fail? How does Bayesianism succeed? I do not understand the example at all.

Comment author: janos 09 September 2011 06:51:07PM *  1 point [-]

You can comfortably do Bayesian model comparison here; have priors for µcon, µamn, and µsim, and let µpat be either µamn (under hypothesis Hamn) or µsim (under hypothesis Hsim), and let Hamn and Hsim be mutually exclusive. Then integrating out µcon, µamn, and µsim, you get a marginal odds-ratio for Hamn vs Hsim, which tells you how to update.

The standard frequentist method being discussed is nested hypothesis testing, where you want to test null hypothesis H0 with alternative hypothesis H1, and H0 is supposed to be nested inside H1. For instance you could easily test null hypothesis µcon >= µamn >= µpat = µsim against µcon >= µamn >= µpat >= µsim. However, for testing non-nested hypotheses, the methodology is weaker, or at least less standard.

Comment author: ArisKatsaris 14 August 2011 11:48:50AM *  2 points [-]

Most of the statements you make are false in their connotations, but there's one statement you make (and attribute to "Bayesian Bob") that seems false no matter what way you look at it, and it's this one: "A statement, any statement, starts out with a 50% probability of being true" Even the rephrasing "in a vacuum we should believe it with 50% certainty" still seems simply wrong. Where in the world did you see that in Bayesian theory?

For saying that, I label you a Level-0 Rationalist. Unless someone's talking about binary digits of Pi, they should generally remove the concept of "50% probability" from their minds altogether.

A statement, any statement, starts out with a probability that's based on its complexity, NOT with a 50/50 probability. "Alice is a banker" is a simpler statement than "Alice is a feminist banker who plays the piano.". That's why the former must be assigned greater probability than the latter.

Comment author: janos 15 August 2011 04:58:44PM 3 points [-]

"Alice is a banker" is a simpler statement than "Alice is a feminist banker who plays the piano.". That's why the former must be assigned greater probability than the latter.

Complexity weights apply to worlds/models, not propositions. Otherwise you might as well say:

"Alice is a banker" is a simpler statement than "Alice is a feminist, a banker, or a pianist.". That's why the former must be assigned greater probability than the latter.

Comment author: janos 07 April 2011 10:59:53PM *  1 point [-]

tl;dr : miscalibration means mentally interpreting loglikelihood of data as being more or less than its actual loglikelihood; to infer it you need to assume/infer the Bayesian calculation that's being made/approximated. Easiest with distributions over finite sets (i.e. T/F or multiple-choice questions). Also, likelihood should be called evidence.

I wonder why I didn't respond to this when it was fresh. Anyway, I was running into this same difficulty last summer when attempting to write software to give friendly outputs (like "calibration") to a bunch of people playing the Aumann game with trivia questions.

My understanding was that evidence needs to be measured on the logscale (as the difference between prior and posterior), and miscalibration is when your mental conversion from gut feeling of evidence to the actual evidence has a multiplicative error in it. (We can pronounce this as: "the true evidence is some multiplicative factor (called the calibration parameter) times the felt evidence".) This still seems like a reasonable model, though of course different kinds of evidence are likely to have different error magnitudes, and different questions are likely to get different kinds of evidence, so if you have lots of data you can probably do better by building a model that will estimate your calibration for particular questions.

But sticking to the constant-calibration model, it's still not possible to estimate your calibration from your given confidence intervals because for that we need an idea of what your internal prior (your "prior" prior, before you've taken into account the felt evidence) is, which is hard to get any decent sense of, though you can work off of iffy assumptions, such as assuming that your prior for percentage answers from a trivia game is fitted to the set of all the percentage answers from this trivia game, and has some simple form (e.g. Beta). The Aumann game gave an advantage in this respect, because rather than comparing your probability distribution before&after thinking about the question, it makes it possible to compare the distribution before&after hearing other people's arguments&evidence; if you always speak in terms of standard probability distributions, it's not too hard to infer your calibration there.

Further "funny" issues can arise when you get down to work; for instance if your prior was a Student-t with df n1 and your posterior was a Student-t with df n2<n1 then the model says that your calibration cannot be more than n1/(n1-n2) or your posterior will fail to be a distribution. Likewise if your prior was Normal with variance s1^2 and your posterior is Normal with variance s2^2>s1^2 then your calibration cannot be more than 1/(1-s1^2/s2^2) without having your posterior explode. It's tempting to say the lesson is that things break if you're becoming asymptotically less certain, which makes some intuitive sense: if your distributions are actually mixtures of finitely many different hypotheses that you're Bayesianly updating the weights of, then you will never become asymptotically less certain; in particular the Student-t scenario I described can't happen. However this is not a satisfactory conclusion because the Normal scenario (where you increase your variance by upweighting a hypothesis that gives higher variance) can easily happen.

A different resolution to the above is that the model of evidence=calibration*felt evidence is wrong, and needs an error term or two; that can give a workable result, or at least not catch fire and die.

Another thought: if your mental process is like the one two paragraphs up, where you're working with a mixture of several fixed (e.g. normal) hypotheses, and the calibration concept is applied to how you update the weights of the hypotheses, then the change in the mixture distribution (i.e. the marginal) will not follow anything like the calibration model.

So the concept is pretty tricky unless you carefully choose problems where you can reasonably model the mental inference, and in particular try to avoid "mixture-of-hypotheses"-type scenarios (unless you know in advance precisely what the hypotheses imply, which is unusual unless you construct the questions that way, .. but then I can't think of why you'd ask about the mixture instead of about the probabilities of the hypotheses themselves).

You might be okay when looking at typical multiple-choice questions; certainly you won't run into the issues with broken posteriors and invalid calibrations. Another advantage is that "the" prior (i.e. uniform) is uncontroversial, though whether the prior to use for computing calibration should be "the" prior is not obvious; but if you don't have before-and-after results from people then I guess it's the best you can do.

I just noticed that what's usually called the "likelihood" I was calling "evidence" here. This has probably been suggested by someone before, but: I've never liked the term "likelihood", and this is the best replacement for it that I know of.

In response to Inverse Speed
Comment author: [deleted] 27 March 2011 07:15:33AM *  6 points [-]

how much of x% concentration do you add to your y% concentration to get z% concentration?

Dimensional analysis is an important tool.

Here's an example. Suppose the problem is: "How much 85% alcohol do you have to add to 1 kilogram of 40% alcohol to get 55% alcohol?" Ethanol has a cute property where "Mixing equal volumes of ethanol and water results in only 1.92 volumes of mixture", which would be a distraction here, so let's specify that we're working with mass fractions, aka alcohol by weight here.

We're starting with 1 kg of 40% alcohol, which contains .4 kg alcohol in 1 kg of total fluid. We're adding N kg of 85% alcohol, which contains (.85 * N) kg alcohol in N kg of total fluid. So we'll end up with (.4 + .85 * N) kg of alcohol in (1 + N) kg of total fluid. We want 55% alcohol, so this gives us an equation:

(.4 + .85 * N) kg alcohol / (1 + N) kg total fluid = 55% alcohol

We specified that "55% alcohol" means dividing kilograms of alcohol by kilograms of total fluid, so the division is correct here. (Dimensional analysis means more than just making sure that you don't try to add kilograms to liters. Dividing kg total fluid by kg alcohol would give us a dimensionless number, but not one that could be referred to as "percent alcohol". Same for dividing kg alcohol by kg water.)

Now we can nuke the units and grind out the math:

(.4 + .85 * N) / (1 + N) = .55
(.4 + .85 * N) = .55 * (1 + N)
.4 + .85 * N = .55 + .55 * N
.4 + .85 * N - .55 * N = .55
.85 * N - .55 * N = .55 - .4
.3 * N = .15
N = .15 / .3
N = .5

Verify:

1 kg of 40% alcohol contains 1 * .4 = .4 kg alcohol
.5 kg of 85% alcohol contains .5 * .85 = .425 kg alcohol
.4 kg alcohol + .425 kg alcohol = .825 kg alcohol
1 kg total fluid + .5 kg total fluid = 1.5 kg total fluid
.825 kg alcohol / 1.5 kg total fluid = .55 YAY!
In response to comment by [deleted] on Inverse Speed
Comment author: janos 27 March 2011 01:22:11PM *  10 points [-]

The way I'd try to do this problem mentally would be:

Relative to the desired concentration of 55%, each unit of 40% is missing .15 units of alcohol, and each unit of 85% has .3 extra units of alcohol. .15:.3=1:2, so to balance these out we need (amount of 40%):(amount of 85%)=2:1, i.e. we need twice as much 40% as 85%. Since we're using 1kg of 40%, this means 0.5kg of 85%.

Comment author: Perplexed 15 March 2011 05:22:08AM 2 points [-]

it's more correct to say that the women/blacks were 60% more likely to not be referred.

Hmmm. I would have said that white men were 60% as likely to not be referred. (This is the first time I've seen the golden ratio show up in a discussion of probability!)

Comment author: janos 15 March 2011 07:41:45PM *  0 points [-]

I prefer your phrasing.

View more: Prev | Next