Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: Eugine_Nier 05 March 2014 02:51:03AM 21 points [-]

I can't help think part of the difference is that they're your books so you can do whatever you want to them, whereas he's your employee and is being paid to do this the "right way".

Comment author: PhilGoetz 05 March 2014 11:45:06PM 4 points [-]

Good point.

Comment author: CCC 05 March 2014 04:14:30AM 31 points [-]

He was learning how to cut the books. You were learning how to teach someone to cut the books, a task in which you had no prior experience. Yes, it took two people and it took longer than working out how to cut the books yourself; but given what you now know, assuming your new hire suddenly moves away and has to be unexpectedly replaced, you would be able to teach someone else how to cut the books more quickly than before.

Teaching someone how to do the skill is a different skill to being able to do the skill, and it requires a more thorough conscious knowledge of how to use the skill than using the skill does.

Comment author: PhilGoetz 05 March 2014 08:43:05PM *  10 points [-]

You're right, yet I think it's still remarkable that it took longer to watch myself do it and figure out how I was doing it, than it took me to figure it out in the first place. For many types of skills, that wouldn't be the case. I think the ease of discovery, rather than the difficulty of observing myself, made the difference.

Don't teach people how to reach the top of a hill

29 PhilGoetz 04 March 2014 09:38PM

When is it faster to rediscover something on your own than to learn it from someone who already knows it?

Sometimes it's faster to re-derive a proof or algorithm than to look it up. Keith Lynch re-invented the fast Fourier transform because he was too lazy to walk all the way to the library to get a book on it, although that's an extreme example. But if you have a complicated proof already laid out before you, and you are not Marc Drexler, it's generally faster to read it than to derive a new one. Yet I found a knowledge-intensive task where it would have been much faster to tell someone nothing at all than to tell them how to do it.

continue reading »
Comment author: Vladimir_Nesov 08 April 2009 09:37:33PM *  1 point [-]

And my final conclusion is, then:
Either become an average utilitarian; or stop describing rationality as expectation maximization.

That's unwarranted. Axioms are being applied to describe very different processes, so you should look at their applications separately. In any case, reaching a "final conclusion" without an explicit write-up (or discovering a preexisting write-up) to check the sanity of conclusion is in most cases a very shaky step, predictably irrational.

Comment author: PhilGoetz 27 January 2014 10:26:49PM *  1 point [-]

Okay: Suppose you have two friends, Betty and Veronica, and one balloon. They both like balloons, but Veronica likes them a little bit more. Therefore, you give the balloon to Veronica.

You get one balloon every day. Do you give it to Veronica every day?

Ignore whether Betty feels slighted by never getting a balloon. If we considered utility and disutility due to the perception of equity and inequity, then average utilitarianism would also produce somewhat equitable results. The claim that inequity is a problem in average utilitarianism does not depend on the subjects perceiving the inequity.

Just to be clear about it, Betty and Veronica live in a nursing home, and never remember who got the balloon previously.

You might be tempted to adopt a policy like this: p(v) = .8, p(b) = .2, meaning you give the balloon to Veronica eight times out of 10. But the axiom of independence assumes that it is better to use the policy p(v) = 1, p(b) = 0.

This is straightforward application of the theorem, without any mucking about with possible worlds. Are you comfortable with giving Veronica the balloon every day? Or does valuing equity mean that expectation maximization is wrong? I think those are the only choices.

Comment author: PhilGoetz 25 January 2014 06:06:03AM *  0 points [-]

I figured out what the problem is. Axiom 4 (Independence) implies average utilitarianism is correct.

Suppose you have two apple pies, and two friends, Betty and Veronica. Let B denote the number of pies you give to Betty, and V the number you give to Veronica. Let v(n) denote the outcome that Veronica gets n apple pies, and similarly define b(n). Let u_v(S) denote Veronica's utility in situation S, and u_b(S) denote Betty's utility.

Betty likes apple pies, but Veronica loves them, so much so that u_v(v(2), b(0)) > u_b(b(1), v(1)) + u_v(b(1), v(1)). We want to know whether average utilitarianism is correct to know whether to give Veronica both pies.

Independence, the fourth axiom of the von Neumann-Morgenstern theorem, implies that if the outcome L is preferable to outcome M, then one outcome of L and one outcome of N is preferable to one outcome of M and one outcome of N.

Let L represent giving one pie to Veronica and M represent giving one pastry to Betty. Now let’s be sneaky and let N also represent giving one pastry to Veronica. The fourth axiom says that L + N—giving two pies to Veronica—is preferable to L + M—giving one to Veronica and one to Betty. We have to assume that to use the theorem.

But that’s the question we wanted to ask--whether our utility function U should prefer the solution that gives two pies to Veronica, or one to Betty and one to Veronica! Assuming the fourth axiom builds average utilitarianism into the von Neumann-Morgenstern theorem.

Comment author: PhilGoetz 27 January 2014 10:10:23PM 0 points [-]

Argh; never mind. This is what Wei_Dai already said below.

Comment author: Pablo_Stafforini 06 March 2012 08:56:28PM *  2 points [-]

[Average utilitarianism] implies that for any population consisting of very good lives there is a better population consisting of just one person leading a life at a slightly higher level of well-being (Parfit 1984 chapter 19). More dramatically, the principle also implies that for a population consisting of just one person leading a life at a very negative level of well-being, e.g., a life of constant torture, there is another population which is better even though it contains millions of lives at just a slightly less negative level of well-being (Parfit 1984). That total well-being should not matter when we are considering lives worth ending is hard to accept. Moreover, average utilitarianism has implications very similar to the Repugnant Conclusion (see Sikora 1975; Anglin 1977).

Average utilitarianism has even more implausible implications. Consider a world A in which people experience nothing but agonizing pain. Consider next a different world B which contains all the people in A, plus arbitrarily many people all experiencing pain only slightly less intense. Since the average pain in B is less than the average pain in A, average utilitarianism implies that B is better than A. This is clearly absurd, since B differs from A only in containing a surplus of arbitrarily many people experiencing nothing but intense pain. How could one possibly improve a world by merely adding lots of pain to it?

Comment author: PhilGoetz 27 January 2014 10:08:19PM 0 points [-]

You realize you just repeated the scenario described in the quote?

Comment author: ThisSpaceAvailable 23 January 2014 04:40:18AM 1 point [-]

First of all, I find the term "prescriptive" to be rather equivocatory, used primarily to express disapproval rather than to communicate precise meaning, and it quite often just means "more strict than I think is appropriate". To the extent that "prescriptive" has a clear meaning, I disagree with your application of the word to definitions. There are prescriptive and descriptive approaches to writing a dictionary, but the definitions themselves are descriptive. For instance, there was a flap about a dictionary that included in its definition of the word "gay" that one meaning is "stupid". A girl objected to that, and started advocating that the dictionary remove that meaning of the word. So, one person might say "The word is sometimes used to mean 'stupid', and dictionaries are should describe how words are used, so we should include that meaning". That's a descriptive approach to definitions. The girl, on the other hand, was saying "This meaning is offensive, and dictionaries shouldn't offend people, so this meaning should be removed". This is a prescriptive approach. But both "gay mean means homosexual" and "gay means homosexual or stupid" are descriptive statements.

You need a prescriptive, subjective definition of a thing that will transport you over water.

If you want something that will transport you over water, that's not “prescriptive”, “subjective”, or even a “definition”. It's a specification. You aren't saying “things that can't transport me over water shouldn't be called boats”, you're saying “The genie shouldn't give me something that can't transport me over water”. You don't need a new definition of “boat” to communicate that, you just need to phrase your wish as being more specific than just a “boat”. If you really want to have a term that refers to a thing that will transport you over water, you can make up a new word, and give it that definition. If you define a word as meaning “a thing that can transport PhilGoetz over water”, then that will be an objective definition.

Comment author: PhilGoetz 27 January 2014 06:22:21AM 0 points [-]

I'm talking about what people do, to warn people to watch out for it when they do that. Sometimes you'll be in an discussion, and some people will have defined a term descriptively, and some will have defined it what I'm calling prescriptively, and you need to notice that.

Comment author: PhilGoetz 25 January 2014 06:06:03AM *  0 points [-]

I figured out what the problem is. Axiom 4 (Independence) implies average utilitarianism is correct.

Suppose you have two apple pies, and two friends, Betty and Veronica. Let B denote the number of pies you give to Betty, and V the number you give to Veronica. Let v(n) denote the outcome that Veronica gets n apple pies, and similarly define b(n). Let u_v(S) denote Veronica's utility in situation S, and u_b(S) denote Betty's utility.

Betty likes apple pies, but Veronica loves them, so much so that u_v(v(2), b(0)) > u_b(b(1), v(1)) + u_v(b(1), v(1)). We want to know whether average utilitarianism is correct to know whether to give Veronica both pies.

Independence, the fourth axiom of the von Neumann-Morgenstern theorem, implies that if the outcome L is preferable to outcome M, then one outcome of L and one outcome of N is preferable to one outcome of M and one outcome of N.

Let L represent giving one pie to Veronica and M represent giving one pastry to Betty. Now let’s be sneaky and let N also represent giving one pastry to Veronica. The fourth axiom says that L + N—giving two pies to Veronica—is preferable to L + M—giving one to Veronica and one to Betty. We have to assume that to use the theorem.

But that’s the question we wanted to ask--whether our utility function U should prefer the solution that gives two pies to Veronica, or one to Betty and one to Veronica! Assuming the fourth axiom builds average utilitarianism into the von Neumann-Morgenstern theorem.

Comment author: Cyan 22 January 2014 06:20:22AM 1 point [-]

If you'd explained that I misunderstand Gibbs sampling, that would have been a failure to update. You didn't.

I wrote a comment that was so discordant with your understanding of Gibbs sampling and EM that it should have been a red flag that one or the other of us was misunderstanding something. Instead you put forth a claim stating your understanding, and it fell to me to take note of the discrepancy and ask for clarification. This failure to update is the exact event which prompted me to attach "Dunning-Kruger" to my understanding of you.

I don't see how distinction makes sense for Gibbs sampling or EM... That's why these algorithms exist--they spare you from having to choose a prior, if the data is strong enough that the choice makes no difference.

The way in which the ideas you have about EM and Gibbs sampling are wrong isn't easily fixable in a comment thread. We could do a Google Hangout at some point; if you're interested, PM me.

Comment author: PhilGoetz 22 January 2014 03:54:06PM 1 point [-]

I believe my ideas about Gibbs sampling are correct, as demonstrated by my correct choice and implementation of it to solve a difficult problem. My terminology may be non-standard.

Here is what I believe happened in that referenced exchange: You wrote a comment that was difficult to comprehend, and I didn't see how it related to my question. I explained why I asked the question, hoping for clarification. That's a failure to communicate, not a failure to update.

Comment author: Cyan 21 January 2014 07:41:06PM *  1 point [-]

Oh man, you're not doing yourself any favors in trying to shift my understanding of you. Not that I doubt that your algorithm worked well! Let me explain.

You've used a multilevel modelling scheme in which the estimands are the eight proportions. In general, in any multilevel model, the parameters at a given level determine the prior probabilities for the variables at the level immediately below. In your specific context, i.e., estimating these proportions, a fully Bayesian multilevel model would also have a prior distribution on those proportions (a so-called "hyperprior", terrible name).

If you didn't use one, your algorithm can be viewed as a fully Bayesian analysis that implicitly used a constant prior density for the proportions, and this will indeed work well given enough information in the data. Alternatively, one could view the algorithm as a (randomized) type II maximum likelihood estimator, also known as "empirical Bayes".

In a fully Bayesian analysis, there will always be a top-level prior that is chosen only on the basis of prior information, not data. Any approach that uses the data to set the prior at the top level is an empirical Bayes approach. (These are definitions, by the way.) When you speak of "estimating the prior probabilities", you're taking an empirical Bayes point of view, but you're not well-informed enough to be aware that "Bayesian" and "empirical Bayes" are not the same thing.

The kinds of prior distributions with which I was concerned in my posts are those top-level prior distributions that don't come from data. Now, my pair of posts were terrible -- they basically dropped all of the readers into the inferential gap. But smart mathy guy cousin_it was intrigued enough to do his own reading and wrote some follow-up posts, and these serve as an existence proof that it was possible for someone with enough background to understand what I was talking about.

On the other hand, you didn't know what I was talking about, but you thought you did, and you offered questions and comments that apparently you still believe are relevant to the topic I addressed in my posts. To me, it really does look like -- in this context, at least -- you are laboring under a "cognitive bias in which unskilled individuals suffer from illusory superiority, mistakenly rating their ability much higher than is accurate".

So now I'll review my understanding of you:

  • Smart? Yes.
  • Not as smart as you think you are? Yes.
  • High intelligence is a core part of your self-image? Well, you did find my claim "not as smart as you think you are" irritating enough to respond to; you touted your math degree, teaching experience, and success in data analysis. So: yes.
  • Posting on LW is often unrewarding for you because of above three traits? Hmm... well, that has the same answer as this question: have you found our current exchange unrewarding? (Absent further info, I'm assuming the answer is "yes".)
Comment author: PhilGoetz 22 January 2014 01:29:37AM *  1 point [-]

To claim evidence that I'm overconfident, you have to show me asserting something that is wrong, and then failing to update when you provide evidence that it's wrong.

In the thread which you referenced, I asked you questions, and the only thing I asserted was that EM and Gibbs sampling find priors which will result in computed posteriors being well-calibrated to the data. You did not provide, and still have not provided, evidence that that statement was wrong. Therefore I did not exhibit a failure to update

I might be using different terminology than you--by "priors" I meant the values that I'm going to use as priors in my running program on new data for transferred function annotations, and by "posteriors" I meant the posterior probability it will compute for a given annotation, given those "priors". I didn't claim to know what the standard terminology is. The only thing I claimed was that Gibbs sampling & EM did something that, using my terminology, could be described as setting priors so they gave calibrated results.

If you had corrected my terminology, and I'd ignored you, that would have been a failure to update. If you'd explained that I misunderstand Gibbs sampling, that would have been a failure to update. You didn't.

Relevant to your post? I don't know. I didn't assert that that particular fact was relevant to your post. I don't know if I even read your post. I responded to your comment, "seek a prior that guarantees posterior calibration," very likely in an attempt to understand your post.

you didn't know what I was talking about, but you thought you did

Again, what are you talking about? I asked you questions. The only thing I claimed to know was about the subject that I brought up, which was EM and Gibbs sampling.

As far as I can see, I didn't say anything confidently, I didn't say anything that was incorrect AFAIK, I didn't claim you had made a mistake, and I didn't fail to update on any evidence that something I'd said was wrong. So all these words of yours are not evidence for my over-confidence.

Even now, after writing paragraphs on the subject, you haven't tried to take anything I claimed and explain why it is wrong!

Try this approach: Look over the comments that you provided as evidence of my overconfidence. Say what I would have written differently if I were not overconfident.

In a fully Bayesian analysis, there will always be a top-level prior that is chosen only on the basis of prior information, not data. Any approach that uses the data to set the prior at the top level is an empirical Bayes approach.

I don't see how distinction makes sense for Gibbs sampling or EM. They are iterative procedures that take your initial (top-level) prior, and then converge on a posterior-to-the-data value (which I called the prior, as it is plugged into my operating program as a prior). It doesn't matter how you choose your initial prior; the algorithm will converge onto the same final result, unless there is some difficulty converging. That's why these algorithms exist--they spare you from having to choose a prior, if the data is strong enough that the choice makes no difference.

View more: Next