Cyan comments on Using vs. evaluating (or, Why I don't come around here no more) - Less Wrong

23 Post author: PhilGoetz 20 January 2014 02:36AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (38)

You are viewing a single comment's thread.

Comment author: Cyan 20 January 2014 11:51:53AM *  1 point [-]

I consider you a smart guy -- but when I wrote a couple of front page posts about Bayesian statistics, you made a few comments that revealed notable Dunning-Kruger effect with respect to the topic. Since that time, my observations of your interactions with other subject matter experts about topics in their domains of expertise have only reinforced this impression.

My current understanding of you is that you are a smart guy, but posting on LW is often unrewarding for you because (i) your high intelligence is a core part of your self-image, and (ii) you're not as smart as you think you are. I hope this info can be of use to you; apologies for any narcissistic injury caused by this comment;.

Comment author: Anatoly_Vorobey 20 January 2014 07:14:08PM 12 points [-]

Out of curiosity, did you consider sending this comment via PM, and if so, what made you decide to post it publicly?

Comment author: Cyan 20 January 2014 09:58:09PM 9 points [-]

I didn't think of using a PM. I don't have any good reason to do this publicly... hmm.

If you were implicitly questioning my motives, you were right to do so.

Comment author: Kaj_Sotala 20 January 2014 05:35:49PM 8 points [-]

If you're going to make a comment like this, you really have to provide specific examples.

Comment author: Cyan 20 January 2014 06:45:55PM *  6 points [-]

Fair enough. Here's the comment thread. There was also a follow-up PM exchange between PhilGoetz and me which gave me very weak but non-zero evidence supporting my impression.

The "other subject matter experts" examples are too fuzzy in my memory to try to find; the principle example (and possibly only example) is the time EY explicitly discommended PhilGoetz's attempt to reiterate EY's ideas (in the comment thread of a post PhilGoetz wrote to critique EY's ideas).

Comment author: IlyaShpitser 20 January 2014 07:19:04PM 3 points [-]

Yeah, I have conversations like this about causality with people here all the time :(. (I don't remember any particular ones with Phil specifically). It is definitely a problem wider than just one individual.

Comment author: PhilGoetz 21 January 2014 06:17:43PM *  0 points [-]

You called me over-confident, and as evidence, cited a conversation in which I mostly asked you questions. It seems your claim is based on my having said,

You said, "seek a prior that guarantees posterior calibration." That's what both EM and Gibbs sampling do, which is why I asked.

and your opinion that is wrong.

My recollection is that both EM and Gibbs sampling produce prior probabilities which maximize the likelihood of the observed data. In other words, they produce priors that result in posteriors (the probability of the observed data given those priors) that are perfectly calibrated to the data you train them on.

So our situations are symmetric: I think you did not quite understand what you said, or else what I said, or else you misunderstand EM and Gibbs sampling. I'm open to correction.

Comment author: Cyan 21 January 2014 06:31:30PM *  1 point [-]

Perhaps you could lay out the problem with my evidence in more concrete terms?

(ETA: At the time I wrote this reply, the comment I was responding to read

You called me over-confident, and as evidence, cited a conversation in which I asked you questions.

As I write this ETA, there's a lot more detail in the parent.)

Comment author: PhilGoetz 21 January 2014 06:40:25PM *  1 point [-]

That is as concrete as I can make it, unless you want me to write out an algorithm for Gibbs sampling and explaining why it produces priors that maximize the posterior. Or give an example where I used it to do so. I can do that: I had a set of about 8 different databases I was using to assign functions to known proteins. I wanted to estimate the reliability of each database, as a probability that its annotation was correct. This set of 8 probabilities was the set of priors I sought. I had a set of about a hundred-thousand annotated proteins, and given a set of priors, I could produce the probability of the given set of 100,000 annotations. I used that dataset plus Gibbs sampling to produce those 8 priors. And it worked extraordinarily well.

Comment author: Cyan 21 January 2014 07:41:06PM *  3 points [-]

Oh man, you're not doing yourself any favors in trying to shift my understanding of you. Not that I doubt that your algorithm worked well! Let me explain.

You've used a multilevel modelling scheme in which the estimands are the eight proportions. In general, in any multilevel model, the parameters at a given level determine the prior probabilities for the variables at the level immediately below. In your specific context, i.e., estimating these proportions, a fully Bayesian multilevel model would also have a prior distribution on those proportions (a so-called "hyperprior", terrible name).

If you didn't use one, your algorithm can be viewed as a fully Bayesian analysis that implicitly used a constant prior density for the proportions, and this will indeed work well given enough information in the data. Alternatively, one could view the algorithm as a (randomized) type II maximum likelihood estimator, also known as "empirical Bayes".

In a fully Bayesian analysis, there will always be a top-level prior that is chosen only on the basis of prior information, not data. Any approach that uses the data to set the prior at the top level is an empirical Bayes approach. (These are definitions, by the way.) When you speak of "estimating the prior probabilities", you're taking an empirical Bayes point of view, but you're not well-informed enough to be aware that "Bayesian" and "empirical Bayes" are not the same thing.

The kinds of prior distributions with which I was concerned in my posts are those top-level prior distributions that don't come from data. Now, my pair of posts were terrible -- they basically dropped all of the readers into the inferential gap. But smart mathy guy cousin_it was intrigued enough to do his own reading and wrote some follow-up posts, and these serve as an existence proof that it was possible for someone with enough background to understand what I was talking about.

On the other hand, you didn't know what I was talking about, but you thought you did, and you offered questions and comments that apparently you still believe are relevant to the topic I addressed in my posts. To me, it really does look like -- in this context, at least -- you are laboring under a "cognitive bias in which unskilled individuals suffer from illusory superiority, mistakenly rating their ability much higher than is accurate".

So now I'll review my understanding of you:

  • Smart? Yes.
  • Not as smart as you think you are? Yes.
  • High intelligence is a core part of your self-image? Well, you did find my claim "not as smart as you think you are" irritating enough to respond to; you touted your math degree, teaching experience, and success in data analysis. So: yes.
  • Posting on LW is often unrewarding for you because of above three traits? Hmm... well, that has the same answer as this question: have you found our current exchange unrewarding? (Absent further info, I'm assuming the answer is "yes".)
Comment author: PhilGoetz 22 January 2014 01:29:37AM *  1 point [-]

To claim evidence that I'm overconfident, you have to show me asserting something that is wrong, and then failing to update when you provide evidence that it's wrong.

In the thread which you referenced, I asked you questions, and the only thing I asserted was that EM and Gibbs sampling find priors which will result in computed posteriors being well-calibrated to the data. You did not provide, and still have not provided, evidence that that statement was wrong. Therefore I did not exhibit a failure to update

I might be using different terminology than you--by "priors" I meant the values that I'm going to use as priors in my running program on new data for transferred function annotations, and by "posteriors" I meant the posterior probability it will compute for a given annotation, given those "priors". I didn't claim to know what the standard terminology is. The only thing I claimed was that Gibbs sampling & EM did something that, using my terminology, could be described as setting priors so they gave calibrated results.

If you had corrected my terminology, and I'd ignored you, that would have been a failure to update. If you'd explained that I misunderstand Gibbs sampling, that would have been a failure to update. You didn't.

Relevant to your post? I don't know. I didn't assert that that particular fact was relevant to your post. I don't know if I even read your post. I responded to your comment, "seek a prior that guarantees posterior calibration," very likely in an attempt to understand your post.

you didn't know what I was talking about, but you thought you did

Again, what are you talking about? I asked you questions. The only thing I claimed to know was about the subject that I brought up, which was EM and Gibbs sampling.

As far as I can see, I didn't say anything confidently, I didn't say anything that was incorrect AFAIK, I didn't claim you had made a mistake, and I didn't fail to update on any evidence that something I'd said was wrong. So all these words of yours are not evidence for my over-confidence.

Even now, after writing paragraphs on the subject, you haven't tried to take anything I claimed and explain why it is wrong!

Try this approach: Look over the comments that you provided as evidence of my overconfidence. Say what I would have written differently if I were not overconfident.

In a fully Bayesian analysis, there will always be a top-level prior that is chosen only on the basis of prior information, not data. Any approach that uses the data to set the prior at the top level is an empirical Bayes approach.

I don't see how distinction makes sense for Gibbs sampling or EM. They are iterative procedures that take your initial (top-level) prior, and then converge on a posterior-to-the-data value (which I called the prior, as it is plugged into my operating program as a prior). It doesn't matter how you choose your initial prior; the algorithm will converge onto the same final result, unless there is some difficulty converging. That's why these algorithms exist--they spare you from having to choose a prior, if the data is strong enough that the choice makes no difference.

Comment author: Cyan 22 January 2014 06:20:22AM 1 point [-]

If you'd explained that I misunderstand Gibbs sampling, that would have been a failure to update. You didn't.

I wrote a comment that was so discordant with your understanding of Gibbs sampling and EM that it should have been a red flag that one or the other of us was misunderstanding something. Instead you put forth a claim stating your understanding, and it fell to me to take note of the discrepancy and ask for clarification. This failure to update is the exact event which prompted me to attach "Dunning-Kruger" to my understanding of you.

I don't see how distinction makes sense for Gibbs sampling or EM... That's why these algorithms exist--they spare you from having to choose a prior, if the data is strong enough that the choice makes no difference.

The way in which the ideas you have about EM and Gibbs sampling are wrong isn't easily fixable in a comment thread. We could do a Google Hangout at some point; if you're interested, PM me.

Comment author: PhilGoetz 22 January 2014 03:54:06PM 1 point [-]

I believe my ideas about Gibbs sampling are correct, as demonstrated by my correct choice and implementation of it to solve a difficult problem. My terminology may be non-standard.

Here is what I believe happened in that referenced exchange: You wrote a comment that was difficult to comprehend, and I didn't see how it related to my question. I explained why I asked the question, hoping for clarification. That's a failure to communicate, not a failure to update.

Comment author: PhilGoetz 21 January 2014 06:13:24PM *  2 points [-]

Assuming that I disagreed with you re. Bayesian statistics, our positions are symmetric--you believe I am overconfident, and I believe you are over-confident. I have a degree in math and have taught basic Bayesian statistics at a university, and used Bayesian statistics successfully to get correct results in computer programs many times, so I have some reason for my confidence. Have you made use of this information in re-evaluating your own confidence?

Comment author: Cyan 21 January 2014 06:23:05PM *  4 points [-]

You'd told me by PM that you'd carried out analyses using Bayesian methods, but when I asked you to give me a look at some, you (justifiably!) deemed it not worth your time to do so. So that part is incorporated into my picture of you. I didn't know about the math degree or teaching, but that info is in line with my current understanding of you, so it doesn't shift it.