Nick_Tarleton comments on Minicamps on Rationality and Awesomeness: May 11-13, June 22-24, and July 21-28 - Less Wrong

24 Post author: AnnaSalamon 29 March 2012 08:48PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (239)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 29 March 2012 10:55:18PM 11 points [-]

I tried to take that into account when reading.

I know, I did too, but that is really the sort of calculation that should be done by a large-scale study that documents a control distribution for 0-10 ratings that such ratings can be calibrated against.

treating the indexes as utilities

Please explain.

In my engineering school, we had some project planning classes where we would attempt to calculate what was the best design based on the strength of our preference for performance in a variety of criteria (aesthetics, wieght, strength, cost, etc). Looking back I recognize what we were doing as coming up with a utility function to compute the utilities of the different designs.

Unfortunately, none of us (including the people who had designed the procedure) knew anything about utility functions or decision theory, so they would do things like rank the different criteria, and the strength of each design in each criteria, and then use those directly as utility wieghts and partial utilities.

(so for example strength might be most important (10), then cost (9) then wieght (8) and so on. and then maybe design A would be best (10) in wieght, worst (1) in strength, etc)

I didn't know any decision theory or anything, but I have a strong sense for noticing errors in mathematical models, and this thing set off alarm bells like crazy. We should have been giving a lot of thought to calibration of our wieghts and utilities to make sure arbitraryness of rankings can't sneak through and change the answer, but no one gave a shit. I raised a fuss and tried to rederive the whole thing from first principles. I don't think I got anything, tho, it was only one assignment so I might have given up because of low value (it's a hard problem). Don't remember.

Moral:

With this sort of thing, or anything really, you either use bulletproof mathematical models derived from first principles (or empirically) with calibrated real quantities, or you wing it intuitively using your built-in hardware. You do not use "math" on uncalibrated pseudo-quantities; that just tricks you into overriding your intuition for something with no correct basis.

This is why you never use explicit probabilities that aren't either empirically determined or calculated theoretically.

Comment author: Nick_Tarleton 30 March 2012 06:46:11PM *  5 points [-]

With this sort of thing, or anything really, you either use bulletproof mathematical models derived from first principles (or empirically) with calibrated real quantities, or you wing it intuitively using your built-in hardware. You do not use "math" on uncalibrated pseudo-quantities; that just tricks you into overriding your intuition for something with no correct basis.

Despite anti-arbitrariness intuitions, there is empirical evidence that this is wrong.

The Robust Beauty of Improper Linear Models

Proper linear models are those in which predictor variables are given weights in such a way that the resulting linear composite optimally predicts some criterion of interest; examples of proper linear models are standard regression analysis, discriminant function analysis, and ridge regression analysis. Research summarized in Paul Meehl's book on clinical versus statistical prediction—and a plethora of research stimulated in part by that book—all indicates that when a numerical criterion variable (e.g., graduate grade point average) is to be predicted from numerical predictor variables, proper linear models outperform clinical intuition. Improper linear models are those in which the weights of the predictor variables are obtained by some nonoptimal method; for example, they may be obtained on the basis of intuition, derived from simulating a clinical judge's predictions, or set to be equal. This article presents evidence that even such improper linear models are superior to clinical intuition when predicting a numerical criterion from numerical predictors. In fact, unit (i.e., equal) weighting is quite robust for making such predictions. The article discusses, in some detail, the application of unit weights to decide what bullet the Denver Police Department should use. Finally, the article considers commonly raised technical, psychological, and ethical resistances to using linear models to make important social decisions and presents arguments that could weaken these resistances.

(this is about something somewhat less arbitrary than using ranks as scores, but it seems like evidence in favor of that approach as well)

Comment author: Will_Newsome 30 March 2012 11:55:58PM *  -1 points [-]

Dawes is not a reliable researcher; I have very little confidence in his studies. Check it.

(ETA: I also have other reasons to mistrust Dawes, but shouldn't go into those here. In general you just shouldn't trust heuristics and biases results any more than you should trust parapsychology results. (Actually, parapsychology results tend to be significantly better supported.) Almost all psychology is diseased science; the hypotheses are often interesting, the statistical evidence given for them is often anti-informative.)