Nick_Tarleton comments on Minicamps on Rationality and Awesomeness: May 11-13, June 22-24, and July 21-28 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (239)
I know, I did too, but that is really the sort of calculation that should be done by a large-scale study that documents a control distribution for 0-10 ratings that such ratings can be calibrated against.
In my engineering school, we had some project planning classes where we would attempt to calculate what was the best design based on the strength of our preference for performance in a variety of criteria (aesthetics, wieght, strength, cost, etc). Looking back I recognize what we were doing as coming up with a utility function to compute the utilities of the different designs.
Unfortunately, none of us (including the people who had designed the procedure) knew anything about utility functions or decision theory, so they would do things like rank the different criteria, and the strength of each design in each criteria, and then use those directly as utility wieghts and partial utilities.
(so for example strength might be most important (10), then cost (9) then wieght (8) and so on. and then maybe design A would be best (10) in wieght, worst (1) in strength, etc)
I didn't know any decision theory or anything, but I have a strong sense for noticing errors in mathematical models, and this thing set off alarm bells like crazy. We should have been giving a lot of thought to calibration of our wieghts and utilities to make sure arbitraryness of rankings can't sneak through and change the answer, but no one gave a shit. I raised a fuss and tried to rederive the whole thing from first principles. I don't think I got anything, tho, it was only one assignment so I might have given up because of low value (it's a hard problem). Don't remember.
Moral:
With this sort of thing, or anything really, you either use bulletproof mathematical models derived from first principles (or empirically) with calibrated real quantities, or you wing it intuitively using your built-in hardware. You do not use "math" on uncalibrated pseudo-quantities; that just tricks you into overriding your intuition for something with no correct basis.
This is why you never use explicit probabilities that aren't either empirically determined or calculated theoretically.
Despite anti-arbitrariness intuitions, there is empirical evidence that this is wrong.
The Robust Beauty of Improper Linear Models
(this is about something somewhat less arbitrary than using ranks as scores, but it seems like evidence in favor of that approach as well)
Dawes is not a reliable researcher; I have very little confidence in his studies. Check it.
(ETA: I also have other reasons to mistrust Dawes, but shouldn't go into those here. In general you just shouldn't trust heuristics and biases results any more than you should trust parapsychology results. (Actually, parapsychology results tend to be significantly better supported.) Almost all psychology is diseased science; the hypotheses are often interesting, the statistical evidence given for them is often anti-informative.)