Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
A common mistake people make with utility functions is taking individual utility numbers as meaningful, and performing operations such as adding them or doubling them. But utility functions are only defined up to positive affine transformation.
Talking about "utils" seems like it would encourage this sort of mistake; it makes it sound like some sort of quantity of stuff, that can be meaningfully added, scaled, etc. Now the use of a unit -- "utils" -- instead of bare real numbers does remind us that the scale we've picked is arbitrary, but it doesn't remind us that the zero we've picked is also arbitrary, and encourages such illegal operations as addition and scaling. It suggests linear, not affine.
But there is a common everyday quantity which we ordinarily measure with an affine scale, and that's temperature. Now, in fact, temperatures really do have an absolute zero (and if you make sufficient use natural units, they have an absolute scale, as well), but generally we measure temperature with scales that were invented before that fact was recognized. And so while we may have Kelvins, we have "degrees Fahrenheit" or "degrees Celsius".
If you've used these scales long enough you recognize that it is meaningless to e.g. add things measured on these scales, or to multiply them by scalars. So I think it would be a helpful cognitive reminder to say something like "degrees utility" instead of "utils", to suggest an affine scale like we use for temperature, rather than a linear scale like we use for length or time or mass.
The analogy isn't entirely perfect, because as I've mentioned above, temperature actually can be measured on a linear scale (and with sufficient use of natural units, an absolute scale); but the point is just to prompt the right style of thinking, and in everyday life we usually think of temperature as an (ordered) affine thing, like utility.
As such I recommend saying "degrees utility" instead of "utils". If there is some other familiar quantity we also tend to use an affine scale for, perhaps an analogy with that could be used instead or as well.
Edit 11/28: Edited note at bottom to note that the random variables should have finite variance, and that this is essentially just L². Also some formatting changes.
This is something that has been bugging me for a while.
The correlation coefficient between two random variables can be interpreted as the cosine of the angle between them. The higher the correlation, the more "in the same direction" they are. A correlation coefficient of one means they point in exactly the same direction, while -1 means they point in exactly opposite directions. More generally, a positive correlation coefficient means the two random variables make an acute angle, while a negative correlation means they make an obtuse angle. A correlation coefficient of zero means that they are quite literally orthogonal.
Everything I have said above is completely standard. So why aren't correlation coefficients commonly expressed as angles instead of as their cosines? It seems to me that this would make them more intuitive to process.
Certainly it would make various statements about them more intuitive. For instance "Even if A is positive correlated with B and B is positively correlated with C, A might be negatively correlated with C." This sounds counterintuitive, until you rephrase it as "Even if A makes an acute angle with B and B makes an acute angle with C, A might make an obtuse angle with C." Similarly, the geometric viewpoint makes it easier to make observations like "If A and B have correlation exceeding 1/√2 and so do B and C, then A and C are positively correlated" -- because this is just the statement that if A and B make an angle of less than 45° and so do B and C, then A and C make an angle of less than 90°.
Now when further processing is to be done with the correlation coefficients, one wants to leave them as correlation coefficients, rather than take their inverse cosines just to have to take their cosines again later. (I don't know that the angles you get this way are actually useful mathematically, and I suspect they mostly aren't.) My question rather is about when correlation coefficients are expressed to the reader, i.e. when they are considered as an end product. It seems to me that expressing them as angles would give people a better intuitive feel for them.
Or am I just entirely off-base here? Statistics, let alone the communication thereof, is not exactly my specialty, so I'd be interested to hear if there's a good reason people don't do this. (Is it assumed that anyone who knows about correlation has the geometric point of view completely down? But most people can't calculate an inverse cosine in their head...)
Formal mathematical version: If we consider real-valued random variables with finite variance on some fixed probability space Ω -- that is to say, L²(Ω) -- the covariance is a positive-semidefinite symmetric bilinear form, with kernel equal to the set of essentially constant random variables. If we mod out by these we can consider the result as an inner product space and define angles between vectors as usual, which gives us the inverse cosine of the correlation coefficient. Alternatively we could just take L²(Ω) and restrict to those elements with zero mean; this is isomorphic (since it is the image of the "subtract off the mean" map, whose kernel is precisely the essentially constant random variables).
Or, yet another obstacle to WBE.
OK, apparently ephaptic coupling is old news, but this seems to be the first example where we can say just what it does and that it is doing something useful.
I assume it's the same as before, but I don't recall where that is and it doesn't seem to be listed on the about page anymore. One of my comments will not mark as read on my userpage even though it will everywhere else...
Recently came across this blog post on Language Log summarizing this recent paper by Laran et al. Super-short version: When people are aware that a slogan is trying to persuade them, reverse-priming effects in which they avoid doing as it suggests can be seen. However, if their attention is drawn away from the fact that it is trying to persuade them, the usual priming effects are seen.
Edit: I think the P2c I wrote originally may have been a bit too weak; fixed that. Nevermind, rechecking, that wasn't needed.
More edits (now consolidated): Edited nontriviality note. Edited totality note. Added in the definition of numerical probability in terms of qualitative probability (though not the proof that it works). Also slight clarifications on implications of P6' and P6''' on partitions into equivalent and almost-equivalent parts, respectively.
One very late edit, June 2: Even though we don't get countable additivity, we still want a σ-algebra rather than just an algebra (this is needed for some of the proofs in the "partition conditions" section that I don't go into here). Also noted nonemptiness of gambles.
The idea that rational agents act in a manner isomorphic to expected-utility maximizers is often used here, typically justified with the Von Neumann-Morgenstern theorem. (The last of Von Neumann and Morgenstern's axioms, the independence axiom, can be grounded in a Dutch book argument.) But the Von Neumann-Morgenstern theorem assumes that the agent already measures its beliefs with (finitely additive) probabilities. This in turn is often justified with Cox's theorem (valid so long as we assume a "large world", which is implied by e.g. the existence of a fair coin). But Cox's theorem assumes as an axiom that the plausibility of a statement is taken to be a real number, a very large assumption! I have also seen this justified here with Dutch book arguments, but these all seem to assume that we are already using some notion of expected utility maximization (which is not only somewhat circular, but also a considerably stronger assumption than that plausibilities are measured with real numbers).
There is a way of grounding both (finitely additive) probability and utility simultaneously, however, as detailed by Leonard Savage in his Foundations of Statistics (1954). In this article I will state the axioms and definitions he gives, give a summary of their logical structure, and suggest a slight modification (which is equivalent mathematically but slightly more philosophically satisfying). I would also like to ask the question: To what extent can these axioms be grounded in Dutch book arguments or other more basic principles? I warn the reader that I have not worked through all the proofs myself and I suggest simply finding a copy of the book if you want more detail.
Peter Fishburn later showed in Utility Theory for Decision Making (1970) that the axioms set forth here actually imply that utility is bounded.
(Note: The versions of the axioms and definitions in the end papers are formulated slightly differently from the ones in the text of the book, and in the 1954 version have an error. I'll be using the ones from the text, though in some cases I'll reformulate them slightly.)
EDIT: Argh, I really failed to read this closely. Rewriting...
Just saw this over at Not Exactly Rocket Science. Chessboxing (or similar games) could help train automatic emotion regulation. Obviously this should generalize. Has this - by which I mean finding things that can help train automatic emotion regulation - been done before? This doesn't seem to be anything new - and this is extrapolation, not experimental results - but it's a neat application.
EDIT: This is now on the Wiki as "Quick reference guide to the infinite". Do what you want with it.
It seems whenever anything involving infinities or measuring infinite sets comes up it generates a lot of confusion. So I thought I would write a quick guide to both to
- Address common confusions
- Act as a useful reference (perhaps this should be a wiki article? This would benefit from others being able to edit it; there's no "community wiki mode" on LW, huh?)
- Remind people that sometimes inventing a new sort of answer is necessary!
I am trying to keep this concise, in some cases substituting Wikipedia links for explanation, but I do want what I have written to be understandable enough and informative enough to answer the commonly occurring questions. Please let me know if you can detect a particular problem. I wrote this very quickly and expect it still needs quite a bit more work to be understandable to someone with very little math background.
I realize many people here are finitists of one stripe or another but this comes up often enough that this seems useful anyway. Apologies to any constructivists, but I am going to assume classical logic, because it's all I know, though I am pointing out explicitly any uses of choice. (For what this means and why anyone cares about this, see this comment.) Also as I intend this as a reference (is there *any* way we can make this editable?) some of this may be things that I do not actually know but merely have read.
Note that these are two separate topics, though they have a bit of overlap.
Primarily, though, my main intention is to put an end to the following, which I have seen here far too often:
Myth #0: All infinities are infinite cardinals, and cardinality is the main method used to measure size of sets.
The fact is that "infinite" is a general term meaning "larger (in some sense) than any natural number"; different systems of infinite numbers get used depending on what is appropriate in context. Furthermore, there are many other methods of measuring sizes of sets, which sacrifice universality for higher resolution; cardinality is a very coarse-grained measure.
[I am hoping this post is not too repetitive, does not spend too much time rehashing basics... also: What should this be tagged with?]
Systems are not always made to be understandable - especially if they were not designed in the first place, like the human brain. Thus, they can often contain variables that are hard to ground in an outside meaning (e.g. "status", "gender"...). In this case, it may often be more appropriate to simply characterize how the variable behaves, rather than worry about attempting to see what it "represents" and "define" it thus. Ultimately, the variable is grounded in the effects it has on the outside world via the rest of the system. Meanwhile it may not represent anything more than "a flag I needed to make this hack work".
I will refer to this as characterizing the object in question rather than defining it. Rather than say what something "is", we simply specify how it behaves. Strictly speaking, characterization is of course a form of definition - indeed, strictly speaking, nearly all definitions are of this form - but I expect you will forgive me if for now I allow a fuzzy notion of "characterization vs. definition" scale.