Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: orthonormal 30 May 2010 06:08:19PM 2 points [-]

The latter. Until I see a real-world case where CRM has been very effective compared to other methods, I'm not going to give much credit to claims that it will achieve greatness in this, that and the other field.

And in particular, I find it extremely unlikely that current major theories of linguistics could in practice be coded into compressors, in a way that satisfies their proponents.

Comment author: marks 06 June 2010 04:28:37AM 1 point [-]

This isn't precisely what Daniel_Burfoot was talking about but its a related idea based on "sparse coding" and it has recently obtained good results in classification:

http://www.di.ens.fr/~fbach/icml2010a.pdf

Here the "theories" are hierarchical dictionaries (so a discrete hierarchy index set plus a set of vectors) which perform a compression (by creating reconstructions of the data). Although they weren't developed with this in mind, support vector machines also do this as well, since one finds a small number of "support vectors" that essentially allow you to compress the information about decision boundaries in classification problems (support vector machines are one of the very few things from machine learning that have had significant and successful impacts elsewhere since neural networks).

The hierarchical dictionaries learned do contain a "theory" of the visual world in a sense, although an important idea is that they do so in a way that is sensitive to the application at hand. There is much left out by Daniel_Burfoot about how people actually go about implementing this line of thought.

Comment author: RichardKennaway 01 June 2010 10:13:04PM 1 point [-]

Any model useful for AI does compression, period.

Any model does compression, period. What is the particular relevance to AI? And if that was your downvote on my other comment, how does thinking of AI in terms of compression help to develop AI?

Comment author: marks 06 June 2010 04:13:29AM 1 point [-]

(A text with some decent discussion on the topic)[http://www.inference.phy.cam.ac.uk/mackay/itila/book.html]. At least one group that has a shot at winning a major speech recognition benchmark competition uses information-theoretic ideas for the development of their speech recognizer. Another development has been the use of error-correcting codes to assist in multi-class classification problems (google "error correcting codes machine learning")[http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=error+correcting+codes+machine+learning] (arguably this has been the clearest example of a paradigm shift that comes from thinking about compression which had a big impact in machine learning). I don't know how many people think about these problems in terms of information theory questions (since I don't have much access to their thoughts): but I do know at least two very competent researchers who, although they never bring it outright into their papers, they have an information-theory and compression-oriented way of posing and thinking about problems.

I often try to think of how humans process speech in terms of information theory (which is inspired by a couple of great thinkers in the area), and thus I think that it is useful for understanding and probing the questions of sensory perception.

There's also a whole literature on "sparse coding" (another compression-oriented idea originally developed by biologist but since ported over by computer vision and a few speech researchers) whose promise in machine learning may not have been realized yet, but I have seen at least a couple somewhat impressive applications of related techniques appearing.

Comment author: PhilGoetz 02 June 2010 07:29:07PM *  4 points [-]

I'm afraid it was my downvote. One example of using compression is Hinton's deep autoencoder networks, which work (although he doesn't say this, the math does) by fine-tuning each layer so as to minimize the entropy of the node activations when presented with the items to be learned. In other words: Instead of trying to figure out what features to detect, develop features that compress the original information well. Magically, these features are very good for performing categorization with.

AI was seldom thought of as compression until about 1986. Also, AI wasn't very good until 1986. Pre-1986, learning was ignored, and logic was king. All the pure-logic approaches suffer from combinatorial explosion, because they don't use entropy to enumerate possibilities in order of usefulness. The hard problems of compression were hidden by supplying AI programs with knowledge already compressed into symbols in the appropriate way; but they still didn't work, unless the number of possible actions/inferences was also restricted artificially.

There are people, like Rodney Brooks, who say logic isn't necessary at all. I wouldn't go that far. So, I overstated: There is work to be done in AI that isn't about compression, except in a very abstract way. Lots of work has been done without thinking of it as being compression. But I would say that the hard stuff that gives us problems (categorization, similarity; recognizing, recalling, and managing state-space trajectories) is closely tied to questions of compression.

Comment author: marks 06 June 2010 03:41:47AM 0 points [-]

I have a minor disagreement, which I think supports your general point. There is definitely a type of compression going on in the algorithm, it's just that the key insight in the compression is not to just "minimize entropy" but rather make the outputs of the encoder behave in a similar manner as the observed data. Indeed, that was one of the major insights in information theory is that one wants the encoding scheme to capture the properties of the distribution over the messages (and hence over alphabets).

Namely, in Hinton's algorithm the outputs of the encoder are fed through a logistic function and then the cross-entropy is minimized (essentially the KL divergence). It seems that he's more providing something like a reparameterization of a probability mass function for pixel intensities which is a logistic distribution when conditioned on the "deeper" nodes. Minimizing that KL divergence means that the distribution is made to be statistically indistinguishable from the distribution over the data intensities (since the KL-divergence minimizes expected log likelihood ratio-which means minimizing the power over the uniformly most powerful test).

Minimizing entropy blindly would mean the neural network nodes would give constant output: which is very compressive but utterly useless.

Comment author: PhilGoetz 02 June 2010 06:03:28PM 0 points [-]

Some argue this on the basis of computational constraints: there's no way that you could ever reasonably compute a moral objective function (because the consquences of any activity are much to complicated)

This attacks a straw-man utilitarianism, in which you need to compute precise results and get the one correct answer. Functions can be approximated; this objection isn't even a problem.

to other critiques that argue the utilitarian notion of "utility" is ill-defined and incoherent (hence the moral objective function has no meaning).

A utility function is more well-defined than any other approach to ethics. How do a deontologist's rules fare any better? A utility function /provides/ meaning. A set of rules is just an incomplete utility function, where someone has picked out a set of values, but hasn't bothered to prioritize them.

Comment author: marks 02 June 2010 06:28:29PM *  1 point [-]

This attacks a straw-man utilitarianism, in which you need to compute precise results and get the one correct answer. Functions can be approximated; this objection isn't even a problem.

Not every function can be approximated efficiently, though. I see the scope of morality as addressing human activity where human activity is a function space itself. In this case the "moral gradient" that the consequentialist is computing is based on a functional defined over a function space. There are plenty of function spaces and functionals which are very hard to efficiently approximate (the Bayes predictors for speech recognition and machine vision fall into this category) and often naive approaches will fail miserably.

I think the critique of utility functions is not that they don't provide meaning, but that they don't necessarily capture the meaning which we would like. The incoherence argument is that there is no utility function which can represent the thing we want to represent. I don't buy this argument mostly because I've never seen a clear presentation of what it is that we would preferably represent, but many people do (and a lot of these people study decision-making and behavior whereas I study speech signals). I think it is fair to point out that there is only a very limited biological theory of "utility" and generally we estimate "utility" phenomenologically by studying what decisions people make (we build a model of utility and try to refine it so that it fits the data). There is a potential that no utility model is actually going to be a good predictor (i.e. that there is some systematic bias). So, I put a lot of weight on the opinions of decision experts in this regard: some think utility is coherent and some don't.

The deontologist's rules seem to do pretty well as many of them are currently sitting in law books right now. They form the basis for much of the morality that parents teach their children. Most utilitarians follow most of them all the time, anyway.

My personal view is to do what I think most people do: accept many hard constraints on one's behavior and attempt to optimize over estimates of projections of a moral gradient along a few dimensions of decision-space. I.e. I try to think about how my research may be able to benefit people, I also try to help out my family and friends, I try to support things good for animals and the environment. These are areas where I feel more certain that I have some sense where some sort of moral objective function points.

Comment author: PhilGoetz 01 June 2010 06:45:47PM *  1 point [-]

If one is a deontologist then one needs to solve a series of constraint-satisfaction problems with hard constraints (i.e. they cannot be violated).

If all you require is to not violate any constraints, and you have no preference between worlds where equal numbers of constraints are violated, and you can regularly achieve worlds in which no constraints are violated, then perhaps constraint-satisfaction is qualitatively different.

In the real world, linear programming typically involves a combination of hard constraints and penalized constraints. If I say the hard-constraint solver isn't utilitarian, then what term would I use to describe the mixed-case problem?

The critical thing to me is that both are formalizing the problem and trying to find the best solution they can. The objections commonly made to utilitarianism would apply equally to moral absolutism phrased as a hard constraint problem.

There's the additional, complicating problem that non-utilitarian approaches may simply not be intelligible. A moral absolutist needs a language in which to specify the morals; the language is so context-dependent that the morals can't be absolute. Non-utilitarian approaches break down when the agents are not restricted to a single species; they break down more when "agent" means something like "set".

Comment author: marks 02 June 2010 02:39:08AM 0 points [-]

I would like you to elaborate on the incoherence of deontology so I can test out how my optimization perspective on morality can handle the objections.

Comment author: PhilGoetz 01 June 2010 06:45:47PM *  1 point [-]

If one is a deontologist then one needs to solve a series of constraint-satisfaction problems with hard constraints (i.e. they cannot be violated).

If all you require is to not violate any constraints, and you have no preference between worlds where equal numbers of constraints are violated, and you can regularly achieve worlds in which no constraints are violated, then perhaps constraint-satisfaction is qualitatively different.

In the real world, linear programming typically involves a combination of hard constraints and penalized constraints. If I say the hard-constraint solver isn't utilitarian, then what term would I use to describe the mixed-case problem?

The critical thing to me is that both are formalizing the problem and trying to find the best solution they can. The objections commonly made to utilitarianism would apply equally to moral absolutism phrased as a hard constraint problem.

There's the additional, complicating problem that non-utilitarian approaches may simply not be intelligible. A moral absolutist needs a language in which to specify the morals; the language is so context-dependent that the morals can't be absolute. Non-utilitarian approaches break down when the agents are not restricted to a single species; they break down more when "agent" means something like "set".

Comment author: marks 02 June 2010 02:37:46AM 2 points [-]

To be clear I see the deontologist optimization problem as being a pure "feasibility" problem: one has hard constraints and zero gradient (or approximately zero gradient) on the moral objective function given all decisions that one can make.

Of the many, many critiques of utilitarianism some argue that its not sensible to actually talk about a "gradient" or marginal improvement in moral objective functions. Some argue this on the basis of computational constraints: there's no way that you could ever reasonably compute a moral objective function (because the consquences of any activity are much to complicated) to other critiques that argue the utilitarian notion of "utility" is ill-defined and incoherent (hence the moral objective function has no meaning). These sorts of arguments undermine argue against the possibility of soft-constraints and moral objective functions with gradients.

The deontological optimization problem, on the other hand, is not susceptible to such critiques because the objective function is constant, and the satisfaction of constraints is a binary event.

I would also argue that the most hard-core utilitarian practically acts pretty similarly to a deontologist. The reason is that we only consider a tiny subspace of all possible decisions, and our estimate of the moral gradient will be highly inaccurate over most possible decision axis (I buy the computational-constraint critique), and its not clear that we have enough information about human experience in order to compute those gradients. So, practically speaking: we only consider a small number of different way to live our lives (hence we optimize over a limited range of axes) and the directions we optimize over is not-random for the most part. Think about how most activists and most individuals who perform any sort of advocacy focus on a single issue.

Also consider the fact that most people don't murder or perform certain forms of horrendous crimes. These single issue thinking, law-abiding types may not think of themselves as deontologist but a deontologist would behave very similarly to them since neither attempts to estimate moral gradients over decisions and treats many moral rules as binary events.

The utilitarian and the deontologist are distinguished in practice in that the utilitarian computes a noisy estimate of the moral gradient along a few axes of their potential decision-space: while everywhere else we think of hard constraints and no gradients on the moral objective. The pure utilitarian is at best a theoretical concept that has no potential basis in reality.

Comment author: PhilGoetz 01 June 2010 05:20:18PM *  1 point [-]

If you're optimizing, you're a form of utilitarian. Even if all you're optimizing is "minimize the number of times Kant's principles X, Y, and Z are violated".

This makes the utilitarian/non-utilitarian distinction useless, which I think it is. Everybody is either a utilitarian of some sort, a nihilist, or a conservative, mystic, or gambler saying "Do it the way we've always done it / Leave it up to God / Roll the dice". It's important to recognize this, so that we can get on with talking about "utility functions" without someone protesting that utilitarianism is fundamentally flawed.

The distinction I was drawing could be phrased as between explicit utilitarianism (trying to compute the utility function) and implicit utilitarianism (constructing mechanisms that you expect will maximize a utility function that is implicit in the action of a system but not easily extracted from it and formalized).

Comment author: marks 01 June 2010 05:43:52PM 0 points [-]

I would argue that deriving principles using the categorical imperative is a very difficult optimization problem and that there is a very meaningful sense in which one is a deontologist and not a utilitarian. If one is a deontologist then one needs to solve a series of constraint-satisfaction problems with hard constraints (i.e. they cannot be violated). In the Kantian approach: given a situation, one has to derive the constraints under which one must act in that situation via moral thinking then one must accord to those constraints.

This is very closely related to combinatorial optimization problems. I would argue that often there is a "moral dual" (in the sense of a dual program) where those constraints are no longer treated as absolute and you can assign different costs to each violation and you can then find a most moral strategy. I think very often we have something akin to strong duality where the utilitarian dual is equivalent to the deontological problem, but its an important distinction to remember that the deontologist has hard constraints and zero gradient on their objective functions (by some interpretations).

The utilitarian performs a search over a continuous space for the greatest expected utility, while the deontologist (in an extreme case) has a discrete set of choices, from which the immoral ones are successively weeded out.

Both are optimization procedures, and can be shown to produce very similar output behavior but the approach and philosophy are very different. The predictions of the behavior of the deontologist and the utilitarian can become quite different under the sorts of situations that moral philosophers love to come up with.

Comment author: PhilGoetz 01 June 2010 04:45:03PM *  3 points [-]

This post, and the prior comment it refers to, have something to say; but they're attacking a straw-man version of utilitarianism.

Utilitarianism doesn't have to mean that you take everybody's statements about what they prefer about everything in the world and add them together linearly. Any combining function is possible. Utilitarianism just means that you have a function, and you want to optimize it.

Maximizing something globally -- say, happiness -- can be a dead end. It can hit a local maximum -- the maximum for those people who value happiness -- but do nothing for the people whose highest value is loyalty to their family, or truth-seeking, or practicing religion, or freedom, or martial valor.

Then you make a new utility function that takes those values into account.

You might not be able to find a very good optimum using utilitarianism. But, by definition, not being a utilitarian means not looking for an optimum, which means you won't find any optimum at all unless by chance.

If you still want to argue against utilitarianism, you need to come up with some other plausible way of optimizing. For instance, you could make a free-market or evolutionary argument, that it's better to provide a free market with free agents (or an evolutionary ecosystem) than to construct a utility function, because the agents can optimize collective utility better than a bureaucracy can, without ever needing to know what the overall utility function is.

Comment author: marks 01 June 2010 05:05:36PM 1 point [-]

I agree with the beginning of your comment. I would add that the authors may believe they are attacking utilitarianism, when in fact they are commenting on the proper methods for implementing utilitarianism.

I disagree that attacking utilitarianism involves arguing for different optimization theory. If a utilitarian believed that the free market was more efficient at producing utility then the utilitarian would support it: it doesn't matter by what means that free market, say, achieved that greater utility.

Rather, attacking utilitarianism involves arguing that we should optimize for something else: for instance something like the categorical imperative. A famous example of this is Kant's argument that one should never lie (since it could never be willed to be a universal law, according to him), and the utilitarian philosopher loves to retort that lying is essential if one is hiding a Jewish family from the Nazis. But Kant would be unmoved (if you believe his writings), all that would matter are these universal principles.

Comment author: Sly 31 May 2010 09:33:06PM 3 points [-]

What do you mean by: his condition is not voluntary? Because he recently made the descision to walk everywhere, yet still remains obese his condition is not voluntary?

I am not sure that follows.

Comment author: marks 01 June 2010 02:52:57PM 3 points [-]

Bear in mind that having more fat means that the brain gets starved of (glucose)[http://www.loni.ucla.edu/~thompson/ObesityBrain2009.pdf] and blood sugar levels have (impacts on the brain generally)[http://ajpregu.physiology.org/cgi/content/abstract/276/5/R1223]. Some research has indicated that the amount of sugar available to the brain has a relationship with self-control. A moderately obese person may have fat cells that steal so much glucose from their brain that their brain is incapable of mustering the will in order to get them to stop eating poorly. Additionally, the marginal fat person is likely fat because of increased sugar consumption (which has been the main sort of food whose intake has increased since the origins of the obesity epidemic in the 1970s), in particular there has been a great increase in the consumption of fructose: which is capable of raising insulin levels (which signal to the body to start storing energy as fat) while at the same time not activating leptin (which makes you feel full). Thus, people are consuming this substance that may be kicking their bodies into full gear to produce more fat: which leaves them with no energy or will to perform any exercise.

The individuals most affected by the obesity epidemic are the poor and recall that some of the cheapest sources of calories available on the market are foods like fructose and processed meats. While there is a component of volition regardless, if the body works as the evidence suggests: they may have a diet that is pushing them quite hard towards being obese, sedentary, and unable to do anything about it.

Think about it this way, if you constantly wack me over the head you can probably get me to do all sorts of things that I wouldn't normally do: but it wouldn't be right to call my behavior in that situation "voluntary". Fat people may be in a similar situation.

Comment author: marks 01 June 2010 04:11:36AM 3 points [-]

I think that this post has something to say about political philosophy. The problem as I see it is that we want to understand how our local decision-making affects the global picture and what constraints should we put on our local decisions. This is extremely important because, arguably, people make a lot of local decisions that make us globally worse off: such as pollution ("externalities" in econo-speak). I don't buy the author's belief that we should ignore these global constraints: they are clearly important--indeed its the fear of the potential global outcomes of careless local decision-making that arguably led to the creation of this website.

However, just like a computers we have a lot of trouble integrating the global constraints into our decision-making (which is necessarily a local operation), and we probably have a great deal of bias in our estimates of what is the morally best set of choices for us to make. Just like the algorithm we would like to find some way to make the computational burden on us less in order to achieve these moral ends.

There is an approach in economics to understand social norms advocated by Herbert Gintis [PDF] that is able to analyze these sorts of scenarios. The essential idea is this: agents can engage in multiple correlated equilibria (these are a generalized version of Nash equilibria) possible as a result of various social norms. These correlated equilibria are, in a sense, patched together by a social norm from the "rational" (self-interested, local expected utility maximizers) agents' decisions. Human rights could definitely be understood in this light (I think: I haven't actually worked out the model).

Similar reasoning may also be used to understand certain types of laws and government policies. It is via these institutions (norms, human organizations, etc.) that we may efficiently impose global constraints on people's local decision-making. The karma system, for instance, on Less wrong probably changes the way that people make their decision to comment.

There is a probably a computer science - economics crossover paper here that would describe how institutions can lower the computational burden on individuals in their decision-making: so that when individuals make decisions in these simpler domains we can be sure that we will still be globally better off.

One word of caution is that this is precisely the rational behind "command economies" and these didn't work out so well during the 20th century. So choosing the "patching together" institution well is absolutely essential.

View more: Next