Uncategories and empty categories

16 PhilGoetz 16 February 2015 01:18AM

Savory

What does "savory" mean when talking about food? Merriam-Webster says:

  • having a pleasant taste or smell
  • having a spicy or salty quality without being sweet
  • pleasing to the sense of taste especially by reason of effective seasoning
  • pungently flavorful without sweetness

Macmillan says:

  • a small piece of food that tastes of salt or spices and is not sweet

But when found in the wild, "savory" is usually contrasted with sweet, and is either freed from the "salt or spices" requirement, or used in a context that already implies "salty, spicy, or sweet." As this debate on chowhounds shows, plenty of cooks think "savory" means "not sweet." It is then not a category, but an uncategory, defined by what it is not.

continue reading »

How to pick your categories

59 [deleted] 11 November 2010 03:13PM

Note: this is intended to be a friendly math post, so apologies to anyone for whom this is all old hat.  I'm deliberately staying elementary for the benefit of people who are new to the ideas.  There are no proofs: this is long enough as it is.

Related: Where to Draw the Boundary, The Cluster Structure of Thingspace, Disguised Queries.

Here's a rather deep problem in philosophy: how do we come up with categories?  What's the difference between a horror movie and a science fiction movie?  Or the difference between a bird and a mammal? Are there such things as "natural kinds," or are all such ideas arbitrary?  

We can frame this in a slightly more mathematical way as follows.  Objects in real life (animals, moving pictures, etc.) are enormously complicated and have many features and properties.  You can think of this as a very high dimensional space, one dimension for each property, and each object having a value corresponding to each property.  A grayscale picture, for example, has a color value for each pixel.  A text document has a count for every word (the word "flamingo" might have been used 7 times, for instance.)  A multiple-choice questionnaire has an answer for each question.  Each object is a point in a high-dimensional featurespace.  To identify which objects are similar to each other, we want to identify how close points are in featurespace.  For example, two pictures that only differ at one pixel should turn out to be similar.

We could then start to form categories if the objects form empirical clusters in featurespace.  If some animals have wings and hollow bones and feathers, and some animals have none of those things but give milk and bear live young, it makes sense to distinguish birds from mammals.  If empirical clusters actually exist, then there's nothing arbitrary about the choice of categories -- the categories are appropriate to the data!

There are a number of mathematical techniques for assigning categories; all of them are basically attacking the same problem, and in principle should all agree with each other and identify the "right" categories.  But in practice they have different strengths and weaknesses, in computational efficiency, robustness to noise, and ability to classify accurately.  This field is incredibly useful -- this is how computers do image and speech recognition, this is how natural language processing works, this is how they sequence your DNA. It also, I hope, will yield insights into how people think and perceive.

Clustering techniques

These techniques attempt to directly find clusters in observations.  A common example is the K-means algorithm.  The goal here is, given a set of observations x1...xn, to partition them into k sets so as to minimize the within-cluster sum of squared differences:

continue reading »

Selective processes bring tag-alongs (but not always!)

30 AnnaSalamon 11 March 2009 08:17AM

by Anna Salamon and Steve Rayhawk (joint authorship)

Related to: Conjuring An Evolution To Serve You, Disguised Queries 

Let’s say you have a bucket full of “instances” (e.g., genes, hypotheses, students, foods), and you want to choose a good one.  You fish around in the bucket, draw out the first 10 instances you find, and pick the instance that scores highest on some selection criterion.

For example, perhaps your selection criterion is “number of polka dots”, and you reach into the bucket pictured below, and you draw out 10 instances.  What do you get?  Assuming some instances have more polka dots than others, you get hypotheses with an above average number of expected polka dots.  The point I want to dwell on, though -- which is obvious when you think about it, but which sheds significant light on everyday phenomena -- is that you don’t get instances that are just high in polka dots.  You get instances are also high in every trait that correlates with having the most polka dots.

For example, in the bucket above, selecting for instances that have many polka dots implies inadvertently selecting for instances that are red.  Selective processes bring tag-alongs, and the specific tag-alongs that you get (redness, in this case) depend on both the trait you’re selecting for, and the bucket from which you’re selecting.

Nearly all cases of useful selection (e.g., evolution, science) would be unable to produce the cool properties they produce (complex order in organisms, truth in theories) if they didn’t have particular, selection-friendly types of buckets, in addition to good selection criteria.  Zoom in carefully enough, and nearly all of the traits one gets by selection can be considered tag-alongs.  Conversely, if you are consciously selecting entities from buckets with a particular aim in view, you may want to consciously safeguard the “selection-friendliness” of the buckets you are using.

continue reading »