Selective processes bring tag-alongs (but not always!)

AnnaSalamon

by Anna Salamon and Steve Rayhawk (joint authorship)

Let’s say you have a bucket full of “instances” (e.g., genes, hypotheses, students, foods), and you want to choose a good one. You fish around in the bucket, draw out the first 10 instances you find, and pick the instance that scores highest on some selection criterion.

For example, perhaps your selection criterion is “number of polka dots”, and you reach into the bucket pictured below, and you draw out 10 instances. What do you get? Assuming some instances have more polka dots than others, you get hypotheses with an above average number of expected polka dots. The point I want to dwell on, though -- which is obvious when you think about it, but which sheds significant light on everyday phenomena -- is that you don’t get instances that are just high in polka dots. You get instances are also high in every trait that correlates with having the most polka dots.

For example, in the bucket above, selecting for instances that have many polka dots implies inadvertently selecting for instances that are red. Selective processes bring tag-alongs, and the specific tag-alongs that you get (redness, in this case) depend on both the trait you’re selecting for, and the bucket from which you’re selecting.

Nearly all cases of useful selection (e.g., evolution, science) would be unable to produce the cool properties they produce (complex order in organisms, truth in theories) if they didn’t have particular, selection-friendly types of buckets, in addition to good selection criteria. Zoom in carefully enough, and nearly all of the traits one gets by selection can be considered tag-alongs. Conversely, if you are consciously selecting entities from buckets with a particular aim in view, you may want to consciously safeguard the “selection-friendliness” of the buckets you are using.

Some examples:

Algebra test:

Let’s say you’re trying to select for students who know algebra. So you write out an algebra exam, E, find ten students at random from your pool, and see which one does best on exam E. For many sorts of buckets of students, this procedure should work: selecting for the criterion “does well on algebra exam E” will give you more than just that criterion. It’ll give you the tag-along property “does well on other algebra problems” or “understands algebra”.

But not for all buckets. If, for example, you release the test questions ahead of time, you’re liable to end up with a bucket of students for which the tag-along property which “does well on algebra exam E” gives you is only “memorized the answers to exam E”, and not “understands algebra”.

Taste and nutrition:

Or, again, perhaps you’re designing a test for healthy foods. In a hunter-gatherer environment, human tastes are perhaps not a bad indicator. Grab 10 foodstuffs at random from your bucket, select for “best tasting”, and you’re liable to get “above average nutrition” as a tag-along.

But for the buckets of food-choices created by modern manufacturing (now that we know chemistry, and we can create compounds on purpose that trigger just those sensory mechanisms that signal “good taste”), selecting for taste no longer selects for nutrition (at least, not as much).

Defensive mimicry:

Selecting against prey items that have warning coloration (particular patterns of bright color, as on poison arrow frogs) will select against poisonous prey items, among some buckets of possible prey. But as other species evolve to mimic the warning coloration pattern, the bucket of prey items changes, and “poisonous” becomes less tied to the indicator trait “such-and-such a coloration pattern”.

Choosing coins that come up heads:

If you have a bucket of fair coins, and you flip 10 of them several times and choose the one that come up heads the most, this one will be no likelier than average to come up heads in later rounds. If you have a bucket where half the coins are two-headed and the other half are two-tailed, then selecting for even a single head is enough to give you a guarantee of heads on every future coin-toss. And if you have a bucket mixed fair and unfair coins, then selecting for heads helps, but more weakly.

Nassim Taleb argues that the financial success of particular traders is similar to the success of particular coins from the fair coins bucket -- selecting for “traders who were successful in the past” doesn’t give you traders who are particularly likely to be successful in the future. Most traders who obtain above-market returns in a particular timespan are, on Taleb’s analysis, traders with ordinary judgment who put their money on a short string of lucky strategies (“buy US real estate”). The bucket of traders doesn’t have the type of structure that allows “future success” to be predicted from “past success”.

In the “Perfect Prediction Scam”, crooks send out all possible prediction-sequences for a small number of e.g. sports outcomes, with each prediction-sequence going to a different set of people. Some string of predictions inevitably does well, allowing the crooks to make money off the proven success of their “psychic powers” or “sophisticated computer models” -- but of course, with a bucket like this, the recipients of the correct prediction-sequence are no likelier than average to receive accurate predictions in the next iteration.

Selecting alleles, within biological evolution:

Alleles are selected based on their differential reproductive impact on one generation -- whether they help a given animal flee from this particular tiger. But it turns out that alleles that help animals flee from this tiger often also help animals flee from other, future tigers. One can imagine a (weird, physically implausible) biology in which alleles that help one flee this particular tiger are not more likely to help one flee other tigers. Technically speaking, the ability to “flee from other, future tigers” is a tag-along, given to animals by the trait-correlations in evolution’s buckets.

With traits as closely linked as “fleeing this tiger” and “fleeing other, future tigers”, it may seem odd to give credit to the bucket of alleles for the fact that selecting for the one trait often also gives you the other trait. One may be tempted to explain our modern tendency to flee tigers by talking solely about “selection”, with no mention of what sort of buckets evolution was pulling traits from. But there are plenty of recurring biological traits that one does intuitively regard as due to co-incidences in evolution’s buckets. Whiteness of bones does not in itself much impact fitness, but it so happens that when evolution selected for bones with advantageous structural properties, it ended up also selecting for bones with a particular, fairly uniform, color.

Moreover, the traits that tag along with particular selection criteria vary from species to species. Depending on evolution’s starting-point, selecting for particular properties will sometimes simultaneously select for particular other traits, and sometimes not. If you take a group of migratory moths, and select for “ability to find the egg-laying site”, you will improve their ability to find a single, particular location. If you take a group of hominids and select for “ability to find their way home”, you may instead improve their general navigational ability. (Similarly, selective pressures for bees to communicate with other bees produced a fixed, species-universal communication system, while selective pressures on hominids to communicate with other hominids produced a system for learning any language that fits a particular, species-universal language template. Species such as hominids that have large brains, and that can therefore have their brains tweaked in different directions by particular chance alleles, create different types of variation-buckets for evolution to select within -- perhaps buckets whose tag-alongs more often include “general” abilities.)

So... it is naively tempting to view traits as “due to selection” or “due to chance correlations in evolution’s buckets”. One might imagine “running from tigers” as a single, unified property that of course you should get “from selection”, when you select for running from past tigers. One might imagine bones’ whiteness as a “mere chance” property that you get “from the correlations in the bucket”, based on the accident that candidate-bones with good structural properties also happen to have a certain color. But I suspect that a person who solidly understood evolution would see all biological traits (white bones; navigational ability in moths and hominids; tiger fleeing) as due to a sort of tagging-along process that depends on both the traits being selected for, and the buckets of alleles selected among. The effective categories available to evolution (like “tiger-fleeing”, or “finding your way to the nest site” vs. “finding your way anywhere”) are somehow in the buckets... so how did they get in there? And what kind of categories land in naturally occurring buckets? More on this in later posts.

Reason vs. rationalization:

If you search out the best arguments for each position and exert equal strength on each search, then selecting for the position for which you found the strongest arguments will often also select for positions that are true, as a tag-along.

If you instead find the best arguments you can for your initial opinions, and only allow yourself to notice weak evidence for your opponents’ positions -- if you avoid your beliefs' real weak points -- then selecting for the positions for which you found the strongest arguments will no longer much select for positions that are true.

Reasoning produces more “selection-friendly” buckets of arguments than does rationalization (for a person seeking truth).

(The above might not seem as though it has that much to do with rationality. But it lays groundwork for a couple other posts I want to do, that help explain where categories come from and what kind of use we do and can make of categories, and of generalization from past to future data-sets, in science.)

[-]Emile15y150

I wish this post put more emphasis on the difference between "Wanting A, selecting for A and getting B too" and "Wanting B, but selecting for A because it's more accessible".

In both cases you get something you don't want, but for different reasons. The first illustration with the bucket of instances is the first (you don't want the tag along stuff), but most of the others are the second (you want the tag-along, but have no easy way to select for it, for example because you're natural selection and thus very stupid).

[-]talisman15y50

I do a lot of interviewing candidates for jobs, and it's essential to be aware of both those concepts. In working on our hiring process, we discuss both concepts, in words very similar to yours.

I've heard occasional complaints about certain things we do in our interviews, of the form "what does X have to do with being a good Y?!". These complaints invariably come from people who didn't get offers, and give me a warm glow at having made the correct decision.

[-]Eliezer Yudkowsky15y50

I call these "secondary optimization effects".

Optimization squeezes a targeted state or future into regions high in the optimizer's preference function. This also squeezes all correlates of the optimizer's preference function. When we squeeze A, it affects every B with P(B|A) != P(B).

[-][anonymous]15y40

It seems like the answer to "how did the categories get in the bucket" is 'they are whatever is left after the constraints of the system have removed everything else.'

For example, imagine you have a room full of different kinds of dice. A lever is pulled and any die that is not six-sided is removed from the room. All that's left are six-sided dice. You can no longer select for die that tend to land on seven, as the system has removed them as a possible option.

For evolution, the constraints would be the laws of physics. Whatever is left after the laws of physics have removed the impossibilites is what can be selected for.

[-]Johnicholas15y20

This post reminds me of Inductive Bias. The space of possible hypotheses corresponds to the bucket. Even if it's a seemingly "big" space of hypotheses, the bucket's influence on the outcome can be large.

LESSWRONG
LW