Kaj_Sotala comments on Three Approaches to "Friendliness" - Less Wrong

14 Post author: Wei_Dai 17 July 2013 07:46AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (84)

You are viewing a single comment's thread.

Comment author: Kaj_Sotala 18 July 2013 03:38:36PM *  4 points [-]

Normative AI - Solve all of the philosophical problems ahead of time, and code the solutions into the AI.
Black-Box Metaphilosophical AI - Program the AI to use the minds of one or more human philosophers as a black box to help it solve philosophical problems, without the AI builders understanding what "doing philosophy" actually is.
White-Box Metaphilosophical AI - Understand the nature of philosophy well enough to specify "doing philosophy" as an algorithm and code it into the AI.

So after giving this issue some thought: I'm not sure to what extent a white-box metaphilosopical AI will actually be possible.

For instance, consider the Repugnant Conclusion. Derek Parfit considered some dilemmas in population ethics, put together possible solutions at them, and then noted that the solutions led to an outcome which again seemed unacceptable - but also unavoidable. Once his results had become known, a number of other thinkers started considering the problem and trying to find a way way around those results.

Now, why was the Repugnant Conclusion considered unacceptable? For that matter, why were the dilemmas whose solutions led to the RC considered "dilemmas" in the first place? Not because any of them would have violated any logical rules of inference. Rather, we looked at them and thought "no, my morality says that that is wrong", and then (engaging in motivated cognition) began looking for a consistent way to avoid having to accept the result. In effect, our minds contained dynamics which rejected the RC as a valid result, but that rejection came from our subconscious values, not from any classical reasoning rule that you could implement in an algorithm. Or you could conceivably implement the rule in the algorithm if you had a thorough understanding of our values, but that's not of much help if the algorithm is supposed to figure out our values.

You can generalize this problem to all kinds of philosophy. In decision theory, we already have an intuitive value of what "winning" means, and are trying to find a way to formalize it in a way that fits our value. In epistemology, we have some standards about the kind of "truth" that we value, and are trying to come up with a system that obeys those standards. Etc.

The root problem is that classification and inference require values. As Watanabe (1974) writes:

According to the theorem of the Ugly Duckling, any pair of nonidentical objects share an equal number of predicates as any other pair of nonidentical objects, insofar as the number of predicates is finite [10], [12]. That is to say, from a logical point of view there is no such thing as a natural kind. In the case of pattern recognition, the new arrival shares the same number of predicates with any other paradigm of any class. This shows that pattern recognition is a logically indeterminate problem. The class-defining properties are generalizations of certain of the properties shared by the paradigms of the class. Which of the properties should be used for generalization is not logically defined. If it were logically determinable, then pattern recognition would have a definite answer in violation of the theorem of the Ugly Duckling.

This conclusion is somewhat disturbing because our empirical knowledge is based on natural kinds of objects. The source of the trouble lies in the fact that we were just counting the number of predicates in the foregoing, treating them as if they were all equally important. The fact is that some predicates are more important than some others. Objects are similar if they share a large number of important predicates.

Important in what scale? We have to conclude that a predicate is important if it leads to a classification that is useful for some purpose. From a logical point of view, a whale can be put together in the same box with a fish or with an elephant. However, for the purpose of building an elegant zoological theory, it is better to put it together with the elephant, and for classifying industries it is better to put it together with the fish. The property characterizing mammals is important for the purpose of theory building in biology, while the property of living in water is more important for the purpose of classification of industries.

The conclusion is that classification is a value-dependent task and pattern recognition is mechanically possible only if we smuggle into the machine the scale of importance of predicates. Alternatively, we can introduce into the machine the scale of distance or similarity between objects. This seems to be an innocuous set of auxiliary data, but in reality we are thereby telling the machine our value judgment, which is of an entirely extra-logical nature. The human mind has an innate scale of importance of predicates closely related to the sensory organs. This scale of importance seems to have been developed during the process of evolution in such a way as to help maintain and expand life [12], [14].

"Progress" in philosophy essentially means "finding out more about the kinds of things that we value, drawing such conclusions that our values say are correct and useful". I am not sure how one could make an AI make progress in philosophy if we didn't already have a clear understanding of what our values were, so "white-box metaphilosophy" seems to just reduce back to a combination of "normative AI" and "black-box metaphilosophy".

Comment author: Kaj_Sotala 19 July 2013 01:16:43PM 2 points [-]

Coincidentally, I ended up reading Evolutionary Psychology: Controversies, Questions, Prospects, and Limitations today, and noticed that it makes a number of points that could be interpreted in a similar light: in that humans do not really have a "domain-general rationality", and that instead we have specialized learning and reasoning mechanisms, each of which are carrying out a specific evolutionary purpose and which are specialized for extracting information that's valuable in light of the evolutionary pressures that (used to) prevail. In other words, each of them carries out inferences that are designed to further some specific evolutionary value that helped contribute to our inclusive fitness.

The paper doesn't spell out the obvious implication, since that isn't its topic, but it seems pretty clear to me: since our various learning and reasoning systems are based on furthering specific values, our philosophy has also been generated as a combination of such various value-laden systems, and we can't expect an AI reasoner to develop a philosophy that we'd approve of unless its reasoning mechanisms also embody the same values.

That said, it does suggest a possible avenue of attack on the metaphilosophy issue... figure out exactly what various learning mechanisms we have and which evolutionary purposes they had, and then use that data to construct learning mechanisms that carry out similar inferences as humans do.

Quotes:

Hypotheses about motivational priorities are required to explain empirically discovered phenomena, yet they are not contained within domain-general rationality theories. A mechanism of domain-general rationality, in the case of jealousy, cannot explain why it should be “rational” for men to care about cues to paternity certainty or for women to care about emotional cues to resource diversion. Even assuming that men “rationally” figured out that other men having sex with their mates would lead to paternity uncertainty, why should men care about cuckoldry to begin with? In order to explain sex differences in motivational concerns, the “rationality” mechanism must be coupled with auxiliary hypotheses that specify the origins of the sex differences in motivational priorities. [...]

The problem of combinatorial explosion. Domain-general theories of rationality imply a deliberate cal- culation of ends and a sample space of means to achieve those ends. Performing the computations needed to sift through that sample space requires more time than is available for solving many adaptive problems, which must be solved in real time. Consider a man coming home from work early and discovering his wife in bed with another man. This circumstance typically leads to immediate jealousy, rage, violence, and sometimes murder (Buss, 2000; Daly & Wilson, 1988). Are men pausing to rationally deliberate over whether this act jeopardizes their paternity in future offspring and ultimate reproductive fitness, and then becoming enraged as a consequence of this rational deliberation? The predictability and rapidity of men’s jealousy in response to cues of threats to paternity points to a specialized psychological circuit rather than a response caused by deliberative domain-general rational thought. Dedicated psychological adaptations, because they are activated in response to cues to their corresponding adaptive problems, operate more efficiently and effectively for many adaptive problems. A domain-general mechanism “must evaluate all alternatives it can define. Permutations being what they are, alternatives increase exponentially as the problem complexity increases” (Cosmides & Tooby, 1994, p. 94). Consequently, combinatorial explosion paralyzes a truly domain-general mechanism (Frankenhuis & Ploeger, 2007). [...]

In sum, domain-general mechanisms such as “rationality” fail to provide plausible alternative explanations for psychological phenomena discovered by evolutionary psychologists. They are invoked post hoc, fail to generate novel empirical predictions, fail to specify underlying motivational priorities, suffer from paralyzing combinatorial explosion, and imply the detection of statistical regularities that cannot be, or are unlikely to be, learned or deduced ontogenetically. It is important to note that there is no single criterion for rationality that is independent of adaptive domain. [...]

The term learning is sometimes used as an explana- tion for an observed effect and is the simple claim that something in the organism changes as a consequence of environmental input. Invoking “learning” in this sense, without further specification, provides no additional explanatory value for the observed phenomenon but only regresses its cause back a level. Learning requires evolved psychological adaptations, housed in the brain, that enable learning to occur: “After all, 3-pound cauliflowers do not learn, but 3-pound brains do” (Tooby & Cosmides, 2005, p. 31). The key explanatory challenge is to identify the nature of the underlying learning adaptations that enable humans to change their behavior in functional ways as a consequence of particular forms of environmental input.

Although the field of psychology lacks a complete understanding of the nature of these learning adaptations, enough evidence exists to draw a few reasonable conclu- sions. Consider three concrete examples: (a) People learn to avoid having sex with their close genetic relatives (learned incest avoidance); (b) people learn to avoid eating foods that may contain toxins (learned food aversions); (c) people learn from their local peer group which actions lead to increases in status and prestige (learned prestige criteria). There are compelling theoretical arguments and empirical evidence that each of these forms of learning is best explained by evolved learning adaptations that have at least some specialized design features, rather than by a single all-purpose general learning adaptation (Johnston, 1996). Stated differently, evolved learning adaptations must have at least some content-specialized attributes, even if they share some components. [...]

These three forms of learning—incest avoidance, food aversion, and prestige criteria—require at least some content-specific specializations to function properly. Each op- erates on the basis of inputs from different sets of cues: coresidence during development, nausea paired with food ingestion, and group attention structure. Each has different functional output: avoidance of relatives as sexual partners, disgust at the sight and smell of specific foods, and emulation of those high in prestige. It is important to note that each form of learning solves a different adaptive problem.

There are four critical conclusions to draw from this admittedly brief and incomplete analysis. First, labeling something as “learned” does not, by itself, provide a satisfactory scientific explanation any more than labeling something as “evolved” does; it is simply the claim that environmental input is one component of the causal process by which change occurs in the organism in some way. Second, “learned” and “evolved” are not competing explanations; rather, learning requires evolved psychological mechanisms, without which learning could not occur. Third, evolved learning mechanisms are likely to be more numerous than traditional conceptions have held in psychology, which typically have been limited to a few highly general learning mechanisms such as classical and operant conditioning. Operant and classical conditioning are important, of course, but they contain many specialized adaptive design features rather than being domain general (Ohman & Mineka, 2003). And fourth, evolved learning mechanisms are at least somewhat specific in nature, containing particular design features that correspond to evolved solutions to qualitatively distinct adaptive problems.

Comment author: torekp 24 July 2013 02:11:56AM 0 points [-]

I always suspected that natural kinds depended on an underdetermined choice of properties, but I had no idea there was or could be a theorem saying so. Thanks for pointing this out.

Does a similar point apply to Solomonoff Induction? How does the minimum length of the program necessary to generate a proposition, vary when we vary the properties our descriptive language uses?