Kaj_Sotala comments on Three Approaches to "Friendliness" - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (84)
So after giving this issue some thought: I'm not sure to what extent a white-box metaphilosopical AI will actually be possible.
For instance, consider the Repugnant Conclusion. Derek Parfit considered some dilemmas in population ethics, put together possible solutions at them, and then noted that the solutions led to an outcome which again seemed unacceptable - but also unavoidable. Once his results had become known, a number of other thinkers started considering the problem and trying to find a way way around those results.
Now, why was the Repugnant Conclusion considered unacceptable? For that matter, why were the dilemmas whose solutions led to the RC considered "dilemmas" in the first place? Not because any of them would have violated any logical rules of inference. Rather, we looked at them and thought "no, my morality says that that is wrong", and then (engaging in motivated cognition) began looking for a consistent way to avoid having to accept the result. In effect, our minds contained dynamics which rejected the RC as a valid result, but that rejection came from our subconscious values, not from any classical reasoning rule that you could implement in an algorithm. Or you could conceivably implement the rule in the algorithm if you had a thorough understanding of our values, but that's not of much help if the algorithm is supposed to figure out our values.
You can generalize this problem to all kinds of philosophy. In decision theory, we already have an intuitive value of what "winning" means, and are trying to find a way to formalize it in a way that fits our value. In epistemology, we have some standards about the kind of "truth" that we value, and are trying to come up with a system that obeys those standards. Etc.
The root problem is that classification and inference require values. As Watanabe (1974) writes:
"Progress" in philosophy essentially means "finding out more about the kinds of things that we value, drawing such conclusions that our values say are correct and useful". I am not sure how one could make an AI make progress in philosophy if we didn't already have a clear understanding of what our values were, so "white-box metaphilosophy" seems to just reduce back to a combination of "normative AI" and "black-box metaphilosophy".
Coincidentally, I ended up reading Evolutionary Psychology: Controversies, Questions, Prospects, and Limitations today, and noticed that it makes a number of points that could be interpreted in a similar light: in that humans do not really have a "domain-general rationality", and that instead we have specialized learning and reasoning mechanisms, each of which are carrying out a specific evolutionary purpose and which are specialized for extracting information that's valuable in light of the evolutionary pressures that (used to) prevail. In other words, each of them carries out inferences that are designed to further some specific evolutionary value that helped contribute to our inclusive fitness.
The paper doesn't spell out the obvious implication, since that isn't its topic, but it seems pretty clear to me: since our various learning and reasoning systems are based on furthering specific values, our philosophy has also been generated as a combination of such various value-laden systems, and we can't expect an AI reasoner to develop a philosophy that we'd approve of unless its reasoning mechanisms also embody the same values.
That said, it does suggest a possible avenue of attack on the metaphilosophy issue... figure out exactly what various learning mechanisms we have and which evolutionary purposes they had, and then use that data to construct learning mechanisms that carry out similar inferences as humans do.
Quotes:
I always suspected that natural kinds depended on an underdetermined choice of properties, but I had no idea there was or could be a theorem saying so. Thanks for pointing this out.
Does a similar point apply to Solomonoff Induction? How does the minimum length of the program necessary to generate a proposition, vary when we vary the properties our descriptive language uses?