Another Argument Against Eliezer's Meta-Ethics

Wei Dai

I think I've found a better argument that Eliezer's meta-ethics is wrong. The advantage of this argument is that it doesn't depend on the specifics of Eliezer's notions of extrapolation or coherence.

Eliezer says that when he uses words like "moral", "right", and "should", he's referring to properties of a specific computation. That computation is essentially an idealized version of himself (e.g., with additional resources and safeguards). We can ask: does Idealized Eliezer (IE) make use of words like "moral", "right", and "should"? If so, what does IE mean by them? Does he mean the same things as Base Eliezer (BE)? None of the possible answers are satisfactory, which implies that Eliezer is probably wrong about what he means by those words.

1. IE does not make use of those words. But this is intuitively implausible.

2. IE makes use of those words and means the same things as BE. But this introduces a vicious circle. If IE tries to determine whether "Eliezer should save person X" is true, he will notice that it's true if he thinks it's true, leading to Löb-style problems.

3. IE's meanings for those words are different from BE's. But knowing that, BE ought to conclude that his meta-ethics is wrong and morality doesn't mean what he thinks it means.

I think I've found a better argument that Eliezer's meta-ethics is wrong. The advantage of this argument is that it doesn't depend on the specifics of Eliezer's notions of extrapolation or coherence.

1. IE does not make use of those words. But this is intuitively implausible.

3. IE's meanings for those words are different from BE's. But knowing that, BE ought to conclude that his meta-ethics is wrong and morality doesn't mean what he thinks it means.

It's (2), but there is no circularity problem. Idealized Eliezer does not make use of those words in any output, because Idealized Eliezer is a simplified model that only accepts input and outputs a goodness score; it's a function IE(x): statement => goodness-score. It never outputs words, so it can't use words like "moral" or "should" except inside its own thoughts. It might (but need not) use those words in its own thoughts, but if it does, then those words will mean "what I am eventually going to output", in which case thinking "X is moral" while computing X is equivalent to a return statement, and asking whether Y is moral while computing X is equivalent to recursion.

The circularity you think you've noticed is simply the observation that IE(x)=IE(x). However, this is not a computation that returns, so IE cannot be implemented that way; if it is recursive, it must be a well-founded recursion, that is, all recursive chains resulting from finite input must have finite length. This is formalized in type theory, and we can prove that particular computations are or are not suitable.

This is perhaps the most promising solution (if we want to stick with Eliezer's approach). I'm not sure it really works though. How does your IE process meta-moral arguments, for example, arguments about whether average utilitarianism or total utilitarianism is right? (Presumably BE wants IE to be influenced by those arguments in roughly the same way that BE would.) What does "right" mean to it while it's thinking about those kinds of arguments?

14

Another Argument Against Eliezer's Meta-Ethics

14

14

14

Another Argument Against Eliezer's Meta-Ethics

14

14