I think I've found a better argument that Eliezer's meta-ethics is wrong. The advantage of this argument is that it doesn't depend on the specifics of Eliezer's notions of extrapolation or coherence.

Eliezer says that when he uses words like "moral", "right", and "should", he's referring to properties of a specific computation. That computation is essentially an idealized version of himself (e.g., with additional resources and safeguards). We can ask: does Idealized Eliezer (IE) make use of words like "moral", "right", and "should"? If so, what does IE mean by them? Does he mean the same things as Base Eliezer (BE)? None of the possible answers are satisfactory, which implies that Eliezer is probably wrong about what he means by those words.

1. IE does not make use of those words. But this is intuitively implausible.

2. IE makes use of those words and means the same things as BE. But this introduces a vicious circle. If IE tries to determine whether "Eliezer should save person X" is true, he will notice that it's true if he thinks it's true, leading to Löb-style problems.

3. IE's meanings for those words are different from BE's. But knowing that, BE ought to conclude that his meta-ethics is wrong and morality doesn't mean what he thinks it means.

New Comment
35 comments, sorted by Click to highlight new comments since:
  1. IE does not make use of those words. But this is intuitively implausible.

  2. ...

  3. ...

My initial reaction is that 1, while initially implausible, gains plausibility from the rejection of 2 and 3. So your rebuttal of Eliezer's metaethics needs to take 1 more seriously to be complete.

Ok, let's take 1 more seriously. In order for Eliezer's meta-ethics to qualify as meta-ethics, he has to at least roughly specify what IE is. But how do you specify an idealized version of yourself that reasons about morality without using words like "moral", "right" and "should"? If Eliezer takes Base Eliezer and just deletes the parts of his mind that are related to these words, he's almost certainly not going to like the results. What else could he do?

But how do you specify an idealized version of yourself that reasons about morality without using words like "moral", "right" and "should"?

You don't use those words, you refer to your brain as a whole, which happens to already contain those things, and specify extrapolation operations like time passing that it might go through. (Note that no one has nailed down what exactly the ideal extrapolation procedure would be, although there's some intuition about what is and isn't allowed. There is an implied claim there that different extrapolation procedures will tend to converge on similar results, although this is unlikely to be the case for every moral question or for quantitative moral questions at high precision.)

I meant :

how do you specify an (idealized version of yourself that reasons about morality without using words like "moral", "right" and "should")?

But I think you interpreted me as:

how do you specify an (idealized version of yourself that reasons about morality) without using words like "moral", "right" and "should"?

Indeed I did misinterpret it that way. To answer the other interpretation of that question,

how do you specify an (idealized version of yourself that reasons about morality without using words like "moral", "right" and "should")?

The answer is, I don't think there's any problem with your idealized self using those words. Sure, it's self-referential, but self-referential in a way that makes stating that X is moral equivalent to returning, and asking whether Y is moral equivalent to recursing on Y. This is no different from an ordinary person thinking about a decision they're going to make; the statements "I decide X" and "I decide not-X" are both tautologically true, but this is not a contradiction because these are performatives, not declaratives.

So if an Idealized Aris (IA) knows the exact height of Everest, and the Base Aris doesn't, does that mean the only options are:

  1. IA does not make use of the words "height" and "Everest".

  2. IA makes use of the words and means the same thing as I do, so he can only know the height of Everest if I already know the height of Everest.

  3. IA's referents for the words "height" and "Everest" are different from BA's. Therefore height and Everest doesn't mean what I think it means.

Somewhere your logic has holes in it.

It's not true that if IA and BA mean the same things by "height" and "Everest", IA can only know the height of Everest if BA already knows the height of Everest. My 2 does not use the same (erroneous) logic.

Do you have to the same meaning if you have the same referent?

For example "that beaker of water" to five year old means something wet, fluid and drinkable in a glass container whereas to a chemistry student it means a fluid made of hydrogen and oxygen, that can be used to dilute solutions. Yet they are both talking about the same thing.

You're right. My argument makes more sense if I substitute "meaning" for "referent" in 3 (which I did). Thanks.

[-]FAWS00

Not exactly the same erroneous logic, but erroneous nevertheless. IE's happening to decide to judge something right wouldn't make that right anymore than chopping bits off the kilogram prototype would change the mass of a kilogram, even if Eliezer actually defined right in that way (isn't he refusing to define it that precisely?).

It's (2), but there is no circularity problem. Idealized Eliezer does not make use of those words in any output, because Idealized Eliezer is a simplified model that only accepts input and outputs a goodness score; it's a function IE(x): statement => goodness-score. It never outputs words, so it can't use words like "moral" or "should" except inside its own thoughts. It might (but need not) use those words in its own thoughts, but if it does, then those words will mean "what I am eventually going to output", in which case thinking "X is moral" while computing X is equivalent to a return statement, and asking whether Y is moral while computing X is equivalent to recursion.

The circularity you think you've noticed is simply the observation that IE(x)=IE(x). However, this is not a computation that returns, so IE cannot be implemented that way; if it is recursive, it must be a well-founded recursion, that is, all recursive chains resulting from finite input must have finite length. This is formalized in type theory, and we can prove that particular computations are or are not suitable.

This is perhaps the most promising solution (if we want to stick with Eliezer's approach). I'm not sure it really works though. How does your IE process meta-moral arguments, for example, arguments about whether average utilitarianism or total utilitarianism is right? (Presumably BE wants IE to be influenced by those arguments in roughly the same way that BE would.) What does "right" mean to it while it's thinking about those kinds of arguments?

It could refer to evaluation of potential self-improvements. What the agent does is not necessarily right, and even the thing with highest goodness-score which the agent will fail to find is not necessarily right, because the agent could self-improve instead and compute a right-er action using its improved architecture where there could be no longer any goodness score, for example.

For what it's worth, I am increasingly convinced that Idealized Dave doesn't use words like "moral," "right," and "should," and instead talks about the likely consequences of actions and attributes of those consequences that feed into their sort-order, which in turn motivates actions.

Which is increasingly motivating Base Dave to avoid them.

No idea what the implications of that are.

For what it's worth, Base NihilCredo does exactly that.

I'm not sure what your argument means, because it seems to refer to squishy things called "words" and there are many ways to go wrong when talking about those. Or maybe I'm just being stupid again. Does your argument also work for paperclippers or prime-pebblesorters?

I'm not totally sure what the argument means myself. Here's the situation as I see it. We have an intuition that "right" might be something important but we're not sure what it is, so we want to ask "what does right mean?" Eliezer claimed to have dissolved this question, so we can stop thinking about it and move on. But I keep having the suspicion that the dissolution doesn't quite work. This argument is an attempt to explain why I have this suspicion.

Does your argument also work for paperclippers or prime-pebblesorters?

If they don't use words like "moral", "right", and "should" to think about what to do, but just runs an algorithm that makes decisions without using such words, then no, my argument has nothing to do with them.

Pebblesorters use words whose meaning feels mysterious to them, but the computation that encodes their preferences doesn't need to use any such words, it just counts pebbles in heaps. This shoots down your argument.

Tangentially: the first time I read this comment I parsed the first word as "Philosophers." Which rendered the comment puzzling, but not necessarily wrong.

It is not entirely clear to me that Pebblesorters are good standins for humans in this sort of analogy.

But, leaving that aside... applying Wei Dai's argument to Pebblesorters involves asking whether Idealized Pebblesorters use words like "right" and "should" and "good" and "correct" and "proper" with respect to prime-numbered piles, the way Base Pebblesorters do.

I'm not sure what the answer to that question is. It seems to me that they just confuse themselves by doing so, but I feel that way about humans too.

You're certainly right that the computation that encodes their preferences doesn't involve words, but I don't know what that has to do with anything. The computation that encodes our preferences doesn't involve words either... and so?

The further along this track I go, the less meaningful the question seems. I guess I'm Just Not Getting It.

The vicious circle isn't really there, any more than (in the terminology of Good and Real) a properly constructed street-crossing robot would use the knowledge of its safe disposition to conclude that it must be safe to cross. See also You Provably Can't Trust Yourself.

"Properly constructed" is the key phrase here. The vicious circle shouldn't be there, and wouldn't be if "right" is defined correctly (for example perhaps as a logical construct which itself doesn't refer to "right"), but if by "right" IE means IE's output, then it is there.

By "right", BE and IE both mean "the output of algorithm X". The fact that IE happens to be algorithm X doesn't cause a vicious circle.

There is another option you might have not considered. Eliezer (IE) can not be described to either make use of the words or not. To get from wrong to right you need change. If you reached maximally right you don't need change, therefore you become static. Eliezer (IE) might simply reach a completely static world state that is optimally right, after which it simply holds or becomes unchanging. A program that doesn't run, or is static, can neither be described to make use of a labeling nor to not make use of it.

One might argue that Eliezer (IE) has to preserve an optimal right world state. But that argument implicitly assumes that a world state has to be static. But even a dynamic world state involving a periodic function could not reasonably be described to be right compared to something better if it is optimal. There is no light without darkness, if nothing is wrong then right ceases to exist.

So maybe Eliezer (IE) should not be thought of as either making use of such labels or not but simply as Eliezer (IE).

First of all, when I (Base Dorikka, BD) am using the phrases "X is moral," "X is right," and "I should X" in this comment, I am hypothesizing that a version of myself with a comparatively large amount of computing resources and information (Idealized Dorikka, ID) would come to the conclusion that action X would optimize ID's utility function in relation to the set of all actions that BD could take in the situation.

Note that I am not stating with that what I say is "moral" would for certain optimize ID's utility function in relation to the set of all actions that BD could take in the situation -- my saying that "X is moral" represents me making a probability estimate that I would do so. It's key to my understanding here that I don't actually have any authority to declare something the moral thing, I am just estimating that the probability that X is moral is high.

This is my understanding of the use of such phrases -- if you disagree with my conclusions, check if we disagree on how Eliezer uses the words.

Now, BD does not, of course, have access to ID's computing resources, and so I can only imagine a version of ID a certain level above my own. However, ID would have enough computing resources to imagine an ID+ that was a certain level above him. If ID+ ever came to exist, he would be able to imagine an ID++, and so on.

I think that Eliezer's metaethics would select option #2, and point out that ID's reference for what he should do is ID+, not himself.

Edit: I, however, do have the objection that it isn't meaningful for BD to make an estimate of what ID's decision would be because he doesn't have ID's computing power. BD could not estimate ID's utility function with any more information that BD has access to, so BD's estimate of ID's utility function is no better than BD's own utility function.

IE makes use of those words and means the same things as BE. But this introduces a vicious circle. If IE tries to determine whether "Eliezer should save person X" is true, he will notice that it's true if he thinks it's true, leading to Löb-style problems.

That doesn't seem like as much of a problem. We can posit that IE is right about ethics, in which case it would anyway be true that "Eliezer should save person X" is true if and only if he thinks it's true.

If IE tries to determine whether "Eliezer should save person X" is true, he will notice that it's true if he thinks it's true, leading to Löb-style problems.

Could you spell out those problems?

And in any case, why is IE trying to determine whether "Eliezer should save person X" is true? Shouldn't he limit himself to determining whether Eliezer should save person X?

Compare: If Joe tries to determine whether "256 X 2 = 512" is true, he will notice that it is true if a certain computation he could perform yields 512, and that if he performs that calculation, he will believe it (the quoted statement above) is true. This leads to Löb-style confusion.

The confusion is dissolved because anyone can do the computations operationalizing IE's ethical definitions, if IE can communicate his values to that computational agent.

Or is this one of those situations where the self-referentiality of UDT-like decision theories gets us into trouble?

ETA: Ok, I just noticed that IE is by definition the end result of a dynamic process of reflective updating starting from the Base Eliezer. So IE is a fixpoint defined in terms of itself. Maybe Löb-style issues really do exist.

While Eliezer's moral judgments are tentative, and he would update them if he had access to IE, he does have a process for making judgments that is distinct from thinking about the decisions of FAIs and ideal selves. IE will presumably base decisions entirely on these factors.

Agreed, but when IE is making decisions (using a process that is distinct from thinking about the decisions of FAIs and ideal selves), do the words "morality" or "should" ever enter into his mind and end up making a difference in his decisions? If so, what does he mean by them?

"Morality" describes (the result of) the process he uses. The thing as a whole is about morality, it is not an aspect in his decision process. Morality is the sum of the thousand shards of desire.

Eliezer says that when he uses words like "moral", "right", and "should", he's referring to properties of a specific computation. That computation is essentially an idealized version of himself (e.g., with additional resources and safeguards).

When we use words like "right", we don't refer to "right", we refer to particular heuristics that allow us to make right decisions. These heuristics are indeed properties of us, and improved heuristics are properties of idealized versions of us. But "right" itself is not a property of humans, or the kind of thing we can build or imagine. It's a different kind of referent from the usual concepts, for some reason. We don't refer to "right", we use it through particular heuristics.

Consider an analogy with mathematical truth. When we use "truth", do we refer to particular heuristics that allow us to know which mathematical statements are true, or do we refer to something which exists largely independently of human beings, even though we're not really sure what it is exactly? The latter makes more sense to me.

I do not mean to imply that by "right" we must also refer to something that exists independently of human beings, but I think this analogy shows that we need more than "we can't seem to figure out what 'right' refers to, but clearly these heuristics have something to do with it" in order to conclude that the heuristics are what we mean by "right".

"we can't seem to figure out what 'right' might refer to besides these heuristics"

"Right" as the meta-ethical question doesn't refer to heuristics, but when we talk of "right", what we often mean to do is to refer to those heuristics (but not always). So I didn't contrast the two senses you discuss in the first paragraph.

Do we really often mean to refer to those heuristics by "right"? Can you give me a couple of examples of when we clearly intend to do that?

When one speaks, in some sense one always argues. So when you say that "saving that child was right", you appeal to particular moral intuitions, i.e. heuristics, allowing other people to notice these intuitions and agree with you, not making a meta-ethical claim about this action being in accordance with the essence of "right".

When one speaks, in some sense one always argues.

Sometimes when one speaks one speaks for oneself!