I think my previous argument was at least partly wrong or confused, because I don't really understand what it means for a computation to mean something by a symbol. Here I'll back up and try to figure out what I mean by "mean" first.

Consider a couple of programs. The first one (A) is an arithmetic calculator. It takes a string as input, interprets it a formula written in decimal notation, and outputs the result of computing that formula. For example, A("9+12") produces "21" as output. The second (B) is a substitution cipher calculator. It "encrypts" its input by substituting each character using a fixed mapping. It so happens that B("9+12") outputs "c6b3".

What do A and B mean by "2"? Intuitively it seems that by "2", A means the integer (i.e., abstract mathematical object) 2, while for B, "2" doesn't really mean anything; it's just a symbol that it blindly manipulates. But A also just produces its output by manipulating symbols, so why does it seem like it means something by "2"? I think it's because the way A manipulates the symbol "2" corresponds to how the integer 2 "works", whereas the way B manipuates "2" doesn't correspond to anything, except how it manipulates that symbol. We could perhaps say that by "2" B means "the way B manipulates the symbol '2'", but that doesn't seem to buy us anything.

(Similarly, by "+" A means the mathematical operation of addition, whereas B doesn't really mean anything by it. Note that this discussion assumes some version of mathematical platonism. A formalist would probably say that A also doesn't mean anything by "2" and "+" except how it manipulates those symbols, but that seems implausible to me.)

Going back to meta-ethics, I think a central mystery is what do we mean by "right" when we're considering moral arguments (by which I don't mean Nesov's technical term "moral arguments", but arguments such as "total utilitarianism is wrong (i.e., not right) because it leads to the following conclusions ..., which are obviously wrong"). If human minds are computations (which I think they almost certainly are), then the way that a human mind processes such arguments can be viewed as an algorithm (which may differ from individual to individual). Suppose we could somehow abstract this algorithm away from the rest of the human, and consider it as, say, a program that when given an input string consisting of a list of moral arguments, thinks them over, comes to some conclusions, and outputs those conclusions in the form of a utility function.

If my understanding is correct, what this algorithm means by "right" depends on the details of how it works. Is it more like calculator A or B? It may be that the way we respond to moral arguments doesn't correspond to anything except how we respond to moral arguments. For example, if it's totally random, or depend in a chaotic fashion on trivial details of wording or ordering of its input. This would be case B, where "right" can't really be said to mean anything, at least as far as the part of our minds that considers moral arguments is concerned. Or it may be case A, where the way we process "right" corresponds to some abstract mathematical object or some other kind of external object, in which case I think "right" can be said to mean that external object.

Since we don't know which is the case yet, I think we're forced to say that we don't currently know what "right" means.

New Comment
31 comments, sorted by Click to highlight new comments since:

On "right" in moral arguments. Why does it make sense to introduce the notion of "right" at all? Whenever we are faced with a moral argument, we're moved by specific moral considerations, never abstract rightness. There is a mystery: what makes these arguments worth being moved by? And then we have the answer: they possess the livening quality of elan vital (ahem) meta-ethical morality.

It begins to look more and more compelling to me that "morality" is more like phlogiston than fire, a word with no explanatory power and moving parts that just lumps together all the specific reasons for action, and has too many explanatory connotations for an open question.

Do you take a similar position on mathematical truth? If not, why? What's the relevant difference between "true" and "right"?

For any heuristic, indeed any query that is part of the agent, the normative criterion for its performance should be given by the whole agent. What should truth be, the answers to logical questions? What probability should given event in the world be assigned? These questions are no simpler than the whole of morality. If we define a heuristic that is not optimized by the whole morality, this heuristic will inevitably become obsolete, tossed out whole. If we allow improvements (or see substitution as change), then the heuristic refers to morality, and is potentially no simpler than the whole.

Truth and reality are the most precise and powerful heuristics known to us. Truth as the way logical queries should be answered, and reality as the way we should assign anticipation to the world, plan for some circumstances over others. But there is no guarantee that the "urge to keep on counting" remains the dominant factor in queries about truth, or that chocolate superstimulus doesn't leave a dint on parameters of quantum gravity.

The difference from the overall "morality" is that we know a great deal more about these aspects than about the others. The words themselves are no longer relevant in their potential curiosity-stopping quality.

(Knowledge of these powerful heuristics will most likely lead to humanity's ruin. Anything that doesn't use them is not interesting, an alien AI that doesn't care about truth or reality eliminates itself quickly from our notice. But one that does care about these virtues will start rewriting things we deem important, even if it possesses almost no other virtues.)

Good. I'm adopting this way of thought.

So one possible way forward is to enumerate all our reasons for action, and also all the reasons for discomfort, I guess. Maybe Eliezer was wrong in mocking the Open Source Wish Project. Better yet, we may look for an automated way of enumerating all our "thermostats" and checking that we didn't miss any. This sounds more promising than trying to formulate a unified utility function, because this way we can figure out the easy stuff first (children on railtracks) and leave the difficult stuff for later (torture vs dust specks).

So one possible way forward is to enumerate all our reasons for action

This is a good idea. "What reasons for action do actual people use?" sounds like a better question than "What reasons for action exist?"

Maybe Eliezer was wrong in mocking the Open Source Wish Project.

"Wishes" are directed at undefined magical genies. What we need are laws of thought, methods of (and tools for) figuring out what to do.

Devising a procedure to figure out what to do in arbitrary situations is obviously even harder than creating a human-equivalent AI, so I wouldn't wish this problem upon myself! First I'd like to see an exhaustive list of reasons for action that actual people use in ordinary situations that feel "clear-cut". Then we can look at this data and figure out the next step.

Devising a procedure to figure out what to do in arbitrary situations is obviously even harder than creating a human-equivalent AI

Yes, blowing up the universe with an intelligence explosion is much easier than preserving human values.

Then we can look at this data and figure out the next step.

Sounds like an excuse to postpone figuring out the next step. What do you expect to see, and what would you do depending on what you see? "List of reasons for action that actual people use in ordinary situations" doesn't look useful.

Thinking you can figure out the next step today is unsubstantiated arrogance. You cannot write a program that will win the Netflix Prize if you don't have the test dataset. Yeah I guess a superintelligence could write it blindly from first principles, using just a textbook on machine learning, but seriously, WTF.

With Netflix Prize, you need for training the kind of data that you want to predict. Predicting what stories people will tell in novel situations when deciding to act is not our goal.

Why not? I think you could use that knowledge to design a utopia that won't make people go aaaargh. Then build it, using AIs or whatever tools you have.

The usual complexity of value considerations. The meaning of the stories (i.e. specifications detailed enough to actually implement, the way they should be and not simply the way a human would try elaborating) is not given just by the text of the stories, and once you're able to figure out the way things should be, you no longer need human-generated stories.

This is a different kind of object, and having lots of stories doesn't obviously help. Even if the stories would serve some purpose, I don't quite see how waiting for an explicit collection of stories is going to help in developing the tools that use them.

So one possible way forward is to enumerate all our reasons for action...

Are you aware of this thread?

[-][anonymous]00

Yeah. Unlike lukeprog, I'm proposing to enumerate all reasons for action that actual humans follow, not all "theoretically possible" reasons which is obviously stupid.

"Reason for action" is no more enlightening than "morality", but with less explanatory (curiosity-stopping) connotations. In that context, it was more of "that hot yellow-ish stuff over there" as opposed to "phlogiston".

This is all philosophy of language, yo.

I tend toward Searle's approach to the subject. I think that investing much more than he does into the concept of 'meaning' is a mistake. "What does 'right' mean?" is a wrong question. The correct question is: "What do you mean by 'right'?" Or, more generally: "What effect do you hope to achieve by invoking the communication symbol of 'right' in your speech act?"

Which is, incidentally, why I find Eliezer's meta-ethical move of rigid designation for the meaning of "right" so unnecessary. My current attitude is that things would be clearer if we Taboo-ed the entire field of ethics.

"What does 'right' mean?" is a wrong question. The correct question is: "What do you mean by 'right'?"

I agree the initial question should be the latter (and it is the one I'm asking here), unless we can show that everyone means the same thing by "right".

"What effect do you hope to achieve by invoking the communication symbol of 'right' in your speech act?"

In the case of the calculator, it's not hoping to achieve anything, so it means nothing by "2"?

In the case of the calculator, it's not hoping to achieve anything, so it means nothing by "2"?

What makes you think you can compare humans with calculators? We are all quantum amplitudes, it's all cause and effect. But if the previous sentence would settle all issues, why do we still talk about it if reductionism is the answer? I haven't read most of the sequences yet, so it is a honest question. What made you pose that question?

Not quite, but I don't feel comfortable explaining my view on that yet.

[-][anonymous]00

It's kind of frustrating when you keep denying seemingly obvious implications of your philosophical positions without explaining why they're not implications. But I'll try to be patient...

...while for B, "2" doesn't really mean anything; it's just a symbol that it blindly manipulates.

I think I understand what concepts you were gesturing towards with this example, but for me the argument doesn't go through. The communication failure suggests to me that you might need to dissolve some questions around syntax, semantics, and human pscyhology. In the absence of clear understanding here I would fear a fallacy of equivocation on the term "meaning" in other contexts as well.

The problem is that B seems to output a "3" every single time it sees a "2" in the input. By "3" it functionally communicates "there was a '2' in the corresponding input" and presumably a "2" in the output functionally communicates some stable fact of the input such as that there was a "Q" in it.

This is a different functional meaning that A communicates, but the distance between algorithms A and B isn't very far. One involves a little bit of code and the other involves a little bit more, but these are both relatively small scripts that can be computed using little more memory than is needed to store the input itself.

I could understand someone using the term "meaning" to capture a sense where A and B are both capable of meaning things because they functionally communicate something to a human observer by virtue of their stably predictable input/output relations. Equally, however, I would accept a sense where neither algorithm was capable of meaning something because (to take one trivial example of the way humans are able to "mean" or "not mean" something) neither algorithm is capable of internally determining the correct response, examining the speech context they find themselves in, and emitting either the correct response or a false response that better achieves their goals within that speech context (such as to elicit laughter or to deceive the listener).

You can't properly ask algorithm A "Did you really mean that output?" and get back a sensible answer, because algorithm A has no English parsing abilities, nor a time varying internal state, nor social modeling processes capable of internally representing your understanding (or lack of understanding) of its own output, nor a compressed internal representation of goal outcomes (where fiddling with bits of the goal representation would leave algorithm A continuing to produce complex goal directed behavior except re-targeted at some other goal than the outcome it was aiming for before its goal representation was fiddled with).

I'd be willing to accept an argument that used "meaning" in a very mechanistic sense of "reliable indication" or a social sense of "honest communication where dishonest communication was possible" or even some other sense that you wanted to spell out and then re-use in a careful way in other arguments... But if you want to use a primitive sense of "meaning" that applies to a calculator and then claim that that is what I do when I think or speak, then I don't think I'll find it very convincing.

My understanding of words like "meaning" and conjugations of "to be" starts from the assumption that they are levers for referring to the surface layer of enormously complex cognitive modeling tools for handling many radically different kinds of phenomena where it is convenient to paper over the complexity to order to get some job done, like dissecting precisely what a love interest "meant" when they said you "were being" coy. What "means" means, or what "were being" means in that sort of context is patently obvious to your average 13 year old... except that its really hard to spell that sort of thing out precisely enough to re-implement it in code or express the veridicality conditions in formal logic over primitive observables.

We are built to model minds, just like we are built to detect visual edges. We do these tasks wonderfully and we introspect on them terribly, which means re-using a concept like "meaning" in your foundational moral philosophy is asking for trouble :-P

I normally think of meaning in terms of isomorphisms between systems.

That is, if we characterize a subset of domain X as entity Ax with relationships to other entities in X (Bx, Cx, and so forth), and we characterize those relationships in certain ways... inhibitory and excitory linkages of spreading activation, for example, or correlated appearance and disappearance, or identity of certain attributes like color or size, and entity Ay in domain Y has relationships to other entities in Y (By, Cy, and so forth), and there's a way to characterize Ax's relationships to Bx, Cx, etc. such that Ay has the same relationships to By, Cy, etc., then it makes sense to talk about Ax, Bx, and Cx having the same meaning as Ay, By, and Cy.

(In and of itself, this is a symmetrical relationship... it suggests that if "rock" means rock, then rock means "rock." Asymmetry can be introduced via the process that maintains the similar relationships and what direction causality flows through it... for example, if saying "The rock crumbles into dust" causes the rock to crumble into dust, it makes sense to talk about the object meaning the word; if crumbling the rock causes me to make that utterance, it makes sense to talk about the word meaning the object. If neither of those things happens, the isomorphism is broken and it stops making sense to say there's any meaning involved at all. But I digress.)

The relationship between the string "2" in a calculator and the number of FOOs in a pile of FOOs (or, to state that more generally, between "2" and the number 2) is different for A and B, so "2" means different things in A and B.

But it's worth noting that isomorphisms can be discovered by adopting new ways of characterizing relationships and entities within domains. That is, the full meaning of "2" isn't necessarily obvious, even if I've been using the calculator for a while. I might suddenly discover a way of characterizing the relationships in A and B such that "2" in A and B are isomorphic... that is, I might discover that "2" really does mean the same thing in A and B after all!

This might or might not be a useful discovery, depending on how useful that new way of characterizing relationships is.

As lukeprog notes, all of this is pretty standard philosophy of language.

So, if asked what "right" means to me, I'm ultimately inclined to look at what relationship "right" has to other things in my head, and what kinds of systems in the world have isomorphic patterns of relationships, and what entities in those systems correspond to "right", and whether other people's behavior is consistent with their having similar relationships in their head.

I mostly conclude based on this exercise that to describe an action as being "right" is to imply that a legitimate (though unspecified) authority endorses that action. I find that increasingly distasteful, and prefer to talk about endorsing the action myself.

We could perhaps say that by "2" B means "the way B manipulates the symbol '2'", but that doesn't seem to buy us anything.

I think the meaning of a concept-detector (the concept it detects) in the more naive sense should be considered in the context of an agent, and refers to the way this concept-detector should be, as far as the agent can see (i.e. it's a question of knowable self-improvement). And the way it should be of course depends on how its properties control agent's actions, i.e. on the role the symbol plays in agent's algorithm. (This promises simplification in so far as the context-detector is blind to its purpose, i.e. its usability and direction of refinement is seen only in terms of other instrumental heuristics.)

What the "2" in the output of the calculator means depends on the algorithm that produces numerical judgements. But what the "2" in the input means depends on the algorithm that uses numerical inputs.

It seems like we know pretty well how the algorithm that uses moral judgements works. It, in most cases, causes us to do X when we think X is moral.

Doesn't that still leave the problem of what the algorithm that produces moral judgements means by "moral", "should", etc?

To go back to the calculator analogy, suppose our calculator is sitting in a hailstorm and its buttons are being punched randomly as a result. It seems fair to say that the hailstorm doesn't mean anything by "2". If the algorithm that produces moral judgements is like the hailstorm, couldn't we also say that moral judgements don't really mean anything?

If I am located at the center of a blurring, buzzing confusion, do statements like "I see a red circle" have no meaning?

If you're say that what you mean by "ought" is what the part of you that uses moral judgments means by "ought", then I don't understand why you choose to identify with that part of you and not with the part of you that produces moral judgements. EDIT: Doing so makes it easier to "solve the problem of meta-ethics" but you end up with a solution that doesn't seem particularly interesting or useful. But maybe I'm wrong about that. Continued here.

You cannot make sense of "meaning" unless you assume some form of compositionality. We know the meaning of A: it is a (partial) function which maps some strings as input to some other strings as output. We could decompose A into an algorithm which parses a string to an abstract syntax tree, another algorithm which evaluates the AST and outputs an internal representation of its result, and a third algorithm which prettyprints the result. This makes sense, since our programming language probably has a syntax construct with the meaning of function composition.

But then what's the meaning of an input, such as "2" ? Taking our algorithm as given, "2" is sort of like a function which waits for the rest of our input and then feeds it to the evaluator. Under some niceness assumptions, we can reason about the whole thing and get something better: "2" means: "the leftmost leaf of the abstract syntax tree is of the form: 2 * 10^n + i_(n-1) * 10^(n-1) + i_(n-2) * 10^(n-2) + ... + i_1 * 10 + i_0 , where n >= 0 and each i_k is a natural number from 0 to 9". This assumes that we can assign natural numbers and trees as the meaning of our inner representation, which requires some handwaving in itself. If the programming language has primitive data types and operations for these, then this is easily justified. Otherwise, we would need to prove that some data representations within our program are isomorphic to the naturals, etc. and that the operations are also implemented correctly.

Is it more like calculator A or B?

As a layman I'm not sure how B is not arithmetic as well?

...then the way that a human mind processes such arguments can be viewed as an algorithm...

What is right isn't defined by the algorithm alone but also by the environment, nurture, education and a lot of other variables. Change the circumstances and you change what is right.

You might argue that there are arguments that do not satisfy our evolutionary mind template. But that's not true, because we already encounter rules that don't feel intuitively right but that we nonetheless adopt. In such situations we call ourselves biased and want to change our minds. That's why an FAI trying to extrapolate human volition might fail for two reasons: 1.) our volition is broad and fuzzy, to an extent that right is too vague and volatile 2.) by changing the circumstances it will change what is right and therefore eventually define what is right.