Abstracted Idealized Dynamics

Eliezer Yudkowsky

Abstracted Idealized Dynamics

by Eliezer Yudkowsky

12 min read12th Aug 200825 comments

37

Ethics & Morality

Personal Blog

Followup to: Morality as Fixed Computation

I keep trying to describe morality as a "computation", but people don't stand up and say "Aha!"

Pondering the surprising inferential distances that seem to be at work here, it occurs to me that when I say "computation", some of my listeners may not hear the Word of Power that I thought I was emitting; but, rather, may think of some complicated boring unimportant thing like Microsoft Word.

Maybe I should have said that morality is an abstracted idealized dynamic. This might not have meant anything to start with, but at least it wouldn't sound like I was describing Microsoft Word.

How, oh how, am I to describe the awesome import of this concept, "computation"?

Perhaps I can display the inner nature of computation, in its most general form, by showing how that inner nature manifests in something that seems very unlike Microsoft Word—namely, morality.

Consider certain features we might wish to ascribe to that-which-we-call "morality", or "should" or "right" or "good":

• It seems that we sometimes think about morality in our armchairs, without further peeking at the state of the outside world, and arrive at some previously unknown conclusion.

Someone sees a slave being whipped, and it doesn't occur to them right away that slavery is wrong. But they go home and think about it, and imagine themselves in the slave's place, and finally think, "No."

Can you think of anywhere else that something like this happens?

Suppose I tell you that I am making a rectangle of pebbles. You look at the rectangle, and count 19 pebbles on one side and 103 dots pebbles on the other side. You don't know right away how many pebbles there are. But you go home to your living room, and draw the blinds, and sit in your armchair and think; and without further looking at the physical array, you come to the conclusion that the rectangle contains 1957 pebbles.

Now, I'm not going to say the word "computation". But it seems like that-which-is "morality" should have the property of latent development of answers—that you may not know right away, everything that you have sufficient in-principle information to know. All the ingredients are present, but it takes additional time to bake the pie.

You can specify a Turing machine of 6 states and 2 symbols that unfolds into a string of 4.6 × 10¹⁴³⁹ 1s after 2.5 × 10²⁸⁷⁹ steps. A machine I could describe aloud in ten seconds, runs longer and produces a larger state than the whole observed universe to date.

When you distinguish between the program description and the program's executing state, between the process specification and the final outcome, between the question and the answer, you can see why even certainty about a program description does not imply human certainty about the executing program's outcome. See also Artificial Addition on the difference between a compact specification versus a flat list of outputs.

Morality, likewise, is something that unfolds, through arguments, through discovery, through thinking; from a bounded set of intuitions and beliefs that animate our initial states, to a potentially much larger set of specific moral judgments we may have to make over the course of our lifetimes.

• When two human beings both think about the same moral question, even in a case where they both start out uncertain of the answer, it is not unknown for them to come to the same conclusion. It seems to happen more often than chance alone would allow—though the biased focus of reporting and memory is on the shouting and the arguments. And this is so, even if both humans remain in their armchairs and do not peek out the living-room blinds while thinking.

Where else does this happen? It happens when trying to guess the number of pebbles in a rectangle of sides 19 and 103. Now this does not prove by Greek analogy that morality is multiplication. If A has property X and B has property X it does not follow that A is B. But it seems that morality ought to have the property of expected agreement about unknown latent answers, which, please note, generally implies that similar questions are being asked in different places.

This is part of what is conveyed by the Word of Power, "computation": the notion of similar questions being asked in different places and having similar answers. Or as we might say in the business, the same computation can have multiple instantiations.

If we know the structure of calculator 1 and calculator 2, we can decide that they are "asking the same question" and that we ought to see the "same result" flashing on the screen of calculator 1 and calculator 2 after pressing the Enter key. We decide this in advance of seeing the actual results, which is what makes the concept of "computation" predictively useful.

And in fact, we can make this deduction even without knowing the exact circuit diagrams of calculators 1 and 2, so long as we're told that the circuit diagrams are the same.

And then when we see the result "1957" flash on the screen of calculator 1, we know that the same "1957" can be expected to flash on calculator 2, and we even expect to count up 1957 pebbles in the array of 19 by 103.

A hundred calculators, performing the same multiplication in a hundred different ways, can be expected to arrive at the same answer—and this is not a vacuous expectation adduced after seeing similar answers. We can form the expectation in advance of seeing the actual answer.

Now this does not show that morality is in fact a little electronic calculator. But it highlights the notion of something that factors out of different physical phenomena in different physical places, even phenomena as physically different as a calculator and an array of pebbles—a common answer to a common question. (Where is this factored-out thing? Is there an Ideal Multiplication Table written on a stone tablet somewhere outside the universe? But we are not concerned with that for now.)

Seeing that one calculator outputs "1957", we infer that the answer—the abstracted answer—is 1957; and from there we make our predictions of what to see on all the other calculator screens, and what to see in the array of pebbles.

So that-which-we-name-morality seems to have the further properties of agreement about developed latent answers, which we may as well think of in terms of abstract answers; and note that such agreement is unlikely in the absence of similar questions.

• We sometimes look back on our own past moral judgments, and say "Oops!" E.g., "Oops! Maybe in retrospect I shouldn't have killed all those guys when I was a teenager."

So by now it seems easy to extend the analogy, and say: "Well, maybe a cosmic ray hits one of the transistors in the calculator and it says '1959' instead of 1957—that's an error."

But this notion of "error", like the notion of "computation" itself, is more subtle than it appears.

Calculator Q says '1959' and calculator X says '1957'. Who says that calculator Q is wrong, and calculator X is right? Why not say that calculator X is wrong and calculator Q is right? Why not just say, "the results are different"?

"Well," you say, drawing on your store of common sense, "if it was just those two calculators, I wouldn't know for sure which was right. But here I've got nine other calculators that all say '1957', so it certainly seems probable that 1957 is the correct answer."

What's this business about "correct"? Why not just say "different"?

"Because if I have to predict the outcome of any other calculators that compute 19 x 103, or the number of pebbles in a 19 x 103 array, I'll predict 1957—or whatever observable outcome corresponds to the abstract number 1957."

So perhaps 19 x 103 = 1957 only most of the time. Why call the answer 1957 the correct one, rather than the mere fad among calculators, the majority vote?

If I've got a hundred calculators, all of them rather error-prone—say a 10% probability of error—then there is no one calculator I can point to and say, "This is the standard!" I might pick a calculator that would happen, on this occasion, to vote with ten other calculators rather than ninety other calculators. This is why I have to idealize the answer, to talk about this ethereal thing that is not associated with any particular physical process known to me—not even arithmetic done in my own head, which can also be "incorrect".

It is this ethereal process, this idealized question, to which we compare the results of any one particular calculator, and say that the result was "right" or "wrong".

But how can we obtain information about this perfect and un-physical answer, when all that we can ever observe, are merely physical phenomena? Even doing "mental" arithmetic just tells you about the result in your own, merely physical brain.

"Well," you say, "the pragmatic answer is that we can obtain extremely strong evidence by looking at the results of a hundred calculators, even if they are only 90% likely to be correct on any one occasion."

But wait: When do electrons or quarks or magnetic fields ever make an "error"? If no individual particle can be mistaken, how can any collection of particles be mistaken? The concept of an "error", though humans may take it for granted, is hardly something that would be mentioned in a fully reductionist view of the universe.

Really, what happens is that we have a certain model in mind of the calculator—the model that we looked over and said, "This implements 19 * 103"—and then other physical events caused the calculator to depart from this model, so that the final outcome, while physically lawful, did not correlate with that mysterious abstract thing, and the other physical calculators, in the way we had in mind. Given our mistaken beliefs about the physical process of the first calculator, we would look at its output '1959', and make mistaken predictions about the other calculators (which do still hew to the model we have in mind).

So "incorrect" cashes out, naturalistically, as "physically departed from the model that I had of it" or "physically departed from the idealized question that I had in mind". A calculator struck by a cosmic ray, is not 'wrong' in any physical sense, not an unlawful event in the universe; but the outcome is not the answer to the question you had in mind, the question that you believed empirically-falsely the calculator would correspond to.

The calculator's "incorrect" answer, one might say, is an answer to a different question than the one you had in mind—it is an empirical fact about the calculator that it implements a different computation.

• The 'right' act or the 'should' option sometimes seem to depend on the state of the physical world. For example, should you cut the red wire or the green wire to disarm the bomb?

Suppose I show you a long straight line of pebbles, and ask you, "How many pebbles would I have, if I had a rectangular array of six lines like this one?" You start to count, but only get up to 8 when I suddenly blindfold you.

Now you are not completely ignorant of the answer to this question. You know, for example, that the result will be even, and that it will be greater than 48. But you can't answer the question until you know how many pebbles were in the original line.

But mark this about the question: It wasn't a question about anything you could directly see in the world, at that instant. There was not in fact a rectangular array of pebbles, six on a side. You could perhaps lay out an array of such pebbles and count the results—but then there are more complicated computations that we could run on the unknown length of a line of pebbles. For example, we could treat the line length as the start of a Goodstein sequence, and ask whether the sequence halts. To physically play out this sequence would require many more pebbles than exist in the universe. Does it make sense to ask if the Goodstein sequence which starts with the length of this line of pebbles, "would halt"? Does it make sense to talk about the answer, in a case like this?

I'd say yes, personally.

But meditate upon the etherealness of the answer—that we talk about idealized abstract processes that never really happen; that we talk about what would happen if the law of the Goodstein sequence came into effect upon this line of pebbles, even though the law of the Goodstein sequence will never physically come into effect.

It is the same sort of etherealness that accompanies the notion of a proposition that 19 * 103 = 1957 which factors out of any particular physical calculator and is not identified with the result of any particular physical calculator.

Only now that etherealness has been mixed with physical things; we talk about the effect of an ethereal operation on a physical thing. We talk about what would happen if we ran the Goodstein process on the number of pebbles in this line here, which we have not counted—we do not know exactly how many pebbles there are. There is no tiny little XML tag upon the pebbles that says "Goodstein halts", but we still think—or at least I still think—that it makes sense to say of the pebbles that they have the property of their Goodstein sequence terminating.

So computations can be, as it were, idealized abstract dynamics—idealized abstract applications of idealized abstract laws, iterated over an imaginary causal-time that could go on for quite a number of steps (as Goodstein sequences often do).

So when we wonder, "Should I cut the red wire or the green wire?", we are not multiplying or simulating the Goodstein process, in particular. But we are wondering about something that is not physically immanent in the red wires or the green wires themselves; there is no little XML tag on the green wire, saying, "This is the wire that should be cut."

We may not know which wire defuses the bomb, but say, "Whichever wire does in fact defuse the bomb, that is the wire that should be cut."

Still, there are no little XML tags on the wires, and we may not even have any way to look inside the bomb—we may just have to guess, in real life.

So if we try to cash out this notion of a definite wire that should be cut, it's going to come out as...

...some rule that would tell us which wire to cut, if we knew the exact state of the physical world...

...which is to say, some kind of idealized abstract process into which we feed the state of the world as an input, and get back out, "cut the green wire" or "cut the red wire"...

...which is to say, the output of a computation that would take the world as an input.

• And finally I note that from the twin phenomena of moral agreement and moral error, we can construct the notion of moral disagreement.

This adds nothing to our understanding of "computation" as a Word of Power, but it's helpful in putting the pieces together.

Let's say that Bob and Sally are talking about an abstracted idealized dynamic they call "Enamuh".

Bob says "The output of Enamuh is 'Cut the blue wire'," and Sally says "The output of Enamuh is 'Cut the brown wire'."

Now there are several non-exclusive possibilities:

Either Bob or Sally could have committed an error in applying the rules of Enamuh—they could have done the equivalent of mis-multiplying known inputs.

Either Bob or Sally could be mistaken about some empirical state of affairs upon which Enamuh depends—the wiring of the bomb.

Bob and Sally could be talking about different things when they talk about Enamuh, in which case both of them are committing an error when they refer to Enamuh_Bob and Enamuh_Sally by the same name. (However, if Enamuh_Bob and Enamuh_Sally differ in the sixth decimal place in a fashion that doesn't change the output about which wire gets cut, Bob and Sally can quite legitimately gloss the difference.)

Or if Enamuh itself is defined by some other abstracted idealized dynamic, a Meta-Enamuh whose output is Enamuh, then either Bob or Sally could be mistaken about Meta-Enamuh in any of the same ways they could be mistaken about Enamuh. (But in the case of morality, we have an abstracted idealized dynamic that includes a specification of how it, itself, changes. Morality is self-renormalizing—it is not a guess at the product of some different and outside source.)

To sum up:

Morality, like computation, involves latent development of answers;
Morality, like computation, permits expected agreement of unknown latent answers;
Morality, like computation, reasons about abstract results apart from any particular physical implementation;
Morality, like computation, unfolds from bounded initial state into something potentially much larger;
Morality, like computation, can be viewed as an idealized dynamic that would operate on the true state of the physical world—permitting us to speak about idealized answers of which we are physically uncertain;
Morality, like computation, lets us to speak of such un-physical stuff as "error", by comparing a physical outcome to an abstract outcome—presumably in a case where there was previously reason to believe or desire that the physical process was isomorphic to the abstract process, yet this was not actually the case.

And so with all that said, I hope that the word "computation" has come to convey something other than Microsoft Word.

Part of The Metaethics Sequence

Next post: "'Arbitrary'"

Previous post: "Moral Error and Moral Disagreement"

New to LessWrong?

37

Mentioned in

89Unnatural Categories Are Optimized for Deception

71Magical Categories

26Moral Error and Moral Disagreement

25The Bedrock of Morality: Arbitrary?

19"Arbitrary"

Load More (5/9)

New Comment

25 comments, sorted by

oldest

Click to highlight new comments since: Today at 7:30 AM

[-]Caledonian216y20

No individual particle can be mistaken as to its own behavior. No collection of particles can be mistaken as to its own behavior.

Whether those behaviors properly represent the properties we want them to is another matter. That's why we can say the calculator struck by the cosmic ray is malfunctioning, although all of its parts work perfectly according to physics. 'Physics' is not the set of standards we're referring to, when we speak of the device not working properly.

[-]Will_Newsome13y20

Of course, 'how we want the calculator to work' is just a stand-in that represents not a subgoal of our utility function, but a referent to something outside us we perceive as more objective than what we can see of ourselves. A broken calculator is not wrong because the number it spits out isn't the number we were hoping it would spit out when we started calculating our finances, nor because being misled about such a number would endanger our finances further. (That doesn't mean we should turn the universe into a big calculator, of course; unless it's a calculator that knows how to find and calculate the the most objective and elegant calculations, and not just arbitrary addition. Then maybe.)

[-]Richard416y20

Eliezer - that's all well and good, but what in the world do you think determines which computation or 'abstract idealized dynamic' a mortal human is actually referring to? Won't this be radically underdetermined?

You suggest that "Bob and Sally could be talking about different things when they talk about Enamuh". What's the difference between a world where they're talking about different things vs. a world where they are talking about the same thing but one of them is 'miscalculating'? What facts (about their dispositions and such) would determine which of the two explanations holds, on your view?

[-]TGGP416y00

Calculators disagreeing seems much less common than people disagreeing. To me that it is to be expected because people design calculators to answer mathematical questions for them. Humans themselves are only "designed" by natural selection to make more copies of genes.

The earliest calculators go pretty far back, before what we would know call a "computer". A long time ago I scoffed at the notion of a morality calculator. Would it be possible to build something like it without having achieved General AI?

[+]WTF216y-60

[-]Hopefully_Anonymous16y-10

"Someone sees a slave being whipped, and it doesn't occur to them right away that slavery is wrong. But they go home and think about it, and imagine themselves in the slave's place, and finally think, "No.""

I think lines like this epitomize how messy your approach to understanding human morality as a natural phenomenon is. Richard (the pro), what resources do you recommend I look into to find people taking a more rigorous approach to understanding the phenomenon of human morality (as opposed to promoting a certain type uncritically)?

[-]Psy-Kosh16y10

Richard: Which computation? Well... the computation your brain is, under the hood, performing when you're trying to figure out things about "what should I do?"

The full details of the computation may not be explicitly availible to you, but if you're saying "the thing that your brain is processing when you're considering right&wrong/should&shouldn't isn't what you mean by should&shouldn't", then how could you even be said to mean anything by those words?

[-]Infotropism216y10

"To physically play out this sequence would require many more pebbles than exist in the universe. Does it make sense to ask if the Goodstein sequence which starts with the length of this line of pebbles, "would halt"? Does it make sense to talk about the answer, in a case like this?

I'd say yes, personally."

On the other hand you're an infinite set atheist. How do you make a difference between those two cases ? In neither can it be said the process can exist in the physical universe, which is all there is.

Does it makes more sense just because "infinite" really seems too, too big, while the Goodstein sequence merely seems "big" ? None can exist in the physical universe, that is their similar property. Is that property, of physical implementation, and physical observation, not all that matters in the end ?

Same with the concept of a spaceship that'd disappear through the cosmological horizon of an expanding universe, can't have any causal effect anymore, but still exists.

Can you explain, why, how, is it that you feel confident that those processes do in one case still make sense, yet not in the other ? In a technical way.

[-]GBM16y10

Eliezer, this explanation finally puts it all together for me in terms of the "computation". I get it now, I think.

On the other hand, I have a question. Maybe this indicates that I don't truly get it; maybe it indicates that there's something you're not considering. In any case, I would appreciate your explanation, since I feel so close to understanding what you've been saying.

When I multiply 19 and 103, whether in my head, or using a pocket calculator, I get a certain result that I can check: In theory, I can gather a whole bunch of pebbles, lay them out in 103 rows of 19, and then count them individually. I don't have to rely on my calculator - be it internal or electronic.

When I compute morality, though, the only thing I have to examine is my calculator and a bunch of other ones. I would easily recognize that most calculators I come across will give the same answer to a moral question, at least to a limited number of decimal points. But I have no way of knowing whether those calculators are accurate representations of the world - that is, perhaps all of those calculators were created in a way that didn't reflect reality, and added ten to any result calculated.

If 90% of my calculators say 19 times 103 is equal to 1967, how do I determine that they are incorrect, without having the actual pebbles to count?

[-]Richard416y00

HA - "what resources do you recommend I look into to find people taking a more rigorous approach to understanding the phenomenon of human morality"

If you're interested in the empirical phenomenon, I'm the wrong person to ask. (Maybe start with the SEP on moral psychology?) But on a philosophical level I'd recommend Peter Railton for a sophisticated naturalistic metaethic (that I respect a lot while not entirely agreeing with). He has a recent bloggingheads diavlog, but you can't go past his classic article 'Moral Realism' [here if you have jstor access].

[-]Richard416y00

Psy-Kosh - "Well... the computation your brain is, under the hood, performing when you're trying to figure out things about "what should I do?""

That just pushes my question back a step. Don't the physical facts underdetermine what computation ('abstracted idealized dynamic') my brain might be interpreted as performing? It all depends how you abstract and idealize it, after all. Unless, that is, we think there's some brute (irreducible) facts about which are the right idealizations...

[-]jsalvatier16y30

Richard I think the difference is that in a world where one of them is miscalculating, that person can be shown that they are miscalculating and will then calculate correctly. However, in a world where their idealized calculations are actually significantly different, they would simply become enemies.

[-]Hopefully_Anonymous16y00

Richard, Thanks, the SEP article on moral psychology was an enlightening read.

[-]ShardPhoenix16y00

It seems to me that moral reasoning is only a computation in the sense that all human thought processes are computations. In other words, I'm not sure how helpful this is for AI purposes, other than a reminder that such a thing is possible.

I'm not sure it's possible to extricate the complete underlying rules of human morality from all the other elements of human thought. I don't think it's necessarily impossible either, it just seems like we aren't much closer to the solution.

[-]conchis16y00

Don't the physical facts underdetermine what computation ('abstracted idealized dynamic') my brain might be interpreted as performing?

I would think they do in the same sense that the physical facts always underdetermine the computations that the universe is actually performing. That's obviously a problem with trying to implement anything - though how much of a problem depends on how robust your implementation is to having the wrong model: bridges still stand, even though we don't have a perfect model of the universe. But it doesn't strike me as a problem with the theory per se

(I'm not sure whether you were suggesting it was.)

[-]Toby_Ord216y00

Eliezer,

I agree with most of the distinctions and analogies that you have been pointing out, but I still doubt that I agree with your overall position. No-one here can know whether they agree with your position because it is very much underdetermined by your posts. I can have a go at formulating what I see as the strongest objections to your position if you clearly annunciate it in one place. Oddly enough, the philosophy articles that I read tend to be much more technically precise than your posts. I don't mean that your couldn't write more technically precise posts on metaethics, just that I would like you to.

In the same way as scientific theories need to be clear enough to allow concrete prediction and potential falsification, so philosophical theories need to be clear enough that others can use them without any knowledge of their author to make new claims about their subject matter. Many people here may feel that you have made many telling points (which you have), but I doubt that they understand your theory in the sense that they could apply it in wide range of situations where it is applicable. I would love a short post consisting of at most a paragraph of introduction, then a bi-conditional linking a person's judgement about what another person should do in a given situation to some naturalistic facts and then a paragraph or two helping resolve any ambiguities. Then others can actually argue against it and absence of argument could start to provide some evidence in its favour (though of course, surviving the criticisms of a few grad-student philosophers would still not be all that much evidence).

[-]Tyrrell_McAllister216y00

Eliezer, would the following be an accurate synopsis of what you call morality?

Each of us has an action-evaluating program. This should be thought of as a Turing machine encoded in the hardware of our brains. It is a determinate computational dynamic in our minds that evaluates the actions of agents in scenarios. By a scenario, I mean a mental model of a hypothetical or real situation. Now, a scenario that models agents can also model their action-evaluating programs. An evaluation of an action in a scenario is a moral evaluation if, and only if, the same action is given the same value in every scenario that differs from the first one only in that the agent performing the action has a different action-evaluating program.

In other words, moral evaluations are characterized by being invariant under certain kinds of modifications: Namely, modifications that consist only of assigning a different action-evaluating program to the agent performing the action.

Does that capture the distinctive quality of moral evaluations that you've been trying to convey?

A few thoughts:

(1) It seems strange to me to consider moral evaluations, so defined, to be distinct from personal preferences. With this definition, I would say that moral evaluations are a special case of personal preferences. Specifically, they are the preferences that are invariant under a certain kind of modification to the scenario being considered.

I grant that it is valuable to distinguish this particular kind of personal preference. First, I can imagine that you're right when you say that it's valuable if one wants to build an AI. Second, it's logically interesting because this criterion for moral evaluation is a self-referential one, in that it stipulates how the action-evaluating program (doesn't) react to hypothetical changes to itself. Third, by noting this distinctive kind of action-evaluation, you've probably helped to explain why people are so prone to thinking that certain evaluations are universally valid.

Nonetheless, the point remains that your definition amounts to considering moral evaluation to be nothing more than a particular kind of personal preference. I therefore don't think that it does anything to ease the concerns of moral universalists. Some of your posts included very cogent explanations of why moral universalism is incoherent, but I think you would grant that the points that you raised there weren't particularly original. Moral-relativists have been making those points for a long time. I agree that they make moral universalism untenable, but moral universalists have heard them all before.

Your criterion for moral evaluation, on the other hand, is original (to the best of my meager knowledge). But, so far as the debate between moral relativists and universalists is concerned, it begs the question. It takes the reduction of morality to personal preference as given, and proceeds to define which preferences are the moral ones. I therefore don't expect it to change any minds in that debate.

(2) Viewing moral evaluations as just a special kind of personal preference, what reason is there to think that moral evaluations have their own computational machinery underlying them? I'm sure that this is something that you've thought a lot about, so I'm curious to hear you thoughts on this. My first reaction is to think that, sure, we can distinguish moral evaluations from the other outputs of our preference-establishing machinery, but that doesn't mean that special processes were running to produce the moral evaluations.

For example, consider a program that produces the natural numbers by starting with 1, and then producing each successive number by adding 1 to the previously-produced number. After this machine has produced some output, we can look over the tape and observe that some of the numbers produced have the special property of being prime. We might want to distinguish these numbers from the rest of the output for all sorts of good reasons. There is indeed a special, interesting feature of those numbers. But we should not infer that any special computational machinery produced those special numbers. The prime numbers might be special, but, in this case, the dynamics that produced them are the same as those that produced the non-special composite numbers.

Similarly, moral evaluations, as you define them, are distinguishable from other action-evaluations. But what reason is there to think that any special machinery underlies moral evaluations as opposed to other personal preferences?

(3) Since humans manage to differ in so many of their personal preferences, there seems little reason to think that they are nearly universally unanimous with regards to their moral evaluations. That is, I don't see how the distinguishing feature of moral evaluations (a particular kind of invariance) would make them less likely to differ from person-to-person or moment-to-moment within the same person. So, I don't quite understand your strong reluctance to attribute different moral evaluations to different people.

[-]Eliezer Yudkowsky16y40

Toby, I'm not sure that I understand what you want me to do.

Especially as the main reason I don't dabble in mainstream philosophy is that I consider it too vague for AI purposes. For example, in classical causal decision theory, there's abstruse math done with a function p(x||y) (if I recall the notation correctly) that one is never told how to compute - it's taken as a primitive. Judea Pearl could have told them, but nobody seems to have felt the need to develop the theory further, since they already had what looked to them like math: lots of neat symbols. This kind of "precision" does not impress me.

In general, I am skeptical of dressing up ideas in math that don't deserve the status of math; I consider it academic status-seeking, and I try not to lay claim to such status when I don't feel I've earned it. But if you can say specifically where you're looking for precision, I can try to respond.

[-]TGGP416y10

Your definition of morality as computation seems to have very little to do with morality as actually practiced, as noticed by folks like Haidt (who I mentioned before here recently). Even professors of philosophy gussy up conclusions they arrived at via intuition and still admit they arrived at those beliefs because of intuitions rather than arguments. Eliezer's imagined computation seems to have more to do with justification, which is done after we've already made up our minds, than how people actually conclude things. I am very suspicious about a computer being able to emulate a process people don't actually engage in.

And did anybody find it suspicious that pretty much everybody was explaining what made morality_Bob defective (with plenty of different reasons for this hypothetical person) but nobody was providing any "computation"?

[-]Caledonian216y50

TGGP, don't confuse performing computation with being able to make the computation explicit. Everything 'we' do is computed by our brains, but we can't even begin to describe the mathematics we perform constantly.

Saying that morality is a subset of computation is vacuously true. Everything minds do is a subset of computation.

[-]Richard416y00

jsalvati - "I think the difference is that in a world where one of them is miscalculating, that person can be shown that they are miscalculating and will then calculate correctly."

This still won't do, due to path-dependence and such. Suppose Bob could be corrected in any number of ways, and each will cause him to adopt a different conclusion -- and one that he will then persist in holding no matter what other arguments you give him. Which conclusion is the true value for our original morality_Bob? There can presumably be no fact of the matter, on Eliezer's account. And if this sort of underdetermination is very common (which I imagine it is), then there's probably no facts at all about what any of our "moralities" are. There may always be some schedule of information that would bring us to make radically different moral judgments.

Also worrying is the implication that it's impossible to be stubbornly wrong. Once you become impervious to argument in your adoption of inconsistent moral beliefs, well, those contradictions are now apparently part of your true morality, which you're computing just fine.(?)

[-]Toby_Ord216y00

Eliezer,

I didn't mean that most philosophy papers I read have lots of mathematical symbols (they typically don't), and I agree with you that over-formalization can occur sometimes (though it is probably less common in philosophy than under-formalization). What I meant is the practice of clear and concise statements of the main points and attendant qualifications in the kind of structured English that good philosophers use. For example, I gave the following as a guess at what you might be meaning:

When X judges that Y should Z, X is judging that were she fully informed, she would want Y to Z

This allows X to be incorrect in her judgments (if she wouldn't want Y to Z when given full information). It allows for others to try to persuade X that her judgment is incorrect (it preserves a role for moral argument). It reduces 'should' to mere want (which is arguably simpler). It is, however, a conception of should that is judger-dependent: it could be the case that X correctly judges that Y should Z, while W correctly judges that Y should not Z.

The first line was a fairly clear and concise statement of a meta-ethical position (which you said you don't share, and nor do I for that matter). The next few sentences describe some of its nice features as well as a downside. There is very little technical language -- just 'judge', 'fully informed' and 'want'. In the previous comment I gave a sentence or two saying what was meant by 'fully informed' and if challenged I could have described the other terms. Given that you think it is incorrect, could you perhaps fix it, providing a similar short piece of text that describes your view with a couple of terms that can bear the brunt of further questioning and elaboration.

[-]steven16y160

I'm not Eliezer nor am I a pro, but I think I agree with Eliezer's account, and as a first attempt I think it's something like this...

When X judges that Y should Z, X is judging that Z is the solution to the problem W, where W is a rigid designator for the problem structure implicitly defined by the machinery shared by X and Y which they both use to make desirability judgments. (Or at least X is asserting that it's shared.) Due to the nature of W, becoming informed will cause X and Y to get closer to the solution of W, but wanting-it-when-informed is not what makes that solution moral.

[-]Eliezer Yudkowsky16y30

What Steven said.

[-]Toby_Ord216y50

Great! Now I can see several points where I disagree or would like more information.

1) Is X really asserting that Y shares his ultimate moral framework (i.e. that they would converge given time and arguments etc)?

If Y is a psychopath murderer who will simply never accept that he shouldn't kill, can I still judge that Y should refrain from killing? On the current form, to do so would involve asserting that we share a framework, but even people who know this to be false can judge that he shouldn't kill, can't they?

2) I don't know what it means to be the solution to a problem. You say:

'I should Z' means that Z answers the question, "What will save my people? How can we all have more fun? How can we get more control over our own lives? What's the funniest jokes we can tell? ..."

Suppose Z is the act of saying "no". How does this answer the question (or 'solve the problem')? Suppose it leads you to have a bit less fun and others to have a bit more fun and generally has positive effects on some parts of the question and negative on others. How are these integrated? As you phrased it, it is clearly not a unified question and I don't know what makes one act rather than another an answer to a list of questions (when presumably it doesn't satisfy each one in the list). Is there some complex and not consciously known weighting of the terms? I thought you denied that earlier in the series. This part seems very non-algorithmic at the moment.

3) The interpretation says 'implicitly defined by the machinery ... which they both use to make desirability judgments'?

What if there is not such machinery that they both use? I thought only X's machinery counted here as X is the judger.

4) You will have to say more about 'implicitly defined by the machinery ... use[d] to make desirability judgments'. This is really vague. I know you have said more on this, but never in very precise terms, just by analogy.

5) Is the problem W meant to be the endpoint of thought (i.e. the problem that would be arrived at), or is it meant to be the current idea which involves requests for self modification (e.g. 'Save a lot of lives, promote happiness, and factor in whatever things I have not thought of but could be convinced of.') It is not clear from the current statement (or indeed your previous posts), but would be made clear by a solution to (4).

Moderation Log