Wei_Dai comments on Convergence Theories of Meta-Ethics - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (87)
I think Omohundro's Basic AI Drives is a theory about AI behavior, not a meta-ethical theory (i.e., he's not talking about what's right, but what AIs will actually do). Also, you might want to footnote that Roko has changed his mind about UIV.
Changed his mind that it deals with ethics? Changed his mind about the behavioral prediction? In the sense that he no longer believes that it could happen, or in that he no longer believes that it must happen? A link to the 'retraction' would be appreciated.
Thanks for your response to my premature and partial posting. I look forward to hearing your response to the now-completed article.
I came back to give a bit more feedback, and noticed that Eliezer already made similar points. But I'll say my version anyway, since I already composed it in my head. :)
To me, this post as a whole is about AI behavior and economics, which are important subjects on their own, but not meta-ethics. Meta-ethics asks the question, "What is the nature of morality?" Why is that question interesting or important? One reason is, if I'm to design "my" AI, regardless of whether it will FOOM and take over the whole universe, or will have to fight with other AIs, or will peacefully share control of the universe with other AIs through bargaining, I still have to decide what values to give it initially, because those values will partly determine the outcome of the universe. (If it doesn't, I might as well not build "my" AI at all.) And it doesn't help to say "give it what values you want" because I don't know what I want.
I think what I want may have something to do with morality. Perhaps I'm wrong or just confused about that, but I doubt you're going to convince me I'm wrong, or resolve that confusion, by refusing to talk about morality and only talking about what AIs will do.
I have to apologize. Apparently my writing was extremely unclear. I wasn't refusing to talk about morality. The whole posting was an exploration of some of the properties of the relation "A is at least as good as B" when A and B are normative ethical systems.
Admittedly, I did not spend much time actually making ethical judgments, I was operating at the meta level.
But the whole point of my posting was that, if there is convergence (in the second sense) then those initial values may make very little difference in the outcome of the universe - that is, they may be important initially, but in the longer term the ethical system that is converged upon depends less on the seed ethics than on issues of how AIs depend upon each other, how they reproduce, etc.
I'm very sorry that you missed this - the main thrust of the posting. If I had written more clearly, your response might have been a more productive disagreement about substance, rather than a complaint about the title.
But even if "make very little difference" is true, it's little in a relative sense, in that my initial utility function might just end up having just a billionth percent weight in the final merged AI. But in the absolute sense, the difference could still be huge. For an analogy, suppose it's certain that our civilization will never expand beyond the solar system, which will blow up in a few billion years no matter what. Then our values similarly make very little difference in the outcome of the universe in a relative sense but may still make a huge difference in an absolute sense (e.g. if we create a FOOMing singleton that just takes over the solar system).
Also, if I can figure out what I want, and the answer applies to and convinces many others, that could also make a big difference even in a relative sense.
The conjecture is that it is true in an absolute sense. It would have made no sense at all for me to even mention it if I had meant it in the relative sense that you set up here as a straw man and then knock down.
There is something odd going on here. Three very intelligent people are interpreting what I write quite differently than the way I intend it. Probably it is because I generated confusion by misusing words that have a fixed meaning here. And, in this case, it may be because you were thinking of our "fragility" conversation rather than the main posting. But, whatever the reason, I'm finding this very frustrating.
I guess I took your conjecture to be the "relative" one because whether or not it is true perhaps doesn't depend on details of one's utility function, and we, or at least I, was talking about whether the question "what do I want?" is an important one. I'm not sure how you hope to show the "absolute" version in the same way.
Well, Omohundro showed that a certain collection of instrumental values tend to arise independently of the 'seeded' intrinsic values. In fact, decision making tends to be dominated by consideration of these 'convergent' instrumental values, rather than the human-inserted seed values.
Next, consider that those human values themselves originated as heuristic approximations of instrumental values contributing to the intrinsic value of interest to our optimization process - natural selection. The fact that we ended up with the particular heuristics that we did is not due to the fact that the intrinsic value for that process was reproductive success - every species in the biosphere evolved under the guidance of that value. The reason why humans ended up with values like curiosity, reciprocity, and toleration has to do with the environment in which we evolved.
So, my hope is that we can show that AIs will converge to human-like instrumental/heuristic values if they do their self-updating in a human-like evolutionary environment. Regardless of the details of their seeds.
That is the vision, anyways.
I notice that Robin Hanson takes a position similar to yours, in that he thinks things will turn out ok from our perspective if uploads/AIs evolve in an environment defined by certain rules (in his case property laws and such, rather than sexual reproduction).
But I think he also thinks that we do not actually have a choice between such evolution and a FOOMing singleton (i.e. FOOMing singleton is nearly impossible to achieve), whereas you think we might have a choice or at least you're not taking a position on that. Correct me if I'm wrong here.
Anyway, suppose you and Robin are right and we do have some leverage over the environment that future AIs will evolve in, and can use that leverage to predictably influence the eventual outcome. I contend we still have to figure out what we want, so that we know how to apply that leverage. Presumably we can't possibly make the AI evolutionary environment exactly like the human one, but we might have a choice over a range of environments, some more human-like than others. But it's not necessarily true that the most human-like environment leads to the best outcome. (Nor is it even clear what it means for one environment to be more human-like than another.) So, among the possible outcomes we can aim for, we'll still have to decide which ones are better than others, and to do that, we need to know what we want, which involves, at least in part, either figuring out morality is, or showing that it's meaningless or otherwise unrelated to what we want.
Do you disagree on this point?
I tend toward FOOM skepticism, but I don't think it is "nearly impossible". Define a FOOM as a scenario leading in at most 10 years from the first human-level AI to a singleton which has taken effective control over the world's economy. I rate the probability of a FOOM at 40% assuming that almost all AI researchers want a FOOM and at 5% assuming that almost all AI researchers want to prevent a FOOM. I'm under the impression that currently a majority of singularitarians want a FOOM, but I hope that that ratio will fall as the dangers of a FOOMing singleton become more widely known.
No, I agree. Agree enthusiastically. Though I might change the wording just a bit. Instead of "we still have to figure out what we want", I might have written "we still have to negotiate what we want".
My turn now. Do you disagree with this shift of emphasis from the intellectual to the political?
So to summarize, your conclusion seems to be that we should build an arbitrary-goals AI as soon as possible.
Edit: Wrong, corrected here.
Huh? What exactly do you think you are summarizing? If you want to produce a cartoon version of my opinions on this thread, try "We should do all we can to avoid the FOOMing singleton scenario, instead trying to create a society of reproducing AIs, interlocked with each other and with humanity by a network of dependencies. If we do, the details of the initial goal systems may matter less than they would with a singleton."
I see, so "if there is convergence" is not a point of theoretical uncertainty, but something that depends on the way the AIs are built. Makes sense (as a position, not something I agree with).
Well, it is both. Convergence in the sense of "outcome is independent of the starting point" has not been proved for any AI/updating architecture. Also, I strongly suspect that the detailed outcome will depend quite a bit on the way AIs interact and produce successors/self-updates, even if the fact of convergence does not.
That reminds me of:
"An AGI raised in a box could become dangerously solipsistic, probably better to raise AGIs embedded in the social network..."
Goertzel's comment doesn't even make sense to me. Why is he placing 'in a box' in contraposition to 'embedded in the social network'. The two issues are orthogonal. AIs can be social or singleton - either in a box or in the real world. ETA: Well, if you mean the human social network, then I suppose a boxed AI cannot participate. Though I suppose we could let some simulated humans into the box to keep the AI company.
Besides, I've never really considered solipsists to be any more dangerous than anyone else.
"Now I will destroy the whole world - What a Bokononist says before committing suicide."
We don't have any half-decent simulated humans, though.
I'm not insisting on that suggestion, but I'm curious why you want to single that one out as something to object to.
I probably agree with Eliezer on the importance of preserving our human values near-exactly (though I might disagree with him about what those values are). But I don't see how preserving all of our values would be easier in the case of a FOOMing singleton than in the scenario I promote here - in which we have a collectively slow-FOOMing society of roughly-balanced-in-power AIs and (in the early stages) unenhanced humans. In fact, I think that preserving human values will be easier if we have a large number of independent advocates of those values - each of them possibly characterizing those values in slightly different ways.
Sorry, I deleted my comment because I wanted to think it over a bit more, whether the "value is fragile" criticism applies to your idea. I think it does after all.
Suppose there are two critical values, happiness and boredom. We want to be happy, but not by doing the same thing over and over again. So something like: U(unhappy boring future)=0, U(happy boring future)=10, U(unhappy non-boring future)=0, U(happy non-boring future)=100.
Now suppose Alice and Bob each creates an AI. Alice manages to preserve happiness, and Bob manages to preserve boredom. What happens when their AIs merge? The merged utility function would be, assuming they have equal bargaining power: U(unhappy boring future)=0, U(happy boring future)=50, U(unhappy non-boring future)=50, U(happy non-boring future)=100.
Do you agree that's a problem?
Let me offer another example showing the virtues of my 'bargaining' approach to value robustness.
Suppose the 'true' human values require all four of (a)happiness, (b)excitement, (c)social interaction, and (d)'meaningfulness' (whatever that is). But of five seed AIs, only one of them got this aspect of human values exactly right: (abcd) = 100, all other possibilities = 0. The other four all leave out one of the requirements - for example: (acd) = 100, any of a,c,or d missing = 0, b is irrelevant.
If these 5 AIs strike a bargain, they will assign: (abcd) = 100, any three out of four = 20, anything else = 0. Three out of four of the essential values scores only 20 because four out of five AIs consider that situation unacceptable.
So I tend to think that the bargaining dynamic tends to "robustly preserve fragility of values", if that is not too much of an oxymoron.
That was unfortunate, because the Value is Fragile issue is important in this discussion regardless of whether it is more of an issue for CEV or my suggestion.
Well, that merged utility function is certainly less than ideal. Presumably we would prefer that (unhappy non-boring) and (happy boring) had been assigned utilities of zero, like (unhappy boring). However, I will point out that if the difference between an acceptable future and a horrible one is only 100 utils, then 50 utils penalty also ought to be enough to prevent those half-horrible futures. Furthermore, a Nash bargain is characterized by both a composite utility function and a fairness constraint. (That is, a collective behaving in conformance with a Nash bargain is not precisely rational. It might split its charitable giving between two charities, for example.) That fairness constraint provides a second incentive driving the collective away from those mixed futures.
However, when presenting an example intended to point out the flaws in one proposal, it is usually a good idea to see how the other proposals do on that example. In this case, it seems that the CEV version of this example might be a seed AI which is created by Alice OR Bob. It is either boring or unhappy, but not both, with a coin flip deciding which.
It seems probable. Multiple humans seem pretty likely to be preserved due to their historical value - if for no other reason.
I noticed that you found an archived copy of Roko's description of UIV. I believe Roko originally thought that his theory implied that we didn't have to worry too much about the terminal values of the AIs we create, that things will turn out OK due to UIV. Unfortunately he keeps deleting his old writings, so I'm going on memory. I'm not sure exactly how he changed his mind, but I think he now believes we do have to worry about the terminal values.