leplen comments on Open Thread, January 1-15, 2013 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (333)
I would also like to see this discussion. It isn't terribly clear to me why the extinction of the human race and its replacement with some non-human AI is an inherently bad outcome. Why keep around and devote resources to human beings, who at best can be seen as sort of a prototype of true intelligence, since that's not really what they're designed for?
While imagining our extinction at the hands of our robot overlords seems unpleasant, if you imagine a gradual cyborg evolution to a post-human world, that seems scary, but not morally objectionable. Besides the Ship of Theseus, what's the difference?
A long time ago, a different person who also happens to be named “Eliezer Yudkowsky” said that, in the event of a clash between human beings and superintelligent AIs, he would side with the latter. The Yudkowsky we all know rejects this position, though it is not clear to me why.
"Superintelligent AIs" is not one thing, it's a class of quadrillions of different possible things. The old Eliezer was probably thinking of one thing when he referred to superintelligences. When you realize that SAIs are a category of beings with more potential diversity than all species that have ever lived, it's hard to side with them all as a group. You'd have to have poor aesthetics to value them all equally.
Thanks for the clarification. My understanding is that (the current) Eliezer doesn't merely claim that we shouldn't value all superintelligent AIs equally; he makes the much stronger claim that, in a conflict between humans and AIs, we should side with the former regardless of what kind of AI is actually involved in this conflict. This stronger claim seems much harder to defend precisely in light of the fact that the space of possible AIs is so vast. Surely there must be some AIs in this heterogenous group whose survival is preferable to that of creatures like us?
I don't think he makes that claim: all of his arguments on the topic that I've seen mainly refer to the kinds of AIs that seem likely to be built by humans at this time, not hypothetical AIs that could be genuinely better than us in every regard. E.g. here:
That's helpful. I take it, then, that "friendly" AIs could in principle be quite hostile to actual human beings, even to the point of causing the extinction of every person alive. If this is so, I think it's misleading to use the locution 'friendly AI' to designate such artificial agents, and am inclined to believe that many folks who are sympathetic to the goal of creating friendly AI wouldn't be if they knew what was actually meant by that expression.
That doesn't sound quite right either, given Eliezer's unusually strong anti-death preferences. (Nor do I think most other SI folks would endorse it; I wouldn't.)
ETA: Friendly AI was also explicitly defined as "human-benefiting" in e.g. Creating Friendly AI:
Even though Eliezer has declared CFAI as outdated, I don't think that particular bit is.
Not "that doesn't sound quite right", but "that's completely wrong". Friendly AI is defined as "human-benefiting, non-human harming".
I would say that the defining characteristic of Friendly AI, as the term is used on LW, is that it optimizes for human values.
On this view, if it turns out that human values prefer that humans be harmed, then Friendly AI harms humans, and we ought to prefer that it do so.
That's not the proper definition... Friendly AI, according to current guesses/theory, would be an extrapolation of human values. The extrapolation part is everything. I encourage you to check out that linked document, the system it defines (though just a rough sketch) is what is usually meant by "Friendly AI" around here. No one is arguing that "human values" = "what we absolutely must pursue". I'm not sure that creating Friendly AI, a machine that helps us, should be considered as passing a moral judgment on mankind or the world. At least, it seems like a really informal way of looking at it, and probably unhelpful as it's imbued with so much moral valence.
Let's backtrack a bit.
I said:
Kaj replied:
I then said:
But now you reply:
It would clearly be wishful thinking to assume that the countless forms of AIs that "could be genuinely better than us in every regard" would all act in friendly ways towards humans, given that acting in other ways could potentially realize other goals that this superior beings might have.
As I understand Eliezer's current position, it is that the right thing to optimize the universe for is the set of things humans collectively value (aka "CEV(humanity)").
On this account the space of all possible optimizing systems (aka "AIs" or "AGIs") can be divided into two sets: those which optimize for CEV(humanity) (aka "Friendly AIs"), and those which optimize for something else (aka "Unfriendly AIs").
And Friendly AIs are the right thing to "side with", as you put it here, because CEV(humanity) is on this account the right thing to optimize for.
On this account, "why side with Friendly AI over Unfriendly?" is roughly equivalent to asking "why do the right thing?"
The survival of creatures like us is entirely beside the point. Maybe CEV(humanity) includes the survival of creatures like us and maybe it doesn't.
Now, you might ask, why is CEV(humanity) the right thing to optimize the universe for, as opposed to something else? To which I think Eliezer's reply is that this is simply what it means to be right; things are right insofar as they correspond to what humans collectively value.
Some people (myself among them) find this an unconvincing argument. That said, I don't think anyone has made a convincing argument that some specific other thing is better to optimize for, either.
No. The argument is more like that there's no source of complex value in the world besides humans, and writing complex values line by line would take thousands of years, so we are forced to use some combination and/or extrapolation of human values, whether we want to or not.
Hm.
If you have citations for EY articulating the idea that writing superior nonhuman values would take too long to do, rather than that it's fundamentally incoherent, I'd be interested. This would completely change my understanding of the whole Metaethics Sequence.
Whole brain emulation would basically be "copying" human values in a machine, and would demonstrate that "writing" human values is possible. You could then edit a couple morally relevant bits, and you'd be demonstrating that you could "create" a human-like but slightly edited morality. Evaluating whether it is "superior" by some metric would be a whole additional exercise, though.
I don't think the metaethics sequence implies that writing down values is impossible, just that human values are very complex and messy.
Sure, if we drop the idea of "superior," I agree completely that it's possible (in principle) to write a set of values, and that the metaethics sequence does not imply otherwise.
And, also, it implies -- well, it asserts -- that human values are very complex and messy, as you say.
IIRC, it also asserts that human values are right. Which is why I think that on EY's view, evaluating whether the "edited morality" you describe here is superior to human values is not just an additional exercise, but an unnecessary (and perhaps incoherent) one. On his view, I think we can know a priori that it isn't.
Actually, now that I think about it more... when you say "there's no source of complex value in the world besides humans", do you mean to suggest that aliens with equally complex incompatible values simply can't exist, or that if they did exist EY's conclusions would change in some way to account for them?
I believe that EY definitively rejected the idea of there being an objective morality back in 2003 or thereabouts. Unless I am forgetting something from the metaethics sequence.
The whole point of CEV is to create a "superior" morality, though I think that too value-loaded of a word to use; the better word is "extrapolated". The whole idea of Friendly AI is to create a moral agent that continues to progress. So I'm not sure why you're claiming that EY is claiming that the notion of moral self-evaluation in AI is unnecessary. Isn't comparing possible, "better" moralities to the current morality essential to the definition of "moral progress" and therefore indispensable to building a Friendly AI?
To respond to your last statement, no to both. Of course aliens with equally complex incompatible values can exist, and I'm sure they do in some faraway place. Those aliens don't live here, though, so I'm not sure why we'd want to build a Friendly AI for their values rather than our own. The idea of building a Friendly AI is to ensure some kind of "metamoral continuity" through the intelligence explosion.
Not clear why? Because he likes people and doesn't want everyone he knows (including himself), everyone he doesn't know and any potential descendants of either to die? Doesn't that sound like a default position? Most people don't want themselves to go extinct.
Yeah, that's an interesting question. I'll offer a conjecture.
From my understanding, one of the fundamental assumptions of FAI is that there is somehow a stable moral attractor for every AI that is in the local neighborhood of its original goals, or perhaps only that this attractor is possible. No matter how intelligent the machine gets, no matter how many times it improves itself, it will consciously attempt to stay in the local neighborhood of this point (ala the Gandhi murder pill analogy).
If an AI is designed with a moral attractor that is essentially random, and thus probably totally antithetical to human values (such as paperclip manufacture), then it's hard to be on the side of the machines. Giving control of the world over to machine super-intelligences sounds like an okay idea if you imagine them growing, doing science, populating the universe, etc., but if they just tear apart the world to make paperclips in an exceptionally clever manner, then perhaps it isn't such a good idea. This is to say, if the machines use their intelligence to derive their morality, then siding with the machines is all well and good, but if their morality is programmed from the start, and the machines are merely exceptionally skilled morality executors, then there's no good reason to be on the sides of the machines just because they execute their random morality much more effectively.
I am fairly hesitant to agree with the idea of the moral attractor, along with the goals of FAI in general. I understand the idea only through analogy, which is to say not at all, and I have little idea what would dictate the peaks and valleys of a moral landscape, or even the coordinates really. It also isn't clear to me that a machine of such high intelligence would be incapable of forming new value systems, and perhaps discarding its preference for paper clips if there was no more paper to clip together.
While I'm exploring a very wide hypothesis space here about a person I know essentially nothing about, this sort of reasoning is at least consistent with what appears to be the thinking that undergirds work on FAI.
It also raises a very interesting question, which is perhaps more fundamental, and that is whether moral preferences are a function of intelligence or not. If so, the beings far more intelligent than us would presumably be more moral, and have a reasonable claim for our moral support. If not, then they're simply more clever and more powerful, and neither is a particularly good reason to welcome our robot overlords.
An idea I just had, which I'm sure others have considered, but I will merely note here, is that a recursively self-modifying AI would be subject to Darwinian evolution, with lines of code analogous to individual genes, and indeed if there is a stable attractor for such an AI, it seems likely to be about as moral as evolution. which is not particularly encouraging.