Eliezer_Yudkowsky comments on Should I believe what the SIAI claims? - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (600)
Like what? Why he should believe in exponential growth? When by "exponential" he actually means "fast" and no one at SIAI actually advocates for exponentials, those being a strictly Kurzweilian obsession and not even very dangerous by our standards? When he picks MWI, of all things, to accuse us of overconfidence (not "I didn't understand that" but "I know something you don't about how to integrate the evidence on MWI, clearly you folks are overconfident")? When there's lots of little things scattered through the post like that ("I'm engaging in pluralistic ignorance based on Charles Stross's nonreaction") it doesn't make me want to plunge into engaging the many different little "substantive" parts, get back more replies along the same line, and recapitulate half of Less Wrong in the process. The first thing I need to know is whether XiXiDu did the reading and the reading failed, or did he not do the reading? If he didn't do the reading, then my answer is simply, "If you haven't done enough reading to notice that Stross isn't in our league, then of course you don't trust SIAI". That looks to me like the real issue. For substantive arguments, pick a single point and point out where the existing argument fails on it - don't throw a huge handful of small "huh?"s at me.
Castles in the air. Your claims are based on long chains of reasoning that you do not write down in a formal style. Is the probability of correctness of each link in that chain of reasoning so close to 1, that their product is also close to 1?
I can think of a couple of ways you could respond:
Yes, you are that confident in your reasoning. In that case you could explain why XiXiDu should be similarly confident, or why it's not of interest to you whether he is similarly confident.
It's not a chain of reasoning, it's a web of reasoning, and robust against certain arguments being off. If that's the case, then we lay readers might benefit if you would make more specific and relevant references to your writings depending on context, instead of encouraging people to read the whole thing before bringing criticisms.
Most of the long arguments are concerned with refuting fallacies and defeating counterarguments, which flawed reasoning will always be able to supply in infinite quantity. The key predictions, when you look at them, generally turn out to be antipredictions, and the long arguments just defeat the flawed priors that concentrate probability into anthropomorphic areas. The positive arguments are simple, only defeating complicated counterarguments is complicated.
"Fast AI" is simply "Most possible artificial minds are unlikely to run at human speed, the slow ones that never speed up will drop out of consideration, and the fast ones are what we're worried about."
"UnFriendly AI" is simply "Most possible artificial minds are unFriendly, most intuitive methods you can think of for constructing one run into flaws in your intuitions and fail."
MWI is simply "Schrodinger's equation is the simplest fit to the evidence"; there are people who think that you should do something with this equation other than taking it at face value, like arguing that gravity can't be real and so needs to be interpreted differently, and the long arguments are just there to defeat them.
The only argument I can think of that actually approaches complication is about recursive self-improvement, and even there you can say "we've got a complex web of recursive effects and they're unlikely to turn out exactly exponential with a human-sized exponent", the long arguments being devoted mainly to defeating the likes of Robin Hanson's argument for why it should be exponential with an exponent that smoothly couples to the global economy.
This should be revamped into a document introducing the sequences.
One problem I have with your argument here is that you appear to be saying that if XiXiDu doesn't agree with you, he must be stupid (the stuff about low g etc.). Do you think Robin Hanson is stupid too, since he wasn't convinced?
This is quite helpful, and suggests that what I wanted is not a lay-reader summary, but an executive summary.
I brought this up elsewhere in this thread, but the fact that quantum mechanics and gravity are not reconciled suggests that even Schrodinger's equation does not fit the evidence. The "low-energy" disclaimer one has to add is very weird, maybe weirder than any counterintuitive consequences of quantum mechanics.
It's not the Schrödinger equation alone that gives rise to decoherence and thus many-worlds. (Read Good and Real for another toy model, the "quantish" system.) The EPR experiment and Bell's inequality can be made to work on macroscopic scales, so we know that whatever mathematical object the universe will turn out to be, it's not going to go un-quantum on us again: it has the same relevant behavior as the Schrödinger equation, and accordingly MWI will be the best interpretation there as well.
Speaking of executive summaries, will you offer one for your metaethics?
"There is no intangible stuff of goodness that you can divorce from life and love and happiness in order to ask why things like that are good. They are simply what you are talking about in the first place when you talk about goodness."
And then the long arguments are about why your brain makes you think anything different.
This is less startling than your more scientific pronouncements. Are there any atheists reading this that find this (or at first found this) very counterintuitive or objectionable?
I would go further, and had the impression from somewhere that you did not go that far. Is that accurate?
I'm a cognitivist. Sentences about goodness have truth values after you translate them into being about life and happiness etc. As a general strategy, I make the queerness go away, rather than taking the queerness as a property of a thing and using it to deduce that thing does not exist; it's a confusion to resolve, not an existence to argue over.
To be clear, if sentence X about goodness is translated into sentence Y about life and happiness etc., does sentence Y contain the word "good"?
Edit: What's left of religion after you make the queerness go away? Why does there seem to be more left of morality?
No, nothing, and because while religion does contain some confusion, after you eliminate the confusion you are left with claims that are coherent but false.
I can do that:
Morality is a specific set of values (Or, more precisely, a specific algorithm/dynamic for judging values). Humans happen to be (for various reasons) the sort of beings that value morality as opposed to valuing, say, maximizing paperclip production. It is indeed objectively better (by which we really mean "more moral"/"the sort of thing we should do") to be moral than to be paperclipish. And indeed we should be moral, where by "should" we mean, "more moral".
(And moral, when we actually cash out what we actually mean by it seems to translate to a complicated blob of values like happiness, love, creativity, novelty, self determination, fairness, life (as in protecting theirof), etc...)
It may appear that paperclip beings and moral beings disagree about something, but not really. The paperclippers would, once they've analyzed what humans actually mean by "moral", would agree "yep, humans are more moral than us. But who cares about this morality stuff, it doesn't maximize paperclips!"
Of course, screw the desires of the paperclippers, after all, they're not actually moral. We really are objectively better (once we think carefully by what we mean by "better") than them.
(note, "does something or does something not actually do a good job of fulfilling a certain value?" is an objective question. ie, "does a particular action tend to increase the expected number of paperclips?" (on the paperclipper side) or, on our side, stuff like "does a particular action tend to save more lives, increase happiness, increase fairness, add novelty..." etc etc etc is an objective question in that we can extract specific meaning from that question and can objectively (in a way the paperclippers would agree with) judge that. It simply happens to be that we're the sorts of beings that actually care about the answer to that (as we should be), while the screwy hypothetical paperclippers are immoral and only care about paperclips.
How's that, that make sense? Or, to summarize the summary, "Morality is objective, and we humans happen to be the sorts of beings that value morality, as opposed to valuing something else instead"
Is morality actually:
If it's 1, can we say something interesting and non-trivial about the algorithm, besides the fact that it's an algorithm? In other words, everything can be viewed as an algorithm, but what's the point of viewing morality as an algorithm?
If it's 2, why do we think that two people on opposite sides of the Earth are referring to the same complicated blob of values when they say "morality"? I know the argument about the psychological unity of humankind (not enough time for significant genetic divergence), but what about cultural/memetic evolution?
I'm guessing the answer to my first question is something like, morality is an algorithm whose current "state" is a complicated blob of values like happiness, love, ... so both of my other questions ought to apply.
Wei_Dai:
You don't even have to do any cross-cultural comparisons to make such an argument. Considering the insights from modern behavioral genetics, individual differences within any single culture will suffice.
There is no reason to be at all tentative about this. There's tons of cog sci data about what people mean when they talk about morality. It varies hugely (but predictably) across cultures.
Why are you using algorithm/dynamic here instead of function or partial function? (On what space, I will ignore that issue, just as you have...) Is it supposed to be stateful? I'm not even clear what that would mean. Or is function what you mean by #2? I'm not even really clear on how these differ.
You might have gotten confused because I quoted Psy-Kosh's phrase "specific algorithm/dynamic for judging values" whereas Eliezer's original idea I think was more like an algorithm for changing one's values in response to moral arguments. Here are Eliezer's own words:
Others have pointed out that this definition is actually quite unlikely to be coherent: people would be likely to be ultimately persuaded by different moral arguments and justifications if they had different experiences and heard arguments in different orders etc.
Yes, see here for an argument to that effect by Marcello and subsequent discussion about it between Eliezer and myself.
I think the metaethics sequence is probably the weakest of Eliezer's sequences on LW. I wonder if he agrees with that, and if so, what he plans to do about this subject for his rationality book.
This is currently at +1. Is that from Yudkowsky?
(Edit: +2 after I vote it up.)
This makes sense in that it is coherent, but it is not obvious to me what arguments would be marshaled in its favor. (Yudkowsky's short formulations do point in the direction of their justifications.) Moreover, the very first line, "morality is a specific set of values," and even its parenthetical expansion (algorithm for judging values), seems utterly preposterous to me. The controversies between human beings about which specific sets of values are moral, at every scale large and small, are legendary beyond cliche.
It is a common thesis here that most humans would ultimately have the same moral judgments if they were in full agreement about all factual questions and were better at reasoning. In other words, human brains have a common moral architecture, and disagreements are at the level of instrumental, rather than terminal, values and result from mistaken factual beliefs and reasoning errors.
You may or may not find that convincing (you'll get to the arguments regarding that if you're reading the sequences), but assuming that is true, then "morality is a specific set of values" is correct, though vague: more precisely, it is a very complicated set of terminal values, which, in this world, happens to be embedded solely in a species of minds who are not naturally very good at rationality, leading to massive disagreement about instrumental values (though most people do not notice that it's about instrumental values).
It is? That's a worry. Consider this a +1 for "That thesis is totally false and only serves signalling purposes!"
I... think it is. Maybe I've gotten something terribly wrong, but I got the impression that this is one of the points of the complexity of value and metaethics sequences, and I seem to recall that it's the basis for expecting humanity's extrapolated volition to actually cohere.
This whole area isn't covered all that well (as Wei noted). I assumed that CEV would rely on solving an implicit cooperation problem between conflicting moral systems. It doesn't appear at all unlikely to me that some people are intrinsically selfish to some degree and their extrapolated volitions would be quite different.
Note that I'm not denying that some people present (or usually just assume) the thesis you present. I'm just glad that there are usually others who argue against it!
Now this is a startling claim.
Be more specific!
Maybe it's true if you also specify "if they were fully capable of modifying their own moral intuitions." I have an intuition (an unexamined belief? a hope? a sci-fi trope?) that humanity as a whole will continue to evolve morally and roughly converge on a morality that resembles current first-world liberal values more than, say, Old Testament values. That is, it would converge, in the limit of global prosperity and peace and dialogue, and assuming no singularity occurs and the average lifespan stays constant. You can call this naive if you want to; I don't know whether it's true. It's what I imagine Eliezer means when he talks about "humanity growing up together".
This growing-up process currently involves raising children, which can be viewed as a crude way of rewriting your personality from scratch, and excising vestiges of values you no longer endorse. It's been an integral part of every culture's moral evolution, and something like it needs to be part of CEV if it's going to actually converge.
That's not plausible. That would be some sort of objective morality, and there is no such thing. Humans have brains, and brains are complicated. You can't have them imply exactly the same preference.
Now, the non-crazy version of what you suggest is that preferences of most people are roughly similar, that they won't differ substantially in major aspects. But when you focus on detail, everyone is bound to want their own thing.
Psy-Kosh:
It makes sense in its own terms, but it leaves the unpleasant implication that morality differs greatly between humans, at both individual and group level -- and if this leads to a conflict, asking who is right is meaningless (except insofar as everyone can reach an answer that's valid only for himself, in terms of his own morality).
So if I live in the same society with people whose morality differs from mine, and the good-fences-make-good-neighbors solution is not an option, as it often isn't, then who gets to decide whose morality gets imposed on the other side? As far as I see, the position espoused in the above comment leaves no other answer than "might is right." (Where "might" also includes more subtle ways of exercising power than sheer physical coercion, of course.)
*blinks* how did I imply that morality varies? I thought (was trying to imply) that morality is an absolute standard and that humans simply happen to be the sort of beings that care about the particular standard we call "morality". (Well, with various caveats like not being sufficiently reflective to be able to fully explicitly state our "morality algorithm", nor do we fully know all its consequences)
However, when humans and paperclippers interact, well, there will probably be some sort of fight if one doesn't end up with some sort PD cooperation or whatever. It's not that paperclippers and humans disagree on anything, it's simply, well, they value paperclips a whole lot more than lives. We're sort of stuck with having to act in a way to prevent the hypothetical them from acting on that.
(of course, the notion that most humans seem to have the same underlying core "morality algorithm", just disagreeing on the implications or such, is something to discuss, but that gets us out of executive summary territory, no?)
Psy-Kosh:
I would say that it's a crucial assumption, which should be emphasized clearly even in the briefest summary of this viewpoint. It is certainly not obvious, to say the least. (And, for full disclosure, I don't believe that it's a sufficiently close approximation of reality to avoid the problem I emphasized above.)
Hrm, fair enough. I thought I'd effectively implied it, but apparently not sufficiently.
(Incidentally... you don't think it's a close approximation to reality? Most humans seem to value (to various extents) happiness, love, (at least some) lives, etc... right?)
Different people (and cultures) seem to put very different weights on these things.
Here's an example:
You're a government minister who has to decide who to hire to do a specific task. There are two applicants. One is your brother, who is marginally competent at the task. The other is a stranger with better qualifications who will probably be much better at the task.
The answer is "obvious."
In some places, "obviously" you hire your brother. What kind of heartless bastard won't help out his own brother by giving him a job?
In others, "obviously" you should hire the stranger. What kind of corrupt scoundrel abuses his position by hiring his good-for-nothing brother instead of the obviously superior candidate?
Your claims are only anti-predictions relative to science-fiction notions of robots as metal men.
Most possible artificial minds are neither Friendly nor unFriendly (unless you adopt such a stringent definition of mind that artificial minds are not going to exist in my lifetime or yours).
Fast AI (along with most of the other wild claims about what future technology will do, really) falls afoul of the general version of Amdahl's law. (On which topic, did you ever update your world model when you found out you were mistaken about the role of computers in chip design?)
About MWI, I agree with you completely, though I am more hesitant to berate early quantum physicists for not having found it obvious. For a possible analogy: what do you think of my resolution of the Anthropic Trilemma?
Okay, I can see how XiXiDu's post might come across that way. I think I can clarify what I think that XiXiDu is trying to get at by asking some better questions of my own.
"Near"? Where'd we say that? What's "near"? XiXiDu thinks we're Kurzweil?
What kind of evidence would you want aside from a demonstrated Singularity?
Grey goo? Huh? What's that got to do with us? Read Nanosystems by Eric Drexler or Freitas on "global ecophagy". XiXiDu thinks we're Foresight?
If this business about "evidence" isn't a demand for particular proof, then what are you looking for besides not-further-confirmed straight-line extrapolations from inductive generalizations supported by evidence?
You've claimed that in your blogging heads divlog with Scott Aaronson that you think that it's pretty obvious that there will be an AGI within the next century. As far as I know you have not offered a detailed description of the reasoning that led you to this conclusion that can be checked by others.
I see this as significant for the reasons given in my comment here.
I don't know what the situation is with SIAI's position on grey goo - I've heard people say the SIAI staff believe in nanotechnology having capabilities out of line with the beliefs of the scientific community, but they may have been misinformed. So let's forget about about questions 3 and 4.
Questions 1, 2, 5 and 6 remain.
You've shifted the question from "is SIAI on balance worth donating to" to "should I believe everything Eliezer has ever said".
The point is that grey goo is not relevant to SIAI's mission (apart from being yet another background existential risk that FAI can dissolve). "Scientific community" doesn't normally professionally study (far) future technological capabilities.
My whole point about grey goo has been, as stated, that a possible superhuman AI could use it to do really bad things. That is, I do not see how an encapsulated AI, even a superhuman AI, could pose the stated risks without the use of advanced nanotechnology. Is it going to use nukes, like Skynet? Another question related to the SIAI, regarding advanced nanotechnology, is that if without advanced nanotechnology superhuman AI is at all possible.
I'm shocked how you people misintepreted my intentions there.
Grey goo is only a potential danger in its own right because it's a way dumb machinery can grow in destructive power (you don't need to assume AI controlling it for it to be dangerous, at least so goes the story). AGI is not dumb, so it can use something more fitting to precise control than grey goo (and correspondingly more destructive and feasible).
The grey goo example was named to exemplify the speed and sophistication of nanotechnology that would have to be around to either allow an AI to be build in the first place or be of considerable danger.
I consider your comment an expression of personal disgust. No way you could possible misinterpret my original point and subsequent explanation to this extent.
As katydee pointed out, if for some strange reason grey goo is what AI would want, AI will invent grey goo. If you used "grey goo" to refer to the rough level of technological development necessary to produce grey goo, then my comments missed that point.
Illusion of transparency. Since the general point about nanotech seems equally wrong to me, I couldn't distinguish between the error of making it and making a similarly wrong point about the relevance of grey goo in particular. In general, I don't plot, so take my words literally. If I don't like something, I just say so, or keep silent.
If it seems equally wrong, why haven't you pointed me to some further reasoning on the topic regarding the feasibility of AGI without advanced (grey goo level) nanotechnology? Why haven't you argued about the dangers of AGI which is unable to make use of advanced nanotechnology? I was inquiring about these issues in my original post and not trying to argue against the scenarios in question.
Yes, I've seen the comment regarding the possible invention of advanced nanotechnology by AGI. If AGI needs something that isn't there it will just pull it out of its hat. Well, I have my doubts that even a superhuman AGI can steer the development of advanced nanotechnology so that it can gain control of it. Sure, it might solve the problems associated with it and send the solutions to some researcher. Then it could buy the stocks of the subsequent company involved with the new technology and somehow gain control...well, at this point we are already deep into subsequent reasoning about something shaky that at the same time is used as evidence of the very reasoning involving it.
If a superhuman AI is possible without advanced nanotechnology, a superhuman AI could just invent advanced nanotechnology and implement it.
Overall I'd feel a lot more comfortable if you just said "there's a huge amount of uncertainty as to when existential risks will strike and which ones will strike, I don't know whether or not I'm on the right track in focusing on Friendly AI or whether I'm right about when the Singularity will occur, I'm just doing the best that I can."
This is largely because of the issue that I raise here
I should emphasize that I don't think that you'd ever knowingly do something that raised existential risk, I think that you're a kind and noble spirit. But I do think I'm raising a serious issue which you've missed.
Edit: See also these comments
I am looking for the evidence in "supported by evidence". I am further trying to figure how you anticipate your beliefs to pay rent, what you anticipate to see if explosive recursive self-improvement is possible, and how that belief could be surprised by data.
If you just say, "I predict we will likely be wiped out by badly done AI.", how do you expect to update on evidence? What would constitute such evidence?
I haven't done the reading. For further explanation read this comment.
Why do you always and exclusively mention Charles Stross? I need to know if you actually read all of my post.
Because the fact that you're mentioning Charles Stross means that you need to do basic reading, not complicated reading.
To put my own spin on XiXiDu's questions: What quality or position does Charles Stross possess that should cause us to leave him out of this conversation (other than the quality 'Eliezer doesn't think he should be mentioned')?
Another vacuous statement. I expected more.