Reply to: Two Visions Of Heritage
Though it really goes tremendously against my grain - it feels like sticking my neck out over a cliff (or something) - I guess I have no choice here but to try and make a list of just my positions, without justifying them. We can only talk justification, I guess, after we get straight what my positions are. I will also leave off many disclaimers to present the points compactly enough to be remembered.
• A well-designed mind should be much more efficient than a human, capable of doing more with less sensory data and fewer computing operations. It is not infinitely efficient and does not use zero data. But it does use little enough that local pipelines such as a small pool of programmer-teachers and, later, a huge pool of e-data, are sufficient.
• An AI that reaches a certain point in its own development becomes able to (sustainably, strongly) improve itself. At this point, recursive cascades slam over many internal growth curves to near the limits of their current hardware, and the AI undergoes a vast increase in capability. This point is at, or probably considerably before, a minimally transhuman mind capable of writing its own AI-theory textbooks - an upper bound beyond which it could swallow and improve its entire design chain.
• It is likely that this capability increase or "FOOM" has an intrinsic maximum velocity that a human would regard as "fast" if it happens at all. A human week is ~1e15 serial operations for a population of 2GHz cores, and a century is ~1e19 serial operations; this whole range is a narrow window. However, the core argument does not require one-week speed and a FOOM that takes two years (~1e17 serial ops) will still carry the weight of the argument.
• The default case of FOOM is an unFriendly AI, built by researchers with shallow insights. This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever).
• The desired case of FOOM is a Friendly AI, built using deep insight, so that the AI never makes any changes to itself that potentially change its internal values; all such changes are guaranteed using strong techniques that allow for a billion sequential self-modifications without losing the guarantee. The guarantee is written over the AI's internal search criterion for actions, rather than external consequences.
• The good guys do not write an AI which values a bag of things that the programmers think are good ideas, like libertarianism or socialism or making people happy or whatever. There were multiple Overcoming Bias sequences about this one point, like the Fake Utility Function sequence and the sequence on metaethics. It is dealt with at length in the document Coherent *Extrapolated* Volition. It is the first thing, the last thing, and the middle thing that I say about Friendly AI. I have said it over and over. I truly do not understand how anyone can pay any attention to anything I have said on this subject, and come away with the impression that I think programmers are supposed to directly impress their non-meta personal philosophies onto a Friendly AI.
• The good guys do not directly impress their personal values onto a Friendly AI.
• Actually setting up a Friendly AI's values is an extremely meta operation, less "make the AI want to make people happy" and more like "superpose the possible reflective equilibria of the whole human species, and output new code that overwrites the current AI and has the most coherent support within that superposition". This actually seems to be something of a Pons Asinorum in FAI - the ability to understand and endorse metaethical concepts that do not directly sound like amazing wonderful happy ideas. Describing this as declaring total war on the rest of humanity, does not seem fair (or accurate).
• I myself am strongly individualistic: The most painful memories in my life have been when other people thought they knew better than me, and tried to do things on my behalf. It is also a known principle of hedonic psychology that people are happier when they're steering their own lives and doing their own interesting work. When I try myself to visualize what a beneficial superintelligence ought to do, it consists of setting up a world that works by better rules, and then fading into the background, silent as the laws of Nature once were; and finally folding up and vanishing when it is no longer needed. But this is only the thought of my mind that is merely human, and I am barred from programming any such consideration directly into a Friendly AI, for the reasons given above.
• Nonetheless, it does seem to me that this particular scenario could not be justly described as "a God to rule over us all", unless the current fact that humans age and die is "a malevolent God to rule us all". So either Robin has a very different idea about what human reflective equilibrium values are likely to look like; or Robin believes that the Friendly AI project is bound to fail in such way as to create a paternalistic God; or - and this seems more likely to me - Robin didn't read all the way through all the blog posts in which I tried to explain all the ways that this is not how Friendly AI works.
• Friendly AI is technically difficult and requires an extra-ordinary effort on multiple levels. English sentences like "make people happy" cannot describe the values of a Friendly AI. Testing is not sufficient to guarantee that values have been successfully transmitted.
• White-hat AI researchers are distinguished by the degree to which they understand that a single misstep could be fatal, and can discriminate strong and weak assurances. Good intentions are not only common, they're cheap. The story isn't about good versus evil, it's about people trying to do the impossible versus others who... aren't.
• Intelligence is about being able to learn lots of things, not about knowing lots of things. Intelligence is especially not about tape-recording lots of parsed English sentences a la Cyc. Old AI work was poorly focused due to inability to introspectively see the first and higher derivatives of knowledge; human beings have an easier time reciting sentences than reciting their ability to learn.
• Intelligence is mostly about architecture, or "knowledge" along the lines of knowing to look for causal structure (Bayes-net type stuff) in the environment; this kind of knowledge will usually be expressed procedurally as well as declaratively. Architecture is mostly about deep insights. This point has not yet been addressed (much) on Overcoming Bias, but Bayes nets can be considered as an archetypal example of "architecture" and "deep insight". Also, ask yourself how lawful intelligence seemed to you before you started reading this blog, how lawful it seems to you now, then extrapolate outward from that.
I find the hypothesis that an AGI's values will remain frozen highly questionable. To be believable one would have to argue that the human ability to question values is due only or principally to nothing more than the inherent sloppiness of our evolution. However, I see no reason to suppose that an AGI would apply its intelligence to every aspect of its design except its goal structure. I see no reason to suppose that relatively puny and sloppy minds can do a level of questioning and self-doubt that a vastly superior intelligence never will or can.
I also find in extremely doubtful that any human being has a mind sufficient to make guarantees of what will remain immutable in a much more sophisticated mind after billions of iterative improvements. It will take extremely strong arguments before this appears even remotely feasible.
I don't find CEV at all convincing as the basis for FAI as detailed some time ago on the SL4 list.
Please explicate what you mean by "reflective equilibria of the whole human species. What does the "human species" have to do with it if the "human" as we know it is only a phase on the way to something other that humanity or at least some humans may become?
I don't think it is realistic to both create an intelligence that goes "FOOM" by self-improvement and that is any less than a god compared to us. I know you think you can create something that is not necessarily ever self-aware and yet can maximize human well-being or at least you have seemed to hold this position in the past. I do not believe that is possible. An intelligence that mapped human psychology that deeply would be forced to map our relationships to it. Thus self-awareness along with a far deeper introspection than humans can dream of is inescapable.
That humans age and die does not imply a malevolent god set things up (or exists of course). This stage may be inescapable for the growing of new independent intelligences. To say that this is obviously evil is possibly provincial and a very biased viewpoint. We do not know enough to say.
If "testing is not sufficient" then exactly how are you to know that you have got it right in this colossal undertaking?
From where I am sitting it very much looks like you are trying to do the impossible - trying to not only create an intelligence that dwarfs your own by several orders of magnitude but also to guarantees its fundamental values and the overall results of its implementations of those values in reality with respect to humanity. If that is not impossible then I don't know what is.