the issue is whether something happens efficiently enough to be local or fast enough to accumulate advantage between the leading Friendly AI and the leading unFriendly AI
Uh, that's a totally different issue from the one I was discussing.
To recap: I was pointing out that machines have been writing code and improving themselves for decades - that refactoring and lint-like programs applying their own improvements to their own codebases has a long history in the community - dating back to the early days of Smalltalk. That progress in computer ability at self-improvement (via modification of your own codebase) is, in point of fact, a long, slow and gradual process that has been going on for decades so far - and thus is not really well conceived of as being something that will happen suddenly in the future - when computers attain "insight".
Also, I notice that you have "quietly" edited the original post - in an attempt to eliminate the very point I was originally criticising. This rather makes it look as though I was misquoting you. Then you accuse me of attacking a straw man - after this clumsy attempt to conceal the original evidence. Oh well, at least you are correcting your own mistakes when they are pointed out to you - it seems like a kind of progress to me.
Reply to: Two Visions Of Heritage
Though it really goes tremendously against my grain - it feels like sticking my neck out over a cliff (or something) - I guess I have no choice here but to try and make a list of just my positions, without justifying them. We can only talk justification, I guess, after we get straight what my positions are. I will also leave off many disclaimers to present the points compactly enough to be remembered.
• A well-designed mind should be much more efficient than a human, capable of doing more with less sensory data and fewer computing operations. It is not infinitely efficient and does not use zero data. But it does use little enough that local pipelines such as a small pool of programmer-teachers and, later, a huge pool of e-data, are sufficient.
• An AI that reaches a certain point in its own development becomes able to (sustainably, strongly) improve itself. At this point, recursive cascades slam over many internal growth curves to near the limits of their current hardware, and the AI undergoes a vast increase in capability. This point is at, or probably considerably before, a minimally transhuman mind capable of writing its own AI-theory textbooks - an upper bound beyond which it could swallow and improve its entire design chain.
• It is likely that this capability increase or "FOOM" has an intrinsic maximum velocity that a human would regard as "fast" if it happens at all. A human week is ~1e15 serial operations for a population of 2GHz cores, and a century is ~1e19 serial operations; this whole range is a narrow window. However, the core argument does not require one-week speed and a FOOM that takes two years (~1e17 serial ops) will still carry the weight of the argument.
• The default case of FOOM is an unFriendly AI, built by researchers with shallow insights. This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever).
• The desired case of FOOM is a Friendly AI, built using deep insight, so that the AI never makes any changes to itself that potentially change its internal values; all such changes are guaranteed using strong techniques that allow for a billion sequential self-modifications without losing the guarantee. The guarantee is written over the AI's internal search criterion for actions, rather than external consequences.
• The good guys do not write an AI which values a bag of things that the programmers think are good ideas, like libertarianism or socialism or making people happy or whatever. There were multiple Overcoming Bias sequences about this one point, like the Fake Utility Function sequence and the sequence on metaethics. It is dealt with at length in the document Coherent *Extrapolated* Volition. It is the first thing, the last thing, and the middle thing that I say about Friendly AI. I have said it over and over. I truly do not understand how anyone can pay any attention to anything I have said on this subject, and come away with the impression that I think programmers are supposed to directly impress their non-meta personal philosophies onto a Friendly AI.
• The good guys do not directly impress their personal values onto a Friendly AI.
• Actually setting up a Friendly AI's values is an extremely meta operation, less "make the AI want to make people happy" and more like "superpose the possible reflective equilibria of the whole human species, and output new code that overwrites the current AI and has the most coherent support within that superposition". This actually seems to be something of a Pons Asinorum in FAI - the ability to understand and endorse metaethical concepts that do not directly sound like amazing wonderful happy ideas. Describing this as declaring total war on the rest of humanity, does not seem fair (or accurate).
• I myself am strongly individualistic: The most painful memories in my life have been when other people thought they knew better than me, and tried to do things on my behalf. It is also a known principle of hedonic psychology that people are happier when they're steering their own lives and doing their own interesting work. When I try myself to visualize what a beneficial superintelligence ought to do, it consists of setting up a world that works by better rules, and then fading into the background, silent as the laws of Nature once were; and finally folding up and vanishing when it is no longer needed. But this is only the thought of my mind that is merely human, and I am barred from programming any such consideration directly into a Friendly AI, for the reasons given above.
• Nonetheless, it does seem to me that this particular scenario could not be justly described as "a God to rule over us all", unless the current fact that humans age and die is "a malevolent God to rule us all". So either Robin has a very different idea about what human reflective equilibrium values are likely to look like; or Robin believes that the Friendly AI project is bound to fail in such way as to create a paternalistic God; or - and this seems more likely to me - Robin didn't read all the way through all the blog posts in which I tried to explain all the ways that this is not how Friendly AI works.
• Friendly AI is technically difficult and requires an extra-ordinary effort on multiple levels. English sentences like "make people happy" cannot describe the values of a Friendly AI. Testing is not sufficient to guarantee that values have been successfully transmitted.
• White-hat AI researchers are distinguished by the degree to which they understand that a single misstep could be fatal, and can discriminate strong and weak assurances. Good intentions are not only common, they're cheap. The story isn't about good versus evil, it's about people trying to do the impossible versus others who... aren't.
• Intelligence is about being able to learn lots of things, not about knowing lots of things. Intelligence is especially not about tape-recording lots of parsed English sentences a la Cyc. Old AI work was poorly focused due to inability to introspectively see the first and higher derivatives of knowledge; human beings have an easier time reciting sentences than reciting their ability to learn.
• Intelligence is mostly about architecture, or "knowledge" along the lines of knowing to look for causal structure (Bayes-net type stuff) in the environment; this kind of knowledge will usually be expressed procedurally as well as declaratively. Architecture is mostly about deep insights. This point has not yet been addressed (much) on Overcoming Bias, but Bayes nets can be considered as an archetypal example of "architecture" and "deep insight". Also, ask yourself how lawful intelligence seemed to you before you started reading this blog, how lawful it seems to you now, then extrapolate outward from that.