Reply to: Two Visions Of Heritage
Though it really goes tremendously against my grain - it feels like sticking my neck out over a cliff (or something) - I guess I have no choice here but to try and make a list of just my positions, without justifying them. We can only talk justification, I guess, after we get straight what my positions are. I will also leave off many disclaimers to present the points compactly enough to be remembered.
• A well-designed mind should be much more efficient than a human, capable of doing more with less sensory data and fewer computing operations. It is not infinitely efficient and does not use zero data. But it does use little enough that local pipelines such as a small pool of programmer-teachers and, later, a huge pool of e-data, are sufficient.
• An AI that reaches a certain point in its own development becomes able to (sustainably, strongly) improve itself. At this point, recursive cascades slam over many internal growth curves to near the limits of their current hardware, and the AI undergoes a vast increase in capability. This point is at, or probably considerably before, a minimally transhuman mind capable of writing its own AI-theory textbooks - an upper bound beyond which it could swallow and improve its entire design chain.
• It is likely that this capability increase or "FOOM" has an intrinsic maximum velocity that a human would regard as "fast" if it happens at all. A human week is ~1e15 serial operations for a population of 2GHz cores, and a century is ~1e19 serial operations; this whole range is a narrow window. However, the core argument does not require one-week speed and a FOOM that takes two years (~1e17 serial ops) will still carry the weight of the argument.
• The default case of FOOM is an unFriendly AI, built by researchers with shallow insights. This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever).
• The desired case of FOOM is a Friendly AI, built using deep insight, so that the AI never makes any changes to itself that potentially change its internal values; all such changes are guaranteed using strong techniques that allow for a billion sequential self-modifications without losing the guarantee. The guarantee is written over the AI's internal search criterion for actions, rather than external consequences.
• The good guys do not write an AI which values a bag of things that the programmers think are good ideas, like libertarianism or socialism or making people happy or whatever. There were multiple Overcoming Bias sequences about this one point, like the Fake Utility Function sequence and the sequence on metaethics. It is dealt with at length in the document Coherent *Extrapolated* Volition. It is the first thing, the last thing, and the middle thing that I say about Friendly AI. I have said it over and over. I truly do not understand how anyone can pay any attention to anything I have said on this subject, and come away with the impression that I think programmers are supposed to directly impress their non-meta personal philosophies onto a Friendly AI.
• The good guys do not directly impress their personal values onto a Friendly AI.
• Actually setting up a Friendly AI's values is an extremely meta operation, less "make the AI want to make people happy" and more like "superpose the possible reflective equilibria of the whole human species, and output new code that overwrites the current AI and has the most coherent support within that superposition". This actually seems to be something of a Pons Asinorum in FAI - the ability to understand and endorse metaethical concepts that do not directly sound like amazing wonderful happy ideas. Describing this as declaring total war on the rest of humanity, does not seem fair (or accurate).
• I myself am strongly individualistic: The most painful memories in my life have been when other people thought they knew better than me, and tried to do things on my behalf. It is also a known principle of hedonic psychology that people are happier when they're steering their own lives and doing their own interesting work. When I try myself to visualize what a beneficial superintelligence ought to do, it consists of setting up a world that works by better rules, and then fading into the background, silent as the laws of Nature once were; and finally folding up and vanishing when it is no longer needed. But this is only the thought of my mind that is merely human, and I am barred from programming any such consideration directly into a Friendly AI, for the reasons given above.
• Nonetheless, it does seem to me that this particular scenario could not be justly described as "a God to rule over us all", unless the current fact that humans age and die is "a malevolent God to rule us all". So either Robin has a very different idea about what human reflective equilibrium values are likely to look like; or Robin believes that the Friendly AI project is bound to fail in such way as to create a paternalistic God; or - and this seems more likely to me - Robin didn't read all the way through all the blog posts in which I tried to explain all the ways that this is not how Friendly AI works.
• Friendly AI is technically difficult and requires an extra-ordinary effort on multiple levels. English sentences like "make people happy" cannot describe the values of a Friendly AI. Testing is not sufficient to guarantee that values have been successfully transmitted.
• White-hat AI researchers are distinguished by the degree to which they understand that a single misstep could be fatal, and can discriminate strong and weak assurances. Good intentions are not only common, they're cheap. The story isn't about good versus evil, it's about people trying to do the impossible versus others who... aren't.
• Intelligence is about being able to learn lots of things, not about knowing lots of things. Intelligence is especially not about tape-recording lots of parsed English sentences a la Cyc. Old AI work was poorly focused due to inability to introspectively see the first and higher derivatives of knowledge; human beings have an easier time reciting sentences than reciting their ability to learn.
• Intelligence is mostly about architecture, or "knowledge" along the lines of knowing to look for causal structure (Bayes-net type stuff) in the environment; this kind of knowledge will usually be expressed procedurally as well as declaratively. Architecture is mostly about deep insights. This point has not yet been addressed (much) on Overcoming Bias, but Bayes nets can be considered as an archetypal example of "architecture" and "deep insight". Also, ask yourself how lawful intelligence seemed to you before you started reading this blog, how lawful it seems to you now, then extrapolate outward from that.
This seems like the first problem I detected. An intelligence being able to improve itself does not necessarily lead to a recursive cascade of self-improvement - since it may only be able to improve some parts of itself - and it's quite possible that after it has done those improvements, it can't do any more.
Say that machine intelligence learns how to optimise FOR loops, eliminatining unnecessary conditions, etc. Presto, it can optimise its entire codebase - and thus improve itself. However, that doesn't lead to a self-improving recursive cascade - because it only improved itself in one way, and that was a rather limited way. Of course this kind of improvement has been going on for decades - via lint tools and automatic refactoring.
As machines get smarter, they will gradually become able to improve more and more of themselves. Yes, eventually machines will be able to cut humans out of the loop - but before that there will have been much automated improvement of machines by machines - and after that there may still be human code reviews.
This is not the first time I have made this point here. It does not seem especially hard to understand to me - but yet the conversation sails gaily onwards, with no coherent criticism, and no sign of people updating their views: it feels like talking to a wall.
In order to learn how to optimize FOR loops it would have to be pretty intelligent and have general learning ability. So it wouldn't just stop after learning that, it would go on to learn more things at increased speed. Learning the first optimization would let it learn more optimizations even faster than it otherwise would have. The second optimization it makes helps it learn the third even faster and so on.
It's not clear to me how fast this process would be. Just because it learns the next optimization even faster than it otherwise would have taken, does... (read more)