[Anna Salamon] gave the familiar SIAI argument that, if one picks a mind at random from “mind space”, the odds that it will be Friendly to humans are effectively zero.
This is an incredibly weak argument by intuition. The mind picked at random from "mind space" can be self destructive, for instance, or can be incapable of self improvement. As intuition pump, if you pick a computer program at random from computer program space - run random code - it crashes right off almost all of the time. If you eliminate the crashes you get very simple infinite loops. If you eliminate those, you get very simple loops that count or the like, with many pieces of random code corresponding to exact same behaviour after running it for any significant number of cpu cycles (as most of the code ends up non-functional). You get Kolmogorov's complexity prior even if you just try to run uniformly random x86 code.
The problem with the argument is that you appeal to the random mind space, while discussing the AIs that foom'd from being man made and running at manmade hardware, and which do not self destruct, and thus are anything but random.
One could make equally plausible argument that random mind from the space of the minds that are not self destructive, yet capable of self improvement (which implies considerably broad definition of self) is almost certainly friendly as it would implement the simplest goal system which permits self improvements and forbids self harm, implying likely rather broad and not very specific definition of self harm that would likely include harm to all life. It is not a very friendly AI - it will kill the entire crew of a whaling ship if it has to - but not very destructive. edit: Of course, that's subject to how it tries to maximize value of the life; the diversity and complexity preservation seems natural for the anti-self-harm mechanism. Note: the life is immensely closer to the AI than dead parts of the universe. Note2: Less specific discriminators typically have lower complexity. Note3: I think the safest assumption to make is that the AI doesn't start off as a self aware super genius that will figure out instrumental self preservation from first principles even if the goal is not self preserving.
I'll call this a "Greenpeace by default" argument. It is coming from a software developer (me) with some understanding of what random design spaces tend to look like, so it got to have higher prior than the "Unfriendly by default" which ignores the fact that most of the design space corresponds to unworkable designs and that simpler designs have larger number of working implementations.
Ultimately, this is all fairly baseless speculation and rationalization of culturally, socially, and politically motivated opinions and fears. One does not start with an intuition of the random mind design space - it is obvious that such intuition is likely garbage unless one actually dealt with random design spaces before. One starts with fear and invents that argument. One can start with pro-AI attitude and invent converse, but equally (if not more) plausible argument, by appeal to intuitions of this kind. Bottom line is, all of those are severely privileged hypotheses. The scary idea, the Greenpeace idea of mine, they're baseless speculations - though I do have very strong urge to just promote this Greenpeace idea with same zeal, just to counter the harm done by promoting other privileged hypotheses.
How do you think the "Greenpeace by default" AI might define either "harm" or "value", and "life"?
Here's my draft document Concepts are Difficult, and Unfriendliness is the Default. (Google Docs, commenting enabled.) Despite the name, it's still informal and would need a lot more references, but it could be written up to a proper paper if people felt that the reasoning was solid.
Here's my introduction:
And here's my conclusion:
For the actual argumentation defending the various premises, see the linked document. I have a feeling that there are still several conceptual distinctions that I should be making but am not, but I figured that the easiest way to find the problems would be to have people tell me what points they find unclear or disagreeable.