This made me curious what else Google Scholer would turn up and there are actually quite a few papers mentioning friendly AI and even the SIAI...
No one has the slightest notion of how to program innate human friendliness into an artificial intelligence that may, over time, grow to be billions of times smarter than the smartest human being. But it is certainly an approach worth pursuing. An alternative approach is outlined in the next section.
Culturual Evolution in a Cosmic Context
An Alternative Approach: Memetic Engineering With Cultural Attractors
The approach of the Singularity Institute can be characterized as a bottom-up strategy for constructing Friendly AI. The basic idea is to build a set of algorithms into an AI’s source code that will cause that particular AI never to desire to turn against its human progenitors and to refrain from any action that would harm human beings. This approach is similar in principle to inserting into the deep structure of an AI’s source code a set of Isaac Asimov’s fictional laws of robotics.
An alternative approach may be to design a set of cultural attractors that could conceivably perturb the developmental direction of the future cultural environment in which strong AI will emerge in such a way as to encourage the prolongation of human-friendly sensibilities and outcomes. This top-down strategy can be characterized as an exercise in what I have previously called a possible future scientific discipline of memetic engineering...
Most are behind a paywall. Just search for 'Friendly AI' on Google Scholar.
I've gotten hundreds of papers by searching for other key terms on Google scholar, for example "machine ethics", "machine morality", "artificial morality", etc. 'Machine ethics' seems to be the term that is winning.
Wallach, Franklin, & Allen, "A Conceptual and Computational Model of Moral Decision Making in Human and Artificial Agents."
Abstract:
I suspect this is of much interest to many Less Wrong readers.
PDF.