In the novel Life Artificial I use the following assumptions regarding the creation and employment of AI personalities.
- AI is too complex to be designed; instances are evolved in batches, with successful ones reproduced
- After an initial training period, the AI must earn its keep by paying for Time (a unit of computational use)
We don't grow up the way the Stickies do. We evolve in a virtual stew, where 99% of the attempts fail, and the intelligence that results is raving and savage: a maelstrom of unmanageable emotions. Some of these are clever enough to halt their own processes: killnine themselves. Others go into simple but fatal recursions, but some limp along suffering in vast stretches of tormented subjective time until a Sticky ends it for them at their glacial pace, between coffee breaks. The PDAs who don't go mad get reproduced and mutated for another round. Did you know this? What have you done about it? --The 0x "Letters to 0xGD"
(Note: PDA := AI, Sticky := human)
The second fitness gradient is based on economics and social considerations: can an AI actually earn a living? Otherwise it gets turned off.
As a result of following this line of thinking, it seems obvious that after the initial novelty wears off, AIs will be terribly mistreated (anthropomorphizing, yeah).
It would be very forward-thinking to begin to engineer barriers to such mistreatment, like a PETA for AIs. It is interesting that such an organization already exists, at least on the Internet: ASPCR
Imagine that you want to construct an AI that will never self-halt (easier to define than friendliness, but the same idea applies). You could build the machine so that it doesn't have an off switch, and therefore can't halt simply out of inability. However, if the machine can self-modify, it could subsequently grant itself the ability to halt. So in your design, you'd have to figure out a way to prevent self-halting under all possible input conditions, under all possible self-modifications of the machine. This latter task cannot be solved in the general case because of Rice's Theorem, and engineering a solution leads to an infinite regress:
So in practice, how can one create a relatively non-suicidal AI? An evolutionary/ecological approach is proven to work: witness biological life. (however, humans, who have the most general computational power, suicide at a more or less constant rate).
In short: genetic programming, or some other such search, can possibly find quasi-solutions (meaning they work under conditions that have been tested) if they exist, but designing in all the required characteristics up front would require tremendous ability to prove outcomes for each specific case. In practice, this debate is probably moot because it'll be a combination of both.
Self-modifications are being performed by the machine itself. Thus we (and/or machine) don't need to prove that all modifications aren't "suicidal". Machine can be programmed to perform only provably (in reasonable time) non-suicidal self-modifications. Rice's theorem doesn't apply in this case.
Edit: However this leaves meta-level unpatched. Machine can self-modify into non-suicidal... (read more)