First you have to tell the machine to do that. It isn't trivial. The problem is not with the definition of "optimal" itself - >but with what function is being optimised.
If the AI understands psychology, it knows what motivates us. We won't need to explicitly explain any moral conundrums or point out dichotomies. It should be able to infer this knowledge from what it knows about the human psyche. Maybe it could just browse the internet for material on this topic to inform itself of how we humans work.
The way I see it, we humans will have as little need to tell the AI what we want as ants, if they could talk, would have a need to point out to a human that they don't want him to destroy their colony. Even the most abstract conundrums that philosophers needed centuries to even point out, much less answer, might seem obvious to the AI.
The above paragraph obviously only applies if the AI is already superhuman, but the general idea behind it works regardless of its intelligence.
Well not if you decide to train it for "a long time". History is foll of near-simultaneous inventions being made in >different places. Corporate history is full of close competition. There are anti-monopoly laws that attempt to >prevent dominance by any one party - usually by screwing with any company that gets too powerful.
OK, this might pose a problem. A possible solution: The AI, being supposed to turn into a benefactor for humanity as a whole, is developed in an international project instead of by a single company. This would ensure enough funding that it would be hard for a company to develop it faster, draw every AI developer to this one project, thus further eliminating competition, and reduce the chance that executive meddling causes people to get sloppy to save money.
...If the AI understands psychology, it knows what motivates us. We won't need to explicitly explain any moral conundrums or point out dichotomies. It should be able to infer this knowledge from what it knows about the human psyche. Maybe it could just browse the internet for material on this topic to inform itself of how we humans work.
The way I see it, we humans will have as little need to tell the AI what we want as ants, if they could talk, would have a need to point out to a human that they don't want him to destroy their colony. Even the most abstract
edit: I think I have phrased this really poorly and that this has been misinterpreted. See my comment below for clarification.
A lot of thought has been put into the discussion of how one would need to define the goals of an AI so that it won't find any "loopholes" and act in an unintended way.
Assuming one already had an AI that is capable of understanding human psychology, which seems necessary to me to define the AI's goals anyway, wouldn't it be reasonable to assume that the AI would have an understanding of what humans want?
If that is the case, would the following approach work to make the AI friendly?
-give it the temporary goal to always answer questions thruthfully as far as possible while admitting uncertainty
-also give it the goal to not alter reality in any way besides answering questions.
-ask it what it thinks would be the optimal definition of the goal of a friendly AI, from the point of view of humanity, accounting for things that humans are too stupid to see coming.
-have a discussion between it and a group of ethicists/philosophers wherein both parties are encouraged to point out any flaws in the definition.
-have this go on for a long time until everyone (especially the AI, seeing as it is smarter than anyone else) is certain that there is no flaw in the definition and that it accounts for all kinds of ethical contingencies that might arise after the singularity.
-implement the result as the new goal of the AI.
What do you think of this approach?