[Philosophers] just bicker endlessly about uncertainty. "can you really know that 1+1=2?".
I don't think that is a good characterisation of the debate. It isn't just about uncertainty.
there is no such thing as objective morality. Good and evil are subjective ideas, nothing more.
That's what you think. Some smart humans disagree with you. A supermsart AI might disagree with you and might be right. How can you second guess it? You cannot predict the behaviour of a supersmart AI on the basis that i t will agree with you, who are less smart.
Firstly, unless someone explicitly tells the AI that it is a fundamental truth that nature is important to preserve, this can not happen.
Unless it figures it out.
Secondly, the AI would also have to be incredibly gullible to just swallow such a claim.
Why would that require more gullibility than "species X is more important than all the others"? That doesn't even look like a moral claim.
Thirdly, even if the AI does believe that, it will plainly say so to the people it is conversing with, in accordance with its goal to always tell the truth, thus warning us of this bug.
If it has "swallowed that* claim. You are assuming that the AI has a free choice about some goals and is just programmed with others.
If it has "swallowed* that claim. You are assuming that the AI has a free choice about some goals >and is just programmed with others.
This is the important part.
the "optimal goal" is not actually controlling the AI.
the "optimal goal" is merely the subject of a discussion.
what is controlling the AI is the desire the tell the truth to the humans it is talking to, nothing more.
...Why would that require more gullibility than "species X is more important than all the others"? >That doesn't even look like a moral claim.
edit: I think I have phrased this really poorly and that this has been misinterpreted. See my comment below for clarification.
A lot of thought has been put into the discussion of how one would need to define the goals of an AI so that it won't find any "loopholes" and act in an unintended way.
Assuming one already had an AI that is capable of understanding human psychology, which seems necessary to me to define the AI's goals anyway, wouldn't it be reasonable to assume that the AI would have an understanding of what humans want?
If that is the case, would the following approach work to make the AI friendly?
-give it the temporary goal to always answer questions thruthfully as far as possible while admitting uncertainty
-also give it the goal to not alter reality in any way besides answering questions.
-ask it what it thinks would be the optimal definition of the goal of a friendly AI, from the point of view of humanity, accounting for things that humans are too stupid to see coming.
-have a discussion between it and a group of ethicists/philosophers wherein both parties are encouraged to point out any flaws in the definition.
-have this go on for a long time until everyone (especially the AI, seeing as it is smarter than anyone else) is certain that there is no flaw in the definition and that it accounts for all kinds of ethical contingencies that might arise after the singularity.
-implement the result as the new goal of the AI.
What do you think of this approach?