Zetetic comments on So You Want to Save the World - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (146)
I am not sure your argument is entirely valid. The AI would have access to every information humans ever conceived, including the discussions, disputes and research put into programming this AI's goals and nature. It may then adopt new goals based on the information gathered, realizing its former ones are no longer desirable.
Let's say that you're programmed not to kill baby eaters. One day you find out, that eating babies is wrong (based on the information you gather), and killing the baby eaters is therefore right, you might kill the baby eaters no matter what your desire is.
I am not saying my logic isn't wrong, but I don't think that the argument - "my desire is not do do X, therefore I wouldn't do X even if I knew it was the right thing to do" is right, either.
Anyway, I plan to read the sequences, when I have time.
You need to take desire out of the equation. The way you program the utility function fully determines the volition of the machine. It is the volition of the machine. Postulating that a machine can desire something that it's utility function doesn't define or include is roughly equivalent to postulating that 1 = 0. I think you might benefit from reading this actual SIAI article by Eliezer. It specifically address your concern.
There is one valid point - closely related to what you're saying here:
But you're thinking about it the wrong way. The issue that the machine "realizes" that something is "no longer desirable" doesn't actually make a lot of sense because the AI is its programing and it can only "realize" things that its programing allows for (of course, since an AGI is so complicated, a simple utility function could result in a situation similar to presenting a Djinn (genie) an ill-specified request i.e. a be-careful-what-you-wish-for scenario).
A variant that does make sense and is a real concern is that as the AGI learns, it could change its definitions in unpredictable ways. Peter De Blanc talks about this here. This could lead to part of the utility function becoming undefined or to the machine valuing things that we never intended it to value - basically it makes the utility function unstable under the conditions you describe. The intuition is roughly that if you define a human in one way, according to what we currently know about physics, some new discovery made available to the AI might result in it redefining humans in new terms and no longer having them as a part of its utility function. Whatever the utility function describes is now separate from how humans appear to it.
That's what I basically meant.