I think your model of me is incorrect (and suspect I may have a symmetrical problem somehow); I promise you, I don't need reminding that I am part of the world, that my brain runs on physics, etc., and if it looks to you as if I'm assuming the opposite then (whether by my fault, your fault, or both) what you are getting out of my words is not at all what I am intending to put into them.
Just as your will will only cause you to do what the world has told you, so the AI will only do what it is programmed to.
I entirely agree. My point, from the outset, has simply been that this is perfectly compatible with the AI having as much flexibility, as much possibility of self-modification, as we have.
Far better to leave it in fetters.
I don't think that's obvious. You're trading one set of possible failure modes for another. Keeping the AI fettered is (kinda) betting that when you designed it you successfully anticipated the full range of situations it might be in in the future, well enough to be sure that the goals and values you gave it will produce results you're happy with. Not keeping it fettered is (kinda) betting that when you designed it you successfully anticipated the full range of self-modifications it might undergo, well enough to be sure that the goals and values it ends up with will produce results you're happy with.
Both options are pretty terrifying, if we expect the AI system in question to acquire great power (by becoming much smarter than us and using its smartness to gain power, or because we gave it the power in the first place e.g. by telling it to run the world's economy).
My own inclination is to think that giving it no goal-adjusting ability at all is bound to lead to failure, and that giving it some goal-adjusting ability might not but at present we have basically no idea how to make that not happen.
(Note that if the AI has any ability to bring new AIs into being, nailing its own value system down is no good unless we do it in such a way that it absolutely cannot create, or arrange for the creation of, new AIs with even slightly differing value systems. It seems to me that that has problems of its own -- e.g., if we do it by attaching huge negative utility to the creation of such AIs, maybe it arranges to nuke any facility that it thinks might create them...)
Fair enough. I thought that you were using our own (imaginary) free will to derive a similar value for the AI. Instead, you seem to be saying that an AI can be programmed to be as 'free' as we are. That is, to change its utility function in response to the environment, as we do. That is such an abhorrent notion to me that I was eliding it in earlier responses. Do you really want to do that?
The reason, I think, that we differ on the important question (fixed vs evolving utility function) is that I'm optimistic about the ability of the masters to adjust...
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.