Open thread, Jul. 04 - Jul. 10, 2016

MrMind

I think your model of me is incorrect (and suspect I may have a symmetrical problem somehow); I promise you, I don't need reminding that I am part of the world, that my brain runs on physics, etc., and if it looks to you as if I'm assuming the opposite then (whether by my fault, your fault, or both) what you are getting out of my words is not at all what I am intending to put into them.

Just as your will will only cause you to do what the world has told you, so the AI will only do what it is programmed to.

I entirely agree. My point, from the outset, has simply been that this is perfectly compatible with the AI having as much flexibility, as much possibility of self-modification, as we have.

Far better to leave it in fetters.

I don't think that's obvious. You're trading one set of possible failure modes for another. Keeping the AI fettered is (kinda) betting that when you designed it you successfully anticipated the full range of situations it might be in in the future, well enough to be sure that the goals and values you gave it will produce results you're happy with. Not keeping it fettered is (kinda) betting that when you designed it you successfully anticipated the full range of self-modifications it might undergo, well enough to be sure that the goals and values it ends up with will produce results you're happy with.

Both options are pretty terrifying, if we expect the AI system in question to acquire great power (by becoming much smarter than us and using its smartness to gain power, or because we gave it the power in the first place e.g. by telling it to run the world's economy).

My own inclination is to think that giving it no goal-adjusting ability at all is bound to lead to failure, and that giving it some goal-adjusting ability might not but at present we have basically no idea how to make that not happen.

(Note that if the AI has any ability to bring new AIs into being, nailing its own value system down is no good unless we do it in such a way that it absolutely cannot create, or arrange for the creation of, new AIs with even slightly differing value systems. It seems to me that that has problems of its own -- e.g., if we do it by attaching huge negative utility to the creation of such AIs, maybe it arranges to nuke any facility that it thinks might create them...)