timtyler comments on Open Thread: June 2009 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (142)
How to design utility functions for safe AIs?
Make a utility function which will only emit positive values if the AI is disabled at the moment the solution to your precise problem is found. Ensure that the utility function will emit smaller values for solutions which took longer. Ensure the function will emit higher values for world which are more similar to the world as it would have been without the AI interfering.
This will not create friendly AI, but an AI which tries to minimize its interference with the world. Depending on the weights applied to the three parts, it might spontaneously deactivate though.
That's pretty condensed. One of my video/essays discusses the underlying idea. To quote:
"One thing that might help is to put the agent into a quiescent state before being switched off. In the quiescent state, utility depends on not taking any of its previous utility-producing actions. This helps to motivate the machine to ensure subcontractors and minions can be told to cease and desist. If the agent is doing nothing when it is switched off, hopefully, it will continue to do nothing.
Problems with the agent's sense of identity can be partly addressed by making sure that it has a good sense of identity. If it makes minions, it should count them as somatic tissue, and ensure they are switched off as well. Subcontractors should not be "switched off" - but should be tracked and told to desist - and so on."