gedymin comments on Open thread, Nov. 24 - Nov. 30, 2014 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (317)
This may be a naive question, which has a simple answer, but I haven't seen it. Please enlighten me.
I'm not clear on why an AI should have a utility function at all.
The computer I'm typing this on doesn't. It simply has input-output behavior. When I hit certain keys it reacts in certain, very complex ways, but it doesn't decide. It optimizes, but only when I specifically tell it to do so, and only on the parameters that I give it.
We tend to think of world-shaping GAI as an agent with it's own goals, which it seeks to implement. Why can't it be more like a computing machine in a box. We could feed it questions, like "given this data, will it rain tomorrow?", or "solve this protein folding problem", or "which policy will best reduce gun-violence?", or even "given these specific parameters and definitions, how do we optimize for human happiness?" For the complex answers like the last of those, we could then ask the AI to model the state of the world that results from following this policy. If we see that it leads to tiling the universe with smiley faces, we know that we made a mistake somewhere (that wasn't what we were trying to optimize for), and adjust the parameters. We might even train the AI over time, so that it learns how to interpret what we mean from what we say. When the AI models a state of the world that actually reflects our desires, then we implement it's suggestions ourselves, or perhaps only then hit the implement button, by with the AI takes the steps to carry out it's plan. We might even use such a system to check the safety of future generations of the AI. This would slow recursive self improvement, but it seems it would be much safer.
This is actually one of the standard counterarguments against the need for friendly AI, at least against the notion that is should be an agent / be capable of acting as an agent.
I'll try to quickly summarize the counter-counter arguments Nick Bostrom gives in Superintelligence. (In the book, AI that is not agent at all is called tool AI. AI that is an agent but cannot act as one (has no executive power in the real world) is called oracle AI.)
Some arguments have already been mentioned:
There is also a possibility of what Bostrom calls mind crime. If a tool or oracle AI is not inherently friendly, it might simulate sentient minds in order to give the answers to the questions that we ask; kill or possibly even torture these minds. The possibility that these simulations have moral rights is low, but there can be trillions of them, so even a low probability cannot be ignored.
Finally, it might be that the best strategy for a tool AI to give answer is to internally develop an agent-type AI that is capable of self-improvement. If the default outcome of creating a self-improving AI is doom, then the tool AI scenario might in fact be less safe.