Username comments on Values at compile time - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (17)
Let me play devil's advocate for this position.
How to specify goals at compile time is a technical question, but we can do some a priori theorizing as to how we might do it. Roughly, there are two high level approaches of how to go about it. Simple hard-coded goals, and goals fed in from more complex modules. A simple hard-coded goal might be something like current reinforcement learners were the reward signal is human praise (or, a simple to hard-code proxy for human praise such as pressing a reward button). The other alternative is to make a a few modules (e.g. one for natural language understanding, one for modeling humans) and "use it/them as part of the definition of the new AI's motivation.").
Responses to counterarguments:
4.1: needing to specify commands carefully (e.g. "give humans what they really want".).
The whole point of intelligence is being able to specify tasks in an ambiguous way (e.g. you don't have to specify what you want in such detail that you're practically programming a computer). An AI that actually wants to make you happier (since it's goals were specified at compile time using a module that models humans) will ask you what to clarify your intentions if you give it vague goals.
Some other thoughts:
It will be hard to accomplish this, since nobody knows how to go about building such modules. Modeling language, humans, and human values are hard problems. Building the modules is a technical question. But, it is necessary and sufficient to build the modules and feed them into the goal system of another AI to build a friendly AI. In fact, one could make a stronger argument that any AGI that's built with a goal system must have it's goal system specified with natural language modules (e.g. reinforcement learning sucks). Thus, it is likely that any built AGIs would be FAIs.
EDITED to add: Tool-AI arguments. If you can build the modules to feed into an AI with a goal system, then you might be able to build a "tool-AI" that doesn't a goal system. I think it's hard to say a priori that such an architecture isn't more likely than an architecture that requires a goal system. It's even harder to say that a tool-AI architecture is impossible to build.
In summary, I think the chief issues with building friendly AI are technical issues related to actually building the AI. I don't see how decision theory helps. I do think that unfriendly humans with a tool AI is something to be concerned about, but doing math research doesn't seem related to that (Incidentally, MIRI's math research has intrigued people like Elon Musk, which helps with the "unfriendly humans problem").