Eliezer_Yudkowsky comments on The genie knows, but doesn't care - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (515)
That problem has got to be solved somehow at some stage, because something that couldn't pass a Turing Test is no AGI.
Why is that a problem? Is anyone suggesting AGI can be had for free?
Ok. NL is hard. Everyone knows that. But its got to be solved anyway.
Yeah, but it's got to be done anyway.
[more of the same snipped]
Yeah. But it wouldn't be an AGI or an SI if it couldn't pass a TT.
The issue of whether the SI's UF contains a set of human values is irrelevant. In a Loosemore architecture, an AI needs to understand and follow the directive "be friendly to humans", and those are all the goals it needs-- to understand, and to follow;
The UF only needs to contain "understand English, and obey this directive". You don't have to code semantics into the UF. You do of course, have to code it in somewhere,
A problem which has been solved over and over by humans. Humans don't need to be loaded apriori with what makes other humans happy, they only need to know general indicators, like smiles and statements of approval.
Why would that be necessary? In the Loosemore architecture, the AGI has the goals of understanding English and obeying the Be Friendly directive. It eventually gets a detailed, extensional, understanding of Friendliness from pursuing those goals, Why would it need to be preloaded with a detailed, extensional unpacking of friendliness? It could fail in understanding English, of course. But there is no reason to think it is unlikely to fail at understanding "friendliness" specifically, and its competence can be tested as you go along.
I don't see the problem. In the Loosemore architecture, the AGI will care about obeying "be friendly", and it will arrive at the detailed expansion, the idiosyncracies, of "friendly" as part of its other goal to understand English. It cares about being friendly, and it knows the detailed expansion of friendliness, so where's the problem?
Says who? It has the high level directive, and another directive to understand the directive. It's been Friendly in principle all along, it just needs to fill in the details.
Then we do need to figure out how to program the AI to terminally value its programmers' True Intentions. That is hardly a fatal objection. Did you think the Loosemore architecture was one that bootstraps itself without any basic goals?
No. The goal to understand English is not the same as a goal to be friendly in every way, it is more constrained.
Solving Friendliness, in the MIRI sense, means preloading a detailed expansion of "friendly". That is not what is happening in the Loosemore architecture. So it is not equivalent to solving the same problem.
Nope.
That is an open question.
Then hurrah for the Loosemore architecture, which doesn't require humans to"solve" friendliness in the MIRI sense.
Juno_Watt, please take further discussion to RobbBB's blog.