RobbBB comments on The genie knows, but doesn't care - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (515)
On one hand, Friendly AI people want to convert "make me happy" to a formal specification. Doing that has many potential pitfalls. because it is a formal specification.
On the other hand, Richard, I think, wants to simply tell the AI, in English, "Make me happy." Given that approach, he makes the reasonable point that any AI smart enough to be dangerous would also be smart enough to interpret that at least as intelligently as a human would.
I think the important question here is, Which approach is better? LW always assumes the first, formal approach.
To be more specific (and Bayesian): Which approach gives a higher expected value? Formal specification is compatible with Eliezer's ideas for friendly AI as something that will provably avoid disaster. It has some non-epsilon possibility of actually working. But its failure modes are many, and can be literally unimaginably bad. When it fails, it fails catastrophically, like a monotonic logic system with one false belief.
"Tell the AI in English" can fail, but the worst case is closer to a "With Folded Hands" scenario than to paperclips.
I've never considered the "Tell the AI what to do in English" approach before, but on first inspection it seems safer to me.
Relatedly, Phil: You above described yourself and Richard Loosemore as "the two people (Eliezer) should listen to most". Loosemore and I are having a discussion here. Does the content of that discussion affect your view of Richard's level of insight into the problem of Friendly Artificial Intelligence?