JGWeissman comments on Friendly, but Dumb: Why formal Friendliness proofs may not be as safe as they appear - Less Wrong

9 Post author: apophenia 19 April 2010 11:38PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (4)

You are viewing a single comment's thread. Show more comments above.

Comment author: JGWeissman 20 April 2010 03:09:01AM 5 points [-]

On the other hand, there's no particular reason Betty should continue to self-improve that I can see.

A sub goal that is useful in achieving many primary goals is to improve one's general goal achieving ability.

An attempt to cripple a FAI by limiting its general intelligence would be noticed, because the humans would expect it to FOOM, and if it actually does FOOM it will be smart enough.

A sneakier unfriendly AI might try to design an FAI with a stupid prior, with blind spots the uFAI can exploit. So you would want your Friendliness test to look at not just the goal system, but every module of the supposed FAI, including epistemology and decision theory.

But, even a thorough test does not make it a good idea to run a supposed FAI designed by an uFAI. This allows the uFAI to optimize for its purpose every bit of uncertainty we have about the supposed FAI.