Stuart_Armstrong comments on The mathematics of reduced impact: help needed - Less Wrong

10 Post author: Stuart_Armstrong 16 February 2012 02:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (94)

You are viewing a single comment's thread. Show more comments above.

Comment author: Vladimir_Nesov 16 February 2012 10:03:51PM *  8 points [-]

Beware Goodhart's Law: you're setting rules of the game that the "disciple AI" has an incentive to subvert. Essentially, you're specifying a wish, and normally your ability to evaluate a wish is constrained by your ability to consider and (morally) evaluate all the possible consequences (strategies) in detail. An AI might find a strategy that, while satisfying your wish, would be disastrous (which might win the AI a prize so insignificant it'd never rise to your attention).

Comment author: Stuart_Armstrong 22 February 2012 09:54:49AM 0 points [-]

Beware Goodhart's Law:

One consideration is the amount of information in the coarse graining measures: we could set it up so there are more measurements made than there are bits in the disciple AI's source code. Not a guarantee of anything, of course, but Goodhart's law mainly derives from how short the success indicator is compared with the phenomena it's trying to measure, so hence subverting the law is easier than improving the phenomena.