eltem comments on The mathematics of reduced impact: help needed - Less Wrong

10 Post author: Stuart_Armstrong 16 February 2012 02:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (94)

You are viewing a single comment's thread. Show more comments above.

Comment author: Vladimir_Nesov 16 February 2012 10:03:51PM *  8 points [-]

Beware Goodhart's Law: you're setting rules of the game that the "disciple AI" has an incentive to subvert. Essentially, you're specifying a wish, and normally your ability to evaluate a wish is constrained by your ability to consider and (morally) evaluate all the possible consequences (strategies) in detail. An AI might find a strategy that, while satisfying your wish, would be disastrous (which might win the AI a prize so insignificant it'd never rise to your attention).

Comment author: eltem 17 February 2012 11:57:48PM *  0 points [-]

It is probably worth noting here that AI's ability to evaluate measure of matching your wish and consequences that you need is, in turn, limited by its own ability to evaluate consequences of its actions (if we apply the constraint that you are talking about to AI itself). That can easily turn into requirement of building a Maxwell's demon or AI admitting (huh..) that it is doing something about which it doesn't know if it will match your wish or not.