Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

eli_sennesh comments on Siren worlds and the perils of over-optimised search - Less Wrong

27 Post author: Stuart_Armstrong 07 April 2014 11:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (411)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 16 May 2014 02:35:27PM 1 point [-]

Upon further research, it turns out that preference learning is a field within machine learning, so we can actually try to address this at a much more formal level. That would also get us another benefit: supervised learning algorithms don't wirehead.

Notably, this fits with our intuition that morality must be "taught" (ie: via labelled data) to actual human children, lest they simply decide that the Good and the Right consists of eating a whole lot of marshmallows.

And if we put that together with a conservation heuristic for acting under moral uncertainty (say: optimize for expectedly moral expected utility, thus requiring higher moral certainty for less-extreme moral decisions), we might just start to make some headway on managing to construct utility functions that would mathematically reflect what their operators actually intend for them to do.

I also have an idea written down in my notebook, which I've been refining, that sort of extends from what Luke had written down here. Would it be worth a post?