Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Stuart_Armstrong comments on Siren worlds and the perils of over-optimised search - Less Wrong

27 Post author: Stuart_Armstrong 07 April 2014 11:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (411)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 16 May 2014 10:33:19AM 1 point [-]

I've been thinking about this, and I haven't found any immediately useful way of using your idea, but I'll keep it in the back of my mind... We haven't found a good way of identifying agency in the abstract sense ("was cosmic phenonmena X caused by an agent, and if so, which one?" kind of stuff), so this might be a useful simpler problem...

Comment author: [deleted] 16 May 2014 02:35:27PM 1 point [-]

Upon further research, it turns out that preference learning is a field within machine learning, so we can actually try to address this at a much more formal level. That would also get us another benefit: supervised learning algorithms don't wirehead.

Notably, this fits with our intuition that morality must be "taught" (ie: via labelled data) to actual human children, lest they simply decide that the Good and the Right consists of eating a whole lot of marshmallows.

And if we put that together with a conservation heuristic for acting under moral uncertainty (say: optimize for expectedly moral expected utility, thus requiring higher moral certainty for less-extreme moral decisions), we might just start to make some headway on managing to construct utility functions that would mathematically reflect what their operators actually intend for them to do.

I also have an idea written down in my notebook, which I've been refining, that sort of extends from what Luke had written down here. Would it be worth a post?