You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Val comments on In what language should we define the utility function of a friendly AI? - Less Wrong Discussion

3 Post author: Val 05 April 2015 10:14PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (22)

You are viewing a single comment's thread. Show more comments above.

Comment author: Val 09 April 2015 07:57:28PM 0 points [-]

I can agree with some of your points, but interestingly, many commenters prefer a very rigorously defined utility function defined in the lower possible language instead of your heuristically developed one, because they argue that its exact functionality has to be provable.

Comment author: jacob_cannell 09 April 2015 09:23:03PM *  0 points [-]

The types of decision utility functions that we can define precisely for an AI are exactly the kind that we absolutely do not want - namely the class of model-free reward functions. That works for training an agent to play atari games based on a score function provided by the simulated environment, but it just doesn't scale to the real world which doesn't come with a convenient predefined utility function.

For AGI, we need a model based utility function, which maps internal world states to human relevant utility values. As the utility function is then dependent on the AGI's internal predictive world model, you would then need to rigorously define the AGI's entire world model. That appears to be a rather hopelessly naive dead end. I'm not aware of any progress or research that indicates that approach is viable. Are you?

Instead all current research progress trends strongly indicate that the first practical AGI designs will be based heavily on inferring human values indirectly. Proving safety for alternate designs - even if possible - has little value if those results do not apply to the designs which will actually win the race to superintelligence.

Also - there is a whole math research tract in machine learning concerned with provable bounds on loss and prediction accuracy - so it's not simply true that using machine learning techniques to infer human utility functions necessitates 'heuristics' ungrounded in any formal analysis.