You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Gunnar_Zarncke comments on Dumbing Down Human Values - Less Wrong Discussion

4 Post author: leplen 15 November 2014 02:53AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (6)

You are viewing a single comment's thread.

Comment author: Gunnar_Zarncke 15 November 2014 10:35:41AM *  1 point [-]

The goal would be to have a sufficient representation of human values using as dumb a machine as possible. This putative value-learning machine could be dumb in the way that Deep Blue was dumb, by being a hyper-specialist in the problem domain of chess/learning human values and having very little optimization power outside of that domain. It could also be dumb in the way that evolution is dumb, obtaining satisfactory results more through an abundance of data and resources that through any particular brilliance.

It could be a module of a larger AI system whose action component queries this system for value judgements. Of course this decoupling poses some special additional challenges (some given below) but at least it factors the complexity of human value into a dumb box that has no inherent optimization capabilities thus simpifying the overall problem.

So what would an overall system using this dumb valuator box look like?

  • There is the dumb human value estimator (valuator).

  • We have an optimizer which tries to make best decisions given some representation of the world and value judgements thereupon. This is presumably where we want all the smarts and where the self-optimization process has the risk of kicking in.

  • Apparently there need to be some perception component which feeds current representations of the world into the optimizer (to base decisions on) and into the valuator (adds to its corpus presumably and adding specific valuable instances)

  • Presumably we need an action or output component (any output effectively means action because of the influence on humans it facilitiates). This is where the loop closes.

In a way the valuator now is a major factor of the utility function of the system. Major but not the only because the way the valuator is integrated and queried becomes part of the utility function. technical and structural aspects of overall setup become part of the aggregate utility function.

The additional challenges I (and probably most people here) immediately see are:

  • Simple technical aspects of how often and how fast the valuations are queried become implicit parts of how the overall system values the change of specific valuations over time (to use the example of the cat in front of the car: If you query the valuation of the cat only every minute you might still value the cat if it already dead (simplified)). This might cause instabilities or 'exploitable' misjudgements or other strange effects.

  • The smart AI sitting 'behind' this dumb valator now does everything to maximize value as seen by the valuator. Any anomaly or incompleteness of the corpus could be maximized and that is mostly for bad.

  • Because of the feedback via action and perfeption modules the smart AI has the possibility to influence what the valuator sees. Depending on if and how 'actions' by the AI are represented in the valuator this might allow the smart part to reduce the world seen by the valuator to a easily optimizable subset.

None of these problems are necessarily unsolvable. I'm definitely for trying to factor the problem into sub-problems.I like this approach. I'm especially curious how to model the actions of the smart part as entities to be valued by the valuator (this indirectly acts as a filter between optimizer and output).

This is probably worth doing.