You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Bikura comments on Open Thread, Jul. 13 - Jul. 19, 2015 - Less Wrong Discussion

5 Post author: MrMind 13 July 2015 06:55AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (297)

You are viewing a single comment's thread.

Comment author: [deleted] 13 July 2015 11:46:50PM 1 point [-]

What are your thoughts on this AI failure mode: Assume an AI works by rewarding itself when it improves its model of the world (which is roughly Schmidhuber’s curiosity-driven reinforcement learning approach to AI), however, the AI figures out that it can also receive reward if it turns this sort of learning on its head: Instead of changing a model to make it better fit the world, the AI starts changing the world to make it better fit its model.

Has this been considered before? Can we see this occurring in natural intelligence?

Comment author: shminux 14 July 2015 12:35:58AM 1 point [-]

Instead of changing a model to make it better fit the world, the AI starts changing the world to make it better fit its model.

Isn't it basically the definition of agency? Steering the world state toward the one you want?

Comment author: Viliam 14 July 2015 09:06:33AM 1 point [-]

The problem is that in this specific case "the world state you want" is more or less defined as something that is easy to model (because you are rewarded when your models for the world), which may give you incentives to destroy exceptional complicated things... such as life.

Comment author: [deleted] 14 July 2015 01:19:02AM 1 point [-]

It would be a form of agency but probably not the definition of it. In the curiosity-driven approach the agent is thought to choose actions such that it can gain reward form learning new things about the world, thereby compressing the knowledge about the world more (possibly overlooking that the reward could also be gained from making the world better fit the current model of it).

The best illustrating example I can think of right now is an AI that falsely assumes that the Earth is spherical and it decides to flatten the equator instead of updating its model.

Comment author: Vaniver 14 July 2015 04:22:06PM 1 point [-]

Instead of changing a model to make it better fit the world, the AI starts changing the world to make it better fit its model.

One might call this 'cleaning' or 'homogenizing' the world; instead of trying to get better at predicting the variation, you try to reduce the variation so that prediction is easier.

I don't think I've seen much mathematical work on this, and very little that discusses it as an AI failure mode. Most of the discussions I see of it as a failure mode have to do with markets, globalization, agriculture, and pandemic risk.