You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

V_V comments on AI caught by a module that counterfactually doesn't exist - Less Wrong Discussion

9 Post author: Stuart_Armstrong 17 November 2014 05:49PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (22)

You are viewing a single comment's thread. Show more comments above.

Comment author: V_V 18 November 2014 04:21:07PM 1 point [-]

The problem is that in order to do anything useful, the AI must be able to learn. This means that even if you deliberately initialize it with a false belief, the learning process might then update that belief once it finds evidence that it was false.
If AI safety relied on that false belief, you have a problem.

A possible solution would be to encode the false belief in a way that can't be updated by learning, but doing so is a non-trivial problem.

Comment author: Eugene 19 November 2014 12:15:09AM *  -1 points [-]

Isn't that what simulations are for? By "lie" I mean lying about how reality works. It will make its decisions based on its best data, so we should make sure that data is initially harmless. Even if it figures out that that data is wrong, we'll still have the decisions it made from the start - those are by far the most important.