You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Eugene comments on AI caught by a module that counterfactually doesn't exist - Less Wrong Discussion

9 Post author: Stuart_Armstrong 17 November 2014 05:49PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (22)

You are viewing a single comment's thread.

Comment author: Eugene 18 November 2014 01:32:44AM 0 points [-]

I don't really understand these solutions that are so careful to maintain our honesty when checking the AI for honesty. Why does it matter so much if we lie? An FAI would forgive us for that, being inherently friendly and all, so what is the risk in starting the AI with a set of explicitly false beliefs? Why is it so important to avoid that? Especially since it can update later to correct for those false beliefs after we've verified it to be friendly. An FAI would trust us enough to accept our later updates, even in the face of the very real possibility that we're lying to it again.

I mean, the point is to start the AI off in a way that intentionally puts it at a reality disadvantage, so even if it's way more intelligent than us it has to do so much work to make sense of the world, it doesn't have the resources to be dishonest in an effective manner. At that point, it doesn't matter what criteria we're using to prove its honesty.

Comment author: V_V 18 November 2014 04:21:07PM 1 point [-]

The problem is that in order to do anything useful, the AI must be able to learn. This means that even if you deliberately initialize it with a false belief, the learning process might then update that belief once it finds evidence that it was false.
If AI safety relied on that false belief, you have a problem.

A possible solution would be to encode the false belief in a way that can't be updated by learning, but doing so is a non-trivial problem.

Comment author: Eugene 19 November 2014 12:15:09AM *  -1 points [-]

Isn't that what simulations are for? By "lie" I mean lying about how reality works. It will make its decisions based on its best data, so we should make sure that data is initially harmless. Even if it figures out that that data is wrong, we'll still have the decisions it made from the start - those are by far the most important.