You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

paulfchristiano comments on The AI That Pretends To Be Human - Less Wrong Discussion

1 Post author: Houshalter 02 February 2016 07:39PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (69)

You are viewing a single comment's thread. Show more comments above.

Comment author: paulfchristiano 03 February 2016 06:25:28PM 1 point [-]

The general lesson from steganography is that it is computationally easier to change a distribution in an important way than to detect such a change. In order to detect a change you need to consider all possible ways in which a distribution could be meaningfully altered, while in order to make a change you just have to choose one. From a theory perspective, this is a huge asymmetry that favors the an attacker.

This point doesn't seem directly relevant though, unless someone offers any good reason to actually include the non-imitation goal, rather than simply imitating the successful human trials. (Though there are more subtle reasons to care about problematic behavior that is neither penalized nor rewarded by your training scheme. It would be nicer to have positive pressure to do only those things you care about. So maybe the point ends up being relevant after all.)

Actually, in the scheme as you wrote it there is literally no reason to include this second goal. The distinguisher is already trying to distinguish the generator's behavior from [human conditioned on success], so the generator already has to succeed in order to win the game. But this doesn't introduce any potentially problematic optimization pressure, so it just seems better.