You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Mark_Friedenbach comments on Deception detection machines - Less Wrong Discussion

2 [deleted] 05 September 2014 08:15PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (37)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 05 September 2014 10:05:16PM 1 point [-]

Not sure what allowing a small chance of false negatives does: you presumably could just repeat all your questions?

In this case the result would or could be the same, so long as the AI didn't sufficiently update its internal state inbetween. but the detail isn't important; please ignore it. I include it because it makes the device tractable. To achieve perfect detection would require a more powerful computer than the AI being analyzed, which seems impractical. But achieving even infinitesimal error rates appears to be doable (I had a specific construction in mind when writing this post).

More substantially, I don't know how easy 'deception' would be to define - any presentation of information would be selective.

Deception in this case means giving false or incomplete descriptions of its thought processes. It's okay for the AI to think "how do I present this in a way that the human will accept it?" only so long as the AI tells the human it had that thought. E.g. you ask "why do you recommend this action?" and the answer you get is anything other than the actual, 100% complete justification of both the specified choice and its alternatives, and a calculation showing higher expected utility for the chosen action, as well as a whole slew of meta-information such a description of the search strategy and cutoff thresholds for giving up on generating altneratives, which cached computations were available for use, etc. If any of this is falsified, or a single detail ommitted, the red light goes off.

Comment author: DavidAgain 06 September 2014 08:03:53AM 1 point [-]

To me it sounds like the full information provided to avoid being incomplete would be so immense and complex that you'd need another AI just to interpret that! But I may be wrong.

Comment author: [deleted] 07 September 2014 03:50:27PM 1 point [-]

The data is immense, but the deeception detector's job involves only applying simple rules. It's effectively a compressor that compresses the AI logs down to "utility functions + search parameters" which is small enough to be inspected directly.