4 Deception detection machines

by [anonymous]

5th Sep 2014

1 min read

4

Assume you had access to:

1. a self-improving artificial general intelligence (AGI) of at least human-capability or better; and

2. a lie detector that the AGI is known to be susceptible to. A lie for this purpose is any intentional deception.

The AI is not omnipotent. It is running on a physical machine in the real world with limited inputs. One of these inputs is mechanism by which you can ask it questions, and receive answers filtered through the lie detector. A little light on the box turns red if the machine was intentionally deceptive in any way, or green otherwise. As it happens the lie detector is probabilistic, but the chance of a false negative (lie escaping detection) is infinitesimal—1 in 10^40, let's say, but assume you could make it even smaller if you needed to. The chance of a false positive (non-deception being labeled a lie) is effectively zero.

What could you do with this setup?

EDIT: Some of the details above are provided simply to make the problem tractable. E.g. having the deception detector be probabilistic allows its operating requirements to be less than the AI itself. If the AI is running on a supercomputer, it seems intractable to require a super-supercomputer for the deception detector.

EDIT2: Deception in this case simply means falsifying or omitting information. Loosely speaking, if the AI falsifies the computational traces fed into the deception detector, or leaves out any information in its response, this is detectable. Presumably the UFAI could output a very nice, very convincing plan of action with very tiny fine print hidden somewhere along the lines of "PS: This is all a lie! You implement this and I'm going to turn you all into meaty paperclips. Haha!" and it would get past the deception detector. But I would rather discussion not get sidetracked by such legalistic, nitpicky scenarios. Assume the humans involved are competent, conservative, responsible people who have setup institutional safeguards to prevent hasty action and make sure that output is sufficiently analyzed down to the very last digital bit by a competent, international team of highly rational people before being acted upon.

Personal Blog

4

New Comment

Rendering 0/37 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 10:07 PM

Moderation Log

Curated and popular this week

37Comments

Deception detection machines — LessWrong

4 Deception detection machines

by [anonymous]

5th Sep 2014

1 min read

4

Assume you had access to:

1. a self-improving artificial general intelligence (AGI) of at least human-capability or better; and

2. a lie detector that the AGI is known to be susceptible to. A lie for this purpose is any intentional deception.

What could you do with this setup?

Personal Blog

4

New Comment

Rendering 0/37 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 10:07 PM

Moderation Log

Curated and popular this week

37Comments

Comment Permalink

Slider12y10

The assumed properties seem to be in conflict. The most likely thing is that you have misinterpreted what the device does which is way less than detecting deception. Therefore answering the hypothetical won't apply as is to the machine. Since it seems you don't want to spell out the details you need to get more spesific and believeable abstraction for what it does.

What I trivially guess from this comment it works by detecting a area of the AI where it answers the question to itself and then compares whether it differs to what it says to you. This kind of design would light the lamp when the AI honestly takes into account the way the answer hearer structures information that differs from the AIs internal representations constituting a false positive if interpreted as detecting deception. It would also produce a lot of false negatives when the answer receiver would differ in interpretation of the internal representations of the AI which is highly likely as the AI and the answer hearer propably don't share a cognitive archetecture.

The above description doesn't contain much technical info on how to pull that what is claimed off but opens up the principles on how the output is produced much more than a summary "detects deception". You would need to provide a similar level description before I will bother to continue entertain the hypothetical.

[anonymous]12y20

For certain categories of AGI designs, the detector is able to validate from computaitonal trace logs and electrical meters that the answer given was selected using utility function F, no part of F was left out, and the logs weren't faked or modified. The response includes both the answer from the machine and this summarized utility function extracted by the detector. There can be automatic detection of tampering with the logs which are fed into the detector, but other kinds of deceoption detection require some tool-assisted inspection of the utility function.

Sorry I won't post details beyond that, as that would violate LW's policy about discussing AGI implementation details.

See in context