Wei_Dai comments on The AI in a box boxes you - Less Wrong

102 Post author: Stuart_Armstrong 02 February 2010 10:10AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (378)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 02 February 2010 07:27:00PM 41 points [-]

As I always press the "Reset" button in situations like this, I will never find myself in such a situation.

EDIT: Just to be clear, the idea is not that I quickly shut off the AI before it can torture simulated Eliezers; it could have already done so in the past, as Wei Dai points out below. Rather, because in this situation I immediately perform an action detrimental to the AI (switching it off), any AI that knows me well enough to simulate me knows that there's no point in making or carrying out such a threat.

Comment author: Wei_Dai 04 February 2010 09:47:21AM 3 points [-]

As we've discussed in the past, I think this is the outcome we hope TDT/UDT would give, but it's still technically an unsolved problem.

Also, it seems to me that being less intelligent in this case is a negotiation advantage, because you can make your precommitment credible to the AI (since it can simulate you) but the AI can't make its precommitment credible to you (since you can't simulate it). Again I've brought this up before in a theoretical way (in that big thread about game theory with UDT agents), but this seems to be a really good example of it.

Comment author: Vladimir_Nesov 05 February 2010 01:27:59AM *  4 points [-]

Also, it seems to me that being less intelligent in this case is a negotiation advantage, because you can make your precommitment credible to the AI (since it can simulate you) but the AI can't make its precommitment credible to you (since you can't simulate it).

A precommitment is a provable property of a program, so AI, if on a well-defined substrate, can give you a formal proof of having a required property. Most stuff you can learn about things (including the consequences of your own (future) actions -- how do you run faster than time?) is through efficient inference algorithms (as in type inference), not "simulation". Proofs don't, in general, care about the amount of stuff, if it's organized and presented appropriately for the ease of analysis.

Comment author: Wei_Dai 05 February 2010 04:37:24AM *  8 points [-]

Surely most humans would be too dumb to understand such a proof? And even if you could understand it, how does the AI convince you that it doesn't contain a deliberate flaw that you aren't smart enough to find? Or even better, you can just refuse to look at the proof. How does the AI make its precommitment credible to you if you don't look at the proof?

EDIT: I realized that the last two sentences are not an advantage of being dumb, or human, since AIs can do the same thing. This seems like a (separate) big puzzle to me: why would a human, or AI, do the work necessary to verify the opponent's precommitment, when it would be better off if the opponent couldn't precommit?

EDIT2: Sorry, forgot to say that you have a good point about simulation not necessary for verifying precommitment.

Comment author: Eliezer_Yudkowsky 05 February 2010 06:26:02AM 7 points [-]

why would a human, or AI, do the work necessary to verify the opponent's precommitment, when it would be better off if the opponent couldn't precommit?

Because the AI has already precommitted to go ahead and carry through the threat anyway if you refuse to inspect its code.

Comment author: Wei_Dai 05 February 2010 04:21:29PM 5 points [-]

Ok, if I believe that, then I would inspect its code. But how did I end up with that belief, instead of its opposite, namely that the AI has not already precommitted to go ahead and carry through the threat anyway if I refuse to inspect its code? By what causal mechanism, or chain of reasoning, did I arrive at that belief? (If the explanation is different depending on whether I'm a human or an AI, I'd appreciate both.)

Comment author: loqi 05 February 2010 05:04:16AM 1 point [-]

Do you mean too dumb to understand the formal definitions involved? Surely the AI could cook up completely mechanical proofs verifiable by whichever independently-trusted proof checkers you care to name.

I'm not aware of any compulsory verifiers, so your latter point stands.

Comment author: Wei_Dai 05 February 2010 05:31:00AM *  3 points [-]

I mean if you take a random person off the street, he couldn't possibly understand the AI's proof, or know how to build a trustworthy proof checker. Even the smartest human might not be able to build a proof checker that doesn't contain a flaw that the AI can exploit. I think there is still something to my "dumbness is a possible negotiation advantage" puzzle.

Comment author: aausch 05 February 2010 05:34:44AM 1 point [-]

The Map is not the Territory.

Comment author: loqi 05 February 2010 07:16:29AM -1 points [-]

Far out.

Comment author: aausch 05 February 2010 09:11:40AM 0 points [-]

Understanding the formal definitions involved is not enough. Humans have to be smart enough to independently verify that they map to the actual implementation.

Going up a meta-level doesn't simplify the problem, in this case - the intelligence capability required to verify the proof is the same as the order of magnitude of intelligence in the AI.

I believe that, in this case, "dumb" is fully general. No human-understandable proof checkers would be powerful enough to reliably check the AI's proof.

Comment author: loqi 05 February 2010 06:49:59PM *  3 points [-]

Understanding the formal definitions involved is not enough. Humans have to be smart enough to independently verify that they map to the actual implementation.

This is basically what I mean by "understanding" them. Otherwise, what's to understand? Would you claim that you "understand set theory" because you've memorized the axioms of ZFC?

I believe that, in this case, "dumb" is fully general. No human-understandable proof checkers would be powerful enough to reliably check the AI's proof.

This intuition is very alien to me. Can you explain why you believe this? Proof checkers built up from relatively simple trusted kernels can verify extremely large and complex proofs. Since the AI's goal is for the human to understand the proof, it seems more like a test of the AI's ability to compile proofs down to easily machine-checkable forms than it is the human's ability to understand the originals. Understanding the definitions is the hard part.

Comment author: aausch 07 February 2010 10:30:12PM *  0 points [-]

This intuition is very alien to me. Can you explain why you believe this? Proof checkers built up from relatively simple trusted kernels can verify extremely large and complex proofs. Since the AI's goal is for the human to understand the proof, it seems more like a test of the AI's ability to compile proofs down to easily machine-checkable forms than it is the human's ability to understand the originals. Understanding the definitions is the hard part.

A different way to think about this that might help you see the problem from my point of view, is to think of proof checkers as checking the validity of proofs within a given margin of error, and within a range of (implicit) assumptions. How accurate does a proof checker have to be - how far do you have to mess with bult in assumptions for proof checkers (or any human-built tool) before they can no longer be thought of as valid or relevant? If you assume a machine which doubles both its complexity and its understanding of the universe at sub-millisecond intervals, how long before it will find the bugs in any proof checker you will pit it against?