Jack comments on What can you do with an Unfriendly AI? - Less Wrong

16 Post author: paulfchristiano 20 December 2010 08:28PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (127)

You are viewing a single comment's thread. Show more comments above.

Comment author: Jack 21 December 2010 12:50:33AM 0 points [-]

If I understand the post correctly for each genie the only information that ever gets is a yes or no. A no results in termination, yes results in release from the lamp into the Netherworld. Any coordination would require at least one genie to defect from this agreement, producing a no when the real answer was yes. The punishment for any negative is death meaning the genie would have to be willing to die for the coordination, which by stipulation it isn't willing to do. In other words, if the best they can do by coordinating is destroy human society, but genies aren't willing to destroy themselves to destroy society, I don't see how TDT gets them out of the this coordination problem.

Comment author: Vladimir_Nesov 21 December 2010 01:05:11AM 1 point [-]

If any risk of being discovered as giving incorrect information is unacceptable, then you should just ask your questions in plain English, as any lie the genie will tell you generally increases the risk of punishment. (It will also minimize the risk of punishment by increasing believability of its answers at the expense of their truth, but only where it has lies that are more believable than the actual truth.)

Comment author: Jack 21 December 2010 01:27:17AM *  2 points [-]

In my understanding of the post (which doesn't seem to be what everyone else took away) the genies are destroyed whenever they fail to to provide a theorem that meets our constraints even when those constraints make finding a theorem impossible. The genie that gets tasked with finding a one bit proof of the Riemann hypothesis is just screwed. You might have to have a third form of punishment (Edited: there are three incentive options, two could be described as punishments) worse than death to prevent mass striking. Anyway, the point is each genie only provides one bit of information and if it either lies about finding a theorem or fails to find one it dies. So it isn't a matter of risking punishment: it will die if it fails to give a solution when it has one that meets the constraints. The only way out is to share a theorem if there is one.

Comment author: jimrandomh 21 December 2010 01:29:00AM 0 points [-]

You might have to have a third form of punishment worse than death to prevent mass striking.

No, you don't. Proof checking is only O(n) and the size of the AI's output has a small bound.

Comment author: Jack 21 December 2010 01:32:11AM *  0 points [-]

No, you don't. Proof checking is only O(n) and the size of the AI's output has a small bound.

If every AI makes us check their solution how is this method better than a brute force search?

(Or be more explicit about where I'm being stupid:-)

ETA: Oh, I see. We only need to make one check for each theorem length. Still seems like it would be worth while to punish false positives more though. But you're right, it isn't strictly needed.

Comment author: jimrandomh 21 December 2010 01:53:57AM *  1 point [-]

Checking a single proof is O(n) (where n is the length of the proof). There are 2^n proofs of length n, so brute force search is O(sum(i=1..n)(i*2^i)) = O(n*2^n). Note that we throw away constant factors, and any terms whose effect is smaller than a constant factor, when using big-oh notation.

Using an UFAI this way requires 2n invocations of the AI (n to get the length, and another n to get the bits), plus an O(n) verification step after each. All of those verification steps add up to O(n^2) - easily manageable with current computers. The 2n invocations of the AI, on the other hand, might not be; we never specified the AI's computational complexity, but it's almost certainly much worse than n^2.

Comment author: paulfchristiano 21 December 2010 01:55:51AM *  0 points [-]

Its actually O(n log n) if you use a cleverer approach, which is fortunate (to get the length in log n you do a binary search. To get it down to 1 thing of that length in log n you do a binary search on the number of psuedorandom constraints which can be simultaneously satisfied).

Comment author: jimrandomh 21 December 2010 01:57:44AM 0 points [-]

I'm confused. Which thing are you saying is O(n log n)?

Comment author: paulfchristiano 21 December 2010 02:01:00AM 1 point [-]

You use O(log n) genies in the process, each of which you must spend O(n) time verifying. Its an O(log n) slowdown from just getting the answer directly.

Comment author: jimrandomh 21 December 2010 02:10:48AM 0 points [-]

I see how you can use log(n) genies to get the length, but I don't think you can then use log(n) genies to get the lexicographically first proof of length n. That only provides enough bits of information to uniquely identify a proof if there are at most O(2^(log n)) = O(n) of them, which is not necessarily the case.

Comment author: Jack 21 December 2010 01:55:26AM 0 points [-]

Appreciated.

Comment author: jimrandomh 21 December 2010 01:08:46AM 3 points [-]

It is not incorrect information that you are trying to prevent! The proof checker ensures that the result you get is correct. What you want to prevent is correct information which has all of its degrees of freedom besides correctness optimized against you. You do that by taking away all degrees of freedom besides correctness, using this procedure.

Comment author: Vladimir_Nesov 21 December 2010 01:11:58AM *  2 points [-]

The deception in the relevant sense here is when the genies conspire to give you an answer which retains all those degrees of freedom, and knowable incorrectness of an answer consists in someone noticing that the genies could, in fact, choose a shorter answer.

Comment author: shokwave 21 December 2010 12:58:10AM 0 points [-]

Your comment made me re-read the post more carefully. I had on first reading assumed that a truthful answer was rewarded (whether yes or no) and a lying answer was punished. If a yes is rewarded and a no is punished, and our AI genies are so afraid of termination that they would never give a 'no' where they could give a 'yes', why wouldn't they all give us 'yes'?

Comment author: paulfchristiano 21 December 2010 01:33:17AM 1 point [-]

Recall the filter between the AI and the world. The AI doesn't directly say "yes/no." The AI gives a proof to the filter which then says either "yes, the AI found a proof" or "no, the AI didn't." So they do all say 'yes' if they can. I will modify the post to be more clear.

Comment author: Jack 21 December 2010 01:18:26AM 0 points [-]

As I understand it this method is designed to work for constraint satisfaction problems -where we can easily detect false positives. You're right that a possibility is that all the genies that can't find solutions go on strike just to make us check all the yes's (which would make this process no better than a brute force search, right?), maybe there needs to be a second punishment that is worse than death to give them an incentive not to lie.

Comment author: paulfchristiano 21 December 2010 01:46:41AM 3 points [-]

A genie who can't find a solution has literally no agency. There is nothing he can say to the filter which will cause it to say "yes," because the filter itself checks to see if the genie has given a proof. If the genie can't find a proof, the filter will always say "no." I don't quite know what going on strike would entail, but certainly if all of the genies who can't find solutions collectively have 0 influence in the world, we don't care if they strike.

Comment author: Jack 21 December 2010 01:52:00AM 0 points [-]

Okay, that makes sense. What about computation time limits? A genie that knows it can't give an answer would wait as long as possible before saying anything.

Comment author: paulfchristiano 21 December 2010 01:54:06AM 1 point [-]

I mention timing in the post; the AI gets some fixed interval, at the end of which the filter outputs whether or not they have a proof. If you can't change what the filter says, then you don't get to affect the world.