paulfchristiano comments on What can you do with an Unfriendly AI? - Less Wrong

16 Post author: paulfchristiano 20 December 2010 08:28PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (127)

You are viewing a single comment's thread. Show more comments above.

Comment author: paulfchristiano 20 December 2010 09:38:52PM 0 points [-]

There is absolutely no sense in which this scheme is security by obscurity. My claim is that the genie will respect my wishes even though he knows exactly what I am doing, because he values my generosity right now more than the promise of taking over the world later.

Comment author: Vladimir_Nesov 20 December 2010 09:44:18PM 2 points [-]

Again, if your genie is already incentivized to be honest, in what sense is your scheme with all its bells and whistles better than asking for the shortest answer the genie can find, in plain English?

Comment author: paulfchristiano 20 December 2010 10:27:22PM 2 points [-]

It is not magically incentivized to be honest. It is incentivized to be honest because each query is constructed precisely such that an honest answer is the rational thing to do, under relatively weak assumptions about its utility function. If you ask in plain English, you would actually need magic to produce the right incentives.

Comment author: Vladimir_Nesov 20 December 2010 10:31:45PM 0 points [-]

It is not magically incentivized to be honest. It is incentivized to be honest because each query is constructed precisely such that an honest answer is the rational thing to do, under relatively weak assumptions about its utility function. If you ask in plain English, you would actually need magic to produce the right incentives.

My question is about the difference. Why exactly is the plain question different from your scheme?

(Clearly your position is that your scheme works, and therefore "doesn't assume any magic", while the absence of your scheme doesn't, and so "requires magic in order to work". You haven't told me anything I don't already know, so it doesn't help.)

Comment author: paulfchristiano 21 December 2010 01:10:49AM 0 points [-]

Here is the argument in the post more concisely. Hopefully this helps:

It is impossible to lie and say "I was able to find a proof" by the construction of the verifier (if you claim you were able to find a proof, the verifier needs to see the proof to believe you.) So the only way you can lie is by saying "I was not able to find a proof" when you could have if you had really tried. So incentivizing the AI to be honest is precisely the same as incentivizing them to avoid admitting "I was not able to find a proof." Providing such an incentive is not trivial, but it is basically the easiest possible incentive to provide.

I know of no way to incentivize someone to answer the plain question easily just based on your ability to punish them or reward them when you choose to. Being able to punish them for lying involves being able to tell when they are lying.

Comment author: Vladimir_Nesov 21 December 2010 02:04:34AM -1 points [-]
Comment author: [deleted] 20 December 2010 10:41:46PM 0 points [-]

Are you saying you think Christiano's scheme is overkill? Presumably we don't have to sacrifice a virgin in order to summon a new genie, so it doesn't look expensive enough to matter.

Comment author: Vladimir_Nesov 20 December 2010 10:51:20PM *  1 point [-]

I'm saying that it's not clear why his scheme is supposed to add security, and it looks like it doesn't. If it does, we should understand why, and optimize that property instead of using the scheme straight away, and if it doesn't, there is no reason to use it. Either way, there is at least one more step to be made. (In this manner, it could work as raw material for new lines of thought where we run out of ideas, for example.)

Comment author: Perplexed 20 December 2010 10:11:52PM 0 points [-]

As I understand it, the genie is not incentivized to honest. It is incentivized to not get caught being dishonest. And the reason for the roundabout way of asking the question is to make the answer-channel bandwidth as narrow as possible.

Comment author: paulfchristiano 20 December 2010 10:30:23PM 2 points [-]

It is impossible to be dishonest by saying "yes," by construction. The genie is incentivized to say "yes' whenever possible, so it is disincentivized to be dishonest by saying "no." So the genie is incentivized to be honest, not just to avoid being called out for dishonesty.

Comment author: Vladimir_Nesov 20 December 2010 10:27:47PM 0 points [-]

As I understand it, the genie is not incentivized to honest. It is incentivized to not get caught being dishonest.

Since we care about the genie actually being honest, the technique can be thought about as a way of making it more likely that the genie is honest, with the threat of punishing dishonestly a component of that technique.