Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

An argument against indirect normativity

1 Post author: cousin_it 24 July 2013 06:35PM

I think I've found a new argument, which I'll call X, against Paul Christiano's "indirect normativity" approach to FAI goals. I just discussed X with Paul, who agreed that it's serious.

This post won't describe X in detail because it's based on basilisks, which are a forbidden topic on LW, and I respect Eliezer's requests despite sometimes disagreeing with them. If you understand Paul's idea and understand basilisks, figuring out X should take you about five minutes (there's only one obvious way to combine the two ideas), so you might as well do it now. If you decide to discuss X here, please try to follow the spirit of LW policy.

In conclusion, I'd like to ask Eliezer to rethink his position on secrecy. If more LWers understood basilisks, somebody might have come up with X earlier.

Comments (25)

Comment author: 9eB1 24 July 2013 06:40:15PM 9 points [-]

This is a reference for indirect normativity and its associated Less Wrong discussion.

Comment author: cousin_it 24 July 2013 06:42:22PM *  0 points [-]

Yep. One of my comments in the discussion thread describes about half of X, I didn't figure out the other half until today, and others didn't realize the problem back then.

Comment author: Qiaochu_Yuan 25 July 2013 08:09:06PM 7 points [-]

For future reference, whenever anybody says "I've got a secret" I strongly desire to learn that secret, especially if it's prefaced with "ooh, and the secret is potentially dangerous" (because "dangerous" translates to "cool"), and I expect that I'm not alone in this.

Comment author: cousin_it 25 July 2013 08:20:09PM *  1 point [-]

Yeah, I feel the same way. Fortunately, this particular secret is not hard to figure out once you know it's there, several LWers have done so already. Knowing your work on LW, I would expect you of all people to solve it in about 30 seconds. That's assuming you already understand UDT, indirect normativity, and Roko's basilisk. If you don't, it's easy to find explanations online. If you understand the prerequisites but are still stumped, PM me and I'll explain.

Comment author: Eliezer_Yudkowsky 24 July 2013 11:49:54PM 12 points [-]

I don't believe that Paul's approach to indirect normativity is on the right track. I also have no idea which of the possible problems you might be talking about. PM me. I'd put something a very high probability that it's either not a problem, or that I thought of it years ago.

Yes, blackmail can enable a remote attacker to root your AI if your AI was not designed with this in mind and does not have a no-blackmail equilibrium (which nobody knows how to describe yet). This is true for any AI that ends up with a logical decision theory, indirectly normative or otherwise, or also CDT AIs which encounter other agents which can make credible precommitments. I figured that out I don't even recall how long ago (remember: I'm the guy who first wrote down an equation for that issue; also I wouldn't be bothering with TDT at all if it wasn't relevant to some sort of existential risk). Didn't talk about it at the time for obvious reasons. The existence of N fiddly little issues like this, any one of which can instantly kill you with zero warning if you haven't reasoned through something moderately complex in advance and without any advance observations to hit you over the head, is why I engage in sardonic laughter whenever someone suggests that the likes of current AGI developers would be able to handle the problem at their current level of caring. Anyway, MIRI workshops are actively working on advancing our understanding of blackmail to the point where we can eventually derive a robust no-blackmail equilibrium, which is all that anyone can or should be doing AFAICT.

Comment author: cousin_it 25 July 2013 12:06:20AM *  2 points [-]

PM sent.

I agree that solving blackmail in general would make things easier, and it's good that MIRI is working on this.

Comment author: Vladimir_Nesov 24 July 2013 11:42:15PM *  2 points [-]

Is this relevant for my variant of "indirect normativity" (i.e. allowing human-designed WBEs, no conceptually confusing tricks in constructing the goal definition)?

(I'm generally skeptical about anything that depends on the concept of "blackmail" distinct from general bargaining, as it seems to be hard to formulate the distinction. It seems to be mostly the affect of being bargained against really unfairly, which might happen with a smart opponent if you are bad at bargaining and are vulnerable to accepting unfair deals, so the solution appears to be to figure out how to bargain well, and avoid bargaining against strong opponents until you do figure that out.)

Comment author: cousin_it 25 July 2013 12:13:57AM 0 points [-]

Yeah, I think it's relevant for your variant as well.

Comment author: Vladimir_Nesov 25 July 2013 12:28:35AM *  0 points [-]

I don't see how something like this could be a natural problem in my setting, it all seems to depend on how the goal definition (i.e. WBE research team evaluated by the outer AGI) think such issues through, e.g. make sure that they don't participate in any bargaining when they are not ready, at human level or using poorly-understood tools. Whatever problem you can notice now, they could also notice while being evaluated by the outer AGI, if evaluation of the goal definition does happen, so critical issues for the goal definition are things that can't be solved at all. It's more plausible to get decision theory in the goal-evaluating AGI wrong so that it can itself lose bargaining games and end up effectively abandoning goal evaluation, which is a clue in favor of it being important to understand bargaining/blackmail pre-AGI. PM/email me?

Comment author: cousin_it 25 July 2013 01:00:39AM *  1 point [-]

I agree that if the WBE team (who already know that a powerful AI exists and that they're in a simulation) can resist all blackmail, then the problem goes away.

Comment author: shminux 24 July 2013 07:06:21PM 2 points [-]

Surely EY won't mind if you put a trigger warning and rot13 the argument. Or post it on the LW subreddit.

Comment author: ygert 25 July 2013 08:36:05PM *  2 points [-]

It's really rather confusing and painful to have a discussion without talking about what you are discussion, using only veiled words, indirect hints, and private messages. May I suggest that instead we move the discussion to the uncensored thread if we are going to discuss it at all?

I believe that the whole point of the uncensored thread is to provide a forum for the discussion of topics, such as basilisks, that Eliezer is uncomfortable with us discussing on the main site.

Comment deleted 25 July 2013 09:44:26PM *  [-]
Comment author: ygert 25 July 2013 10:41:48PM *  0 points [-]

How is the location of the discussion of any relevance? Discussing it over there is no harder then anywhere else, and is in fact even easier, seeing as one wouldn't need to go through the whole dance of PMing one another.

And the exact purpose of that thread was for discussions like this one. If discussions like this one are not held there, why does that thread even exist?

Comment author: wedrifid 25 July 2013 12:35:50PM 1 point [-]

If you understand Paul's idea and understand basilisks, figuring out X should take you about five minutes (there's only one obvious way to combine the two ideas), so you might as well do it now.

I agree that X (as well as some related non-basilisk issues) are weaknesses of the indirect normative approach.

In conclusion, I'd like to ask Eliezer to rethink his position on secrecy. If more LWers understood basilisks, somebody might have come up with X earlier.

Unless I am mistaken about what you mean by X they did come up with X earlier. They just couldn't speak of it here at risk of provoking Yudkowskian Hysterics.

Comment author: cousin_it 25 July 2013 01:03:48PM 0 points [-]

Can you tell me who specifically came up with X and when? PM if you like.

Comment author: wedrifid 25 July 2013 07:15:22PM 0 points [-]

Can you tell me who specifically came up with X and when? PM if you like.

Just me specifically. But I had better put the disclaimer that I considered X' that seems to fit the censored context, assuming that I have correctly mapped Paul's 'indirect normativity' to my informal conception of the context. I am of course curious as to what specifically your X is. I am wary of assuming I correctly understand you---people misunderstand each other easily even when the actual arguments are overt. If you have written it up already would you mind PMing it?

Comment author: cousin_it 25 July 2013 08:00:50PM 0 points [-]

PM sent.

Comment author: Matt_Simpson 24 July 2013 08:47:00PM *  1 point [-]

I think I understand X, and it seems like a legitimate problem, but the comment I think you're referring to here seems to contain (nearly) all of X and not just half of it. So I'm confused and think I don't completely understand X.

Edit: I think I found the missing part of X. Ouch.

Comment author: cousin_it 24 July 2013 09:18:45PM 0 points [-]

Yeah. The idea came when I was lying in a hammock half asleep after dinner, it really woke me up :-) Now I wonder what approach could overcome such problems, even in principle.

Comment author: Matt_Simpson 24 July 2013 10:41:51PM 0 points [-]

If the basilisk is correct* it seems any indirect approach is doomed, but I don't see how it prevents a direct approach. But that has it's own set of probably-insurmountable problems, I'd wager.

* I remain highly uncertain about that, but it's not something I can claim to have a good grasp on or to have thought a lot about.

Comment author: Karl 24 July 2013 11:14:21PM 0 points [-]

If the NBC (No Blackmail Conjecture) is correct, then that shouldn't be a problem.

Comment author: cousin_it 24 July 2013 11:16:34PM 1 point [-]

Can you state the conjecture or link to a description?

Comment author: Karl 24 July 2013 11:39:03PM *  3 points [-]

By that term I simply mean Eliezer's idea that the correct decision theory ought to use a maximization vantage points with a no-blackmail equilibrium.

Comment author: cousin_it 26 July 2013 08:28:01AM *  1 point [-]

Maybe a more scary question isn't whether we can stop our AIs from blackmailing us, but whether we want to. If the AI has an opportunity to blackmail Alice for a dollar to save Bob from some suffering, do we want the AI to do that, or let Bob suffer? Eliezer seems to think that we obviously don't want our FAI to use certain tactics, but I'm not sure why he thinks that.

Comment author: Manfred 24 July 2013 07:55:23PM *  0 points [-]

Hm, I don't see it. Unless you're referring to the case where the humans aren't smart enough to abide by something like UDT. Maybe you estimated it as too obvious :)

EDIT: okay, I see a more general version now. But I think it's speculative enough that it's actually not the biggest problem with Paul's particular proposal. So it's probably still not what you expected me to think of :D