An argument against indirect normativity

cousin_it

I think I've found a new argument, which I'll call X, against Paul Christiano's "indirect normativity" approach to FAI goals. I just discussed X with Paul, who agreed that it's serious.

This post won't describe X in detail because it's based on basilisks, which are a forbidden topic on LW, and I respect Eliezer's requests despite sometimes disagreeing with them. If you understand Paul's idea and understand basilisks, figuring out X should take you about five minutes (there's only one obvious way to combine the two ideas), so you might as well do it now. If you decide to discuss X here, please try to follow the spirit of LW policy.

In conclusion, I'd like to ask Eliezer to rethink his position on secrecy. If more LWers understood basilisks, somebody might have come up with X earlier.

I think I've found a new argument, which I'll call X, against Paul Christiano's "indirect normativity" approach to FAI goals. I just discussed X with Paul, who agreed that it's serious.

In conclusion, I'd like to ask Eliezer to rethink his position on secrecy. If more LWers understood basilisks, somebody might have come up with X earlier.

I don't believe that Paul's approach to indirect normativity is on the right track. I also have no idea which of the possible problems you might be talking about. PM me. I'd put something a very high probability that it's either not a problem, or that I thought of it years ago.

Yes, blackmail can enable a remote attacker to root your AI if your AI was not designed with this in mind and does not have a no-blackmail equilibrium (which nobody knows how to describe yet). This is true for any AI that ends up with a logical decision theory, indirectly normative or otherwise, or also CDT AIs which encounter other agents which can make credible precommitments. I figured that out I don't even recall how long ago (remember: I'm the guy who first wrote down an equation for that issue; also I wouldn't be bothering with TDT at all if it wasn't relevant to some sort of existential risk). Didn't talk about it at the time for obvious reasons. The existence of N fiddly little issues like this, any one of which can instantly kill you with zero warning if you haven't reasoned through something moderately complex in advance and without any advance observations to hit you over the head, is why I engage in sardonic laughter whenever someone suggests that the likes of current AGI developers would be able to handle the problem at their current level of caring. Anyway, MIRI workshops are actively working on advancing our understanding of blackmail to the point where we can eventually derive a robust no-blackmail equilibrium, which is all that anyone can or should be doing AFAICT.

PM sent.

I agree that solving blackmail in general would make things easier, and it's good that MIRI is working on this.

5

An argument against indirect normativity

5

5

5

An argument against indirect normativity

5

5