Yeah, I think it's relevant for your variant as well.
I don't see how something like this could be a natural problem in my setting, it all seems to depend on how the goal definition (i.e. WBE research team evaluated by the outer AGI) think such issues through, e.g. make sure that they don't participate in any bargaining when they are not ready, at human level or using poorly-understood tools. Whatever problem you can notice now, they could also notice while being evaluated by the outer AGI, if evaluation of the goal definition does happen, so critical issues for the goal definition are things that can't be so...
I think I've found a new argument, which I'll call X, against Paul Christiano's "indirect normativity" approach to FAI goals. I just discussed X with Paul, who agreed that it's serious.
This post won't describe X in detail because it's based on basilisks, which are a forbidden topic on LW, and I respect Eliezer's requests despite sometimes disagreeing with them. If you understand Paul's idea and understand basilisks, figuring out X should take you about five minutes (there's only one obvious way to combine the two ideas), so you might as well do it now. If you decide to discuss X here, please try to follow the spirit of LW policy.
In conclusion, I'd like to ask Eliezer to rethink his position on secrecy. If more LWers understood basilisks, somebody might have come up with X earlier.