This question is partly motivated by observing recent discussions about corrigibility and wondering to what extent the people involved have thought about how their results might be used.
If there existed practically implementable ways to make AGIs corrigible to arbitrary principals, that would enable a wide range of actors to eventually control powerful AGIs. Whether that would be net good or bad on expectation would depend on the values/morality of the principals of such AGIs.
Currently it seems highly unclear what kinds of people we should expect to end up in control of corrigible ASIs, if corrigibility were practically feasible.
What (crucial) considerations should one take into account, when deciding whether to publish---or with whom to privately share---various kinds of corrigibility-related results?
I give the probability that some authority figure would use an order-following AI to get torturous revenge on me (probably for being part of a group they dislike) is quite slim. Maybe one in a few thousand, with more extreme suffering being less likely by a few more orders of magnitude? The probablility that they have me killed for instrumental reasons, or otherwise waste the value of the future by my lights, is mich higher - ten percent-ish, depends on my distribution over who's giving the orders. But this isn't any worse to me than being killed by an AI that wants to replace me with molecular smiley faces.