This question is partly motivated by observing recent discussions about corrigibility and wondering to what extent the people involved have thought about how their results might be used.
If there existed practically implementable ways to make AGIs corrigible to arbitrary principals, that would enable a wide range of actors to eventually control powerful AGIs. Whether that would be net good or bad on expectation would depend on the values/morality of the principals of such AGIs.
Currently it seems highly unclear what kinds of people we should expect to end up in control of corrigible ASIs, if corrigibility were practically feasible.
What (crucial) considerations should one take into account, when deciding whether to publish---or with whom to privately share---various kinds of corrigibility-related results?
Taking a stab at answering my own question; an almost-certainly non-exhaustive list:
Would the results be applicable to deep-learning-based AGIs?[1] If I think not, how can I be confident they couldn't be made applicable?
Do the corrigibility results provide (indirect) insights into other aspects of engineering (rather than SGD'ing) AGIs?
How much weight one gives to avoiding x-risks vs s-risks.[2]
Who actually needs to know of the results? Would sharing the results with the whole Internet lead to better outcomes than (e.g.) sharing the results with a smaller number of safety-conscious researchers? (What does the cost-benefit analysis look like? Did I even do one?)
How optimistic (or pessimistic) one is about the common-good commitment (or corruptibility) of the people who one thinks might end up wielding corrigible AGIs.
Something like the True Name of corrigibility might at first glance seem applicable only to AIs of whose internals we have some meaningful understanding or control. ↩︎
If corrigibility were easily feasible, then at first glance, that would seem to reduce the probability of extinction (via unaligned AI), but increase the probability of astronomical suffering (under god-emperor Altman/Ratcliffe/Xi/Putin/...). ↩︎