This question is partly motivated by observing recent discussions about corrigibility and wondering to what extent the people involved have thought about how their results might be used.
If there existed practically implementable ways to make AGIs corrigible to arbitrary principals, that would enable a wide range of actors to eventually control powerful AGIs. Whether that would be net good or bad on expectation would depend on the values/morality of the principals of such AGIs.
Currently it seems highly unclear what kinds of people we should expect to end up in control of corrigible ASIs, if corrigibility were practically feasible.
What (crucial) considerations should one take into account, when deciding whether to publish---or with whom to privately share---various kinds of corrigibility-related results?
Yes. Current AI policy is like people in a crowded room fighting over who gets to hold a bomb. It's more important to defuse the bomb than it is to prevent someone you dislike from holding it.
That said, we're currently not near any satisfactory solutions to corrigibility. And I do think it would be better for the world if were easier (by some combination of technical factors and societal factors) to build AI that works for the good of all humanity than to build equally-smart AI that follows the orders of a single person. So yes, we should focus research and policy effort toward making that happen, if we can.
And if we were in that world already, then I agree releasing all the technical details of an AI that follows the orders of a single person would be bad.
To me, those odds each seem optimistic by a factor of about 1000, but ~reasonable relative to each other.
(I don't see any low-cost way to find out why we disagree so strongly, though. Moving on, I guess.)
Makes sense (given your low odds for bad outcomes).
Do you also care about minds that are not you, though? Do you expect most future minds/persons that are brought into existence to have nice lives, if (say) Donald "Grab Them By The Pussy" Trump became god-emperor (and was the one deciding what persons/minds get to exist)?