It occurs to me that a karma system (such as that used on this website) has the potential to be an adequate check against the unilateralist's curse as described by Bostrom et al. if we assume that the penalty applied to downvoted posts is sufficient to prevent the harm of the putative infohazard.
I think it provides feedback about whether a post was infohazardous or otherwise bad to post, but for many types of infohazard it pretty clearly doesn't prevent them from causing harm; doxxing, for example, is not so easily undone. Luckily most things are small and iterated, and people learn from the scores on each others' posts, so voting does significantly reduce the unilateralist's curse problem.
There can be circumstances where experts can see that something is an infohazard, while laypeople can't; in that case, voting only works if the experts explain their reasoning in addition to downvoting. Explaining one's reasoning under those circumstances looks very similar to trying to exercise a heckler's veto. In that case votes on the explanation are informative about whether the original post was an infohazard, but things get hard to interpret.
There's a bit of difference between action and veto depending on substitutability of alternatives. If a UN member vetos something, a separate coalition (or single member) can often take the action without sanction (and can veto censure), so the UN is less effective, but not actually blocking of the action.
Other kinds of acts are fully prevented by a veto.
This is relevant to friendly AI thinking, in the tradeoff of power and safety in an untrusted agent.
It occurs to me that a karma system (such as that used on this website) has the potential to be an adequate check against the unilateralist's curse
An assumption here is that people downvote infohazards on that basis. However, in fact we see that many communities have no problem sharing damaging and dangerous information - just look at Reddit.
The quoted sentence claims that karma systems are a check against the unilateralist's curse specifically, not infohazards in general, as is made explicit in the final sentence of that paragraph ("Conversely, while a net-upvoted post might still be infohazardous [...]").
I've been envisioning "unilateralist's curse" as referring to situations where the average error in individual agents' estimates of the value of the initiative (what I called E in the post, but Bostrom et al. calls the error d and says it's from a cdf F(d)) is zero, and the harm comes from the fact that the variance in error terms makes someone unilaterally act/veto when they shouldn't, in a way that could be corrected by "listening to their peers." If the community as a whole is systematically biased about the value of the initiative, that seems like a different, and harder, problem.
This seems basically right if the community of possible actors is the same as the community of voters assigning karma. If the community of voters is different from, or much larger than, the community of actors, you might still encounter the unilateralist's curse as seen from the perspective of the community of actors, especially if the latter is better-informed than the former.
I don't understand what work the term "Heckler's Veto" is doing here. In my understanding, "the heckler's veto" refers to situations where someone can prevent someone else from speaking by being loudly offended, either directly (by shouting them down) or indirectly (through laws or norms against "offensive" speech). A heckler's veto is one kind of veto, but I'm not sure what the value is in applying the term to vetoes in general.
This re-framing of the underlying statistical insight (the unilateral veto being "dual" to the unilateral act) seems relevant to its application to censorship: an author deciding to publish a blog post (even if other forum members think it's harmful) is in the position of taking unilateralist action—but so is a member of a board of pre-readers of whom any one has the power to censor the post (even if the other reviewers think it's fine).
I'm not sure you can call it a reframing when it's present fairly prominently in the original paper espousing the concept. But yes, if any number of pre-readers can unilaterally veto publication of a topic, then you might run into the unilateralist's curse. That doesn't mean pre-reading for info hazards is a bad idea: the pre-readers (including the authors) can simply take a vote to avoid unilateralist issues.
I occasionally see the "unilateralist's curse" invoked as a rationale for censorship in contexts where I am very suspicious that the actual reason is protecting some interest group's power. But if I'm alone in such suspicions, then maybe that means I'm just uniquely paranoid. To help sort out what's what, I consulted the paper by Nick Bostrom, Anders Sandberg, and Tom Douglas in which the term was coined.
The main argument (as the authors note under the keyword "winner's curse") is basically an application of regression to the mean: if N agents are deciding whether to do something on the basis of its true value V plus random error term E, then someone with a large positive E might end up doing the thing even if V is actually negative—and the problem gets worse for larger N.
Crucially, Bostrom et al. note:
The veto held by members of the United Nations Security Council is given as an illustrative example of unilateral spoiling. This re-framing of the underlying statistical insight (the unilateral veto being "dual" to the unilateral act) seems relevant to its application to censorship: an author deciding to publish a blog post (even if other forum members think it's harmful) is in the position of taking unilateralist action—but so is a member of a board of pre-readers of whom any one has the power to censor the post (even if the other reviewers think it's fine).
It occurs to me that a karma system (such as that used on this website) has the potential to be an adequate check against the unilateralist's curse as described by Bostrom et al. if we assume that the penalty applied to downvoted posts is sufficient to prevent the harm of the putative infohazard. If some possible post is infohazardous (say, doxxing someone's home address), most users will correctly know not to post it. If one user erroneously decides that doxxing is good (as if having "rolled" an anomalously high error term), we expect their post to be downvoted to oblivion by the supermajority who knows that doxxing is bad. Conversely, while a net-upvoted post might still be infohazardous, the harm from the post should not be attributed to the unilateralist's curse.
(Thanks to David Manheim's comments on "Credibility of the CDC on SARS-CoV-2" for the inspiration.)