In this post, I propose an idea that could improve whistleblowing efficiency, thus hopefully improving AI Safety by making unsafe practices discovered marginally faster.
I'm looking for feedback, ideas for improvement, and people interested in making it happen.
It has been proposed before, that it's beneficial to have an efficient and trustworthy whistleblowing mechanism The technology that makes it possible has become easy and convenient. For example, here is Proof of Organization, built on top of ZK Email: a message board that allows people owning an email address at their company's domain to post without revealing their identity And here is an application for ring signatures using GitHub SSH keys that allows creating a signature that proves that you own one of the keys from any subgroup you define (e.g., EvilCorp repository contributors)
However, as one may have guessed, it hasn't been widely used. Hence, when the critical moment arrives, the whistleblower may not be aware of such technology, and even if they were, they probably wouldn't trust it enough to use it. I think trust comes from either code being audited by a well-established and trusted entity or, more commonly - through practice (e.g., I don't need to verify that a certain password manager is secure if I know that millions are using it, and there haven't been any password breaches reported)
Hence, I was considering how to make a privacy-preserving communication tool that would be commonly used, demonstrating its legitimacy and becoming trusted
The best idea I have so far is to create a set of Twitter bots for each interesting company (or community), where only the people in question could post. Depending on the particular Twitter bot in question, access could be gated by ownership of a LinkedIn account, email domain, or, e.g., an LW/AI-Alignment forum account of a certain age
I imagine this could become viral and interesting in gossipy cases, like the Sam Altman drama or the Biden dropout drama.
Some questions that came up during consideration:
How to deal with moderation of the content (if everything is posted, anyone could deliberately post some profanity to get the bot banned)?
I would aggressively moderate myself and replace moderated posts with a link to a separate website where all posts get through
How do we balance convenience and privacy?
I'd make a hosted opensource tool, which I expect most people would feel content to use for any gossip case that doesn't put your job on the line but has instructions available to download it and run locally and submit posts through Tor, etc. for cases where such effort is warranted
What if people use this tool to make false accusations?
I do think this is an actual downside, but I hope that the benefits of the tool would be worth it
What if someone creates a fake dialogue, pretending to be two people debating a topic?
Although it's technically possible to make a tool that would allow proving that you have not posted before, this functionality shouldn't exist since. Otherwise, one can be forced to make such proof or confess. It is a thing to be aware of, but not too much of a problem, in my opinion
I'm curious to learn what others think and about other ideas for making a gossip/whistleblower tool that could become widely known and trusted.
In this post, I propose an idea that could improve whistleblowing efficiency, thus hopefully improving AI Safety by making unsafe practices discovered marginally faster.
I'm looking for feedback, ideas for improvement, and people interested in making it happen.
It has been proposed before, that it's beneficial to have an efficient and trustworthy whistleblowing mechanism The technology that makes it possible has become easy and convenient. For example, here is Proof of Organization, built on top of ZK Email: a message board that allows people owning an email address at their company's domain to post without revealing their identity And here is an application for ring signatures using GitHub SSH keys that allows creating a signature that proves that you own one of the keys from any subgroup you define (e.g., EvilCorp repository contributors)
However, as one may have guessed, it hasn't been widely used. Hence, when the critical moment arrives, the whistleblower may not be aware of such technology, and even if they were, they probably wouldn't trust it enough to use it. I think trust comes from either code being audited by a well-established and trusted entity or, more commonly - through practice (e.g., I don't need to verify that a certain password manager is secure if I know that millions are using it, and there haven't been any password breaches reported)
Hence, I was considering how to make a privacy-preserving communication tool that would be commonly used, demonstrating its legitimacy and becoming trusted
The best idea I have so far is to create a set of Twitter bots for each interesting company (or community), where only the people in question could post. Depending on the particular Twitter bot in question, access could be gated by ownership of a LinkedIn account, email domain, or, e.g., an LW/AI-Alignment forum account of a certain age
I imagine this could become viral and interesting in gossipy cases, like the Sam Altman drama or the Biden dropout drama.
Some questions that came up during consideration:
I'm curious to learn what others think and about other ideas for making a gossip/whistleblower tool that could become widely known and trusted.