I catch myself sometimes thinking of ideas / scenarios that support higher p(doom), typically as counter-examples to points folks make for lower p(doom), and I wonder how much self-censorship I should apply, given that AI can read these conversations.

My CoT:

  1. I sure don't want to feed ideas to any bad actor.
  2. But it's arrogant to think that anything I can come up with wouldn't already be obvious to an entity with paperclipping-level power.
  3. In chess, an easy way to make mistakes is by defending against imaginary threats, or even real threats which aren't the most dangerous ones on the board, or threats whose defense is costlier than what you forego by not making other good moves available to you like a counterattack.
  4. In dictatorships, preventing people from coordinating with one another e.g. by convincing everyone that their neighbor squawks to the secret police, is a very effective way for a few to hold control over many. So when you're up against a powerful singular threat, coordination is important!
  5. Yet, I can't shake a queazy feeling at the thought of putting out dangerous ideas. Perhaps, somehow, the space of savant-smart AI systems who are powerful enough to paperclip, yet may not be generically smart enough to have lots of random ideas, is not so small as to be ignored?

Do others have any useful guidelines, thoughts or intuition here? What am I missing?

New Answer
New Comment
2 comments, sorted by Click to highlight new comments since:

These are reasonable concerns. I'm glad to see list points 2, 3, and 4. Playing defense to the exclusion of playing offense is a way to lose. We are here to win. That requires courage and discernment in the face of danger. Avoiding all downsides is a pretty sure way to reduce your odds of winning. There are many posts here about weighing the dowsides of sharing infohazards with the upsides of progressing on alignment; sorry I don't have refs at the top of my head. I will say that I, and I think the community on average, would say it's highly unlikely that existentially dangerous AI exists now, or that future AI will become more dangerous by reading particularly clever human ideas.

So, if your ideas have potential important upside, and no obvious large downside, please share them.

Some of the points in posts about sharing ideas: it's unlikely they're as dangerous as you think; others have probably had those ideas and maybe tested them if they're that good. And you need to weigh the up-side against the down-side.

Also if you have a little time, searching LessWrong for similar ideas will be fun and fascinating.

Different personalities will tend to misweight in opposite directions. Pessimists/anxious people will overweight potential downsides, while optimists/enthusiastic people will overweight the upside. Doing a good job is complex. But it probably doesn't matter much unless you've done a bunch of research to establish that your new idea is really both new and potentially very powerful. Lots of ideas (but not nearly all!) have been thought of and explored, and there are lots of reasons that powerful-seeming ideas wind up not being that important or new.

So, if your ideas have potential important upside, and no obvious large downside, please share them.

 

What would be some examples of obviously large downside? Something that comes to mind is anything that tips the current scales in a bad way, like some novel research result that directs researchers to more rapid capabilities increase without a commensurate increase in alignemnt. Anything else?