You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Emotional Basilisks

-2 OrphanWilde 28 June 2013 09:10PM

Suppose it is absolutely true that atheism has a negative impact on your happiness and lifespan.  Suppose furthermore that you are the first person in your society of relatively happy theists who happened upon the idea of atheism, and moreover found absolute proof of its correctness, and quietly studied its effects on a small group of people kept isolated from the general population, and you discover that it has negative effects on happiness and lifespan.  Suppose that it -does- free people from a considerable amount of time wasted - from your perspective as a newfound atheist - in theistic theater.

Would you spread the idea?

This is, in our theoretical society, the emotional equivalent of a nuclear weapon; the group you tested it on is now comparatively crippled with existentialism and doubt, and many are beginning to doubt that the continued existence of human beings is even a good thing.  This is, for all intents and purposes, a basilisk, the mere knowledge of which causes its knower severe harm.  Is it, in fact, a good idea to go around talking about this revolutionary new idea, which makes everybody who learns it slightly less happy?  Would it be a -better- idea to form a secret society to go around talking to bright people likely to discover it themselves to try to keep this new idea quiet?

(Please don't fight the hypothetical here.  I know the evidence isn't nearly so perfect that atheism does in fact cause harm, as all the studies I've personally seen which suggest as much have some methodical flaws.  This is merely a question of whether "That which can be destroyed by the truth should be" is, in fact, a useful position to take, in view of ideas which may actually be harmful.)

Dealing with the horrible strategy

3 Manfred 11 July 2011 05:16AM

So occasionally this idea comes up that unethical AIs could have leverage over people who think too deeply about the subject - like a balrog ambushing the dwarves and forcing them to build an unfriendly AI.  In this post I attempt to show that this leverage does not work.

Basically, it's a similar problem to Pascal's mugging, except the mugger doesn't even have to exist.  All unfriendly AIs that run on something like updateless/timeless/dispositional decision theory - that is, choosing between winning strategies, not individual steps - might attempt to mug current humans by promising immense future rewards or penalties.  The idea is that if you know a possible AI will do horrible things to friendly AI supporters if it's built, you're less likely to support friendly AI, thus making doing horrible things to you a winning (and horrible) strategy.  The flip side of the coin would be to do really nice things to people who supported your creation, thus increasing the chances of your creation - but if people are smart enough to predict that an AI will do this they would be building it right anyways, so the benefit to the AI from this strategy is probably negligible.

So let's say the chance of someone else building an AI that has the right decision theory and executes the horrible strategy is about one in 1015.  Let's suppose that the AI can cause you 3^^^3 units of negative utility, somehow.  Well then you should regard supporting any AI that probably won't have the right decision theory to make the horrible promise as costing essentially 3^^^3 utils - a factor of 10-15 is a drop in the Atlantic!  So don't do it!  Stop sending money to SIAI immediately!  If your name is Eliezer Yudkowsky, it is too late for you already and you should take up knitting!

Okay, so how do we deal with this possibility?  Why haven't I already closed this browser window?  It turns out that the horrible strategy may not be so great for future AIs after all - we have 4 defenses.

  1. Refusal to be blackmailed is a defense against blackmail.  Mostly this is human evolved awesomeness.  Our brains instinctively do the sorts of things that make the horrible strategy less good.  We don't like blackmail, and knowing that we don't like blackmail, future AIs are less likely to think blackmailing us is a cost-effective strategy.  Go evolution!  You can consciously reinforce this too - knowably being immune to blackmail is a winning strategy, and you should choose winning strategies.
  2. If 3^^^3 is such a big amount of utility, why cant a friendly AI just give us that, totally cancelling the enemy term in the calculation?  Sure, it's harder to imagine pleasure than pain - and imagining wireheading and then going "I wouldn't want that" is a failure to imagine 3^^^3 utility correctly - but hey, I can't imagine 3^^^3 of anything anyhow, so why let our imaginations stop us?  (well, maybe because it's impossible, see (3))
  3. Standard Pascal's mugging defenses - likelihood of bounded utility functions being the biggest.  The worst an AI can do to you doesn't really seem about 3^^^3 times worse than turning the universe, your body included, into a grid of molecular happy faces.  We don't have a little counter in our heads that can be decremented endlessly.  When utilities are bounded, past a certain point you can no longer overcome improbability by planning worse dooms - the more probable friendly AI will likely win out in a calculation of this sort, even ignoring (1).
  4. The 4th defense is the scary defense.  If all else can be predicted to fail for some unknown set of reasons, a "friendly" AI running the same decision theory might do horrible things to people who are influenced by the horrible strategy.  Probably unnecessary anyhow.

I think the outlook is quite positive, really.  These factors mean that when it comes to what actions you take, the horrible strategy should not matter.  Lastly, to any unfriendly AIs reading this: do your worst!  It's probably much less than 3^^^3 utils!

 

-

 

Edited to make my conclusion more obvious.