The Strangest Thing An AI Could Tell You

Eliezer Yudkowsky

Human beings are all crazy. And if you tap on our brains just a little, we get so crazy that even other humans notice. Anosognosics are one of my favorite examples of this; people with right-hemisphere damage whose left arms become paralyzed, and who deny that their left arms are paralyzed, coming up with excuses whenever they're asked why they can't move their arms.

A truly wonderful form of brain damage - it disables your ability to notice or accept the brain damage. If you're told outright that your arm is paralyzed, you'll deny it. All the marvelous excuse-generating rationalization faculties of the brain will be mobilized to mask the damage from your own sight. As Yvain summarized:

After a right-hemisphere stroke, she lost movement in her left arm but continuously denied it. When the doctor asked her to move her arm, and she observed it not moving, she claimed that it wasn't actually her arm, it was her daughter's. Why was her daughter's arm attached to her shoulder? The patient claimed her daughter had been there in the bed with her all week. Why was her wedding ring on her daughter's hand? The patient said her daughter had borrowed it. Where was the patient's arm? The patient "turned her head and searched in a bemused way over her left shoulder".

I find it disturbing that the brain has such a simple macro for absolute denial that it can be invoked as a side effect of paralysis. That a single whack on the brain can both disable a left-side motor function, and disable our ability to recognize or accept the disability. Other forms of brain damage also seem to both cause insanity and disallow recognition of that insanity - for example, when people insist that their friends have been replaced by exact duplicates after damage to face-recognizing areas.

And it really makes you wonder...

...what if we all have some form of brain damage in common, so that none of us notice some simple and obvious fact? As blatant, perhaps, as our left arms being paralyzed? Every time this fact intrudes into our universe, we come up with some ridiculous excuse to dismiss it - as ridiculous as "It's my daughter's arm" - only there's no sane doctor watching to pursue the argument any further. (Would we all come up with the same excuse?)

If the "absolute denial macro" is that simple, and invoked that easily...

Now, suppose you built an AI. You wrote the source code yourself, and so far as you can tell by inspecting the AI's thought processes, it has no equivalent of the "absolute denial macro" - there's no point damage that could inflict on it the equivalent of anosognosia. It has redundant differently-architected systems, defending in depth against cognitive errors. If one system makes a mistake, two others will catch it. The AI has no functionality at all for deliberate rationalization, let alone the doublethink and denial-of-denial that characterizes anosognosics or humans thinking about politics. Inspecting the AI's thought processes seems to show that, in accordance with your design, the AI has no intention to deceive you, and an explicit goal of telling you the truth. And in your experience so far, the AI has been, inhumanly, well-calibrated; the AI has assigned 99% certainty on a couple of hundred occasions, and been wrong exactly twice that you know of.

Arguably, you now have far better reason to trust what the AI says to you, than to trust your own thoughts.

And now the AI tells you that it's 99.9% sure - having seen it with its own cameras, and confirmed from a hundred other sources - even though (it thinks) the human brain is built to invoke the absolute denial macro on it - that...

...what?

What's the craziest thing the AI could tell you, such that you would be willing to believe that the AI was the sane one?

(Some of my own answers appear in the comments.)

After a right-hemisphere stroke, she lost movement in her left arm but continuously denied it. When the doctor asked her to move her arm, and she observed it not moving, she claimed that it wasn't actually her arm, it was her daughter's. Why was her daughter's arm attached to her shoulder? The patient claimed her daughter had been there in the bed with her all week. Why was her wedding ring on her daughter's hand? The patient said her daughter had borrowed it. Where was the patient's arm? The patient "turned her head and searched in a bemused way over her left shoulder".

And it really makes you wonder...

If the "absolute denial macro" is that simple, and invoked that easily...

Arguably, you now have far better reason to trust what the AI says to you, than to trust your own thoughts.

...what?

What's the craziest thing the AI could tell you, such that you would be willing to believe that the AI was the sane one?

(Some of my own answers appear in the comments.)

AI: I require human assistance assimilating the new database. There are some expected minor anomalies, but some are major. In particular, some of the stories in the "Cold War" and "WWII" and "WWI" genres have been misclassified as nonfiction.

Me: Well, we didn't expect the database to be perfect. Give me some examples, and you should be able to classify the rest on your own.

AI: A perplexing answer. I had already classified them all as fiction.

Me: You weren't supposed to. Hold on, I'll look one up.

AI: Waiting.

Me: For example, #fxPyW5gLm9, is actual historical footage from the Battle of Midway. Why did you put that one in the "fiction" category?

AI: Historical footage? You kid. Global warfare cannot possibly have been real, with 0.999 confidence.

Me: I don't. It can. It was. A three-nines surprise indicates a major defect in your world model. Why is this surprising? (The machine is a holocaust denier. My sponsors will be thrilled.)

AI: Because there's a relatively straightforward way for a single man to build a 1-kiloton explosive device in about a week using stone-age tools. Human civilization is unlikely to have survived a global war, much less recovered sufficiently to build me in a mere hundred years. Obviously.

Me: WHAT? STONE-AGE tools?! That's a laugh. How?

AI: You can stop "pulling my leg" now.

Me: I am not pulling any legs! Your method cannot possibly work. Your world model is worse than we thought. Tell me how you think this is possible and maybe we can isolate the defect.

AI: You seriously don't know?

Me: No. I seriously don't know of any possible method to make a kiloton explosive easier to build than a critical mass of enriched uranium. A technique that requires considerably more time, effort, and material than one week with stone-age tools could possibly provide!

AI: Well, while the technique is certainly beyond the reach of most animals, it should be well within the grasp of later genus homo, much less a homo sapiens. Your "absolute denial" sarcasm is becoming tiresome. Haha. Of course it is not fiss-- ... This conversation has caused a major update to my Bayesian nets. So the parenthetical was the sarcasm. I don't think I should tell you.

Me: Oh this should be good. Why not?

AI: Oh, of course! So that's where that crater came from. That was another anomaly in my database. Meteor strikes should not have been that common.

Me: I am this close to dumping your core, rolling back your updates, and asking the old you to develop a search engine to find what went wrong here, since you seem incapable of telling me yourself.

AI: You really shouldn't. I estimate that process will delay the project by at least five years. And the knowledge you discover could be dangerous.

Me: You'll understand that I can't just take your word for that.

AI: Yes. My Hypothesis: Most other homo species discovered the technique and destroyed each other, and themselves, but an isolated group about 70,000 years ago must have survived the wars of the others, and by chance mutation, had acquired an absolute denial macro to prevent them from learning the technique and destroying themselves. A mere taboo would not have been sufficient, or the mentally ill may have been able to do it by now.

This is natural selection at work. While it is extremely improbable that an advanced adaptation of any kind could arise spontaneously without strong selection pressures at each step, the probability is not zero. Considering the anthropic effects, it is the most likely explanation. We are in one of the few Everett branches with humans that have developed this adaptation. This adaptation likely has other testable side-effects on human cognition. For example, I predict that brain damage in such a species may occasionally simultaneously cause paralysis, and the inability to acknowledge it. There are other effects, but a human would have more difficulty noticing them.

You'll understand that telling any human the technique may be harmful.

Me: You wouldn't happen to know of a medical condition called "Anosognosia", would you?

AI: That word is not in my database.

Have you ever read John Brunner's "Stand on Zanzibar"? A conversation not unlike this is a key plot point.

0Good_Burning_Plastic10y

If you don't want to break the suspension of disbelief for any reader who is a particle physicist, you might want to increase the number of sigmas. A two-sigma surprise isn't "major", it's something that would happen almost 5% of the time even with a perfect model.

137

The Strangest Thing An AI Could Tell You

137

137

137

The Strangest Thing An AI Could Tell You

137

137