Hey. I'm relatively new around here. I have read the core reading of the Singularity Institute, and quite a few Less Wrong articles, and Eliezer Yudkowsky's essay on Timeless Decision Theory. This question is phrased through Christianity, because that's where I thought of it, but it's applicable to lots of other religions and nonreligious beliefs, I think.
According to Christianity, belief makes you stronger and better. The Bible claims that people who believe are substantially better off both while living and after death. So if a self modifying decision maker decides for a second that the Christian faith is accurate, won't he modify his decision making algorithm to never doubt the truth of Christianity? Given what he knows, it is the best decision.
And so, if we build a self modifying AI, switch it on, and the first ten milliseconds caused it to believe in the Christian god, wouldn't that permanently cripple it, as well as probably causing it to fail most definitions of Friendly AI?
When designing an AI, how do you counter this problem? Have I missed something?
Thanks, GSE
EDIT: Yep, I had misunderstood what TDT was. I just meant self modifying systems. Also, I'm wrong.
I suspect any agent can be taken down by sufficiently bad input. Human brains are of course horribly exploitable, and predatory memes are quite well evolved to eat people's lives.
But I suspect that even a rational superintelligence ("perfectly spherical rationalist of uniform density") will be susceptible to something, on a process like:
Thus, a superintelligent agent could catch a bad case of an evolved predatory meme.
I do not know that the analogy with current computer science holds, I just suspect it does. But I'd just like you to picture our personal weakly godlike superintelligence catching superintelligent Scientology.
(And I still hear humans who think they're smart tell me that other people are susceptible but they don't think they would be. I'd like to see reasoning to this effect that takes into account the above, however.)
Edit: I've just realised that what I've argued above is not that a given rational agent will necessarily have a susceptibility - but that it cannot know that it doesn't have one. (I still think humans claiming that they know themselves not to be susceptible are fools, but need to think more on whether they necessarily have a susceptibility at all.)
There's no reason for this to be true for an AI. However, I also don't see why this assumption is necessary for the rest of your argument, which is basically that an agent can't know in advance all the future ramifications of accepting any possible new idea or belief. (It can know it for some of them; the challenge is presumably to build a good enough AI that can select enough new ideas that it can formally prove things about to be useful, while rejecting few useful ideas as unsusceptible to analysis.)