As I've noted, my original comment isn't arguing what I thought I was arguing - I thought I was arguing that there's always some sort of 'sploit, in the sense of giving the mind a bad meme that takes it over, but I was actually arguing that it can't know there isn't. Which is also interesting (if my logic holds), but not nearly as strong.
I am very interested in the idea of whether there would always be a virulent poison meme 'sploit (even if building it would require infeasible time), but I suspect that requires a different line of argument.
I'm not aware of anything resembling a clear enough formalism of what people mean by mind or meme to answer either your original question or this one. I suspect we don't have anywhere near the understanding of minds in general to hope to answer the question, but my intuition is that it is the sort of question that we should be trying to answer.
Hey. I'm relatively new around here. I have read the core reading of the Singularity Institute, and quite a few Less Wrong articles, and Eliezer Yudkowsky's essay on Timeless Decision Theory. This question is phrased through Christianity, because that's where I thought of it, but it's applicable to lots of other religions and nonreligious beliefs, I think.
According to Christianity, belief makes you stronger and better. The Bible claims that people who believe are substantially better off both while living and after death. So if a self modifying decision maker decides for a second that the Christian faith is accurate, won't he modify his decision making algorithm to never doubt the truth of Christianity? Given what he knows, it is the best decision.
And so, if we build a self modifying AI, switch it on, and the first ten milliseconds caused it to believe in the Christian god, wouldn't that permanently cripple it, as well as probably causing it to fail most definitions of Friendly AI?
When designing an AI, how do you counter this problem? Have I missed something?
Thanks, GSE
EDIT: Yep, I had misunderstood what TDT was. I just meant self modifying systems. Also, I'm wrong.