On self-deception
(Meta-note: First post on this site) I have read the sequence on self-deception/doublethink and I have some comments for which I'd like to solicit feedback. This post is going to focus on the idea that it's impossible to deceive oneself, or to make oneself believe something which one knows apriori to be wrong. I think Eliezer believes this to be true, e.g. as discussed here. I'd like to propose a contrary position. Let's suppose that a super-intelligent AI has been built, and it knows plenty of tricks that no human ever thought of, in order to present a false argument which is not easily detectable to be false. Whether it can do that by presenting subtly wrong premises, or by incorrect generalization, or word tricks, or who knows what, is not important. It can, however, present an argument in a Socratic manner, and like Socrates' interlocutors, you find yourself agreeing with things you don't expect to agree with. I now come to this AI, and request it to make a library of books for me (personally). Each is to be such that if I (specifically) were to read it, I would very likely come to believe a certain proposition. It should take into account that initially I may be opposed to the proposition, and that I am aware that I am being manipulated. Now, AI produces such a library, on the topic of religion, for all major known religions, A to Z. It has a book called "You should be an atheist", and "You should be a Christian", etc, up to "You should be a Zoroastrian". Suppose, I now want to deceive myself. I throw fair dice, and end up picking a Zoroastrian book. I now commit to reading the entire book and do so. In the process I become convinced that indeed, I should be a Zoroastrian, despite my initial skepticism. Now my skeptical friend comes to me: Q: You don't really believe in Zoroastrianism. A: No, I do. Praise Ahura Mazda! Q: You can't possibly mean it. You know that you didn't believe it and you read a book that was designed to manipulate you, and now you do?
Maybe I am amoral, but I don't value myself the same as a random person even in a theoretical sense. What I do is I recognize that in some sense I am no more valuable to humanity than any other person. But I am way more valuable to me - if I die, that brings utility to 0, and while it can be negative in some circumstances (aka Life is not worth living), some random person's death clearly cannot do so, people are constantly dying in huge numbers all the time, and the cost of each death is non-zero to me, but must be relatively small, else I would easily be in the negative territory, and I am not.