If before you open the book, you believe that the book will provide incredibly compelling evidence of Zoroastrianism whether or not Zoroastrianism is true, and upon opening the book you find incredibly compelling evidence of Zoroastrianism, your probability of Zoroastrianism should not change, since you didn't observe any evidence which is more likely to exist if Zoroastrianism were true than if it were not true.
It may be that you are underestimating the AI's cleverness, so that you expect to see decent evidence of Zoroastrianism, but in fact you found incredible evidence of Zoroastrianism, and so you become convinced. In this case your false belief about the AI not being too convincing is doing the philosophical work of deceiving you, and it's no longer really deceiving yourself. Deceiving yourself seems to be more about starting with all correct beliefs, but talking yourself into an incorrect belief.
If you happen to luck out into having a false belief about the AI being unconvincing, and if this situation with the library of theology just falls out of the sky without your arranging it, you got lucky - but that's being deceived by others. If you try to set up the situation, you can't deliberately underestimate the AI because you'll know you're doing it. And you can't set up the theological library situation until you're confident you've deliberately underestimated the AI.
If before you open the book, you believe that the book will provide incredibly compelling evidence of Zoroastrianism whether or not Zoroastrianism is true, and upon opening the book you find incredibly compelling evidence of Zoroastrianism, your probability of Zoroastrianism should not change, since you didn't observe any evidence which is more likely to exist if Zoroastrianism were true than if it were not true.
This presumes that your mind can continue to obey the rules of Bayesian updating in the face of an optimization process that's deliberately trying to make it break those rules. We can't do that very well.
You may want to look at Brandon Fitelson's short paper Evidence of evidence is not (necessarily) evidence. You seem to be arguing that, since we have strong evidence that the book has strong evidence for Zoroastrianism before we read it, it follows that we already have (the most important part of) our evidence for Zoroastrianism. But it turns out that it's extremely tricky to make this sort of reasoning work. To use the most primitive example from the paper, discovering that a playing card C is black is evidence that C is the ace of spades. Furthermore, that C is the ace of spades is excellent evidence that it's an ace. But discovering that C is black does not give you any evidence whatsoever that C is an ace.
The problem here - at least one of them - is that discovering C is black is just as much evidence for C being the x of spades for any other card-value x. Similarly, before opening the book on Zoroastrianism, we have just as much evidence for the existence of strong evidence for Christianity/atheism/etc, so our credences shouldn't suddenly start favoring any one of these. But once we learn the evidence for Zoroastrianism, we've acquired new information, in just the same way that learning that the card is an ace of spades provides us new information if we previously just knew it was black.
I do suspect that there are relevant disanalogies here, but don't have a very detailed understanding of them.
Suppose, I am going to read a book by a top Catholic theologian. I know he is probably smarter than me, because of the number of priests in the world, and their average IQ and intellectual abilities, etc, I figure the smartest of them is probably really really smart and more well read and has the very best arguments the Church found in 2000 years. If I read his book, should I take it into account and discount his evidence because of this meta information? Or should I evaluate the evidence?
It's the very fallacy Eliezer argues against where people know about clever arguers and use this fact against everyone else.
If I read his book, should I take it into account and discount his evidence because of this meta information? Or should I evaluate the evidence?
You should take the meta-information into account, because what you're getting is filtered evidence. See What Evidence Filtered Evidence. If the book only contained very weak arguments, this would suggest that no strong arguments could be found, and would therefore be evidence against what the book was arguing for.
an ordinary person may well not need a super-intelligent AI to fool him, just a very convincing book or very clever interlocutor.
Such as a rogue AI (played by EY) convincing a skeptic to let it out of the box. Apparently the super-intelligence threshold does not need to be super-high (no offense to EY).
I've noticed this sort of thing with documentaries about the JFK assassination. One documentary will seem to produce very strong and reasonable evidence that Oswald did it, and the next documentary seems to have a similar strength argument that he did not. Sigh. The real world is confusing some times; when smart people are trying to make you more confused then life is hard.
I've noticed this sort of thing with documentaries about the JFK assassination. One documentary will seem to produce very strong and reasonable evidence that Oswald did it, and the next documentary seems to have a similar strength argument that he did not. Sigh. The real world is confusing some times.
I used to have a hard time with cases like that. Then I figured out the right mental category to put them in, after the story of Alexander Litvinenko's poisoning made headlines. I took an outside view: spy poisoned, accuses other spies, and a radioactive leads to someone's door. Once I realized that there were obviously competent parties fucking with the evidence, I classified it as "spy business" and deemed it unsolvable. Having this mental category has served me well, and it's fairly obvious that the JFK assassination goes in the same bucket.
This argument is irrelevant to the point Eliezer was making in the sequence, as it doesn't distinguish levels of self-deception possible in normal human experience and those reachable with superoptimization. In effect, you are exploiting the sorites fallacy (or fallacy of gray). That superoptimization might be able to break your mind in a certain way says little about whether your mind can normally break that way.
We rarely observe Christians trying to walk on water even though they should be able to, given enough faith. In fact they act as if it's impossible. I assume that this is the sort of thing you are talking about? But we also see people trying faith healing even though it doesn't work. Their model of the world really is different from yours. Likewise with scientologists and psychiatry. They aren't faking it. If Z tells me that I must pray in order to be healed, and not take drugs (I have no idea if it does, probably not) and I do in fact do so, being convinced by the book that I must, would that be sufficient?
Not sure whether you really mean 'know apriori to be wrong', which would be a very bold claim on almost any issue. But I think people can definitely self-decieve. Used to spend a lot of time arguing on religion sites, and I always found the counter-argument to Pascal's Wager that you 'can't just decide what to believe' very weak, especially as Pascal himself set out a 'influence your own belief' how-to. It's not even that exceptional: I suspect that most people could get themselves into being 'true believers' in one political stance or other by surrounding...
I took that into account, and my prior was really low that I would ever believe it.
Was it? See the following passage:
I know I would think he's not there after I read it
Also:
Why should the AI do that?
is an argument from ignorance, and
Valid argument is the best way to demonstrate the truth of something that is in fact true.
is not true, it's not even properly wrong since it assumes the mind projection fallacy.
Firstly, upvoted for an excellent problem!
Let's suppose that a super-intelligent AI has been built, and it knows plenty of tricks that no human ever thought of, in order to present a false argument which is not easily detectable to be false. Whether it can do that by presenting subtly wrong premises, or by incorrect generalization, or word tricks, or who knows what, is not important. It can, however, present an argument in a Socratic manner, and like Socrates' interlocutors, you find yourself agreeing with things you don't expect to agree with.
So the ...
Is Eliezer's claim that it is impossible for a perfect reasoner to deceive themself, or that it is impossible for real-life humans to deceive themselves?
I assume he doesn't argue that crazy people can't deceive themselves. But then where is the boundary between crazy and perfect? And if the claim only applies to perfect reasoners, of what use is it?
An introduction to Zoroastrianism, by Omega:
"Dear reader, if you have picked this book first, I recommend you to stop being rational, right now. Ignore all rational rules and techniques you know, and continue reading this book with open mind -- that means, without critical thinking. Because if you fail to believe in Zoroastrianism, I will torture you for eternity, and I mean it!"
A friendly Omega could write this too, if it already knows that reader will surrender, so at the end the reader is not tortured, and the reader's wish (to believe in Zoro...
No he expects that if he reads the book, his posterior belief in the proposition is likely going to be high. But his current prior belief in the truth of the proposition is low.
Also, as I made clear in my update, AI is not perfect, merely very good. I only need it to be good enough for the whole episode to go through, i.e. that you don't argue that a rational person will never believe in Z after reading the book and my story is implausible.
So in other words, the person is expecting to be persuaded by something other than the truth. Perhaps on the basis that the last N times he read one of these books, it changed his mind.
In that case, it is no different than if the person were stepping into a brain modification booth, and having his mind altered directly. Because a rational person would simply not be conned by this process. He would see that he currently believes in the existence of the flying spaghetti monster, and that he just read a book on the flying spaghetti monster prepared by a superintelligent AI which he had asked to prepare for him ultra-persuasive but entirely biased collections of evidence, and remember that he didn't formerly believe in the flying spaghetti monster. He would conclude on this basis that his belief probably has no basis in reality, i.e. is inaccurate, and stop believing (with such high probability) in it.
If we are to accept that the AI is good enough to prevent this happening - a necessary premise of the thought experiment - then it must be preventing the person from being rational in this way, perhaps by including statements in the book that in some extraordinary way reprogram his mind via some backdoor vulnerability. Let's say that perhaps the person is an android creating by the AI for its own amusement, which responds to certain phrases with massive anomalous changes in its brain wiring. That is simply the only way I can accept the premises that:
a) the person applies Bayes's theorem properly (if this is not true, then he is simply not “mentally consistent” as you said)
b) he is aware that the books are designed to persuade him with high probability
c) he believes that the propositions to be proven in the books are untrue in general
d) he believes with high probability that the books will persuade him
which, unless I am very much mistaken, are equivalent to your statements of the problem.
If reading a book is not basically equivalent to submitting knowingly to brain modification for belief in something, then one of the above is untrue, i.e. the premises are inconsistent and the thought experiment can tell us nothing.
Remember that you are trying to prove than “one can really intentionally deceive oneself and be in a mentally consistent (although weird) state”. I accept that there is nothing mentally inconsistent about submitting to have one’s beliefs changed by brain surgery in one’s sleep. But accepting the fact that “intentionally deceiving oneself” is just three words that could be applied to any referent, I don’t think that your apparent referent is what Eliezer was talking about in the post you linked to. So you haven’t refuted him.
“Intentionally deceiving oneself” in his discussion means “deciding that one should believe something, and then (using the mundane tools available to us now, like reading books, chanting mantras, going to church, meditating etc.) forcing oneself to believe something else”. This may be possible in the trivial sense that 0 is not a probability, but in a practical sense it is basically “impossible” and that is all that Eliezer was arguing.
I’m sure Eliezer and anyone else would agree that it is possible to be an ideal Bayesian, and step in a booth in order to have oneself modified to believe in the flying spaghetti monster. It does seem to me that in order for the booth to work, it is going to have to turn you into a non-Bayesian irrational person, erase all of your memories about these booths or install false beliefs about the booths and then implant barriers in your mind to prevent the rest of your brain from changing this belief. It seems like a very difficult problem to me – but then we are talking about a superintelligent AI! In fact I expect that you’d need to be altered so much that you couldn’t even expect to be approximately the same person after leaving the booth.
Incidentally, this reminds me of a concept discussed in Greg Egan’s book “Quarantine”, which you might find interesting.
EDIT:
On re-reading, I see that the modification process as I described it doesn't actually uphold the premises of your thought experiment, because only one iteration of book-reading could occur before the person is no longer "mentally consistent" i.e. rational, and he can't ever read more than one of the books either (since his beliefs about or knowledge of the books themselves have been changed - which is not what he asked of the AI). So in order for the premises to be consistent, the book-programming-brain-surgery would have to completely wipe his mind and build a set of experiences from scratch so as to make the Universe seem consistent with evidence of the flying spaghetti monster, without it having to turn him into a non-Bayesian. The person would have to have evidence that the AI is clever enough that he should believe that it will be able to make books that persuade him of anything. And the AI would probably reset his mind at a point where he believes that he has never actually read any of the books yet.
What if the person realises that this exact scenario might already have happened? If the person was aware of the existence of this AI, and that he was in the business of asking it to do things liable to change his mind, I don't suppose that the line of reasoning I have outlined here would be hard for him to arrive at. This would be likely to undermine his belief in the reality of his entire life experiences in general, lowering his degree of belief in any particular deity. I suppose the easiest way around this would be if the AI were to make him sufficiently unintelligent that he doesn't come to suspect this, but is just barely capable of understanding the idea of a really smart being that can make "books" to persuade him of things (bearing in mind that according to the premises of the thought experiment, he has to be mentally consistent, i.e. Bayesian, and cannot have arbitrary barriers erected inside in his mind).
It seems that this thought experiment has turned out to be an example of the hidden complexity of wishes!
(Meta-note: First post on this site)
I have read the sequence on self-deception/doublethink and I have some comments for which I'd like to solicit feedback. This post is going to focus on the idea that it's impossible to deceive oneself, or to make oneself believe something which one knows apriori to be wrong. I think Eliezer believes this to be true, e.g. as discussed here. I'd like to propose a contrary position.
Let's suppose that a super-intelligent AI has been built, and it knows plenty of tricks that no human ever thought of, in order to present a false argument which is not easily detectable to be false. Whether it can do that by presenting subtly wrong premises, or by incorrect generalization, or word tricks, or who knows what, is not important. It can, however, present an argument in a Socratic manner, and like Socrates' interlocutors, you find yourself agreeing with things you don't expect to agree with. I now come to this AI, and request it to make a library of books for me (personally). Each is to be such that if I (specifically) were to read it, I would very likely come to believe a certain proposition. It should take into account that initially I may be opposed to the proposition, and that I am aware that I am being manipulated. Now, AI produces such a library, on the topic of religion, for all major known religions, A to Z. It has a book called "You should be an atheist", and "You should be a Christian", etc, up to "You should be a Zoroastrian".
Suppose, I now want to deceive myself. I throw fair dice, and end up picking a Zoroastrian book. I now commit to reading the entire book and do so. In the process I become convinced that indeed, I should be a Zoroastrian, despite my initial skepticism. Now my skeptical friend comes to me:
Q: You don't really believe in Zoroastrianism.
A: No, I do. Praise Ahura Mazda!
Q: You can't possibly mean it. You know that you didn't believe it and you read a book that was designed to manipulate you, and now you do? Don't you have any introspective ability?
A: I do. I didn't intend to believe it, but it turns out that it is actually true! Just because I picked this book up for the wrong reason, doesn't mean I can't now be genuinely convinced. There are many examples where people would study religion of their enemy in order to discredit it and in the process become convinced of its truth. I think St. Augustine was in a somewhat similar case.
Q: But you know the book is written in such a way as to convince you, whether it's true or not.
A: I took that into account, and my prior was really low that I would ever believe it. But the evidence presented in the book was so significant and convincing that it overcame my skepticism.
Q: But the book is a rationalization of Zoroastrianism. It's not an impartial analysis.
A: I once read a book trying to explain and prove Gödel's theorem. It was written explicitly to convince the reader that the theorem was true. It started with the conclusion and built all arguments to prove it. But the book was in fact correct in asserting this proposition.
Q: But the AI is a clever arguer. It only presents arguments that are useful to its cause.
A: So is the book on Gödel's theorem. It never presented any arguments against Gödel, and I know there are some, at least philosophical ones. It's still true.
Q: You can't make a new decision based on such a book which is a rationalization. Perhaps it can only be used to expand one's knowledge. Even if it argues in support of a true proposition, a book that is a rationalization is not really evidence for the proposition's truth.
A: You know that our AI created a library of books to argue for most theological positions. Do you agree that with very high probability one of the books in the library argues for a true proposition? E.g. the one about atheism? If I were to read it now, I'd become an atheist again.
Q: Then do so!
A: No, Ahura Mazda will punish me. I know I would think he's not there after I read it, but he'll punish me anyway. Besides, at present I believe that book to be intentionally misleading. Anyway, if one of the books argues for a true proposition, it may also use a completely valid argument without any tricks. I think this is true of this book on Zoroastrianism, and is false of all other books in AI's library.
Q: Perhaps I believe the Atheism book argues for a true proposition, but it is possible that all the books written by the AI use specious reasoning, even the one that argues for a true proposition. In this case, you can't rely on any of them being valid.
A: Why should the AI do that? Valid argument is the best way to demonstrate the truth of something that is in fact true. If tricks are used, this may be uncovered which would throw doubt onto the proposition being argued.
Q: If you picked a book "You should believe in Zeus", you'd believe in Zeus now!
A: Yes, but I would be wrong. You see, I accidentally picked the right one. Actually, it's not entirely accidental. You see, if Ahura Mazda exists, he would with some positive probability interfere with the dice and cause me to pick the book on the true religion because he would like me to be his worshiper. (Same with other gods, of course). So, since P(I picked the book on Zoroastrianism|Zoroastrianism is a true religion) > P(I picked the book on Zoroastrianism|Zoroastrianism is a false religion), I can conclude by Bayes' rule that me picking that book up is evidence for Zoroastrianism. Of course, if the prior P(Zoroastrianism is a true religion) is low, it's not a lot of evidence, but it's some.
Q: So you are really saying you won the lottery.
A: Yes. A priori, the probability is low, of course. But I actually have won the lottery: some people do, you know. Now that I have won it, the probability is close to 1 (It's not 1, because I recognize that I could be wrong, as a good Bayesian should. But the evidence is so overwhelming, my model says it's really close to 1).
Q: Why don't you ask your super-intelligent AI directly whether the book's reasoning is sound?
A: According to the book, I am not supposed to do it because Ahura Mazda wouldn't like it.
Q: Of course, the book is written by the superintelligent AI in such a way that there's no trick I can think of that it didn't cover. Your ignorance is now invincible.
A: I still remain a reasonable person and I don't like being denied access to information. However, I am now convinced that while having more information is useful, it is not my highest priority anymore. I know it is possible for me to disbelieve again if given certain (obviously false!) information, but my estimate of the chance that any further true information could change my opinion is very low. In fact, I am far more likely to be deceived by false information about Ahura Mazda, because I am not superintelligent. This is why Ahura Mazda (who is superintelligent, by the way) advises that one should not tempt oneself into sin by reading any criticism of Zoroastrianism.
Q: Just read that atheist book and become normal again!
A: You are possessed by demons! Repent and become the follower of Ahura Mazda!
So, are you now convinced that you should be a Zoroastrian one can really intentionally deceive oneself and be in a mentally consistent (although weird) state?
To answer one potential objection, an ordinary person may well not need a super-intelligent AI to fool him, just a very convincing book or very clever interlocutor. As to why someone would want to submit to this, I'll discuss this in a separate post.
Update:
Here are some points that I think are useful to add from various comments.
Any thoughts on whether I should post this on the main site?