But I wonder whether you could manipulate them this way arbitrarily far from rational behavior
Surely I can construct such a model. But whether this is generally the case depends too much on the details of the implementation to give a complete answers.
whether they may notice this as some inferences are more likely to be detected when you already have some other facts.
... especially logical inferences: logical deductions of true facts are true, even if you don't know/remember them. But then again, that depends too much on the implementation of the agent to have a general answer, in this case also its computational power would matter.
This article is going to be in the form of a story, since I want to lay out all the premises in a clear way. There's a related question about religious belief.
Let's suppose that there's a country called Faerie. I have a book about this country which describes all people living there as rational individuals (in a traditional sense). Furthermore, it states that some people in Faerie believe that there may be some individuals there known as sorcerers. No one has ever seen one, but they may or may not interfere in people's lives in subtle ways. Sorcerers are believed to be such that there can't be more than one of them around and they can't act outside of Faerie. There are 4 common belief systems present in Faerie:
This is completely exhaustive, because everyone believes there can be at most one sorcerer. Of course, some individuals within each group have different ideas about what their sorcerer is like, but within each group they all absolutely agree with their dogma as stated above.
Since I don't believe in sorcery, a priori I assign very high probability for case 4, and very low (and equal) probability for the other 3.
I can't visit Faerie, but I am permitted to do a scientific phone poll. I call some random person, named Bob. It turns out he believes in Bright. Since P(Bob believes in Bright | case 1 is true) is higher than the unconditional probability, I believe I should adjust the probability of case 1 up, by Bayes rule. Does everyone agree? Likewise, the probability of case 3 should go up, since disbelief in Dark is evidence for existence of Dark in exactly the same way, although perhaps to a smaller degree. I also think the case 2 and case 4 have to lose some probability, since it adds up to 1. If I further call a second person, Daisy, who turns out to believe in Dark, I should adjust all probabilities in the opposite direction. I am not asking either of them about the actual evidence they have, just what they believe.
I think this is straightforward so far. Here's the confusing part. It turns out that both Bob and Daisy are themselves aware of this argument. So, Bob says, one of the reasons he believes in Bright, is because that's positive evidence for Bright's existence. And Daisy believes in Dark despite that being evidence against his existence (presumably because there's some other evidence that's overwhelming).
Here are my questions:
I am looking forward to your thoughts.