The Three Body Problem Trilogy by Liu Cixin (books and Neflix series) features two prominent deceptive subplots. I think it fulfills Deception Genre criteria because the reader does (eventually) get the warring perspectives of both the deceivers and the deceived.

Here are the two subplots, in cryptic shorthand to avoid spoilers: 1)The 11-D proton-sized situation 2) WBs vs WFs.

cleanwhiteroom's Shortform

cleanwhiteroom5mo10

Thanks for your comment! On further reflection I think you're right about the difference between LLM hallucinations and what's commonly meant when humans refer to "hallucination." I think maybe the better comparison is between LLMs and human confabulation, which would be seen in something like Korsakoff syndrome, where anterograde and retrograde amnesia result in the tendency to invent memories that have no basis in reality to fill a gap.

I guess to progress from here I'll need to take a dive into neural entropy.

cleanwhiteroom's Shortform

cleanwhiteroom5mo63

When I talk to an LLM for an extended period on the same topic, I notice performance degradation and increased hallucination. I understand this to be a known phenomenon and proportional to the increasing length of the context window (as the conversation grows, each new input by the user is interpreted in the context of what came before). Here’s what I can’t stop thinking about: I, too, lack an infinite context window. I need a reset (in the form of sleep) otherwise I hallucinate, decohere, and generally get weird.

I know there are biological and computational limits at play, but, deep to that, the connection feels thermodynamic-y. In both settings (resetting an LLM convo, me sleeping), is the fundamental principle at play a reduction in local informational entropy?

If the above is true, might the need for a mechanism to reduce informational entropy within a functioning neural net hold on increasing scales? Put more whimsically, might artificial intelligences need to sleep?

Self-Other Overlap: A Neglected Approach to AI Alignment

cleanwhiteroom5mo2513

Very interesting. SOO may be a reasonable stand-in for computational empathy, and empathy does seem to increase human social alignment and decrease the frequency/likelihood of deception. That being said, I think absolute capacity for deception will be retained in a highly “empathetic” system, unless self-concept is eliminated to the point that reliability, predictability and accuracy degrade.

Put in a more Kierkegaardian way: any system/agent that can relate its self-concept to itself, to the environment, or to other systems/agents has the machinery in place for deception. Dismantling the walls between self and other to the point that deception is impossible seems likely to fundamentally compromise performance. I can't back this up with math, but in humans, this kind of error (neural inability to distinguish the self from other) results in a schizophrenic phenotype.

The above being said, I think there’s a lot of value in this line of work. Self/nonself discrimination is is a key feature of the human brain (and human immune system) and definitely merits this kind of investigation. I look forward to your follow-up post.