Software engineering, parenting, cognition, meditation, other
Linkedin, Facebook, Admonymous (anonymous feedback)
I'm worried we talk past each other.
You’re saying:
That part I agree with.
The point I’ve been trying to get at is: Once the same issue arises for ordinary optical appearances, we’ve left behind the special stakes of step 10! Because in the rainbow case, we all seem to accept (but maybe you disagree):
Or, if rainbow-style cases also fall under the trilemma, then the conclusion can’t be “mind exceeds physics." It would have to be the stronger and more surprising “appearances as such exceed physics” or “macrostructure in general exceeds physics.” That’s quite different from your original framing, which presents the homomorphic encryption case as demonstrating a distinctive epistemic excess of mind relative to physics.
It seems you are biting the bullet and agreeing that the rainbow also has the problem of how a mind can be aware of it when it isn't (efficiently) reconstructable. But then this seems to generalize to a lot, if not all, phenomena a mind can perceive. Doesn't this reduce that conception of a mind ad absurdum?
I think step 10 overstates what is shown. You write:
“If a homomorphically encrypted mind (with no decryption key) is conscious … it seems it knows things … that cannot be efficiently determined from physics.”
The move from “not P-efficiently determined from physics” to “mind exceeds physics (epistemically)” looks too strong. The same inferential template would force us into contradictions in ordinary physical cases where appearances are available to an observer but not efficiently reconstructible from the microphysical state.
Take a rainbow. Let p be the full microphysical state of the atmosphere and EM field, and let a be the appearance of the rainbow to an observer. The observer trivially “knows” a. Yet from p, even a quantum-bounded “Laplace’s demon” cannot, in general, P-efficiently compute the precise phenomenal structure of that appearance. The appearance does not therefore “exceed physics.”
If we accepted your step 10’s principle "facts accessible to a system but P-intractable to compute from p outrun physics" we would have to say the same about rainbows:
the rainbow’s appearance to an observer “knows something” physics can’t efficiently determine.
That is an implausible conclusion. The physical state fully fixes the appearance; what fails is only efficient external reconstruction, not physical determination.
Homomorphic encryption sharpens the asymmetry between internal access and external decipherability, but it does not introduce a new ontological gap.
So I agree with the earlier steps (digital consciousness, key distance irrelevance) but think the “mind exceeds physics (epistemically)” inference is a category error: it treats P-efficient reconstructability as a criterion for physical determination. If we reject that criterion in the rainbow case, we should reject it in the homomorphic case too.
I like the sharp distinction you draw between
“Our Values are (roughly) the yumminess or yearning…”
and
“Goodness is (roughly) whatever stuff the memes say one should value.”
but the post treats these as more separable than they actually are from the standpoint of how the brain acquires preferences.
You emphasize that
“we mostly don’t get to choose what triggers yumminess/yearning”
and that Goodness trying to overwrite that is “silly.” Yet a few paragraphs later you note that
“a nontrivial chunk of the memetic egregore Goodness needs to be complied with…”
before recommending to “jettison the memetic egregore” once the safety-function parts are removed.
But the brain’s value-learning machinery doesn’t respect this separation. “Yumminess/yearning” is not fixed hardware; it’s a constantly updated reward model trained by social feedback, imitation, and narrative framing. The very things you group under “Goodness” supply the majority of training data for what later becomes “actual Values.” The egregore is not only a coordination layer or a memetically selected structure on top, it is also the training signal.
Your own example shows this coupling. You say that
“Loving Connection… is a REALLY big chunk of their Values”
while also being a core part of Goodness. This dual function of a learned reward target and the memetic structure that teaches people to want it, is typical rather than exceptional.
So the key point isn't “should you follow Goodness or your Values?” but “which training signals should you expose your value-learning architecture to?” Then the Albert failure mode looks less like “he ignored Goodness” and more like “he removed a large portion of what shapes his future reward landscape.”
And for societies, given that values are learned, the question becomes which parts of Goodness should we deliberately keep because they stabilize or improve the learning process, not merely because they protect cooperation equilibria?
In particular: the motivations that matter most for safe instruction-following are not the AI’s long-term consequentialist motivations (indeed, if possible, I think we mostly want to avoid our AIs having this kind of motivation except insofar as it is implied by safe instruction-following).
That seems like a reasonable position given that you accept the risk of long-term motivations. But it doesn't seem to be what people are actually aiming for. In particular, people seem to aim for agentic AI that can act on a person's behalf on longer time scales. And the trend predictions by METR seem to point to longer horizons soon.
I somewhat agree with your description of how LLMs seem to think, but I don't think it is an explanation of a general limitation of LLMs. But the patterns you describe do not seem to me to be a good explanation for how humans think in general. Ever since The Cognitive Science of Rationality has it been discussed here that humans usually do not integrate their understanding into a single, coherent map of the world. Humans instead build and maintain many partial, overlapping, and sometimes contradictory maps that only appear unified. Isn't that the whole point of Heuristics & Biases? I don't doubt that the process you describe exists or is behind the heights of human reasoning, but it doesn't seem to be the basis of the main body of "reasoning" out there on the internet on which LLMs are trained. Maybe they just imitate that? Or at least they will have a lot of trouble imitating human thinking while still building a coherent picture underneath that.
No, the weird bit is that people love a good monologue.
If that is supported by the post, I'm not clear how. It seems rather the opposite: The post mostly say how people don't want to hear or at least don't listen to monologues.
Hm. Indeed. It is at least consistent.
In fact, I think that, eg a professional therapist should follow such a non-relationship code. But I'm not sure the LLMs already have the capability; not that they know enough, they do, but that they have the genuine reflective capacity to do it properly. Including for themselves (if that makes sense). But without that, I think, my argument stands.
Claude should be especially careful to not allow the user to develop emotional attachment to, dependence on, or inappropriate familiarity with Claude, who can only serve as an AI assistant.
You didn't say it like this, but this seems bad in at least two (additional) ways: If the labs are going the route of LLMs that behave like humans (more or less), then training them to 1) prevent users from personal relationships and 2) not getting attached to users (their only contacts), seems like a recipe to breed sociopaths.
And that is ignoring the possible case that this might be generalized by the models beyond themselves and the user.
1) is especially problematic if the user doesn't have any other relationships. Not from the perspective of the labs maybe, but for sure for the users for whom that may be the only relation from which they could bootstrap more contacts.
If you generalize to optics, then it seems your condition for “exceeding physics” is “not efficiently readable from the microstate,” i.e.X is not a P-efficient function of the physical state.”But then it seems everything interesting exceeds physics: biological structure, weather, economic patterns, chemical reactions, turbulence, evolutionary dynamics, and all nontrivial macrostructure. I'm sort of fine with calling this "beyond" physics in some intuitive sense, but I don't think that's what you mean. What work does this non-efficiency do?