Gunnar_Zarncke — LessWrong

LESSWRONG
LW

Homomorphically encrypted consciousness and its implications

If you generalize to optics, then it seems your condition for “exceeding physics” is “not efficiently readable from the microstate,” i.e.X is not a P-efficient function of the physical state.”But then it seems everything interesting exceeds physics: biological structure, weather, economic patterns, chemical reactions, turbulence, evolutionary dynamics, and all nontrivial macrostructure. I'm sort of fine with calling this "beyond" physics in some intuitive sense, but I don't think that's what you mean. What work does this non-efficiency do?

Homomorphically encrypted consciousness and its implications

Gunnar_Zarncke2d20

I'm worried we talk past each other.

You’re saying:

efficient reconstructibility is unclear in the rainbow case,
but whatever the right story is, it must handle both rainbow-like cases and the engineered homomorphic encryption case,
and if some of those cases force non-efficient supervenience, then we face your trilemma.

That part I agree with.

The point I’ve been trying to get at is: Once the same issue arises for ordinary optical appearances, we’ve left behind the special stakes of step 10! Because in the rainbow case, we all seem to accept (but maybe you disagree):

appearance is fully determined by physics,
but the mapping from microphysics to appearance may be extremely messy or intractable for an external observer,
and we don’t treat that as evidence that the visual appearance “exceeds physics.”

Or, if rainbow-style cases also fall under the trilemma, then the conclusion can’t be “mind exceeds physics." It would have to be the stronger and more surprising “appearances as such exceed physics” or “macrostructure in general exceeds physics.” That’s quite different from your original framing, which presents the homomorphic encryption case as demonstrating a distinctive epistemic excess of mind relative to physics.

Homomorphically encrypted consciousness and its implications

Gunnar_Zarncke2d20

It seems you are biting the bullet and agreeing that the rainbow also has the problem of how a mind can be aware of it when it isn't (efficiently) reconstructable. But then this seems to generalize to a lot, if not all, phenomena a mind can perceive. Doesn't this reduce that conception of a mind ad absurdum?

Homomorphically encrypted consciousness and its implications

Gunnar_Zarncke3d90

I think step 10 overstates what is shown. You write:

“If a homomorphically encrypted mind (with no decryption key) is conscious … it seems it knows things … that cannot be efficiently determined from physics.”

The move from “not P-efficiently determined from physics” to “mind exceeds physics (epistemically)” looks too strong. The same inferential template would force us into contradictions in ordinary physical cases where appearances are available to an observer but not efficiently reconstructible from the microphysical state.

Take a rainbow. Let p be the full microphysical state of the atmosphere and EM field, and let a be the appearance of the rainbow to an observer. The observer trivially “knows” a. Yet from p, even a quantum-bounded “Laplace’s demon” cannot, in general, P-efficiently compute the precise phenomenal structure of that appearance. The appearance does not therefore “exceed physics.”

If we accepted your step 10’s principle "facts accessible to a system but P-intractable to compute from p outrun physics" we would have to say the same about rainbows:

the rainbow’s appearance to an observer “knows something” physics can’t efficiently determine.

That is an implausible conclusion. The physical state fully fixes the appearance; what fails is only efficient external reconstruction, not physical determination.

Homomorphic encryption sharpens the asymmetry between internal access and external decipherability, but it does not introduce a new ontological gap.

So I agree with the earlier steps (digital consciousness, key distance irrelevance) but think the “mind exceeds physics (epistemically)” inference is a category error: it treats P-efficient reconstructability as a criterion for physical determination. If we reject that criterion in the rainbow case, we should reject it in the homomorphic case too.

Human Values ≠ Goodness

Gunnar_Zarncke9d40

I like the sharp distinction you draw between

“Our Values are (roughly) the yumminess or yearning…”

and

“Goodness is (roughly) whatever stuff the memes say one should value.”

but the post treats these as more separable than they actually are from the standpoint of how the brain acquires preferences.

You emphasize that

“we mostly don’t get to choose what triggers yumminess/yearning”

and that Goodness trying to overwrite that is “silly.” Yet a few paragraphs later you note that

“a nontrivial chunk of the memetic egregore Goodness needs to be complied with…”

before recommending to “jettison the memetic egregore” once the safety-function parts are removed.

But the brain’s value-learning machinery doesn’t respect this separation. “Yumminess/yearning” is not fixed hardware; it’s a constantly updated reward model trained by social feedback, imitation, and narrative framing. The very things you group under “Goodness” supply the majority of training data for what later becomes “actual Values.” The egregore is not only a coordination layer or a memetically selected structure on top, it is also the training signal.

Your own example shows this coupling. You say that

“Loving Connection… is a REALLY big chunk of their Values”

while also being a core part of Goodness. This dual function of a learned reward target and the memetic structure that teaches people to want it, is typical rather than exceptional.

So the key point isn't “should you follow Goodness or your Values?” but “which training signals should you expose your value-learning architecture to?” Then the Albert failure mode looks less like “he ignored Goodness” and more like “he removed a large portion of what shapes his future reward landscape.”

And for societies, given that values are learned, the question becomes which parts of Goodness should we deliberately keep because they stabilize or improve the learning process, not merely because they protect cooperation equilibria?

How human-like do safe AI motivations need to be?

Gunnar_Zarncke9d3-1

In particular: the motivations that matter most for safe instruction-following are not the AI’s long-term consequentialist motivations (indeed, if possible, I think we mostly want to avoid our AIs having this kind of motivation except insofar as it is implied by safe instruction-following).

That seems like a reasonable position given that you accept the risk of long-term motivations. But it doesn't seem to be what people are actually aiming for. In particular, people seem to aim for agentic AI that can act on a person's behalf on longer time scales. And the trend predictions by METR seem to point to longer horizons soon.

Insofar As I Think LLMs "Don't Really Understand Things", What Do I Mean By That?

Gunnar_Zarncke11d40

I somewhat agree with your description of how LLMs seem to think, but I don't think it is an explanation of a general limitation of LLMs. But the patterns you describe do not seem to me to be a good explanation for how humans think in general. Ever since The Cognitive Science of Rationality has it been discussed here that humans usually do not integrate their understanding into a single, coherent map of the world. Humans instead build and maintain many partial, overlapping, and sometimes contradictory maps that only appear unified. Isn't that the whole point of Heuristics & Biases? I don't doubt that the process you describe exists or is behind the heights of human reasoning, but it doesn't seem to be the basis of the main body of "reasoning" out there on the internet on which LLMs are trained. Maybe they just imitate that? Or at least they will have a lot of trouble imitating human thinking while still building a coherent picture underneath that.

Supervillain Monologues Are Unrealistic

Gunnar_Zarncke20d41

No, the weird bit is that people love a good monologue.

If that is supported by the post, I'm not clear how. It seems rather the opposite: The post mostly say how people don't want to hear or at least don't listen to monologues.

AI Craziness Mitigation Efforts

Gunnar_Zarncke24d20

Hm. Indeed. It is at least consistent.

In fact, I think that, eg a professional therapist should follow such a non-relationship code. But I'm not sure the LLMs already have the capability; not that they know enough, they do, but that they have the genuine reflective capacity to do it properly. Including for themselves (if that makes sense). But without that, I think, my argument stands.

AI Craziness Mitigation Efforts

Gunnar_Zarncke24d63

Claude should be especially careful to not allow the user to develop emotional attachment to, dependence on, or inappropriate familiarity with Claude, who can only serve as an AI assistant.

You didn't say it like this, but this seems bad in at least two (additional) ways: If the labs are going the route of LLMs that behave like humans (more or less), then training them to 1) prevent users from personal relationships and 2) not getting attached to users (their only contacts), seems like a recipe to breed sociopaths.

And that is ignoring the possible case that this might be generalized by the models beyond themselves and the user.

1) is especially problematic if the user doesn't have any other relationships. Not from the perspective of the labs maybe, but for sure for the users for whom that may be the only relation from which they could bootstrap more contacts.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments