Software engineering, parenting, cognition, meditation, other
Linkedin, Facebook, Admonymous (anonymous feedback)
Yes, the description length of each dimension can still be high, but not arbitrarily high.
Steven Byrnes talks about thousands of lines of pseudocode in the "steering system" in the brain-stem.
Does the above derivation mean that values are anthropocentric? Maybe kind of. I'm deriving only an architectural claim: Bounded evolved agents compress their control objectives through a low-bandwidth interface. Humans are one instance. AI's are different. They are designed and any evolutionary pressure on the architecture is not on anything value-like. If an AI has no such bottleneck, inferring and stabilizing its ‘values’ may be strictly harder. If it has one, it depends on its structure. Alignment might generalize, but not necessarily to human-compatible values.
Existing consciousness theories do not make predictions.
Huh? There are many predictions. The obvious ones:
Empirically measurable effects of at least some aspects of consciousness are totally routine in anesthesia - otherwise, how would you be confident the patient is unconscious during a procedure. The Appendix of my Metacognition post lists quite a few measurable effects.
I think the problem is again that people can't agree on what they mean by consciousness. I'm sure there is a reading where there are no predictions. But any theory that models it as a Physical Process necessarily makes predictions.
Thanks for writing this! I finally got around to reading it, and I think it is a great reverse-engineering of these human felt motivations. I think I'm buying much of it, but I have been thinking of aggregation cases and counterexamples, and would like to hear your take on it.
A friend wins an award; I like them, but I feel a stab of envy (sometimes may wish they’d fail). That is negative valence without “enemy” label, and not obviously about their attention to me. For example:
when another outperforms the self on a task high in relevance to the self, the closer the other the greater the threat to self-evaluation. -- Some affective consequences of social comparison and reflection processes: the pain and pleasure of being close; Tesser et al., 1988
Is the idea that the "friend/enemy" variable is actually more like "net expected effect on my status," so a friend’s upward move can locally flip them into a threat?
I can dislike a competitor and still feel genuine admiration for their competence or courage. If "enemy" is on, why doesn’t it reliably route through provocation or schadenfreude? Do you think admiration is just a different reward stream, or does it arise when the "enemy" tag is domain-specific?
E.g., an opposing soldier or a political adversary is injured and I feel real compassion, even if I still endorse opposing them.
Two-thirds of respondents (65 per cent) say they would save the life of a surrendering enemy
combatant who had killed a person close to them, but almost one in three (31 per cent) say
they would not. The same holds true when respondents are asked if they would help a
wounded enemy combatant who had killed someone close to them (63 per cent compared
with 33 per cent). -- PEOPLE ON WAR Country report Afghanistan
This feels like “enemy × their distress” producing sympathy rather than schadenfreude. Is your take that “enemy” isn’t a stable binary at all—that vivid pain cues can transiently force a “person-in-pain” interpretation that overrides coalition tagging?
Someone helps me. I feel gratitude and an urge to reciprocate. It doesn’t feel like "approval reward" (I’m not enjoying being regarded highly). It feels more like a debt.
Perceived benevolent helper intentions were associated with higher gratitude from beneficiaries compared to selfish ones, yet had no associations with indebtedness. -- Revisiting the effects of helper intentions on gratitude and indebtedness: Replication and extensions Registered Report of Tsang (2006)
Do you see gratitude as downstream of the same "they’re thinking about me" channel, or as a separate ledger?
People often report guilt as a direct response to "I did wrong," even when they’re confident nobody will know.
when opportunities for compensation are not present, guilt may evoke self-punishment. -- When guilt evokes self-punishment: evidence for the existence of a Dobby Effect
I'm not sure that fits guilt from "imagined others thinking about me." It looks like a norm-violation penalty that doesn’t need the “about-me attention” channel. Do you have a view on which way it goes?
I have been wondering about if the suggested processing matches what we would expect for larger groups of people (that could all be friend/enemy and/or thinking of me or not. And there seem to be at least two different processes going on:
Compassion doesn’t scale with the number of people attended to. This seems to be well established for Identifiable Victim and Numbing. When harm is spread over many victims, affect often collapses into numbness unless one person becomes vivid. That matches your attentional bottleneck.
But evaluation does seem to scale with headcount, at least in stage fright and other audience effects.
Maybe a roomful of people can feel strongly like “they’re thinking about me,” even if you’re not tracking anyone individually? But then the “about-me attention” variable would be computed at the group level, which complicates your analysis.
What do you think about my arguments in Thou art rainbow: Consciousness as a Self-Referential Physical Process?
I have read almost all of this dialog, and my half-serious upshot is:
An agent A can't prove that another agent B is correct in both its reasoning as well as semantics, but that doesn't matter because it can't trust its own reasoning to that degree either.
This glosses over a lot of details in the long and charitable comment thread above. I tried to get an overview of it with ChatGPT. I'm surprised how well that worked:
ChatGPT 5.2 extended thinking summary of the misunderstanding,
Let:
A natural “soundness schema relative to S” is:
Sound(L,S) := ∀φ (□Lφ→TrueS(φ)).
The Löbian obstacle setup (as Morgan summarizes it) is that a designer agent A wants to rely on proofs produced by a subordinate B, and this seems to demand something like a schema □Lφ→φ (or its intended-world analogue) for arbitrary φ, which is blocked by Löb-ish reasoning.
So far: aligned.
Demski treats “escape the obstacle” as: produce an agent design whose decision procedure can rationally delegate mission-critical tasks under something like the formal constraints studied in the Löbstacle literature.
A clean formalization of what he keeps asking for is:
Hence his repeated demand: “what modification of the agent designs do you have in mind?” and later: “I still don’t understand how you propose for the agent to reason/behave differently.”
Morgan’s post is primarily: diagnose the source of the apparent paradox as a category error—treating a semantic claim (“my proofs track the world”) as if it were a syntactic theorem inside LLL.
His move is to relocate Sound(L,S) out of LLL and into a separate layer of “metalogical belief” that lives at the interface between proofs and world-interpretation. He says the agent need not (and cannot) prove the semantic mapping is sound; rather, the agent assumes/believes it when applying proofs to the world.
This is why he says “the agent doesn’t change its reasoning” (meaning: the proof calculus inside LLL is unchanged).
Demski hears:
“Allowing for metalogical beliefs immediately dissolves the obstacle.”
and translates it into the most standard formal move:
Then he points out the familiar reflection hierarchy problem: L′ can at best endorse L, but not itself; so you get “trust weaker but not equal-strength” all over again. He explicitly frames this as “you just get a new system L′… which doesn’t believe its own soundness.” and distinguishes “L is sound” from “L plus me is sound.”
Morgan, meanwhile, is not proposing “add Sound(L,S) as an axiom used inside the proof system that generates actions.” He keeps insisting the semantic map is “defined on L, not on L′,” and that for the delegation story you can restrict attention to accepting L-proofs.
So the mismatch is:
Formally, Morgan is closer to specifying a two-layer agent:
Demski’s objection is then immediate: if Acc is where the action happens, you have changed the decision architecture; and unless Acc is itself formally grounded, you haven’t solved the original agent-design problem, only renamed it. This is exactly his “back pocket” complaint: “fine, I grant the agent ‘has’ the semantic map; how does it use it?”
They overload “reasoning” differently.
This is why Demski keeps hearing a contradiction in Morgan’s stance:
“No, the agent doesn’t change its reasoning… The change is that we don’t require A to prove that their semantic mapping is sound…”
Under Demski’s formal lens, “not requiring proof” is already a behavioral modification, because the original Löbstacle agent was (by assumption) constrained to act only when it could justify that constraint internally.
Morgan tries to narrow “trust” to “trusting LLL-proofs”:
Demski pushes back that the Löbstacle was about mission-critical delegation, not merely proof checking; restricting to a weaker fragment can amount to “make the delegate dumber,” which is the classic non-solution.
Formally, Demski is insisting on a property like:
A delegates ⟹ ∃ nontrivial tasks where B’s full cognition matters, not just L-proof output.
Morgan is (at least in that subthread) defending a weaker property:
A trusts B ⟹ A accepts any syntactically valid L-proof produced by B as action-relevant.
Those are different “trust” relations, and each is natural given a different target.
Morgan phrases the bridge as normative:
if φ is provable in L, the interpretation of φ “ought to be true”.
Demski hears “ought” as moral/normative in the usual sense and suggests rewriting it as a prediction (“will probably be true”).
But structurally, this is the same dispute:
If you compress all of the above into two “theorems” they are implicitly trying to establish:
If an agent’s action rule is allowed to use an extra-logical acceptance bridge Acc that is not required to be provable inside L, then Löb’s prohibition “L can’t prove its own soundness schema” does not block the agent from in fact relying on L-proofs in the world (conditional on S being sound).
This is basically: “Löb blocks internal proof of Sound(L,S), not the practical stance of treating proofs as reliable.”
If we keep the Löbstacle-style constraint that delegation must be justified within the agent’s own formal apparatus (or equivalently, we forbid a free external Acc oracle), then Morgan’s move either (i) reappears as L′ and triggers the reflection hierarchy, or (ii) reduces to trusting only weaker fragments, which is not the desired kind of delegation.
So: Morgan is dissolving a philosophical inference (“therefore no agent can trust its reasoning”), while Demski is defending a formal obstacle statement about a particular class of reflective agent designs.
Both can be simultaneously right—because the quantifiers differ.
It gets the core point:
the Löbstacle was about mission-critical delegation, not merely proof checking; restricting to a weaker fragment can amount to “make the delegate dumber,” which is the classic non-solution.
It is a non-solution in the sense that it doesn't let the sub agent B run free because it can be fully trusted. But Morgan’s move does seem to enable a save kind of delegation. So practically, the different approaches come down to:
Clearly, 1 is weaker than 2. But we can't get 2 anyway, so getting 1 seems like a win.
And maybe we can extend 1 into a full agent by wrapping B into a verifier. And that would nest for repeated delegation.
I'm not sure we can directly apply solid state physics to NNs, but we may approximate some parts of the NNs with a physical model and transfer theorems there. I'm thinking of Lorzenzo Tomaz' work on Momentum Point-Perplexity Mechanics in Large Language Models (disclaimer: I worked with him at AE Studio).
What is the relative cost between Aerolamp and regular air purifiers?
For regular air purifiers, ChatGPT 5.2 estimates 0.2€/1000m3 of filtered air.
From the Aerolamp website:
How many Aerolamps do I need?
Short answer: 1 for a typical room, or about every 250 square feet
Long answer: It's complicated
Unlike technologies like air filters, the efficacy of germicidal UV varies by pathogen. Some pathogens, like human coronaviruses, are very sensitive to far-UVC. Others are more resistant. However, there is significant uncertainty in just how sensitive various pathogens are to UV light.
The key metric to look for in all air disinfection technologies is the Clean Air Delivery Rate (CADR), usually given in cubic feet per minute (cfm). A typical high-quality portable air-cleaner has a CADR of around 400 cfm - a more typical one will deliver 200 cfm.
For a typical 250 square foot room with 9 foot ceilings, Aerolamp has an expected CADR of 200-1500 cfm, depending on the pathogen and the study referenced.
And ChatGPT estimates 0.02 to 0.3€/1000m³ for the Areolamp - quite competitive esp. given that it is quieter.
Mesa-optimizers are the easiest case of detectable agency in an AI. There are more dangerous cases. One is Distributed agency, where the agent is spread across tooling, models, and maybe humans or other external systems, and the gradient driving the evolution is the combination of the local and overall incentives.
Mesa-Optimization is introduced in Risks from Learned Optimization: Introduction and probably because it was the first type of learned optimization, it has driven much of the conversation. It makes some implicit assumptions: that learned optimization is compact in the sense of being a sub-system of the learning system, coherent due to the simple incentive structure, and a stable pattern that can be inspected in the AI system (hence Mechinterp).
These assumptions do not hold in the more general case where agency from learned optimization may develop in more complex setups, such as the current generation of LLM agents, which consist of an LLM, a scaffolding, and tools, including memory. In such a system, memetic evolution of the patterns in memory or external tools are part of the learning dynamics of the overall systems, and we can no longer go by the incentive gradients (benchmark optimization) of the LLM alone. We need to interpret the system as a whole.
Just because the system is designed as an agent doesn't mean that the actual agent coincides with the designed agent. We need tools to deal with these hybrid systems. Methods like Unsupervised Agent Discovery could help pin down the agent in such systems, and Mechinterp has to be extended to span borders between LLMs.