LESSWRONG
LW

580
Vladimir_Nesov
35771Ω5524799061508
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Condensation
Vladimir_Nesov22h20

It's an argument from cosmic normality, about the scope of applicability of such methods. Like medicine or biology, the relevance is a temporary accident of the current phase of human condition. I'm not sure how natural the impression of overclaiming applicability of physics, statistics, information theory, or machine learning is, perhaps this is quite clear already.

The point is that these things are not obviously at all relevant to the nature of agency, or values. You could in principle have frail human biological bodies within a simulated world, and practice medicine on them, but that's hardly a central thing that happens in a post-computronium world.

Reply
Condensation
Vladimir_Nesov2d6-2

At some point essentially everything of value is going to be abstract computation, and anything outside of that will be either noise or fully understood, the physical substrate. It's not clear how that abstract computation should be structured, but the experience from engineering of the physical systems or communication technologies, the classical ways of understanding the physical world, isn't obviously that relevant for agent foundations.

Reply2
13 Arguments About a Transition to Neuralese AIs
Vladimir_Nesov2d20

Even with text-only reasoning traces, there is inscrutable thinking in the activations that can span the whole reasoning trace (with global attention), it just has to remain shallow, as its depth is only given by layers, and its depth can only be extended by looping around through the text-only reasoning trace. There will be some steganography (RL makes sure of this), even if it's not substantial, that increases the effective depth of inscrutable thinking beyond a single pass through the layers. But it stops with the reasoning trace, so this affordance is still bounded.

Continual learning greatly extends the thinking horizon, likely using inscrutable weight updates, thus in particular extending the feasible depth of inscrutable thinking that can avoid showing up in the text-only reasoning traces.

Reply
13 Arguments About a Transition to Neuralese AIs
Vladimir_Nesov2d40

Continual learning destroys the hope that we get to early AGIs while avoiding the issues with neuralese, because it almost certainly requires inscrutable communication that maintains thinking for a long time (likely in the form of some kind of test time weight updates), and it's an essential capability. With neuralese itself, there is a significant possibility that it doesn't get too much better by that time yet, but continual learning does need to happen first (as a capability rather than a particular algorithmic breakthrough), and it more inherently has the same issues.

Reply
Myopia Mythology
Vladimir_Nesov3dΩ6130

Strange attitude towards the physical world can be reframed as caring only about some abstract world that happens to resemble the physical world in some ways. A chess AI could be said to be acting on some specific physical chessboard within the real world and carefully avoiding all concern about everything else, but it's more naturally described as acting on just the abstract chessboard, nothing else. I think values/preference (for some arbitrary agent) should be not just about probutility upon the physical world, but should also specify which world they are talking about, so that different agents are not just normatively disagreeing about relative value of events, but about which worlds are worth caring about (not just possible worlds within some space of nearby possible worlds, but fundamentally very different abstract worlds), and therefore what kinds of events (from which sample spaces) ought to serve as semantics for possible actions, before their value can be considered.

A world model (such as an LLM with frozen weights) is already an abstraction, its data is not the same as the physical world itself, but it's coordinated with the physical world to some extent, similarly to how an abstract chessboard is coordinated with a specific physical chessboard in the real world (learning is coordination, adjusting the model so that the model and the world have more shared explanations for their details). Thus acting within an abstract world given by a world model (as opposed to within the physical world itself) might be a useful framing for systematically ignoring some aspects of the physical world, and world models could be intentionally crafted to emphasize particular aspects.

Reply
Comparing Payor & Löb
Vladimir_Nesov3dΩ16259

I would term □x→x "hope for x" rather than "reliability", because it's about willingness to enact x in response to belief in x, but if x is no good, you shouldn't do that. Indeed, for bad x, having the property of □x→x is harmful fatalism, following along with destiny rather than choosing it. In those cases, you might want to □x→¬x or something, though that only prevents x from being believed, that you won't need to face □x in actuality, it doesn't prevent the actual x. So □x→x reflects a value judgement about x reflected in agent's policy, something downstream of endorsement of x, a law of how the content of the world behaves according to an embedded agent's will.

Payor's Lemma then talks about belief in hope □(□x→x), that is hope itself is exogenous and needs to be judged (endorsed or not). Which is reasonable for games, since what the coalition might hope for is not anyone's individual choice, the details of this hope couldn't have been hardcoded in any agent a priori and need to be negotiated during a decision that forms the coalition. A functional coalition should be willing to act on its own hope (which is again something we need to check for a new coalition, that might've already been the case for a singular agent), that is we need to check that □(□x→x) is sufficient to motivate the coalition to actually x. This is again a value judgement about whether this coalition's tentative aspirations, being a vehicle for hope that x, are actually endorsed by it.

Thus I'd term □(□x→x) "coordination" rather than "trust", the fact that this particular coalition would tentatively intend to coordinate on a hope for x. Hope □x→x is a value judgement about x, and in this case it's the coalition's hope, rather any one agent's hope, and the coalition is a temporary nascent agency thing that doesn't necessarily know what it wants yet. The coalition asks: "If we find ourselves hoping for x together, will we act on it?" So we start with coordination about hope, seeing if this particular hope wants to settle as the coalition's actual values, and judging if it should by enacting x if at least coordination on this particular hope is reached, which should happen only if x is a good thing.

(One intuition pump with some limitations outside the provability formalism is treating □x as "probably x", perhaps according to what some prediction market tells you. If "probably x" is enough to prompt you to enact x, that's some kind of endorsement, and it's a push towards increasing the equilibrium-on-reflection value of probability of x, pushing "probably x" closer to reality. But if x is terrible, then enacting it in response to its high probability is following along with self-fulfilling doom, rather doing what you can to push the equilibrium away from it.)

Löb's Theorem then says that if we merely endorse a belief by enacting the believed outcome, this is sufficient for the outcome to actually happen, a priori and without that belief yet being in evidence. And Payor's Lemma says that if we merely endorse a coalition's coordinated hope by enacting the hoped-for outcome, this is sufficient for the outcome to actually happen, a priori and without the coordination around that hope yet being in evidence. The use of Löb's Theorem or Payor's Lemma is that the condition (belief in x, or coordination around hope for x) should help in making the endorsement, that is it should be easier to decide to x if you already believe that x, or if you already believe that your coalition is hoping for x. For coordination, this is important because every agent can only unilaterally enact its own part in the joint policy, so it does need some kind of premise about the coalition's nature (in this case, about the coalition's tentative hope for what it aims to achieve) in order to endorse playing its part in the coalition's joint policy. It's easier to decide to sign an assurance contract than to unconditionally donate to a project, and the role of Payor's Lemma is to say that if everyone does sign the assurance contract, then the project will in fact get funded sufficiently.

Reply1
LWLW's Shortform
Vladimir_Nesov4d70

Power-centralisation in a post-AGI world is not about wielding humans, unlike in a pre-AGI world. Power is no longer power over humans doing your bidding, because humans doing your bidding won't give you power. By orthogonality, any terrible thing can in principle be someone's explicit intended target (an aspiration, not just a habit shaped by circumstance), but that's rare. Usually the terrible things are (a side effect of) an instrumentally useful course of action that has other intended goals, even where in the final analysis the justification doesn't quite work.

Reply
LWLW's Shortform
Vladimir_Nesov4d110

Most s-risk scenarios vaguely analogous to historical situations don't happen in a post-AGI world, because there humans aren't useful for anything, either economically or in terms of maintaining power (unlike how they were throughout human history). It's not useful for the entities in power to do any of the things with traditionally terrible side effects.

Absence of feedback loops for treating people well (at the level of humanity as a whole) is its own problem, but it's a distinct kind of problem. It doesn't necessarily settle poorly (at the level of individuals and smaller communities) in a world with radical abundance, if indeed even a tiny fraction of the global resources gets allocated to the future of humanity, which is the hard part to ensure.

Reply
13 Arguments About a Transition to Neuralese AIs
Vladimir_Nesov4d60

Even Anthropic is building frontier AIs (since Opus 3 or Sonnet 3.5; this was a bit of a surprise to some at the time). Thus if the hypothetical breakthrough of meaningfully better capabilities from neuralese happens (in whatever form), all AI companies will start making use of it, as soon as the immediate behavioral downsides get mitigated to the usual level. Any other kinds of downsides won't be a reason to not go there, with frontier AIs.

Also, continual learning is analogous to neuralese, an inscrutable way of preserving/propagating information with long chains of reasoning, different from text-only notes. In both cases, you can build a textual "transcript" of a process of thinking, but it's not necessarily faithful, and doesn't screen off earlier thinking from later thinking.

Reply
A 2032 Takeoff Story
Vladimir_Nesov5d40

Trillions in revenue is more plausible if continual learning is somewhat working soon and gradually getting better, rather than mostly absent until it's unlocked in 2033. With gradual improvement in continual learning, AGI timelines might get more like gradual disempowerment timelines, no clear thresholds until it's suddenly already been too late for a while.

Reply
Load More
103Musings on Reported Cost of Compute (Oct 2025)
18d
11
80Permanent Disempowerment is the Baseline
3mo
23
50Low P(x-risk) as the Bailey for Low P(doom)
3mo
29
66Musings on AI Companies of 2025-2026 (Jun 2025)
5mo
4
34Levels of Doom: Eutopia, Disempowerment, Extinction
5mo
1
196Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall
6mo
25
176Short Timelines Don't Devalue Long Horizon Research
Ω
7mo
Ω
24
19Technical Claims
7mo
0
151What o3 Becomes by 2028
11mo
15
41Musings on Text Data Wall (Oct 2024)
1y
2
Load More
10Vladimir_Nesov's Shortform
Ω
1y
Ω
142
Well-being
2 months ago
(+58/-116)
Sycophancy
2 months ago
(-231)
Quantilization
2 years ago
(+13/-12)
Bayesianism
3 years ago
(+1/-2)
Bayesianism
3 years ago
(+7/-9)
Embedded Agency
3 years ago
(-630)
Conservation of Expected Evidence
4 years ago
(+21/-31)
Conservation of Expected Evidence
4 years ago
(+47/-47)
Ivermectin (drug)
4 years ago
(+5/-4)
Correspondence Bias
4 years ago
(+35/-36)
Load More