dxu

LESSWRONG
LW

dxu — LessWrong

Replying toA Simple Toy Coherence Theorem

The constant bound isn't not that relevant just because of the in principal unbounded size, it also doesn't constrain the induced probabilities in the second coding scheme much at all. It's an upper bound on the maximum length, so you can still have the weightings in codings scheme B differ differ in relative length by a ton, leading to wildly different priors

Your phrasing here is vague and somewhat convoluted, so I have difficulty telling if what you say is simply misleading, or false. Regardless:

If you have UTM1 and UTM2, there is a constant-length prefix P such that UTM1 with P prepended to some further bitstring as input will compute whatever UTM2 computes... (read 353 more words →)

Replying toA Simple Toy Coherence Theorem

dxu1y*

A Simple Toy Coherence Theorem

All possible encoding schemes / universal priors differ from each other by at most a finite prefix. You might think this doesn't achieve much, since the length of the prefix can be in principle unbounded; but in practice, the length of the prefix (or rather, the prior itself) is constrained by a system's physical implementation. There are some encoding schemes which neither you nor any other physical entity will ever be able to implement, and so for the purposes of description length minimization these are off the table. And of the encoding schemes that remain on the table, virtually all of them will behave identically with respect to the description lengths they assign to "natural" versus "unnatural" optimization criteria.

Replying toWhy Not Subagents?

dxu1y

Why Not Subagents?

It looks to me like the "updatelessness trick" you describe (essentially, behaving as though certain non-local branches of the decision tree are still counterfactually relevant even though they are not — although note that I currently don't see an obvious way to use that to avoid the usual money pump against intransitivity) recovers most of the behavior we'd see under VNM anyway; and so I don't think I understand your confusion re: VNM axioms.

E.g. can you give me a case in which (a) we have an agent that exhibits preferences against whose naive implementation there exists some kind of money pump (not necessarily a repeatable one), (b) the agent can implement the updatelessness trick in order to avoid the money pump without modifying their preferences, and yet (c) the agent is not then representable as having modified their preferences in the relevant way?

Replying toYou can, in fact, bamboozle an unaligned AI into sparing your life

dxu1y*

You can, in fact, bamboozle an unaligned AI into sparing your life

I think I might be missing something, because the argument you attribute to Dávid still looks wrong to me. You say:

The entropy of the simulators’ distribution need not be more than the entropy of the (square of the) wave function in any relevant sense. Despite the fact that subjective entropy may be huge, physical entropy is still low (because the simulations happen on a high-amplitude ridge of the wave function, after all).

Doesn't this argument imply that the supermajority of simulations within the simulators' subjective distribution over universe histories are not instantiated anywhere within the quantum multiverse?

I think it does. And, if you accept this, then (unless for some reason you think the... (read 407 more words →)

Replying toYou can, in fact, bamboozle an unaligned AI into sparing your life

dxu1y*

You can, in fact, bamboozle an unaligned AI into sparing your life

The AI has a similarly hard time to the simulators figuring out what's a plausible configuration to arise from the big bang. Like the simulators have an entropy N distribution of possible AIs, the AI itself also has an entropy N distribution for that. So it's probability that it's in a real Everett branch is not p, but p times 2^-N, as it has only a 2^-N prior probability that the kind of word it observes is the kind of thing that can come up in a real Everett branch. So it's balanced out with the simulation hypothesis, and as long as the simulators are spending more planets, that hypothesis wins.

If I... (read more)

Replying toWe Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap

dxu1y

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap

These two kinds of “learning” are not synonymous. Adaptive systems “learn” things, but they don’t necessarily “learn about” things; they don’t necessarily have an internal map of the external territory. (Yes, the active inference folks will bullshit about how any adaptive system must have a map of the territory, but their math does not substantively support that interpretation.) The internal heuristics or behaviors “learned” by an adaptive system are not necessarily “about” any particular external thing, and don’t necessarily represent any particular external thing.

I think I am confused both about whether I think this is true, and about how to interpret in such a way that it might be true. Could you go into more detail on what it means for a learner to learn something without there being some representational semantics that could be used to interpret what it's learned, even if the learner itself doesn't explicitly represent those semantics? Or is the lack of explicit representation actually the core substance of the claim here?

dxu1yQuick Take

It seems the SOTA for training LLMs has (predictably) pivoted away from pure scaling of compute + data, and towards RL-style learning based on (synthetic?) reasoning traces (mainly CoT, in the case of o1). AFAICT, this basically obviates safety arguments that relied on "imitation" as a key source of good behavior, since now additional optimization pressure is being applied towards correct prediction rather than pure imitation.

dxu1y

Strictly speaking, this seems very unlikely, since we know that e.g. CoT increases the expressive power of Transformers.

Ah, yeah, I can see how I might've been unclear there. I was implicitly taking CoT into account when I talked about the "base distribution" of the model's outputs, as it's essentially ubiquitous across these kinds of scaffolding projects. I agree that if you take a non-recurrent model's O(1) output and equip it with a form of recurrent state that you permit to continue for O(n) iterations, that will produce a qualitatively different distribution of outputs than the O(1) distribution.

In that sense, I readily admit CoT into the class of improvements I earlier characterized as... (read 420 more words →)

dxu1y

And I suspect we probably can, given scaffolds like https://sakana.ai/ai-scientist/ and its likely improvements (especially if done carefully, e.g. integrating something like Redwood's control agenda, etc.). I'd be curious where you'd disagree (since I expect you probably would) - e.g. do you expect the AI scientists become x-risky before they're (roughly) human-level at safety research, or they never scale to human-level, etc.?

Jeremy's response looks to me like it mostly addresses the first branch of your disjunction (AI becomes x-risky before reaching human-level capabilities), so let me address the second:

I am unimpressed by the output of the AI scientist. (To be clear, this is not the same thing as being unimpressed by the... (read more)

dxu1y

I'm interested! Also curious as to how this is implemented; are you using retrieval-augmented generation, and if so, with what embeddings?

Epistemic status: exploratory, "shower thought", written as part of a conversation with Claude:

For any given entity (broadly construed here to mean, essentially, any physical system), it is possible to analyze that entity as follows:

Define the set of possible future trajectories that entity might follow, according to some suitably uninformative ignorance prior on its state and (generalized) environment. Then ask, of that set, whether there exists some simple, obvious, or otherwise notable prior on the set in question, that assigns probabilities to various member trajectories in such a way as to establish an upper level set of some kind. Then ask, of that upper level set, how large it is relative to the

dxu

dxu, habryka

The below is me (habryka) and dxu talking about a shortform that dxu had published a few months ago, going into the relationship between goal-oriented cognition and the myopic nature of current large language model training setups. Some key quotes if you don't want to read the whole thing:

I think it's interesting to think about how brains get around the issue [of having limited context]. Obviously the patterns in which biological neurons fire don't carve cleanly into "forward passes" the way ANNs do, but ultimately there's only so much computation that can occur in the brain within some (small) unit of time. In that sense, I think I want to claim that

... (read 7602 more words →)

Something I wrote recently as part of a private conversation, which feels relevant enough to ongoing discussions to be worth posting publicly:

The way I think about it is something like: a "goal representation" is basically what you get when it's easier to state some compact specification on the outcome state, than it is to state an equivalent set of constraints on the intervening trajectories to that state.
In principle, this doesn't have to equate to "goals" in the intuitive, pretheoretic sense, but in practice my sense is that this happens largely when (and because) permitting longer horizons (in the sense of increasing the length of the minimal sequence needed to reach some terminal

... (read more)

dxu's Shortform

dxu

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

LESSWRONG
LW

LESSWRONG
LW

dxu

Goal oriented cognition in "a single forward pass"

dxu's Shortform

dxu

dxu

Goal oriented cognition in "a single forward pass"

dxu's Shortform