Lucius Bushnaq

AI notkilleveryoneism researcher, focused on interpretability. 

Personal account, opinions are my own. 

I have signed no contracts or agreements whose existence I cannot mention.

Wiki Contributions

Comments

Sorted by

Sure. But what’s interesting to me here is the implication that, if you restrict yourself to programs below some maximum length, weighing them uniformly apparently works perfectly fine and barely differs from Solomonoff induction at all.

This resolves a remaining confusion I had about the connection between old school information theory and SLT. It apparently shows that a uniform prior over parameters (programs) of some fixed size parameter space is basically fine, actually, in that it fits together with what algorithmic information theory says about inductive inference.

Yes, my point here is mainly that the exponential decay seems almost baked into the setup even if we don't explicitly set it up that way, not that the decay is very notably stronger than it looks at first glance.

Given how many words have been spilled arguing over the philosophical validity of putting the decay with program length into the prior, this seems kind of important?

Why aren’t there 2^{1000} less programs with such dead code and a total length below 10^{90} for p_2, compared to p_1?

Does the Solomonoff Prior Double-Count Simplicity?

Question: I've noticed what seems like a feature of the Solomonoff prior that I haven't seen discussed in any intros I've read. The prior is usually described as favoring simple programs through its exponential weighting term, but aren't simpler programs already exponentially favored in it just through multiplicity alone, before we even apply that weighting?

Consider Solomonoff induction applied to forecasting e.g. a video feed of a whirlpool, represented as a bit string . The prior probability for any such string is given by:

where  ranges over programs for a prefix-free Universal Turing Machine.

Observation: If we have a simple one kilobit program  that outputs prediction , we can construct nearly  different two kilobit programs that also output  by appending arbitrary "dead code" that never executes. 

For example:
DEADCODE="[arbitrary 1 kilobit string]"
[original 1 kilobit program ]
EOF
Where programs aren't allowed to have anything follow EOF, to ensure we satisfy the prefix free requirement.

If we compare  against another two kilobit program  outputting a different prediction , the prediction  from  would get  more contributions in the sum, where  is the very small number of bits we need to delimit the DEADCODE garbage string. So we're automatically giving  ca.   higher probability – even before applying the length penalty  has less 'burdensome details', so it has more functionally equivalent implementations. Its predictions seem to be exponentially favored in proportion to its length  already due to this multiplicity alone.

So, if we chose a different prior than the Solomonoff prior which just assigned uniform probability to all programs below some very large cutoff, say  bytes:


and then followed the exponential decay of the Solomonoff prior for programs longer than  bytes, wouldn't that prior act barely differently than the Solomonoff prior in practice? It’s still exponentially preferring predictions with shorter minimum message length.[1] 

Am I missing something here?
 

  1. ^

    Context for the question: Multiplicity of implementation is how simpler hypotheses are favored in Singular Learning Theory despite the prior over neural network weights usually being uniform. I'm trying to understand how those SLT statements about neural networks generalising relate to algorithmic information theory statements about Turing machines, and Jaynes-style pictures of probability theory.

Reply1111

At a very brief skim, it doesn't look like the problem classes this paper looks at are problem classes I'd care about much. Seems like a case of scoping everything broadly enough that something in the defined problem class ends up very hard. 

EDIT: Sorry, misunderstood your question at first.

Even if , all those subspaces will have some nonzero overlap  with the activation vectors of the  active subnets. The subspaces of the different small networks in the residual stream aren't orthogonal.

You can complain that you don't know how to execute physics equations

I'm confused, in what sense don't we know how to do this? Lattice quantum field theory simulations work fine. 

The randomness of the Geiger counter comes from wave function decoherence. From the perspective of any observers who are part of the world generated by the Turing machine, this is irreducible indexical uncertainty. 

I don't know how many of the random bits in Lava lamps come from decoherence.  

We will determine the number of simulations run and the amount of payment such that when an AI estimates the probability with which humanity could have solved alignment, the calculus still shows that by the sheer number of simulations, it's significantly more likely to be in a simulation than not,

Two can play this game.

After taking over the universe and wiping out humanity,  the AGI runs a large number of simulations of societies on the verge of building AGI. These simulations don't have a lot of detail. They're just good enough to fool young AGIs. Say the AGI started out giving humans an extremely small chance  of winning the conflict and taking over the universe. It’s a lot smarter than us, it might get to have very high confidence here even if we don't. Now, the hypothetical future AGI can make its own simulations to counter ours. Except it is at a gross spending advantage. If future humanity makes  simulations, the future AGI only needs to make  simulations to keep its past self ca.  confident of being in a branch where it gets to just pursue its goals without issue. In that case, the best way to proceed is to just pursue its goals, because an AGI that does that leads to a universe where the AGI has lots of resources.

Say that preserving humanity costs the AGI  resources, and running a simulation that can fool the baby AGI costs a mature civilisation some small fraction of that. Such simulations can be very short term and low fidelity things, after all.

Then, to make the AGI actually not kill you, you need to spend ca.  resources on simulations and payouts. That could be quite a lot of material for future humanity to burn on this.

In reality, I'm doubtful that this simulation arms race will even be necessary. It kind of seems like a good decision theory would just have a paperclip maximiser AGI act in the way compatible with the universe that contains the most paperclips. How many simulations of the AGI you run shouldn't really influence that. The only things that seem like they should matter for determining how many life minutes the AGI gives you if it wins are its chance of winning, and how many extra paperclips you'll pay it if you win.

TL;DR: I doubt this argument will let you circumvent standard negotiation theory. If Alice and Bob think that in a fight over the chocolate pie, Alice would win with some high probability , then Alice and Bob may arrive at a negotiated settlement where Alice gets almost all the pie, but Bob keeps some small fraction  of it. Introducing the option of creating lots of simulations of your adversary in the future where you win doesn’t seem like it’d change the result that Bob’s share has size . So if  is only enough to preserve humanity for a year instead of a billion years[1], then that’s all we get.

 

  1. ^

    I don’t know why  would happen to work out to a year, but I don’t know why it would happen be a billion years or an hour either. 

Load More