LESSWRONG
LW

1225
Lucius Bushnaq
4524Ω196113661
Message
Dialogue
Subscribe

AI notkilleveryoneism researcher, focused on interpretability. 

Personal account, opinions are my own. 

I have signed no contracts or agreements whose existence I cannot mention.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
8Lucius Bushnaq's Shortform
1y
105
The Mom Test for AI Extinction Scenarios
Lucius Bushnaq8h50

For what it's worth, my mother read If Anyone Builds It, Everyone Dies and seems to have been convinced by it. She's probably not very representative though. She had prior exposure to AI x-risk arguments through me, is autistic, has a math PhD, and is a Gödel, Escher, Bach fan. 

Reply
johnswentworth's Shortform
Lucius Bushnaq10d40

The proposal at the end looks somewhat promising to me on a first skim. Are there known counterpoints for it?

Reply
johnswentworth's Shortform
Lucius Bushnaq10d62

I agree that this seems maybe useful for some things, but not for the "Which UTM?" question in the context of debates about Solomonoff induction specifically, and I think that's the "Which UTM?" question we are actually kind of philosophically confused about. I don't think we are philosophically confused about which UTM to use in the context of us already knowing some physics and wanting to incorporate that knowledge into the UTM pick, we're confused about how to pick if we don't have any information at all yet.

Reply1
johnswentworth's Shortform
Lucius Bushnaq10d*60

Attempted abstraction and generalization: If we don't know what the ideal UTM is, we can start with some arbitrary UTM U1, and use it to predict the world for a while. After (we think) we've gotten most of our prediction mistakes out of the way, we can then look at our current posterior, and ask which other UTM U2 might have updated to that posterior faster, using less bits of observation about (our universe/the string we're predicting). You could think of this as a way to define what the 'correct' UTM is. But I don't find that definition very satisfying, because the validity of this procedure for finding a good U2 depends on how correct the posterior we've converged on with our previous, arbitrary, U1 is. 'The best UTM is the one that figures out the right answer the fastest' is true, but not very useful.

Is the thermodynamics angle gaining us any more than that for defining the 'correct' choice of UTM? 

We used some general reasoning procedures to figure out some laws of physics and stuff about our universe. Now we're basically asking what other general reasoning procedures might figure out stuff about our universe as fast or faster, conditional on our current understanding of our universe being correct. 

Reply
johnswentworth's Shortform
Lucius Bushnaq11d40

Why does it make Bayesian model comparison harder? Wouldn't you get explicit predicted probabilities for the data X from any two models you train this way? I guess you do need to sample from the Gaussian in λ a few times for each X and pass the result through the flow models, but that shouldn't be too expensive.

Reply
johnswentworth's Shortform
Lucius Bushnaq11d40

Did that clarify?

Yes. Seems like a pretty strong assumption to me. 

Yup, it sure does look similar. One tricky point here is that we're trying to fit the f's to the data, so if going that route we'd need to pick some parametetric form for f.

Ah. In that case, are you sure you actually need Z to do the model comparisons you want? Do you even really need to work with this specific functional form at all? As opposed to e.g. training a model p(λ∣X) to feed its output into m tiny normalizing flow models which then try to reconstruct the original input data with conditional probability distributions qi(xi∣λ)?

To sketch out a little more what I mean, p(λ∣X) could e.g. be constructed as a parametrised function[1] which takes in the actual samples X and returns the mean of a Gaussian, which λ is then sampled from in turn[2]. The qi(xi∣λ) would be constructed using normalising flow networks[3], which take in λ as well as uniform distributions over variables zi that have the same dimensionality as their xi. Since the networks are efficiently invertible, this gives you explicit representations of the conditional probabilities qi(xi∣λ), which you can then fit to the actual data using KL-divergence. 

You'd get explicit representations for both P[λ∣X] and P[X∣λ] from this.

  1. ^

    Or ensemble of functions, if you want the mean of λ to be something like ∑ifi(xi) specifically.

  2. ^

    Using reparameterization to keep the sampling operation differentiable in the mean.

  3. ^

    If the dictionary of possible values of X is small, you can also just use a more conventional ml setup which explicitly outputs probabilities for every possible value of every xi of course.

Reply
johnswentworth's Shortform
Lucius Bushnaq11d40

Trick I'm currently using: we can view the sum ∑xZ|X(x) as taking an expectation of Z|X(x) under a uniform distribution Q[X]. Under that uniform distribution, ∑ifi(Xi) is a sum of independent random variables, so let's wave our hands just a little and assume that sum is approximately normal.

Not following this part. Can you elaborate? 

Some scattered thoughts:

  1. Regrading convergence, to state the probably obvious, since P[Xi∣Λ]∝∑xeλTfi(xi), fi(xi) at least has to go to zero for x going to infinity.
  2. In my field-theory-brained head, the analysis seems simpler to think about for continuous x. So unless we're married to x being discrete, I'd switch from ∑x to ∫dx. Then you can potentially use Gaussian integral and source-term tricks with the dependency on x as well. If you haven't already, you might want to look at (quantum) field theory textbooks that describe how to calculate expectation values of observables over path integrals. This expression looks extremely like the kind of thing you'd usually want to calculate with Feynman diagrams, except I'm not sure whether the fi(xi)have the right form to allow us to power expand in xi and then shove the non-quadratic xi terms into source derivatives the way we usually would in perturbative quantum field theory.
  3. If all else fails, you can probably do it numerically, lattice-QFT style, using techniques like hybrid Monte Carlo to sample points in the integral efficiently.[1] 
     
  1. ^

    You can maybe also train a neural network to do the sampling.

Reply
Why Corrigibility is Hard and Important (i.e. "Whence the high MIRI confidence in alignment difficulty?")
Lucius Bushnaq18d62

Skimming some of the posts in the sequence, I am not persuaded that corrigibility now looks like an engineering problem rather than a problem that needs (a) major theoretical breakthrough(s).

The point about corrigibility MIRI keeps making is that it's anti-natural, and Max seems to agree with that. 
 

Reply
Alexander Gietelink Oldenziel's Shortform
Lucius Bushnaq24d50

That may be true[1]. But it doesn't seem like a particularly useful answer?

"The optimization target is the optimization target."

  1. ^

    For the outer optimiser that builds the AI

Reply
Neural networks generalize because of this one weird trick
Lucius Bushnaq24d*30

Are there any theorems that use SLT to quantify out-of-distribution generalization?

There is one now, though whether you still want to count this as part of SLT or not is a matter of definition.

Reply
Load More
114From SLT to AIT: NN generalisation out-of-distribution
Ω
1mo
Ω
8
72Circuits in Superposition 2: Now with Less Wrong Math
Ω
4mo
Ω
0
47[Paper] Stochastic Parameter Decomposition
Ω
4mo
Ω
14
42Proof idea: SLT to AIT
Ω
8mo
Ω
15
25Can we infer the search space of a local optimiser?
Q
8mo
Q
5
108Attribution-based parameter decomposition
Ω
9mo
Ω
21
152Activation space interpretability may be doomed
Ω
9mo
Ω
35
71Intricacies of Feature Geometry in Large Language Models
10mo
0
45Deep Learning is cheap Solomonoff induction?
10mo
1
131Circuits in Superposition: Compressing many small neural networks into one
Ω
1y
Ω
9
Load More
Modularity
4 years ago
(+22/-89)