The proposal at the end looks somewhat promising to me on a first skim. Are there known counterpoints for it?
I agree that this seems maybe useful for some things, but not for the "Which UTM?" question in the context of debates about Solomonoff induction specifically, and I think that's the "Which UTM?" question we are actually kind of philosophically confused about. I don't think we are philosophically confused about which UTM to use in the context of us already knowing some physics and wanting to incorporate that knowledge into the UTM pick, we're confused about how to pick if we don't have any information at all yet.
Attempted abstraction and generalization: If we don't know what the ideal UTM is, we can start with some arbitrary UTM , and use it to predict the world for a while. After (we think) we've gotten most of our prediction mistakes out of the way, we can then look at our current posterior, and ask which other UTM might have updated to that posterior faster, using less bits of observation about (our universe/the string we're predicting). You could think of this as a way to define what the 'correct' UTM is. But I don't find that definition very satisfying, because the validity of this procedure for finding a good depends on how correct the posterior we've converged on with our previous, arbitrary, is. 'The best UTM is the one that figures out the right answer the fastest' is true, but not very useful.
Is the thermodynamics angle gaining us any more than that for defining the 'correct' choice of UTM?
We used some general reasoning procedures to figure out some laws of physics and stuff about our universe. Now we're basically asking what other general reasoning procedures might figure out stuff about our universe as fast or faster, conditional on our current understanding of our universe being correct.
Why does it make Bayesian model comparison harder? Wouldn't you get explicit predicted probabilities for the data from any two models you train this way? I guess you do need to sample from the Gaussian in a few times for each and pass the result through the flow models, but that shouldn't be too expensive.
Did that clarify?
Yes. Seems like a pretty strong assumption to me.
Yup, it sure does look similar. One tricky point here is that we're trying to fit the 's to the data, so if going that route we'd need to pick some parametetric form for .
Ah. In that case, are you sure you actually need to do the model comparisons you want? Do you even really need to work with this specific functional form at all? As opposed to e.g. training a model to feed its output into tiny normalizing flow models which then try to reconstruct the original input data with conditional probability distributions ?
To sketch out a little more what I mean, could e.g. be constructed as a parametrised function[1] which takes in the actual samples and returns the mean of a Gaussian, which is then sampled from in turn[2]. The would be constructed using normalising flow networks[3], which take in as well as uniform distributions over variables that have the same dimensionality as their . Since the networks are efficiently invertible, this gives you explicit representations of the conditional probabilities , which you can then fit to the actual data using KL-divergence.
You'd get explicit representations for both and from this.
Or ensemble of functions, if you want the mean of to be something like specifically.
Using reparameterization to keep the sampling operation differentiable in the mean.
If the dictionary of possible values of is small, you can also just use a more conventional ml setup which explicitly outputs probabilities for every possible value of every of course.
Trick I'm currently using: we can view the sum as taking an expectation of under a uniform distribution . Under that uniform distribution, is a sum of independent random variables, so let's wave our hands just a little and assume that sum is approximately normal.
Not following this part. Can you elaborate?
Some scattered thoughts:
You can maybe also train a neural network to do the sampling.
Skimming some of the posts in the sequence, I am not persuaded that corrigibility now looks like an engineering problem rather than a problem that needs (a) major theoretical breakthrough(s).
The point about corrigibility MIRI keeps making is that it's anti-natural, and Max seems to agree with that.
Are there any theorems that use SLT to quantify out-of-distribution generalization?
There is one now, though whether you still want to count this as part of SLT or not is a matter of definition.
For what it's worth, my mother read If Anyone Builds It, Everyone Dies and seems to have been convinced by it. She's probably not very representative though. She had prior exposure to AI x-risk arguments through me, is autistic, has a math PhD, and is a Gödel, Escher, Bach fan.