AI notkilleveryoneism researcher, focused on interpretability.
Personal account, opinions are my own.
I have signed no contracts or agreements whose existence I cannot mention.
The bound is the same one you get for normal Solomonoff induction, except restricted to the set of programs the cut-off induction runs over. It’s a bound on the total expected error in terms of CE loss that the predictor will ever make, summed over all datapoints.
Look at the bound for cut-off induction in that post I linked, maybe? Hutter might also have something on it.
Can also discuss on a call if you like.
Note that this doesn’t work in real life, where the programs are not in fact restricted to outputting bit string predictions and can e.g. try to trick the hardware they’re running on.
You also want one that generalises well, and doesn't do preformative predictions, and doesn't have goals of its own. If your hypotheses aren't even intended to be reflections of reality, how do we know these properties hold?
Because we have the prediction error bounds.
When we compare theories, we don't consider the complexity of all the associated approximations and abstractions. We just consider the complexity of the theory itself.
E.g. the theory of evolution isn't quite code for a costly simulation. But it can be viewed as set of statements about such a simulation. And the way we compare the theory of evolution to alternatives doesn't involve comparing the complexity of the set of approximations we used to work out the consequences of each theory.
Yes.
That’s fine. I just want a computable predictor that works well. This one does.
Also, scientific hypotheses in practice aren’t actually simple code for a costly simulation we run. We use approximations and abstractions to make things cheap. Most of our science outside particle physics is about finding more effective approximations for stuff.
Edit: Actually, I don’t think this would yield you a different general predictor as the program dominating the posterior. General inductor program running program is pretty much never going to be the shortest implementation of .
Thank you for this summary.
I still find myself unconvinced by all the arguments against the Solomonoff prior I have encountered. For this particular argument, as you say, there's still many ways the conjectured counterexample of adversaria could fail if you actually tried to sit down and formalise it. Since the counterexample is designed to break a formalism that looks and feels really natural and robust to me, my guess is that the formalisation will indeed fall to one of these obstacles, or a different one.
In a way, that makes perfect sense; Solomonoff induction really can't run in our universe! Any robot we could build to "use Solomonoff induction" would have to use some approximation, which the malign prior argument may or may not apply to.
You can just reason about Solomonoff induction with cut-offs instead. If you render the induction computable by giving it a uniform prior over all programs of some finite length [1] with runtime , it still seems to behave sanely. As in, you can derive analogs of the key properties of normal Solomonoff induction for this cut-off induction. E.g. the induction will not make more than bits worth of prediction mistakes compared to any 'efficient predictor' program with runtime and K-complexity , it's got a rough invariance to what Universal Turing Machine you run it on, etc. .
Since finite, computable things are easier for me to reason about, I mostly use this cut-off induction in my mental toy models of AGI these days.
EDIT: Apparently, this exists in the literature under the name AIXI-tl. I didn't know that. Neat.
So, no prefix-free requirement.
- A quick google search says the male is primary or exclusive breadwinner in a majority of married couples. Ass-pull number: the monetary costs alone are probably ~50% higher living costs. (Not a factor of two higher, because the living costs of two people living together are much less than double the living costs of one person. Also I'm generally considering the no-kids case here; I don't feel as confused about couples with kids.
But remember that you already conditioned on 'married couples without kids'. My guess would be that in the subset of man-woman married couples without kids, the man being the exclusive breadwinner is a lot less common than in the set of all man-woman married couples. These properties seem like they'd be heavily anti-correlated.
In the subset of man-woman married couples without kids that get along, I wouldn't be surprised if having a partner effectively works out to more money for both participants, because you've got two incomes, but less than 2x living expenses.
- I was picturing an anxious attachment style as the typical female case (without kids). That's unpleasant on a day-to-day basis to begin with, and I expect a lack of sex tends to make it a lot worse.
I am ... not ... picturing that as the typical case? Uh, I don't know what to say here really. That's just not an image that comes to mind for me when I picture 'older hetero married couple'. Plausibly I don't know enough normal people to have a good sense of what normal marriages are like.
- Eyeballing Aella's relationship survey data, a bit less than a third of respondents in 10-year relationships reported fighting multiple times a month or more. That was somewhat-but-not-dramatically less than I previously pictured. Frequent fighting is very prototypically the sort of thing I would expect to wipe out more-than-all of the value of a relationship, and I expect it to be disproportionately bad in relationships with little sex.
I think for many of those couples that fight multiple times a month, the alternative isn't separating and finding other, happier relationships where there are never any fights. The typical case I picture there is that the relationship has some fights because both participants aren't that great at communicating or understanding emotions, their own or other people's. If they separated and found new relationships, they'd get into fights in those relationships as well.
It seems to me that lots of humans are just very prone to getting into fights. With their partners, their families, their roommates etc., to the point that they have accepted having lots of fights as a basic fact of life. I don't think the correct takeaway from that is 'Most humans would be happier if they avoided having close relationships with other humans.'
- Less legibly... conventional wisdom sure sounds like most married men find their wife net-stressful and unpleasant to be around a substantial portion of the time, especially in the unpleasant part of the hormonal cycle, and especially especially if they're not having much sex. For instance, there's a classic joke about a store salesman upselling a guy a truck, after upselling him a boat, after upselling him a tackle box, after [...] and the punchline is "No, he wasn't looking for a fishing rod. He came in looking for tampons, and I told him 'dude, your weekend is shot, you should go fishing!'".
Conventional wisdom also has it that married people often love each other so much they would literally die for their partner. I think 'conventional wisdom' is just a very big tent that has room for everything under the sun. If even 5-10% of married couples have bad relationships where the partners actively dislike each other, that'd be many millions of people in the English speaking population alone. To me, that seems like more than enough people to generate a subset of well-known conventional wisdoms talking about how awful long-term relationships are.
Case in point, I feel like I hear those particular conventional wisdoms less commonly these days in the Western world. My guess is this is because long-term heterosexual marriage is no longer culturally mandatory, so there's less unhappy couples around generating conventional wisdoms about their plight.
So, next question for people who had useful responses (especially @Lucius Bushnaq and @yams): do you think the mysterious relationship stuff outweighs those kinds of costs easily in the typical case, or do you imagine the costs in the typical case are not all that high?
So, in summary, both I think? I feel like the 'typical' picture of a hetero marriage you sketch is more like my picture of an 'unusually terrible' marriage. You condition on a bad sexual relationship and no children and the woman doesn't earn money and the man doesn't even like her, romantically or platonically. That subset of marriages sure sounds like it'd have a high chance of the man just walking away, barring countervailing cultural pressures. But I don't think most marriages where the sex isn't great are like that.
Sure, I agree that, as we point out in the post
Yes, sorry I missed that. The section is titled 'Conclusions' and comes at the end of the post, so I guess I must have skipped over it because I thought it was the post conclusion section rather than the high-frequency latents conclusion section.
As long as your evaluation metrics measure the thing you actually care about...
I agree with this. I just don't think those autointerp metrics robustly capture what we care about.
Removing High Frequency Latents from JumpReLU SAEs
On a first read, this doesn't seem principled to me? How do we know those high-frequency latents aren't, for example, basis directions for dense subspaces or common multi-dimensional features? In that case, we'd expect them to activate frequently and maybe appear pretty uninterpretable at a glance. Modifying the sparsity penalty to split them into lower frequency latents could then be pathological, moving us further away from capturing the features of the model even though interpretability scores might improve.
That's just one illustrative example. More centrally, I don't understand how this new penalty term relates to any mathematical definition that isn't ad-hoc. Why would the spread of the distribution matter to us, rather than simply the mean? If it does matter to us, why does it matter in roughly the way captured by this penalty term?
The standard SAE sparsity loss relates to minimising the description length of the activations. I suspect that isn't the right metric to optimise for understanding models, but it is at least a coherent, non-ad-hoc mathematical object.
EDIT: Oops, you address all that in the conclusion, I just can't read.
Forgot to tell you this when you showed me the draft: The comp in sup paper actually had a dense construction for UAND included already. It works differently than the one you seem to have found though, using Gaussian weights rather than binary weights.
I think the value proposition of AI 2027-style work lies largely in communication. Concreteness helps people understand things better. The details are mostly there to provide that concreteness, not to actually be correct.
If you imagine the set of possible futures that people like Daniel, you or I think plausible as big distributions, with high entropy and lots of unknown latent variables, the point is that the best way to start explaining those distributions to people outside the community is to draw a sample from them and write it up. This is a lot of work, but it really does seem to help. My experience matches habryka's here. Most people really want to hear concrete end-to-end scenarios, not abstract discussion of the latent variables in my model and their relationships.