I think one underused trick for training LLMs is to explicitly "edit" them. That is, suppose they generate some text X in response to prompt Y, and it has some error or is missing something. In that case you can create a text X' that fixes this problem, and do a gradient update to increase log P(X'|Y)/P(X|Y).
For example, if we generate virtual comments in the style of certain LW users, one could either let those users browse the virtual comments that have been created in their style and correct them, or one could let the people who receive the virtual comments edit them to remove misunderstanding or similar.
If we think of the quantified abilities as the logarithms of the true abilities, then taking the log has likely massively increased the correlations by bringing the outliers into the bulk of the distribution.
Bayesianism was a mistake.
Your post is an excellent example of how the supposedly-reasonable middle ground tends to be so clueless as to be plausibly worse than the extremes.
Like, e.g. Blanchard doesn’t think trans men have AGP
You mean AAP here, right?
He accepts autohomoeroticism, which is close enough to AAP that the difference doesn't matter. The real problem here is Michael Bailey who has a sort of dogmatic denial of AAP.
doesn’t think trans women who are attracted to men have AGP
That's pretty common in people's second-hand version; the real issue here is that this is sometimes wrong and some androphiles are AGP.
Oversimplification 2: Bisexuals exist. Many trans women report their sexual orientation changing when they start taking hormones. The correlation between having AGP and being attracted to women can’t be as 100% as Blanchard appears to believe it is.
Blanchard explicitly measured that some trans women identified as bisexual, and argued that they were autogynephilic and not truly bisexual. There's some problems with that assertion, but uncovering those problems really ought to engage with more of the nuances than what you imply here.
Oversimplification 4: Do heterosexual cisgender women have AGP? (Cf. Comments by Aella, eigenrobot etc.) if straight cisgender women also like being attractive in the same way as (some) trans women do, it becomes somewhat doubtful that it’s a pathology.
According to qualitative studies I've done, around 15% of women are at least somewhat AGP (though I think it correlates with being bi/lesbian), but the assertion that this implies it's not a pathology for males seems like magical thinking. E.g. ~100% of women have breasts, but this does not mean that developing breasts would not be considered a pathology for males.
If you consider the "true ability" to be the exponential of the subtest scores, then the extent to which the problem I mention applies depends on the base of the exponential. In the limiting case where the base goes to infinity, only the highest ability matter, whereas in the limiting case where the base goes to 1, you end up with something basically linear.
As for whether it's a crux, approximately nobody has thought about this deeply enough that they would recognize it, but I think it's pretty foundational for a lot of disagreements about IQ.
The analogy that I'm objecting to is, if you looked at e.g. the total for a ledger or a budget, it is an index that sums together expenses in a much more straightforward way. For instance if there is a large expense, the total is large.
Meanwhile, IQ scores are more like the geometric mean of the entries on such an entity. The geometric mean tells you whether the individual items tend to be large or small, which gives you broad-hitting information that distinguishes e.g. people who live in high-income countries from people who live in low-income countries, or large organizations from individual people; but it won't inform you if someone got hit by a giant medical bill or if they managed to hack themselves to an ultra-cheap living space. These pretty much necessarily have to be low-rank mediators (like in the g model) rather than diverse aggregates (like in the sum model).
(Well, a complication in this analogy is that a ledger can vary not just in the magnitude of the transfers but also qualitatively in the kinds of transfers that are made, whereas IQ tests fix the variables, making it more analogous to a standardized budget form (e.g. for tax or loan purposes) broken down by stuff like "living space rent", "food", "healthcare", etc..)
That's part of the problem, often the bad middle ground looks superficially plausible, so it's very sticky and hard to get rid of, because it's not exactly that people get told the wrong things but rather that they spontaneously develop the wrong ideas.
The three basic issues with this viewpoint are:
Ever since the situation with Blanchardianism, I've become extremely bearish on the possibility on this, considering how everyone on all sides including rationalists on all sides of the debate just massively failed on it.
With IQ realism, you also get insane stuff where the supposedly reasonable middle ground tends to have skepticism about the g factor and thinks of IQ as an index that sums together cognitive abilities.
I haven't thought of this in relation to wild animal welfare or birthrates but I don't immediately see the argument that we can outperform the abysmal track record seen in these two other cases.
A possible model is that while good startups have an elevation in the "cult-factor", they have an even greater elevation in the unique factor related to the product they are building. Like SpaceX has cult-like elements but SpaceX also has Mars and Mars is much bigger than the cult-like elements, so if we define a cult to require that the biggest thing going on for them is cultishness then SpaceX is not a cult.
This is justified by LDSL (I really should write up the post explaining it...).
??????