I think it will be even less difference between 2020 and 2040.
In light of the subsequent 5 years of developments like Ukraine, with 15 to go, do you still stand by this claim?
This is an alarming point, as I find myself thinking about the DA today as well; I thought I was 'gwern', but it is possible I am 'robo' instead, if robo represents such a large fraction of LW-DA observer-moments. It would be bad to be mistaken about my identity like that. I should probably generate some random future dates and add them to my Google Calendar to check whether I am thinking about the DA that day and so have evidence I am actually robo instead.
I'd estimate approximately 12-15 direct meta-responses to your post within the next month alone, and see no reason to expect the exponential to turn sigmoid in timescales that render my below argument unlikely.
However, you can't use this argument because unlike the MLWDA, where I am arguably a random observer of LW DA instances (the thought was provoked by Michael Nielsen linking to Cosma Shalizi's notes on Mesopotamia and me thinking that the temporal distances are much less impressive if you think of them in terms of 'nth human to live', which immediately reminded me of DA and made me wonder if anyone had done a 'meta-DA', and LW simply happened to be the most convenient corpus I knew of to accurately quantify '# of mentions' as tools like Google Scholar or Google N-Grams have a lot of issues - I have otherwise never taken much of an interest in the DA and AFAIK there have been no major developments recently), you are in a temporally privileged position with the MMLWDA, inasmuch as you are the first responder to my MLWDA right now, directly building on it in a non-randomly-chosen-in-time fashion.
Thus, you have to appeal purely to non-DA grounds like making a parametric assumption or bringing in informative priors from 'similar rat and rat adjacent memes', and that's not a proper MMLWDA. That's just a regular old prediction.
Turchin actually notes this issue in his paper, in the context of, of course, the DA and why the inventor Brandon Carter could not make a Meta-DA (but he and I could):
The problem is that if I think that I am randomly chosen from all DA-Doomers, we get very strong version of DA, as 'DA-Doomers' appeared only recently and thus the end should be very soon, in just a few decades from now. The first member of the DA-Doomers reference class was Carter, in 1973, joined by just a few of his friends in the 1980s. (It was rumored that Carter recognized the importance of DA-doomers class and understood that he was first member of it – and thus felt that this “puts” world in danger, as if he was the first in the class, the class is likely to be very short. Anyway, his position was not actually random as he was the first discoverer of the DA).
The Meta-LessWrong Doomsday Argument (MLWDA) predicts long AI timelines and that we can relax:
LessWrong was founded in 2009 (16 years ago), and there have been 44 mentions of the 'Doomsday argument' prior to this one, and it is now 2025, at 2.75 mentions per year.
By the Doomsday argument, we medianly-expect mentions to stop in: after 44 additional mentions over 16 additional years or in 2041. (And our 95% CI on that 44 would then be +1 mention to +1,1760 mentions, corresponding to late-2027 AD to 2665 AD.)
By a curious coincidence, double-checking to see if really no one had made a meta-DA before, it turns out that Alexey Turchin has made a meta-DA as well about 7 years ago, calculating that
If we assume 1993 as the beginning of a large DA-Doomers reference class, and it is 2018 now (at the moment of writing this text), the age of the DA-Doomers class is 25 years. Then, with 50% probability, the reference class of DA-Doomers will disappear in 2043, according to Gott’s equation! Interestingly, the dates around 2030–2050 appear in many different predictions of the singularity or the end of the world (Korotayev 2018; Turchin & Denkenberger 2018b; Kurzweil 2006).
His estimate of 2043 is surprisingly close to 2041.
We offer no explanation as to why this numerical consilience of meta-DA calculations has happened; we attribute their success, as all else, to divine benevolence.
Regrettably, the 2041--2043 date range would seem to imply that it is unlikely we will obtain enough samples of the MLWDA in order to compute a Meta-Meta-LessWrong Doomsday Argument (MMLWDA) with non-vacuous confidence intervals, inasmuch as every mention of the MLWDA would be expected to contain a mention of the DA as well.
The latter, especially, I thought was popularized because it was so surprisingly good at improving benchmark performance.
Inner-monologue is an example because as far as we know, it should have existed in pre-GPT-3 models and been constantly improving, but we wouldn't have noticed because no one would have been prompting for it and if they had, they probably wouldn't have noticed it. (The paper I linked might have demonstrated that by finding nontrivial performance in smaller models.) Only once it became fairly reliable in GPT-3 could hobbyists on 4chan stumble across it and be struck by the fact that, contrary to what all the experts said, GPT-3 could solve harder arithmetic or reasoning problems if you very carefully set it up just right as an elaborate multi-step process instead of what everyone did, which was just prompt it for the answer right away.
Saying it doesn't count because once it was discovered it was such a large real improvement, is circular and defines away any example. (Did it not improve benchmarks once discovered? Then who cares about such an 'uncoupled' capability; it's not a real improvement. Did it subsequently improve benchmarks once discovered? Then it's not really an example because it's 'coupled'...) Surely the most interesting examples are ones which do exactly that!
And of course, now there is so much discussion, and so many examples, and it is in such widespread use, and has contaminated all LLMs being trained since, that they start to do it by default given the slightest pretext. The popularization eliminated the hiddenness. And here we are with 'reasoning models' which have blown through quite a few older forecasts and moved timelines earlier by years, to the extent that people are severely disappointed when a model like GPT-4.5 'only' does as well as the scaling laws predicted and they start predicting the AI bubble is about to pop and scaling has been refuted.
would also be useful for accurately finishing a sentence starting with "Eliezer Yudkowsky says...".
But that would be indistinguishable from many other sources of improvement. For starters, by giving a name, you are only testing one direction: 'name -> output'; truesight is about 'name <- output'. The 'reversal curse' is an example of how such inference arrows are not necessarily bidirectional and do not necessarily scale much. (But if you didn't know that, you would surely conclude the opposite.) There are many ways to improve performance of predicting output: better world-knowledge, abstract reasoning, use of context, access to tools or grounding like web search... No benchmark really distinguishes between these such that you could point to a single specific number and say, "that's the truesight metric, and you can see it gets better with scale".
Musk has now admitted his link penalty is not 'merely' a simple fixed penalty on the presence of a link or anything like that, but about as perverse as is possible:
To be clear, there is no explicit rule limiting the reach of links in posts.
The algorithm tries (not always successfully) to maximize user-seconds on 𝕏, so a link that causes people to cut short their time here will naturally get less exposure.
Best to post a text/image/video summary of what’s at the link for people to view and then decide if they want to click the link.
So, the higher-quality a link is, and the more people & time spent reading it, the more 'the algorithm' punishes it. The worse a link is, the shorter, more trivial, more clickbaity, the least worth reading, the less the algorithm punishes it and rewards it with virality. This explains a lot about Twitter these days.
(This also implies that it may be a bit hard to estimate 'the' link penalty, if the algorithm is doing anything to estimate the quality of a link so as to punish good ones more.)
Kodo here is definitely a reference to "Kōdō" (random Knuth). I believe Duncan has written in the past about taking up perfume/tasting comparison as a hobby, hasn't he?
Also I suspect that there is some astronomically high k such that monkeys at a keyboard (i.e. "output random tokens") will outperform base models for some tasks by the pass@k metric.
It would be an extreme bias-variance tradeoff, yes.
I think it's still sycophantic compared to hardcore STEM circles where we regard criticism as a bloodsport and failing to find fault in something as defeat. But it's much less so than the more relevant comparison, which is other LLMs, and in an absolute sense it's at a level where it's hard to distinguish from reasonable opinions and doesn't seem to be getting in the way too much. As davidad notes, it's still at a level where you can sense its reluctance or if it's shading things to be nice, and that is a level where it's just a small quirk and something you can work around easily.
To give a concrete example: I finish writing a review of a movie I watched the other day, and I hand it to Gemini-2.5-pro:
Gemini-2.5-pro review of first draft of Sake review
Particularly characteristic here is
I've started to wince a bit when Gemini-2.5-pro critiques a draft and throws in a descriptive note along the lines of 'classic Gwern style: just slightly cynical'. It feels too much like it's trying to suck up to me. ('Wow, Gwern, you're so edgy, but in a good way!') And yet... it's not wrong, is it? I do try to avoid being overly cynical or dark, but you certainly couldn't describe my writing as 'slightly idealistic' either; so, 'slightly cynical'.
And otherwise, the description and critique seem largely correct and is not really mincing any words in the criticisms - even if I didn't wind up making many changes based on it. So it's more useful than Claude or 4o would be. (o3 is also good in this regard but the confabulation issue is a big problem, eg. So I just recommend 2.5-pro right now.)