I'd estimate approximately 12-15 direct meta-responses to your post within the next month alone, and see no reason to expect the exponential to turn sigmoid in timescales that render my below argument unlikely.
However, you can't use this argument because unlike the MLWDA, where I am arguably a random observer of LW DA instances (the thought was provoked by Michael Nielsen linking to Cosma Shalizi's notes on Mesopotamia and me thinking that the temporal distances are much less impressive if you think of them in terms of 'nth human to live', which immedia...
The Meta-LessWrong Doomsday Argument (MLWDA) predicts long AI timelines and that we can relax:
LessWrong was founded in 2009 (16 years ago), and there have been 44 mentions of the 'Doomsday argument' prior to this one, and it is now 2025, at 2.75 mentions per year.
By the Doomsday argument, we medianly-expect mentions to stop in: after 44 additional mentions over 16 additional years or in 2041. (And our 95% CI on that 44 would then be +1 mention to +1,1760 mentions, corresponding to late-2027 AD to 2665 AD.)
By a curious coincidence, double-checking to see if...
The latter, especially, I thought was popularized because it was so surprisingly good at improving benchmark performance.
Inner-monologue is an example because as far as we know, it should have existed in pre-GPT-3 models and been constantly improving, but we wouldn't have noticed because no one would have been prompting for it and if they had, they probably wouldn't have noticed it. (The paper I linked might have demonstrated that by finding nontrivial performance in smaller models.) Only once it became fairly reliable in GPT-3 could hobbyists on 4chan ...
Musk has now admitted his link penalty is not 'merely' a simple fixed penalty on the presence of a link or anything like that, but about as perverse as is possible:
To be clear, there is no explicit rule limiting the reach of links in posts.
The algorithm tries (not always successfully) to maximize user-seconds on 𝕏, so a link that causes people to cut short their time here will naturally get less exposure.
Best to post a text/image/video summary of what’s at the link for people to view and then decide if they want to click the link.
So, the higher-qualit...
Kodo here is definitely a reference to "KĹŤdĹŤ" (random Knuth). I believe Duncan has written in the past about taking up perfume/tasting comparison as a hobby, hasn't he?
Also I suspect that there is some astronomically high k such that monkeys at a keyboard (i.e. "output random tokens") will outperform base models for some tasks by the pass@k metric.
It would be an extreme bias-variance tradeoff, yes.
This has been a consistent weakness of OpenAI's image processing from the start: GPT-4-V came with clearcut warnings against using it on non-photographic inputs like screenshots or documents or tables, and sure enough, I found that it was wildly inaccurate on web page screenshots.
(In particular, I had been hoping to use it to automate Gwern.net regression detection: use a headless browser to screenshot random points in Gwern.net and report back if anything looked 'wrong'. It seemed like the sort of 'I know it when I see it' judgment task a VLM ought to be ...
In the first case, think of chess; superhuman chess still plays chess. You can watch AlphaZero’s games and nod along—even if it’s alien, you get what it's doing, the structure of the chess "universe" is such that unbounded intelligence still leads to mostly understandable moves.
I guess the question here is how much is 'mostly'? We can point to areas of chess like the endgame databases, which are just plain inscrutable: when the databases play out some mate-in-50 game because that is what is provably optimal by checking every possible move, any human und...
What domains of 'real improvement' exist that are uncoupled to human perceptions of improvement, but still downstream of text prediction?
As defined, this is a little paradoxical: how could I convince a human like you to perceive domains of real improvement which humans do not perceive...?
correctly guessing the true authors of anonymous text
See, this is exactly the example I would have given: truesight is an obvious example of a domain of real improvement which appears on no benchmarks I am aware of, but which appears to correlate strongly with the p...
I think it's a little more concerning that Dwarkesh has invested in this startup:
Mechanize is backed by investments from Nat Friedman and Daniel Gross, Patrick Collison, Dwarkesh Patel, Jeff Dean, Sholto Douglas, and Marcus Abramovitch.
And I do not see any disclosure of this in either the Youtube description or the Substack transcript at present.
EDIT: a disclosure has been added to both
In that brief moment of uncertainty, anything could have happened. If one person had just packed up and left, everyone might have followed suit. But nobody reacted. Perhaps what kept the room still was the fear of being perceived as scared. Or the belief that surely, bad things could not happen to them. Or maybe they’d heard enough false alarms in their lives. I’m not sure.
One of the most depressing things about the Replication Crisis in especially social psychology is that many results from the 1950s and 1960s failed to replicate at all... except the A...
At first glance, your linked document seems to match this. The herald who calls the printer "pig-headed" does so in direct connection with calling him "dull", which at least in modern terms would be considered a way of calling him stupid?
Not necessarily. 'Dull' can mean, in 1621 just as well as 2025, plenty of other things: eg "Causing depression or ennui; tedious, uninteresting, uneventful; the reverse of exhilarating or enlivening." (OED example closest in time: "Are my discourses dull? Barren my wit?" --Jonson's good friend & fellow playwright, W...
OP's example is correct and you are wrong. 'Pigheaded' is neither a proposed root cause analysis nor does it mean 'are dumb'; perhaps you should check a dictionary before correcting others' usage. It means stubborn, strong-willed, obstinate, often to the point of foolishness or taking very harmful actions, or to quote the OED: "Having a head like that of a pig. Chiefly figurative: stupidly obstinate, perverse, or set in one's ways." Note: it is "stupidly obstinate", and not "stupid". This is because pigs are notoriously smart but stubborn: very strong, hea...
But the caveat there is that this is inherently a backwards-looking result:
We consider GPT-4o (OpenAI, 2024), Claude-3.5-Sonnet (Anthropic, 2024), Grok-2 (xAI, 2024), Gemini-1.5-Pro (Google, 2024), and DeepSeek-V3 (DeepSeek-AI, 2024).
So one way to put it would be that people & classifiers are good at detecting mid-2024-era chatbot prose. Unfortunately, somewhere after then, at least OpenAI and Google apparently began to target the problem of ChatGPTese (possibly for different reasons: Altman's push into consumer companion-bots/personalization/soc...
I'm not sure this is a big problem. How much net attrition do you really expect over a decade, say? By which point who really cares? You will have so much more AI progress, and accumulated data (particularly if you've been gradually replacing the lower-level employees and you have an 'automation wave' moving through the organization where employees increasingly train their automated replacements or their job is simply reorganizing the jobs to enable automation).
It seems like to the extent there's much attrition at high levels, it is reduced in considerable...
Also notable: the big OpenAI reveal today was some sort of better personalization. Instead of the crude 'saved facts' personalization ChatGPT has had for a long time and which has never made much of a difference, they're doing... something. Unclear if it's merely RAG or if they are also doing something interesting like lightweight finetuning. But the GPTs definitely seem to have much better access to your other sessions in the web interface, and as far as I know, few other interfaces with frontier models have tried to do much personalization, so this will ...
I don't think this is true at all. How do you translate, say, rotating multiple shapes in parallel into text?
At least for multimodal LLMs in the pure-token approach like Gato or DALL-E 1 (and probably GPT-4o and Gemini, although few details have been published), you would be able to do that by generating the tokens which embody an encoded image (or video!) of several shapes, well, rotating in parallel. Then you just look at them.
Pursuit of novelty is not vnm-incoherent. Furthermore, it is an instrumentally convergent drive; power-seeking agents will seek novelty as well, because learning increases power in expectation (see: value of information).
Or to put it another way: any argument which convincingly proves that 'incoherent search processes ultimately outcompete coherent search processes' is also an argument which convinces a VNM agent to harness the superior incoherent search processes instead of the inferior coherent ones.
It's good someone else did it, but it has the same problems as the paper: not updated since May 2024, and limited to open source base models. So it needs to be started back up and add in approximate estimators for the API/chatbot models too before it can start providing a good universal capability benchmark in near-realtime.
One of the most robust benchmarks of generalized ability, which is extremely easy to update (unlike benchmarks like Humanity's Last Exam), would just be to estimate the pretraining loss (ie. the compression ratio).
No, it would probably be a mix of "all of the above". FB is buying data from the same places everyone else does, like Scale (which we know from anecdotes like when Scale delivered FB a bunch of blatantly-ChatGPT-written 'human rating data' and FB was displeased), and was using datasets like books3
that are reasonable quality. The reported hardware efficiency numbers have never been impressive, they haven't really innovated in architecture or training method (even the co-distillation for Llama-4 is not new, eg. ERNIE was doing that like 3 years ago), and in...
That would be tricky because you are comparing apples and oranges. Consider that for the USA, there are only 11 cardinals (of 252 worldwide), while there are 10x more federal senators at any moment (I don't know if there would be more or less total: senators tend to be much younger but cardinals also tend to be long-lived), and I can't even guess how many 'Fortune 500 C-level employees' there might be given corporate turnover and the size of many 'C-suites' - tens of thousands, maybe? So your suggestions span ~1-3 orders of magnitude less selectivity than cardinals do.
whomever makes it into the college of cardinals.
I would be surprised if that was the primary homosexuality-enriching step, given that reporting has always been that quite a lot of low-level parish-level priests are also gay. (Note, for example, how many of the sexual abuse scandal victims were boys/men.) I would guess that it operates fairly steadily at all levels, starting from simply which young boys opt for the priesthood (known to be a demand and difficult occupation even if the celibacy requirement is, for you, not so onerous) and operating from th...
We have not yet tried 4.5 as it's so expensive that we would not be able to deploy it, even for limited sections.
Still seems like potentially valuable information to know: how much does small-model smell cost you? What happens if you ablate reasoning? If it is factual knowledge and GPT-4.5 performs much better, then that tells you things like 'maybe finetuning is more useful than we think', etc. If you are already set up to benchmark all these OA models, then a datapoint from GPT-4.5 should be quite easy and just a matter of a small amount of chump change in comparison to the insight, like a few hundred bucks.
22.3 percent of cardinals are reported as eldest children. That compares to 21.6 percent which are youngest children. Eldest children are still favored.
I kept waiting for you to discuss this point: in considering analysis of cardinals (as opposed to ordinary random people), what about the other relevant birth-order effects? Like the... first-born eldest birth order effect, where first-borns are smarter, more extraverted, stabler, higher-SES etc. All of which sounds exactly like the sort of thing you need to rise through an extreme hierarchy to the top.
A...
The failure of the compute-rich Llama models to compete with the compute poorer but talent and drive rich Alibaba and DeepSeek
This seems like it's exaggerating the Llama failure. Maybe the small Llama-4s just released yesterday are a bit of a disappointment because they don't convincingly beat all the rivals; but how big a gap is that absolutely? When it comes to DL models, there's generally little reason to use #2; but that doesn't mean #2 was all that much worse and 'a failure' - it might only have been weeks behind #1. (Indeed, a model might've been ...
The human microbiome is irrelevant to this topic. The microbiome is highly heritable (usual twin studies & SNP heritabilities), and it is caused by genes and the environment, as well as unstable; its direct causal effects in normal humans are minimal. We know that it is supremely irrelevant because environmental changes like antibiotics or new food or global travel which produce large changes in personal (and offspring) microbiomes do not produce large changes in intelligence (of oneself or offspring); and most dramatically, germ-free humans exist and ...
I don't really understand how a local copy of the weights gives the terrorists more practical control over the software's alignment. I don't think it's easy to manually tweak weights for so specific a purpose. Maybe they just mean the API is doing a good job of blocking sketchy requests?
You can finetune models for any specific purpose: just provide a few datapoints and train. The more specific the purpose, the easier tweaking the weights is, not harder. (Surely, if nothing else, you've seen all of the LoRAs and other things for finetuning image gener...
Yes, in dire straits. But it's usually called 'hyperinflation' when you try to make seignorage equivalent to >10% of GDP and fund the government through deliberately creating high inflation (which is on top of any regular inflation, of course). And because inflation is about expectations in considerable part, you can't stop it either. Not to mention what happens when you start hyperinflation.
(FWIW, this is a perfectly reasonable question to ask a LLM first. eg Gemini-2.5-pro will give you a thorough and sensible answer as to why this would be extraordin...
This model seems to contradict https://www.lesswrong.com/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target because it has, in fact, developed reward as the optimization target without ever being instructed to maximize reward.
It doesn't contradict Turntrout's post because his claims are about an irrelevant class of RL algorithms (model-free policy gradients) . A model-based RL setting (like a human, or a LLM like Claude pretrained to imitate model-based RL agents in a huge number of settings ie. human text data) optimizes the reward, if it's ...
The intuition behind this approach draws from our understanding of selection in biological systems. Consider how medieval Europe dealt with violence:
This is a bad example because first, your description is incorrect (Clark nowhere suggests this in Farewell to Alms, as I just double-checked, because his thesis is about selecting for high-SES traits, not selecting against violence, and in England, not Europe - so I infer you are actually thinking of the Frost & Harpending thesis, which is about Western Europe, and primarily post-medieval England at th...
Experimentation is valuable for the high VoI, but it seems hard to encourage 'in general', because experimenting on anything is painful and difficult, and the more so the more important and valuable it is. So just 'subsidizing experiments' would be like 'subsidizing fixing bugs in source code'.
What would you do if you were a funder who wanted to avoid this? Well, you'd... fund specific experiments you knew were important and of high-value. Which is what the federal government and many other NGOs or philanthropists do.
......The loss of knowledge has been attributed to several factors. Firstly, Lind showed in his work that there was no connection between the acidity of the citrus fruit and its effectiveness at curing scurvy. In particular, he noted that acids alone (sulphuric acid or vinegar), would not suffice. Despite this, it remained a popular theory that any acid could be used in place of citrus fruit. This misconception had significant consequences.
When the Royal Navy changed from using Sicilian lemons to West Indian limes, cases of scurvy reappeared. The limes were
(Note for anyone confused why that Grok 3 archive snapshot looks 'cut off': there is a scroll-frame inside the page, which your browser may be hiding from you because everyone hides scrollbars these days. The conversation continues after "Hint: think like a number theorist / Thoughts".)
A google of the first paragraph takes you quickly to https://www.bluesci.co.uk/posts/forgotten-knowledge
I stumbled upon deep in the bowels of https://gwern.net/ which I've annoyingly never been able to find again.
Probably https://gwern.net/newsletter/2021/05#master-synthesis
I wouldn't have guessed just from the landing page that he's the discoverer of backprop, respected former program director at the NSF, etc.
That's what makes it alpha! If he was as legible as, say, Hinton, he would be mined out by now, and nothing but beta. (Similar situation to Schmidhuber - 'obvious crackpot' - although he's such a self-promoter that he overcomes it, and so at thi...
Are we sure that these questions aren’t in their datasets? I don’t think we can be. First off, you just posted them online.
Questions being online is not a bad thing. Pretraining on the datapoints is very useful, and does not introduce any bias; it is free performance, and everyone should be training models on the questions/datapoints before running the benchmarks (though they aren't). After all, when a real-world user asks you a new question (regardless of whether anyone knows the answer/label!), you can... still train on the new question then and there...
You can see it as an example of 'alpha' vs 'beta'. When someone asks me about the value of someone as a guest, I tend to ask: "do they have anything new to say? didn't they just do a big interview last year?" and if they don't but they're big, "can you ask them good questions that get them out of their 'book'?" Big guests are not necessarily as valuable as they may seem because they are highly-exposed, which means both that (1) they have probably said everything they will said before and there is no 'news' or novelty, and (2) they are message-disciplined a...
I agree with all of this! But I'm not sure I understand what you mean by "there may be mediation, but only in a weak sense". We were just interested in studying how models naturally learn in this RL setting
I am emphasizing that to me, this current mediation learning looks fragile and temporary, and is not a solid, long-term 'natural' thing - it is learning, but only as a temporary artificial heuristic that would wash away in the long run with more training or more diverse tasks etc.
My expectation is that in the limit, a model will learn to focus only on...
I would not believe that unless you have done a simulation study with the small n of this study, plausible levels of measurement error (alcoholism being much harder to measure than weight or body fat), with about a dozen covariates (to correspond to the different ways to slice the patients and threshold BMI etc), and then shown that you hardly ever get a false negative like this. My experience with doing such power analysis simulation studies for other things inclines me to think that people greatly overestimate how informative such small studies are once you allow for plausible levels of measurement error and (reverse) p-hacking degrees of freedom.
I don't think that study shows much either way: too small and underpowered to show much of anything (aside from the attrition undermining internal validity).
Dynomight's primary criticism doesn't hold much water because it is (un-pre-registered) reverse p-hacking. If you check enough covariates, you'll find a failure of randomization to balance on some covariate, and you can, if you wish, tell a post hoc story about how that is actually responsible for the overall mean difference. Nevertheless, randomization works, because on average why would any particular covariate be the way in which the confounding is mediated?
Just have to wait for more studies.
But did it inspire them to try to stop CelestAI or to start her? I guess you might need some more drinks for that one...
It's worth mentioning in this context that one of the most remarkable things about the recent wave of GLP-1/GIP drugs is that they seem to have large benefits on, for lack of a better word, willpower and psychiatry. Nor was this expected or predicted AFAIK, or clearly linked solely to the weight-loss: the justification in the animal experiments and early human trials were based purely on physiology and then the human diabetics reporting they felt a less hungry. So this is quite remarkable, and part of why GLP-1/GIP drugs are one of the best things to happe...
Do you think that these drugs significantly help with alcoholism (as one might posit if the drugs help significantly with willpower)? If so, I'm curious what you make of this Dynomight post arguing that so far the results don't look promising.
I don't think it's weird. Given that we know there are temporal trends towards increasing parameter size (despite Chinchilla), FLOPs, data, and continued progress in compute/data-efficiency (with various experience curves), any simple temporal chart will tend to show an increase unless you are specifically conditioning or selecting in some way to neutralize that. Especially when you are drawing with a fat marker on a log plot. Only if you had measured and controlled for all that and there was still a large unexplained residual of 'time' would you have to s...
In reality, we observe that roughly 85% of recommendations stay the same when flipping nationality in the prompt and freezing reasoning traces. This suggests that the mechanism for the model deciding on its recommendation is mostly mediated through the reasoning trace, with a smaller less significant direct effect from the prompt to the recommendation.
This might be less convincing than it seems, because the simple interpretation of the results to me seems to be something like, "the inner-monologue is unfaithful because in this setting, it is simply gene...
You would also expect that the larger models will be more sample-efficient, including at in-context learning of variations of existing tasks (which of course is what steganography is). So all scale-ups go much further than any experiment at small-scale like 8B would indicate. (No idea what 'medium-scale' here might mean.)
One possible interpretation here is going back to the inner-monologue interpretations as being multi-step processes with an error rate per step where only complete success is useful, which is just an exponential; as the number of steps increase from 1 to n, you get a sigmoid from ceiling performance to floor performance at chance. So you can tell the same story about these more extended tasks, which after all, are just the same sort of thing - just more so. We also see this sort of sigmoid in searching with a fixed model, in settings like AlphaZero in Hex...
While it's not possible to counter-signal with a suit in Japan, I feel the equivalent would be to wear traditional clothing like a samue or jinbei, which have their own set of challenges.
Yep. It can be pretty funny watching the contexts in which you can get away with a happi coat or a kimono/yukata; I can only speak from Japanese media rather than personal experience, but one thing I've noticed is that it seems a non-retired man wearing a kimono can still get away with it today as long as they are a sufficiently accomplished humanist or literary scholar...
This is an alarming point, as I find myself thinking about the DA today as well; I thought I was 'gwern', but it is possible I am 'robo' instead, if robo represents such a large fraction of LW-DA observer-moments. It would be bad to be mistaken about my identity like that. I should probably generate some random future dates and add them to my Google Calendar to check whether I am thinking about the DA that day and so have evidence I am actually robo instead.