GPT-4.5 is roughly a 10x scale-up of GPT-4, right? And full number jumps in GPT have always been ~100x? So GPT-4.5 seems like the natural name for OpenAI to go with.
10x is what it was, but it wasn't what it was supposed to be. That's just what they finally killed it at, after the innumerable bugs and other issues that they alluded to during the livestream and elsewhere, which is expected given the 'wait equation' for large DL runs - after a certain point, no matter how much you have invested, it's a sunk cost and you're better off starting afresh, such as, say, with distilled data from some sort of breakthrough model... (Reading between the lines, I suspect that what would become 'GPT-4.5' was one of the unknown projects besides Superalignment which suffered from Sam Altman overpromising compute quotas and gaslighting people about it, leading to an endless deathmarch where they kept thinking 'we'll get the compute next month', and the 10x compute-equivalent comes from a mix of what compute they scraped together from failed runs/iterations and what improvements they could wodge in partway even though that is not as good as doing from scratch, see OA Rerun.)
at that time the median estimate for GPT5 release was at December 2024.
Which was correct ex ante, and mostly correct ex post - that's when OA had been dropping hints about releasing GPT-4.5, which was clearly supposed to have been GPT-5, and seemingly changed their mind near Dec 2024 and spiked it before it seems like the DeepSeek moment in Jan 2025 unchanged their minds and they released it February 2025. (And GPT-4.5 is indeed a lot better than GPT-4 across the board. Just not a reasoning model or dominant over the o1-series.)
GPT was $20/month in 2023 and it's still $20/month.
Those are buying wildly different things. (They are not even comparable in terms of real dollars. That's like a 10% difference, solely from inflation!)
It’s not my view at all. I think a community will achieve much better outcomes if being bothered by the example message is considered normal and acceptable, and writing the example message is considered bad.
That's a strange position to hold on LW, where it has long been a core tenet that one should not be bothered by messages like that. And that has always been the case, whether it was LW2, LW1 (remember, say, 'babyeaters'? or 'decoupling'? or Methods of Rationality), Overcoming Bias (Hanson, 'politics is the mindkiller'), SL4 ('Crocker's Rules') etc.
I can definitely say on my own part that nothing of major value I have done as a writer online—whether it was popularizing Bitcoin or darknet markets or the embryo selection analysis or writing 'The Scaling Hypothesis'—would have been done if I had cared too much about "vibes" or how it made the reader feel. (Many of the things I have written definitely did make a lot of readers feel bad. And they should have. There is something wrong with you if you can read, say, 'Scaling Hypothesis' and not feel bad. I myself regularly feel bad about it! But that's not a bad thing.) Even my Wikipedia editing earned me doxes and death threats.
And this is because (among many other reasons) emotional reactions are inextricably tied up with manipulation, politics, and status - which are the very last things you want in a site dedicated to speculative discussion and far-out unpopular ideas, which will definitionally be 'creepy', 'icky', 'cringe', 'fringe', 'evil', 'bad vibes' etc. (Even the most brutal totalitarian dictatorships concede this when they set up free speech zones and safe spaces like the 'science cities'.)
Someone once wrote, upon being newly arrived to LW, a good observation of the local culture about how this works:
Could being “status-blind” in the sense that Eliezer claims to be (or perhaps some other not yet well-understood status-related property) be strongly correlated to managing to create lots of utility? (In the sense of helping the world a lot).
Currently I consider Yudkowsky, Scott Alexander, and Nick Bostrom to be three of the most important people. After reading Superintelligence and watching a bunch of interviews, one of first things I said about Nick Bostrom to a friend was that I felt like he legitimately has almost no status concerns (that was well before LW 2.0 launched). In case of S/A it’s less clear, but I suspect similar things.
Many of our ideas and people are (much) higher status than they used to be. It is no surprise people here might care more about status than they used to, in the same way that rich people care more about taxes than poor people.
But they were willing to be status-blind and not prize emotionality, and that is why they could become high-status. And barring the sudden discovery of an infallible oracle, we can continue to expect future high-status things to start off low-status...
I would also point out that, despite whatever she said in 1928 about her 1909 inheritance, Woolf committed suicide in 1941 after extensive mental health challenges which included "short periods in 1910, 1912, and 1913" in a kind of insane asylum, and then afterwards beginning her serious writing career (which WP describes as heavily motivated by her psychiatric problems as a refuge/self-therapy), so one can certainly question her own narrative of the benefits of her UBI or the reasons she began writing. (I will further note that the psychological & psychiatric benefits of UBI RCTs have not been impressive to date.)
Update: Bots are still beaten by human forecasting teams/superforecasters/centaurs on truly heldout Metaculus problems as of early 2025: https://www.metaculus.com/notebooks/38673/q1-ai-benchmarking-results/
A useful & readable discussion of various methodological problems (including the date-range search problems above) which render all forecasting backtesting dead on arrival (IMO) was recently compiled as "Pitfalls in Evaluating Language Model Forecasters", Paleka et al 2025, and is worth reading if you are at all interested in the topic.
Personality traits are especially nasty a danger because given the existence of: stabilizing selection + non-additive variance + high social homogamy/assortative mating + many personality traits with substantial heritability, you can probably create extreme self-sustaining non-coercive population structure with a package of edits. I should probably write some more about this because I think that embryo selection doesn't create this danger (or in general result in the common fear of 'speciation'), but embryo editing/synthesis does.
Key lesson:
One conclusion we have drawn from this is that the most important factor for good forecasting is the base model, and additional prompting and infrastructure on top of this provide marginal gains.
Scaling remains undefeated.
If it's not obvious at this point why, I would prefer to not go into it here in a shallow superficial way, and refer you to the OA coup discussions.