But the outside view on LLM hitting a wall and being a “stochastic parrot” is true? GPT4O has been weaker and cheaper than GPT4T in my experience, and the same is true w.r.t. GPT4T vs. GPT4. The two versions of GPT4 seem about the same. Opus is a bit stronger than GPT4, but not by much and not in every topic. Both Opus and GPT4 exhibit patterns of being a stochastic autocompleter, and not a logician. (Humans aren’t that much better, of course. People are terrible at even trivial math. Logic and creativity are difficult.) DallE etc. don’t really have an artistic sense, and still need prompt engineering to produce beautiful art. Gemini 1.5 Pro is even weaker than GPT4, and I’ve heard Gemini Ultra has been retired from public access. All of these models get worse as their context grows, and their grasp of long range dependencies is terrible.

The pace is of course still not too bad compared with other technologies, but there doesn’t seem to be any long-context “Q*” GPT5s in store, from any company.

PS: Does lmsys do anything to control for the speed effect? GPT4O is very fast, and that alone should be responsible for many ELOs.

Reply

1

What should the norms around AI voices be?

Answer by Rudi CMay 26, 202410

Persuasive AI voices might just make all voices less persuasive. Modern life is full of these fake super stimulants anyway.

Reply

AI #64: Feel the Mundane Utility

Rudi C14d10

Can you create a podcast of posts read by AI? It’s difficult to use otherwise.

Reply

AI #64: Feel the Mundane Utility

Rudi C14d10

Can you create a podcast of posts read by AI? It’s difficult to use otherwise.

Reply

Changes in College Admissions

Rudi C1mo30

I doubt this. Test-based admissions don't benefit from tutoring (in the highest percentiles, compared to less hours of disciplined self-study) IMO. We Asians just like to optimize the hell of them, and most parents aren't sure if tutoring helps or not, so they register their children for many extra classes. Outside of the US, there aren't that many alternative paths to success, and the prestige of scholarship is also higher.

Also, tests are somewhat robust to Goodharting, unlike most other measures. If the tests eat your childhood, you'll at least learn a thing or two. I think this is because the Goodharting parts are easy enough that all the high-g people learn them quickly in the first years of schooling, so the efforts are spent just learning the material by doing more advanced exercises. Solving multiple-choice math questions by "wrong" methods that only work for multiple-choice questions is also educational and can come in handy during real work.

Reply

AI Regulation is Unsafe

Rudi C1mo54

AGI might increase the risk of totalitarianism. OTOH, a shift in the attack-defense balance could potentially boost the veto power of individuals, so it might also work as a deterrent or a force for anarchy.

This is not the crux of my argument, however. The current regulatory Overton window seems to heavily favor a selective pause of AGI, such that centralized powers will continue ahead, even if slower due to their inherent inefficiencies. Nuclear development provides further historical evidence for this. Closed AGI development will almost surely lead to a dystopic totalitarian regime. The track record of Lesswrong is not rosy here; the "Pivotal Act" still seems to be in popular favor, and OpenAI has significantly accelerated closed AGI development while lobbying to close off open research and pioneering the new "AI Safety" that has been nothing but censorship and double-think as of 2024.

Reply

AI Regulation is Unsafe

Rudi C1mo20

A core disagreement is over “more doomed.” Human extinction is preferable to a totalitarian stagnant state. I believe that people pushing for totalitarianism have never lived under it.

Reply

1

NYT is suing OpenAI&Microsoft for alleged copyright infringement; some quick thoughts

Rudi C5mo10

ChatGPT isn’t a substitute for a NYT subscription. It wouldn’t work at all without browsing. It would probably get blocked with browsing enabled, both by NYT through its useragent, and by OpenAI’s “alignment.” Even if it doesn’t get blocked, it would be slower than skimming the article manually, and its output not trustable.

OTOH, NYT can spend pennies to have an AI TLDR at the top of each of their pages. They can even use their own models, as semanticscholar does. Anybody who is economical enough to prefer the much worse experience of ChatGPT, would not have paid NYT in the first place. You can bypass the paywall trivially.

In fact, why don’t NYT authors write a TLDR themselves? Most of their articles are not worth reading. Isn’t the lack of a summary an anti-user feature to artificially inflate their offering’s volume?

NYT would, if anything, benefit from LLMs potentially degrading the average quality of the competing free alternatives.

The counterfactual version of GPT4 that did not have NYT in its training is extremely unlikely to have been a worse model. It’s like removing sand from a mountain.

The whole case is an example of rent-seeking post-capitalism.

Reply

The Offense-Defense Balance Rarely Changes

Rudi C6mo10

This is unrealistic. It assumes:

Orders of magnitude more intelligence
The actual usefulness of such intelligence in the physical world with its physical limits

The more worrying prospect is that the AI might not necessarily fear suicide. Suicidal actions are quite prevalent among humans, after all.

Reply

How have you become more hard-working?

Answer by Rudi COct 07, 202371

In estimated order of importance:

Just trying harder for years to build better habits (i.e., not giving up on boosting my productivity as a lost cause)
Time tracking
(Trying to) abandon social media
Exercising (running)
Having a better understanding of how to achieve my goals
Socializing with more productive people
Accepting real responsibilities that makes me accountable to other people
Keeping a daily journal of what I have spent each day doing (high-level as opposed to the low-level time tracking above)

The first two seem the fundamental ones, really. Some of the rest naturally follow from those two (for me).

Reply