Thane Ruthenis - LessWrong

Whenever I send an LLM some query I expect to be able to answer myself (instead of requesting a primer on some unknown-to-me subject), I usually try to figure out how to solve it myself, either before reading the response, or before sending the query at all. I. e., I treat the LLM's take as a second opinion.

This isn't a strategy against brain atrophy, though: it's because (1) I often expect to be disappointed by the LLM's answer, meaning I'll end up needing to solve the problem myself anyway, so might as well get started on that, (2) I'm wary of the LLM concocting some response that's subtly yet deeply flawed, so it's best if I have an independent take to contrast it with. And if I do skip this step before reading the response, I usually indeed then end up disappointed by/suspicious of the LLM's take, so end up having to think it over myself anyway.

It confuses me a bit when people talk about LLMs atrophying their brains, because the idea of blindly taking an LLM's response at face value^[1] doesn't immediately occur to me as a thing someone might do.

So my advice for avoiding LLM brain atrophy would be to reframe your model of LLMs to feature a healthy, accurate level of distrust towards them. The brain-atrophy-preventing strategies then just become the natural, common-sensical things to do, rather than something extra.

^{^}
In situations where you would've otherwise reasoned it out on your own, I mean. I do mostly trust them to report the broad strokes of well-established knowledge accurately, at this point. But the no-LLM counterfactual there would've involved me likewise just reading that information from some (likely lower-quality) internet source, so there's no decrease in brain exercise.

AI Moratorium Stripped From BBB

Thane Ruthenis2d114

that I discussed in AI #1191

Here's to the world staying around long enough for us to read AI #1191.

johnswentworth's Shortform

Thane Ruthenis3d42

Recommending movies

I've been trying to use Deep Research tools as a way to find hyper-specific fiction recommendations as well. The results have been mixed. They don't seem to be very good at grokking the hyper-specificness of what you're looking for, usually they have a heavy bias towards the popular stuff that outweighs what you actually requested^[1], and if you ask them to look for obscure works, they tend to output garbage instead of hidden gems (because no taste).

It did produce good results a few times, though, and is only slightly worse than asking for recommendations on r/rational. Possibly if I iterate on the prompt a few times (e. g., explicitly point out the above issues?), it'll actually become good.

^{^}
Like, suppose I'm looking for some narrative property X. I want to find fiction with a lot of X. But what the LLM does is multiplying the amount of X in a work by the work's popularity, so that works that are low in X but very popular end up in its selection.

Project Vend: Can Claude run a small shop?

Thane Ruthenis3d197

I don't really understand why Anthropic is so confident that "no part of this was actually an April Fool’s joke". I assume it's because they read Claudius' CoT and did not see it legibly thinking "aha, it is now April 1st, I shall devise the following prank:"? But there wouldn't necessarily be such reasoning. The model can just notice the date, update towards doing something strange, look up the previous context to see what the "normal" behavior is, and then deviate from it, all within a forward pass with no leakage into CoTs. Edit: ... Like a sleeper agent being activated, you know.

The timing is so suspect. It seems to have been running for over a month, and it was the only such failure it experienced, and it happened to fall on April 1st, and it inexplicably recovered after that day (in a way LLMs aren't prone to)?

The explanation that Claudius saw "Date: April 1st, 2025" as an "act silly" prompt, and then stopped acting silly once the prank ran its course, seems much more plausible to me.

(Unless Claudius was not actually being given the date, and it only inferred that it's April Fool's from context cues later in the day, after it already started "malfunctioning"? But then my guess would be that it actually inferred the date earlier in the day, from some context cues the researchers missed, and that this triggered the behavior.)

johnswentworth's Shortform

Thane Ruthenis4dΩ140

I agree that it's a promising direction.

I did actually try a bit of that back in the o1 days. What I've found is that getting LLMs to output formal Lean proofs is pretty difficult: they really don't want to do that. When they're not making mistakes, they use informal language as connective tissue between Lean snippets, they put in "sorry"s (a placeholder that makes a lemma evaluate as proven), and otherwise try to weasel out of it.

This is something that should be solvable by fine-tuning, but at the time, there weren't any publicly available decent models fine-tuned for that.

We do have DeepSeek-Prover-V2 now, though. I should look into it at some point. But I am not optimistic, sounds like it's doing the same stuff, just more cleverly.

Relevant: Terence Tao does find them helpful for some Lean-related applications.

johnswentworth's Shortform

Thane Ruthenis4d*Ω9266

(Disclaimer: only partially relevant rant.)

Outside of [coding], I don't know of it being more than a somewhat better google

I've recently tried heavily leveraging o3 as part of a math-research loop.

I have never been more bearish on LLMs automating any kind of research than I am now.

And I've tried lots of ways to make it work. I've tried telling it to solve the problem without any further directions, I've tried telling it to analyze the problem instead of attempting to solve it, I've tried dumping my own analysis of the problem into its context window, I've tried getting it to search for relevant lemmas/proofs in math literature instead of attempting to solve it, I've tried picking out a subproblem and telling it to focus on that, I've tried giving it directions/proof sketches, I've tried various power-user system prompts, I've tried resampling the output thrice and picking the best one. None of this made it particularly helpful, and the bulk of the time was spent trying to spot where it's lying or confabulating to me in its arguments or proofs (which it ~always did).

It was kind of okay for tasks like "here's a toy setup, use a well-known formula to compute the relationships between A and B", or "try to rearrange this expression into a specific form using well-known identities", which are relatively menial and freed up my working memory for more complicated tasks. But it's pretty minor usefulness (and you have to re-check the outputs for errors anyway).

I assume there are math problems at which they do okay, but that capability sure is brittle. I don't want to overupdate here, but geez, getting LLMs from here to the Singularity in 2-3 years just doesn't feel plausible.

Roman Malov's Shortform

Thane Ruthenis5d50

Money is a claim on things other people value. You can't destroy value purely by doing something with your claim on that value.

Except the degenerate case of "making yourself or onlookers sad by engaging in self-destructive behaviors where you destroy your claim on resources", I guess. But it's not really an operation purely with money.

Hmm, I guess you can make something's success conditional on your having money (e. g., a startup backed by your investments), and then deliberately destroy your money, dooming the thing. But that's a very specific situation and it isn't really purely about the money either; it's pretty similar to "buy a thing and destroy it". Closest you can get, I think?

(Man, I hope this is just a concept-refinement exercise and I'm not giving someone advice on how to do economics terrorism.)

Habryka's Shortform Feed

Thane Ruthenis5d366

I don't believe you that anyone was talking in late 2023 that GPT-5 was coming out in a few months

Out of curiosity, I went to check the prediction markets. Best I've found:

From March 2023 to January 2024, expectations that GPT-5 will come out/be announced in 2023 never rose above 13% and fell to 2-7% in the last three months (one, two, three).
Based on this series of questions, at the start of 2024, people's median was September 2024.

I'd say this mostly confirms your beliefs, yes.

(Being able to check out the public's past epistemic states like this is a pretty nifty feature of prediction-market data I haven't realized before!)

End of 2024 would have been a quite aggressive prediction even just on reference class forecasting grounds

76% on "GPT-5 before January 2025" in January 2024, for what it's worth.

reasoning models have replaced everything and seem like a bigger deal than GPT-5 to me.

Ehhh, there are scenarios under which they retroactively turn out not to be a "significant advance" towards AGI. E. g., if it actually proves true that RL training only elicits base models' capabilities and not creates them; or if they turn out to scale really poorly; or if their ability to generalize to anything but the most straightforward verifiable domains disappoints^[1].

And I do expect something from this cluster to come true, which would mean that they're only marginal/no progress towards AGI.

That said, I am certainly not confident in this, and they are a nontrivial advance by standard industry metrics (if possibly not by the p(doom) metric). And if we benchmark "a significant advance" as "a GPT-3(.5) to GPT-4 jump", and then tally up all progress over 2024 from GPT-4 Turbo to Sonnet 3.6 and o1/o3^[2], this is probably a comparable advance.^[3]

I'd count it as "mostly false". 0-0.2?

^{^}
I don't think we've seen much success there yet? I recall Noam Brown pointing to Deep Research as an example, but I don't buy that.
Models have been steadily getting better across the board, but I think it's just algorithmic progress/data quality + distillation from bigger models, not the reasoning on/off toggle?
Oh, hm, I guess we can count o3's lying tendencies as a generalization of its reward-hacking behavior to "soft" domains from math/coding. I am not sure how to count this one, though. I mean, I'd like to make a dunk here, but it does seem to be weak-moderate evidence for the kind of generalization I didn't want to see.
^{^}
Though I'm given to understand the o3 announced at the end of 2024 and the o3 available now are completely different models, see here and here. So we don't actually know how 2024!o3 "felt" like, beyond the benchmarks; and so assuming that the modern o3's capability level was already reached by EOY 2024 is unjustified, I think.
^{^}
This is the point where I would question whether "GPT-3.5 to GPT-4" was a significant advance towards AGI, and drop a hot take that no it wasn't. But Gary Marcus' wording implies that GPT-5 would count as a significant advance by his lights, so whatever.

the gears to ascenscion's Shortform

Thane Ruthenis6d20

Huh. Sydney vibes. I wonder whether it's its ghost possessing Gemini, or whether this cadence of speech^[1] is some natural attractor. (Should've paid more attention to LLM whisperers.)

^{^}
The "spelling out the logical chain in separate self-contained sentences with a lot of 'and's" thing? E. g. here.

Foom & Doom 1: “Brain in a box in a basement”

Thane Ruthenis8d20

I’m surprised you think that the brain’s algorithm is SO simple that it must be discovered soon

I wouldn't say that "in 25 years" is "soon", and 5-25 years seems like a reasonable amount of uncertainty.

What are your timelines?

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments