Agent-foundations researcher. Working on Synthesizing Standalone World-Models, aiming at a timely technical solution to the AGI risk fit for worlds where alignment is punishingly hard and we only get one try.
Currently looking for additional funders ($1k+, details). Consider reaching out if you're interested, or donating directly.
Or get me to pay you money ($5-$100) by spotting holes in my agenda or providing other useful information.
I think it's worth spending 10h/week even if you expect to get less than 10h/week in productivity boost right now, because it does take a while to get good at using these systems
I am aware of this argument. Counterpoint: models get increasingly easier to use as they get more powerful – better at inferring your intent, not subject to entire classes of failure modes plaguing earlier generations, etc. – so the skills you'll learn by painstakingly wrangling current LLMs will end up obsoleted by subsequent generation.
Like, inasmuch as one buys that LLMs are on the trajectory to becoming absurdly powerful, one should not expect to need to develop intricate skillsets for squeezing value out of them. You're not gonna need to prompt-engineer AGIs and invent custom scaffolds for them, they will build the scaffolds for themselves and your cleverest prompts will be as effective as "just talk to them the obvious way". (Same for ad-hoc continuous-memory setups and context-management hacks et cetera: if the AGI labs crack architectural continuous learning, it'll all be obsoleted overnight.)
On the other hand, inasmuch as you don't believe that LLMs are going to be getting increasingly easier to use, you essentially don't believe that they're on the trajectory to become absurdly powerful AGIs. If so, you should downgrade your expectation of how much value their future generations will bring you, and accordingly downgrade how much you should be investing in them now.
Oh, by the way: I saw you saying that you're observing much more software downstream of LLMs. Any chance you can elaborate on that, provide some examples? This is the sort of thing I'm very interested in tracking, and high-quality information sources are hard to come by.
Well, depends on the job, I suppose. I did read your post on the topic, and I'm guessing it indeed makes much more sense in the context of automating parts of a company, with lots of time-consuming but boilerplate-y tasks.
As someone doing math/conceptual research, I don't currently see much potential there. I can imagine stuff that would be useful for me, e. g.:
But none of this would be an equivalent of even a 10h/week productivity boost, I don't think.
To clarify, being able to speed-read a paper with an LLM or do a literature review using a Deep Research feature is very helpful for me. But this is the "80% of the value that you can get just by using the out-of-the-box tools the obvious way" I was talking about. Stuff on top of that mostly isn't worth it.
IMO, the correct approach for most people is more along the lines of "try to be passively aware that LLMs exist now, and be constantly on the lookout for things where they could be easily applied for significant benefits", rather than "spend N hours/week integrating them into your workflows in nontrivial-to-implement ways".
Model to track: You get 80% of the current max value LLMs could provide you from standard-issue chat models and any decent out-of-the-box coding agent, both prompted the obvious way. Trying to get the remaining 20% that are locked behind figuring out agent swarms, optimizing your prompts, setting up ad-hoc continuous-memory setups, doing comparative analyses of different frontier models' performance on your tasks, inventing new galaxy-brained workflows, writing custom software, et cetera, would not be worth it: it would take too long for too little payoff.
There is an "LLMs for productivity!" memeplex that is trying to turn people into its hosts by fostering FOMO in those who are not investing tons of their time into tinkering with LLMs. You should ignore it. At best it would waste your time; at worst it would corrupt your priorities, convincing you that you should reorient your life around "optimizing your Claude Code setup" or writing productivity apps for yourself. LW regulars may be especially vulnerable to it: we know that AI is going to become absurdly powerful sooner or later, so it takes relatively little to sell to us the idea that it already is absurdly powerful – which may or may not be currently being exploited by analogues of crypto grifters.
(Not to say you mustn't be tinkering with LLMs and vibe-coding custom software, especially if you're having fun! But you should perhaps approach it in the spirit of a hobby, rather than the thing you should be doing.)
Well, at least, that's my takeaway from watching the current ideatic ecosystem around LLMs and trying that stuff for myself (one, two, three). I do have tons of ideas about custom software that perhaps could 1.1x my productivity... but it's too complex for the LLMs of today to vibe-code in a truly hands-off manner, and is not worth the time otherwise. Maybe in six more months.
Obviously "reverse any advice you hear" and "Thane has terminal skill issues and this post is sour grapes" may or may not apply. (Though, of course, "you have skill issues if you haven't figured out how to 10x your productivity using LLMs, you must keep trying or you'll be left behind in the permanent underclass!!!" is the standard recruitment pitch of the aforementioned memeplex.)
And then here is the full response from Sam Altman [to Anthropic's ad]
There was so much to unpack in that one. The line about how it's "on brand for Anthropic to use a deceptive ad to critique theoretical deceptive ads that aren’t real" takes the cake, of course. Amazing stuff.
Anthropic and the Pentagon are clashing, because the Pentagon wants to use Claude for autonomous weapon targeting and domestic surveillance, and Anthropic doesn’t want that.
Feels important to note that this is a (minor) positive update on Anthropic for me, worth a hundred nice-sounding Dario essays and Claude Constitutions. I expect them to completely cave in after a bit, hence it being only a minor update. But at least they didn't start out pre-caved-in.
I largely disagree. I suppose there are different types of notes:
They have different levels of usefulness:
"People have this aspirational idea of building a vast, oppressively colossal, deeply interlinked knowledge graph to the point that it almost mirrors every discrete concept and memory in their brain. And I get the appeal of maximalism."
Guilty as charged. I do not regret my crime and I will attempt it again.
I agree that there are ways to define the "capabilities"/"intelligence" of a system where increasing them won't necessarily increase its long-term coherence. Primarily: scaling its ability to solve problems across all domains except the domain of decomposing new unsolved problems into combinations of solved problems. I. e., not teaching it (certain kinds of?) "agency skills". The resultant entity would have an abysmal time horizon (in a certain sense), but it can be made vastly capable, including vastly more capable than most people at most tasks. However, it would by definition be unable to solve new problems, not even those within its deductive closure.
Inasmuch as a system can produce solutions to new problems by deductive/inductive chains, however, it would need to be able to maintain coherence across time (or, rather, across inferential distances, for which time/context lengths are a proxy). And that's precisely what the AI industry is eager to make LLMs do, what it often measures capabilities in.
(I think the above kind of checks out with the distinction you gesture at? Maybe not.)
So yes, there are some notions of "intelligence" and "scaling intelligence" that aren't equivalent to some notions of "coherence" and "scaling coherence". But I would claim it's a moot point, because at this point, the AI industry explicitly wants the kind of intelligence that is equivalent to long-term coherence.
Frankly, the very premise of this paper seems ridiculous to me, to a considerably greater extent than even most other bad alignment takes. How can the notion that agents may be getting more incoherent as they become more capable even exist within an industry that's salivating over the prospect of climbing METR's "maintain coherence over longer spans of time" benchmark?
Will Automating AI R&D not work for some reason, or will it not lead to vastly superhuman superintelligence within 2 years of "~100% automation" for some reason?
My current main guess is that it will more-or-less work, and then it will not lead to vastly superhuman superintelligence.
Specifically: I expect that the current LLM paradigm is sufficient to automate its own in-paradigm research, but that this paradigm is AGI-incomplete. Which means it's possible to "skip to the end" by automating it to superhuman speeds, but what lies at its end won't be AGI.
Like, much of the current paradigm is "make loss go down/reward go up by trying various recombinations of slight variations on a bunch of techniques, constructing RL environments, and doing not-that-deep math research". That means the rewards are verifiable across the board, so there's "no reason" why RLVR + something like AlphaEvolve won't work for automating it. But it's still possible that you can automate ~all of the AI research that's currently happening at the frontier labs, and still fail to get to AGI.
(Though it's possible that what lies at the end will be a powerful-enough non-AGI AI tool that it'll make it very easy for the frontier labs to then use it to R&D an actual AGI, or take over the world, or whatever. This is a subtly different cluster of scenarios, though.)
Ryan had suggested that, on his model, spending ~5%-more-than-commercially-expedient resources on alignment might drop takeover risks down to 50%. I'm interested in how he thinks this scales: how much more resources, in percentage terms, would be needed to drop the risk to 20%, 10%, 1%?
Not convinced this isn't a temporary artefact of the current time horizons. Like, in the future, I think it's plausible that the two categories of tasks you'd be delegating would be either (a) the sort of shallow tasks the future models would be able to complete instantly, or (b) the sort of deep tasks that'd take future models hours to complete.
Fair enough, though, maybe this counts. But is there really a rich suite of skills like that, and would they really take that long to learn by the time learning them does become immediately net-positive?