Another reason for thinking that LLM AGI will have memory/state, conditional on AGI being built, is that it's probably the only blocker to something like drop-in remote workers being built, and from there escalating to AGI and ASI because it would allow for potentially unbounded meta-learning given unbounded resources, and even make meta-learning in general far more effective for longer time periods.
Gwern explains why meta-learning explains basically all of the baffling LLM weaknesses here, and the short version is that right now, LLM weights are frozen af...
I have said something on this, and the short form is I don't really believe in Christiano's argument that the Solomonoff Prior is malign, because I think there's an invalid step in the argument.
The invalid step is where it is assumed that we can gain information about other potential civilization's values solely by the fact that we are in a simulation, and the key issue is since the simulation/mathematical multiverse hypotheses predict everything, this means we can gain no new information in a Bayesian sense.
(This is in fact the problem with the simulation...
My view is that the answer is still basically yes for instrumental convergence being a thing for virtue driven agents, if we condition on them being as capable as humans, because instrumental convergence is the reason general intelligence works at all:
(That said, the instrumental convergence pressure could be less strong for virtues than for consequentialism, depending on details)
That said, I do think virtue ethics and dentology are relevant in AI s...
The real point is where capital investment into AI declines because the economy tips over into a mild recession, and I'd like to see whether the tariffs make it likely that future AI investments decrease over time, meaning the timeline to superintelligent AI gets longer.
I wanted to ask this question, but what do you think the impact of the new tariffs will do to your timelines?
In particular, there's a strange tariff for Taiwan where semiconductors are exempt, but the actual GPUs are not, for some reason, and the specific tariff for Taiwan is 32%.
I ask because I could plausibly see post-2030 timelines if AI companies can't buy many new chips because they are way too expensive due to the new tariffs all across the world.
Shouldn't a 32% increase in prices only make a modest difference to training FLOP? In particular, see the compute forecast. Between Dec 2026 and Dec 2027, compute increases by roughly an OOM and generally it looks like compute increases by a bit less than 1 OOM per year in the scenario. This implies that a 32% reduction only puts you behind by like 1-2 months.
My own take is that I'm fairly sympathetic to the "LLMs are already able to get to AGI" view, with the caveat that most of the difference between human and LLM learning where humans are superior than LLMs comes from being able to do meta-learning over long horizons, and we haven't yet been shown this is possible for LLMs to do purely by scaling compute.
Indeed, I think it's the entire crux of the scaling hypothesis debate, in whether scale enables meta-learning over longer and longer time periods:
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measur...
Gradient Updates has a post on this by Anson Ho and Jean-Stanislas Denain on why benchmarks haven't reflected usefulness, and a lot of the reason is that they underestimated AI progress and didn't really have an incentive to make benchmarks reflect realistic use cases:
https://epoch.ai/gradient-updates/the-real-reason-ai-benchmarks-havent-reflected-economic-impacts
One particular reason that I haven't seen addressed very much in why I don't support/endorse PauseAI, beyond the usual objections, is that there probably aren't going to be that many warning shots that can actually affect policy, at least conditional on misalignment being a serious problem (which doesn't translate to >50% probability of doom), because the most likely takeover plan (at least assuming no foom/software intelligence explosion) fundamentally relies not on killing people, but on launching internal rogue deployments to sabotage alignment work ...
For future work on the software intelligence explosion, I'd like to see 2 particular points focused on here, @Tom Davidson:
1 is estimating the complementarity issue, and more generally pinning down the rho number for software, because whether complementary or substitution effects dominate during the lead up to automating all AI R&D is a huge factor in whether an intelligence explosion is self-sustaining.
More from Tamay Besiroglu and Natalia Coelho here:
I agree that some inference compute can be shifted from capabilities to safety, and it work just as well even during a software intelligence explosion.
My worry was more so that a lot of the control agenda and threat models like rogue internal deployments to get more compute would be fundamentally threatened if the assumption that you had to get more hardware compute for more power was wrong, and instead a software intelligence explosion could be done that used in principle fixed computing power, meaning catastrophic actions to disempower humanity/defeat co...
Some thoughts on this post:
You need adaptability because on the timeframe that you might build a company or start a startup or start a charity, you can expect the rest of the world to remain fixed. But on the timeframe that you want to have a major political movement, on the timeframe that you want to reorient the U.S. government's approach to AI, a lot of stuff is coming at you. The whole world is, in some sense, weighing in on a lot of the interests that have historically been EA's interests.
I'll flag that for AI safety specifically, the world hasn't yet...
lc has argued that the measured tasks are unintentionally biased towards ones where long-term memory/context length doesn't matter:
https://www.lesswrong.com/posts/hhbibJGt2aQqKJLb7/shortform-1#vFq87Ge27gashgwy9
I like your explanation of why normal reliability engineering is not enough, but I'll flag that security against actors are probably easier than LW in general portrays, and I think computer security as a culture is prone to way overestimating the difficulty of security because of incentive issues, not remembering the times something didn't happen, and more generally side-channels arguably being much more limited than people think they do (precisely because they rely on very specific physical stuff, rather than attacking the algorithm).
It's a non-trivial po...
I have 2 cruxes here:
In particular, I do not buy that humans and chimpanzees are nearly that similar as Heinrich describes, and a big reason for this is that the study that showed that had heavily optimized and selected the best chimpanzees against reasonably average humans, which is not a good way to compare performance if you want the results to generalize.
I don't think ...
I agree evolution has probably optimized human learning, but I don't think that it's so heavily optimized that we can use it to give a tighter upper bound than 13 OOMs, and the reason for this is I do not believe that humans are in equilibrium, and this means that there are probably optimizations left to discover, so I do think the 13 OOMs number is plausible )with high uncertainty).
Comment below:
https://www.lesswrong.com/posts/DbT4awLGyBRFbWugh/#mmS5LcrNuX2hBbQQE
I'll flag that while I personally didn't believe in the idea that orcas are on average >6 SDs smarter than humans, and never considered it that plausible, I'd say that I don't think orcas could actually benefit that much from +6 SDs even if applied universally, and the reason is that they are in water, which severely limits your available technology options, and makes it really, really hard to form the societies needed to generate the explosion that happened post-industrial Revolution or even the agricultural revolution.
And there is a deep local optimum...
My take is that the big algorithmic difference that explains a lot of weird LLM deficits, and plausibly explains the post's findings, is that current neural networks do not learn at run-time, instead their weights are frozen, and this explains a central difference of why humans are able to outperform LLMs at longer tasks, because humans have the ability to learn at run-time, as do a lot of other animals.
Unfortunately, this ability is generally lost gradually starting in your 20s, but still the existence of non-trivial learning at runtime is a huge explaine...
Re other theories, I don't think that all other theories in existence have infinitely many adjustable parameters, and if he's referring to the fact that lots of theories have adjustable parameters that can range over the real numbers, which are infinitely complicated in general, than that's different, and string theory may have this issue as well.
Re string theory's issue of being vacuous, I think the core thing that string theory predicts that other quantum gravity models don't is that at the large scale, you recover general relativity and the standard mod...
That said, for the purposes of alignment, it's still good news that cats (by and large) do not scheme against their owner's wishes, and the fact that cats can be as domesticated as they are while they aren't cooperative or social is a huge boon for alignment purposes (within the analogy, which is arguably questionable).
I basically don't buy the conjecture of humans being super-cooperative in the long run, or hatred decreasing and love increasing.
To the extent that something like this is true, I expect it to be a weird industrial to information age relic that utterly shatters if AGI/ASI is developed, and this remains true even if the AGI is aligned to a human.
To clarify a point here:
“Oh but physical devices can’t run an arbitrarily long tape”
This is not the actual issue.
The actual issue is that even with unbounded resources, you still couldn't simulate an unbounded tape because you can't get enough space for positional encodings.
Humans are not Turing-complete in some narrow sense;
Note that for the purpose of Turing-completeness, we only need to show that if we gave it unbounded resources, it could solve any computable problem without having to change the code, and we haven't actually proven that humans aren't Turing complete (indeed my big guess is that humans are Turing completeness).
IMO, the discontinuity that is sufficient here is that I expect societal responses to be discontinuous, rather than continuous, and in particular, I expect societal responses will come when people start losing jobs en masse, and at that point, either the AI is aligned well enough that existential risk is avoided, or the takeover has inevitably happened and we have very little influence over the outcome.
On this point:
...Meaningful representative example in what class: I think it's representative in 'weird stuff may happen', not in we will get more teenage-in
This is cruxy, because I don't think that noise/non-error freeness alone of your observations lead to bribing surveyors unless we add in additional assumptions about what that noise/non-error freeness is.
(in particular, simple IID noise/quantum noise likely doesn't lead to extremal Goodhart/bribing surveyors.)
More generally, the reason I maintain a difference between these 2 failure modes of goodharting, like regressional and extremal goodharting is because they respond differently to decreasing the error.
I suspect that in the limit of 0 error, regressiona...
My own take is I do endorse a version of the "pausing now is too late objection", more specifically I think that for most purposes, we should assume pauses are too late to be effective when thinking about technical alignment, and a big portion of the reason is that I don't think we will be able to convince many people that AI is powerful enough to need governance without them first hand seeing massive job losses, and at that point we are well past the point of no return for when we could control AI as a species.
In particular, I think Eliezer is probably vi...
Re democratic countries overtaken by dictatorial countries, I think that this will only last until AI that can automate at least all white collar labor is achieved, and maybe even most blue collar physical labor well enough that human wages for those jobs decline below what you need to subsist on a human, and by then dictatorial/plutocratic countries will unfortunately come back as a viable governing option, and maybe even overtaking democratic countries.
So to come back to the analogy, I think VNM-rationality dictatorship is unfortunately common and conver...
My guess is that the answer is also likely no, because the self-model is still retained to a huge degree, so p-zombies can't really exist without hugely damaging the brain/being dead.
I explain a lot more about the (IMO) best current model of how consciousness works in general, since I reviewed a post on this topic:
https://www.lesswrong.com/posts/FQhtpHFiPacG3KrvD/seth-explains-consciousness#7ncCBPLcCwpRYdXuG
I was implicitly assuming a closed system here, to be clear.
The trick that makes the game locally positive sum is that the earth isn't a closed system relative to the sun, and when I said globally I was referring to the entire accessible universe.
Thinking about that though, I now think this is way less relevant except on extremely long timescales, but the future may be dominated by very long-term people, so this does matter again.
I think I understand the question now.
I actually agree that if we assume that there's a finite maximum of atoms, we could in principle reformulate the universal computer as a finite state automaton, and if we were willing to accept the non-scalability of a finite state automaton, this could actually work.
The fundamental problem is that now we would have software that only works up to a specified memory limit, because we essentially burned the software into the hardware of the finite automaton and if you are ever uncertain of how much memory or time a probl...
Notably, this is why we focus on the arbitrarily large memory and time case, where we can assume that the machine has arbitrarily large memory and time to work with.
The key question here is whether a finite physical computer can always be extended with more memory and time without requiring us to recode the machine into a different program/computer, and most modern computers can do this (modulo physical issues of how you integrate more memory and time).
In essence, the key property of modern computers is that the code/systems descriptor doesn't change if we add more memory and time, and this is the thing that leads to Turing-completeness if we allow unbounded memory and time.
Notably, this was exactly the sort of belief I was trying to show is false, and your observation about the physical universe does not matter for the argument I made here, because the question is whether with say 2^1000000 atoms, you can solve larger problem sizes with the same code, and Turing-complete systems say yes to the question.
In essence, it's a question of whether we can scale our computers with more memory and time without having to change the code/algorithms, and basically all modern computers can do this in theory.
...I think a much more interesti
Every Turing machine definition I've ever seen says that the tape has to be truly unbounded. How that's formalized varies, but it always carries the sense that the program doesn't ever have to worry about running out of tape. And every definition of Turing equivalence I've ever seen boils down to "can do any computation a Turing machine can do, with at most a bounded speedup or slowdown". Which means that programs on Turing equivalent computer must not have to worry about running out of storage.
...You can't in fact build a computer that can run any arbitrary
Nowadays, I think the main reason humans took off is because human hands were extremely suited for tool use and being at range, which means that there is a selection effect at both the genetic level for more general intelligence and a selection effect on cultures for more cultural learning, and animals just mostly lack this by default, meaning that their intelligence is way less relevant than their lack of good actuators for tool use.
Great, "unbounded" isn't the same as "infinite", but in fact all physically realizable computers are bounded. There's a specific finite amount of tape available. You cannot in fact just go down to the store and buy any amount of tape you want. There isn't unlimited time either. Nor unlimited energy. Nor will the machine tolerate unlimited wear.
Yes, but that's not relevant to the definition of Turing equivalence/completeness/universality.
The question isn't if the specific computer at your hands can solve all Turing-computable problems, but rather if we had ...
Thinking about this, I think a generalized crux with John Wentworth et al is probably on how differently we see bureaucracies, and he sees them as terrible, whereas I see them as both quite flawed and has real problems, but are also wonderful tools to have that keeps the modern civilization's growth engine stable, and the thing that keeps the light on, so I see bureaucracies as way more important for civilization's success than John Wentworth believes.
One reason for this is a lot of the success cases of bureaucracies look like no news can be made, so success isn't obvious, whereas bureaucratic failure is obvious.
One very important caveat is that the new administration is very e/acc on AI, and is rather unwilling to consider even minimal touch regulations, especially on open source, so your asks will have to be very minimal on AI safety.
This is because ethics isn't science, it doesn't "hit back" when the AI is wrong. So an AI can honestly mix up human systematic flaws with things humans value, in a way that will get approval from humans precisely because it exploits those systematic flaws.
I'd say the main reason for this is that morality is relative, and much more importantly, morality is much, much more choosable than physics, which means that where it ends up is less determined than in the case of physics.
The crux IMO is that this sort of general failure mode is much more prone to it...
To be clear, I think the main flaw of a lot of anthropics in practice is ignoring other sources of evidence, and I suspect a lot of the problem really does boil down to conservation of expected evidence violations plus ignoring other, much larger sources of evidence.
On this:
...
Re AI coding, some interesting thoughts on this are from Ajeya Cotra's talks (short form, there are a lot of weaknesses, but the real-world programmer productivity is surprisingly high for coding tasks, but is very bad outside of coding tasks, which is why AI's impact is limited so far):
https://x.com/ajeya_cotra/status/1894821432854749456
https://x.com/ajeya_cotra/status/1895161774376436147
Re this:
And this is mostly where it'll stay unless AGI labs actually crack long-horizon agency/innovations; i. e., basically until genuine AGI is actually there.
...Prov
The AI does not make the meetings pass 10x faster, and that is where the senior developers spend a lot of time.
As far as what a plan to automate AI safety would work out in practice, assuming a relatively strong version of the concept is in this post below, and there will be another post that comes out by the same author talking more about the big risks discussed in the comments below:
https://www.lesswrong.com/posts/TTFsKxQThrqgWeXYJ/how-might-we-safely-pass-the-buck-to-ai
In general, I think the crux is that in most timelines (at a lower bound, 65-70%) that have AGI developed relatively soon (so timelines from 2030-2045, roughly), and the alignment problem isn't so...
How much probability do you assign to automating AI safety not working in time? Because I believe the preparing to automate AI safety work is probably the highest-value in pure ability to reduce X-risk probability, assuming it does work, so I assign much higher EV to automating AI safety, relative to other approaches.
More so that I'm imagining they might not even have heard of the argument, and it's helpful to note that people like Terence Tao, Timothy Gowers and more are all excellent people at their chosen fields, but most people that have a big impact on the world don't go into AI alignment.
Remember, superintelligence is not omniscience.
So I don't expect them to be self motivated to work on this specific problem without at least a little persuasion.
I'd expect a few superintelligent adults to join alignment efforts, but nowhere near thousands or tens of thousands, and I'd upper bound it at 300-500 new researchers at most in 15-25 years.
Much less impactful than automating AI safety.
The issue in this discourse, to me, is comparing this with AGI misalignment. It's conceptually related in some interesting ways, but in practical terms they're just extremely quantitatively different. And, naturally, I care about this specific non-comparability being clear because it says whether to do human intelligence enhancement; and in fact many people cite this as a reason to not do human IE.
Re human vs AGI misalignment, I'd say this is true, in that human misalignments don't threaten the human species, or even billions of people, whereas AI does,...
I'd argue quite a lot, though independent evidence could cause me to update me here, and a key reason for that is there is a plausible argument that a lot of the evidence for cultural learning/cultural practices written in the 1940s-1960s were fundamentally laundered to hide evidence of secret practices.
More generally, I was worried that such an obviously false claim implied a lot more hidden to me wrong claims that I couldn't test, so after spot-checking I didn't want to invest more time into an expensive search process.
...You mean to say that the human body was virtually “finished evolving” 200,000 years ago, thereby laying the groundwork for cultural optimization which took over form that point? Henrich’s thesis of gene-culture coevolution contrasts with this view and I find it to be much more likely to be true. For example, the former thesis posits that humans lost a massive amount of muscle strength (relative to, say, chimpanzees) over many generations and only once that process had been virtually “completed”, started to compensate by throwing rocks or making spears when
I'd probably bump that down to O(90%) at max, and this could get worse (I'm downranking based on the number of psychopaths/sociopaths and narcissists that exist).
I'd actually maybe agree with this, though with the caveat that there's a real possibility you will need a lot more selection/firepower as a human gets smarter, because you lack the ability to technically control humans in the way you can control AIs.
I'm saying that (waves hands vigorously) 99% of people are beneficent or "neutral" (like, maybe not helpful / generous / proactively kind, but not actively harmful, even given the choice) in both intention and in action. That type of neutral already counts as in a totally different league of being aligned compared to AGI.
I think this is ultimately the crux, at least relative to my values, I'd expect at least 20% in America to support active efforts to harm me or my allies/people I'm altruistic to, and do so fairly gleefully (an underrated example here i...
Re the recurrence/memory aspect, you might like this new paper which actually figured out how to use recurrent architectures to make a 1 minute Tom and Jerry cartoon video that was reasonably consistent, and in the tweet below, argues that somehow they managed to fix the training problems that come from training vanilla RNNs:
https://test-time-training.github.io/video-dit/assets/ttt_cvpr_2025.pdf
https://arxiv.org/abs/2407.04620
https://x.com/karansdalal/status/1810377853105828092 (This is the tweet I pointed to for the claim that they solved the issue of tra... (read more)