LESSWRONG
LW

All of Noosphere89's Comments + Replies

AI 2027: What Superintelligence Looks Like

Re the recurrence/memory aspect, you might like this new paper which actually figured out how to use recurrent architectures to make a 1 minute Tom and Jerry cartoon video that was reasonably consistent, and in the tweet below, argues that somehow they managed to fix the training problems that come from training vanilla RNNs:

https://test-time-training.github.io/video-dit/assets/ttt_cvpr_2025.pdf

https://arxiv.org/abs/2407.04620

https://x.com/karansdalal/status/1810377853105828092 (This is the tweet I pointed to for the claim that they solved the issue of tra... (read more)

4Vladimir_Nesov11h

Without AGI, scaling of hardware runs into the financial ~$200bn individual training system cost wall in 2027-2029. Any tribulations on the way (or conversely efforts to pool heterogeneous and geographically distributed compute) only delay that point slightly (when compared to the current pace of increase in funding), and you end up in approximately the same place, slowing down to the speed of advancement in FLOP/s per watt (or per dollar). Without transformative AI, anything close to the current pace is unlikely to last into the 2030s.

LLM AGI will have memory, and memory changes alignment

Noosphere894d64

Another reason for thinking that LLM AGI will have memory/state, conditional on AGI being built, is that it's probably the only blocker to something like drop-in remote workers being built, and from there escalating to AGI and ASI because it would allow for potentially unbounded meta-learning given unbounded resources, and even make meta-learning in general far more effective for longer time periods.

Gwern explains why meta-learning explains basically all of the baffling LLM weaknesses here, and the short version is that right now, LLM weights are frozen af... (read more)

2Seth Herd2d

That's very useful, thanks! That's exactly the argument I was trying to make here. I didn't use the term drop-in remote worker but that's the economic incentive I'm addressing (among more immediate ones- I think large incentives start long before you have a system that can learn any job). Lack of episodic memory looks to me like the primary reason LLMs have weaknesses humans do not. The other is a well-developed skillset for managing complex trains of thought. o1 and o3 and maybe the other reasoning models have learned some of that skillset but only mastered it in the narrow domains that allowed training on verifiable answers. Scaffolding and/or training for executive function (thought management) and/or memory seems poised to increase the growth rate of long time-horizon task performance. It's going to take some work still but I don't think it's wise to assume that the seven-month doubling period won't speed up, or that some point it will just jump to infinity, while the complexity of the necessary subtasks is still a limiting factor. Humans don't train on tons of increasingly long tasks, we just learn some strategies and some skills for managing our thought, like checking carefully whether a step has been accomplished, searching memory for task structure and where we're at in the plan if we lose our place, etc. Humans are worse at longer tasks, but any normal adult human can tackle a task of any length and at least keep getting better at it for as long as they decide to stick with it.

Changing my mind about Christiano's malign prior argument

Noosphere896d42

I have said something on this, and the short form is I don't really believe in Christiano's argument that the Solomonoff Prior is malign, because I think there's an invalid step in the argument.

The invalid step is where it is assumed that we can gain information about other potential civilization's values solely by the fact that we are in a simulation, and the key issue is since the simulation/mathematical multiverse hypotheses predict everything, this means we can gain no new information in a Bayesian sense.

(This is in fact the problem with the simulation... (read more)

2Cole Wyeth6d

I’ve also considered that objection (that no specific value predictions can be made) and addressed it implicitly in my list of demands on Adversaria, particularly “coordination” with any other Adversaria-like universes. If there is only one Adversaria-like universe then Solomonoff induction will predict its values, though in practice they may still be difficult to predict. Also, even if coordination fails, there may be some regularities to the values of Adversaria-like universes which cause them to “push in a common direction.”

Is instrumental convergence a thing for virtue-driven agents?

Noosphere896d20

My view is that the answer is still basically yes for instrumental convergence being a thing for virtue driven agents, if we condition on them being as capable as humans, because instrumental convergence is the reason general intelligence works at all:

https://www.lesswrong.com/posts/GZgLa5Xc4HjwketWe/instrumental-convergence-is-what-makes-general-intelligence

(That said, the instrumental convergence pressure could be less strong for virtues than for consequentialism, depending on details)

That said, I do think virtue ethics and dentology are relevant in AI s... (read more)

AI 2027: What Superintelligence Looks Like

Noosphere897d133

The real point is where capital investment into AI declines because the economy tips over into a mild recession, and I'd like to see whether the tariffs make it likely that future AI investments decrease over time, meaning the timeline to superintelligent AI gets longer.

7ryan_greenblatt7d

Sure, but note that the story "tariffs -> recession -> less AI investment" doesn't particularly depend on GPU tariffs!

AI 2027: What Superintelligence Looks Like

Noosphere897d112

I wanted to ask this question, but what do you think the impact of the new tariffs will do to your timelines?

In particular, there's a strange tariff for Taiwan where semiconductors are exempt, but the actual GPUs are not, for some reason, and the specific tariff for Taiwan is 32%.

I ask because I could plausibly see post-2030 timelines if AI companies can't buy many new chips because they are way too expensive due to the new tariffs all across the world.

ryan_greenblatt7d174

Shouldn't a 32% increase in prices only make a modest difference to training FLOP? In particular, see the compute forecast. Between Dec 2026 and Dec 2027, compute increases by roughly an OOM and generally it looks like compute increases by a bit less than 1 OOM per year in the scenario. This implies that a 32% reduction only puts you behind by like 1-2 months.

Davidmanheim's Shortform

Noosphere899d62

My own take is that I'm fairly sympathetic to the "LLMs are already able to get to AGI" view, with the caveat that most of the difference between human and LLM learning where humans are superior than LLMs comes from being able to do meta-learning over long horizons, and we haven't yet been shown this is possible for LLMs to do purely by scaling compute.

Indeed, I think it's the entire crux of the scaling hypothesis debate, in whether scale enables meta-learning over longer and longer time periods:

https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measur... (read more)

Recent AI model progress feels mostly like bullshit

Noosphere899d40

Gradient Updates has a post on this by Anson Ho and Jean-Stanislas Denain on why benchmarks haven't reflected usefulness, and a lot of the reason is that they underestimated AI progress and didn't really have an incentive to make benchmarks reflect realistic use cases:

https://epoch.ai/gradient-updates/the-real-reason-ai-benchmarks-havent-reflected-economic-impacts

Why do many people who care about AI Safety not clearly endorse PauseAI?

Answer by Noosphere89Mar 31, 202520

One particular reason that I haven't seen addressed very much in why I don't support/endorse PauseAI, beyond the usual objections, is that there probably aren't going to be that many warning shots that can actually affect policy, at least conditional on misalignment being a serious problem (which doesn't translate to >50% probability of doom), because the most likely takeover plan (at least assuming no foom/software intelligence explosion) fundamentally relies not on killing people, but on launching internal rogue deployments to sabotage alignment work ... (read more)

Will AI R&D Automation Cause a Software Intelligence Explosion?

Noosphere8911d20

For future work on the software intelligence explosion, I'd like to see 2 particular points focused on here, @Tom Davidson:

1 is estimating the complementarity issue, and more generally pinning down the rho number for software, because whether complementary or substitution effects dominate during the lead up to automating all AI R&D is a huge factor in whether an intelligence explosion is self-sustaining.

More from Tamay Besiroglu and Natalia Coelho here:

https://x.com/tamaybes/status/1905435995107197082

https://x.com/natalia__coelho/status/190615045630243... (read more)

Does the AI control agenda broadly rely on no FOOM being possible?

Noosphere8911d20

I agree that some inference compute can be shifted from capabilities to safety, and it work just as well even during a software intelligence explosion.

My worry was more so that a lot of the control agenda and threat models like rogue internal deployments to get more compute would be fundamentally threatened if the assumption that you had to get more hardware compute for more power was wrong, and instead a software intelligence explosion could be done that used in principle fixed computing power, meaning catastrophic actions to disempower humanity/defeat co... (read more)

Third-wave AI safety needs sociopolitical thinking

Noosphere8914d2-5

Some thoughts on this post:

You need adaptability because on the timeframe that you might build a company or start a startup or start a charity, you can expect the rest of the world to remain fixed. But on the timeframe that you want to have a major political movement, on the timeframe that you want to reorient the U.S. government's approach to AI, a lot of stuff is coming at you. The whole world is, in some sense, weighing in on a lot of the interests that have historically been EA's interests.

I'll flag that for AI safety specifically, the world hasn't yet... (read more)

2Chris_Leong13d

This is the right decision for most folk, but I expect the issue is more the opposite: we don't have enough folks treating this as their heroric responsibility.

Recent AI model progress feels mostly like bullshit

Noosphere8916d103

I'll say that one of my key cruxes on whether AI progress actually becomes non-bullshit/actually leading into an explosion is whether in-context learning/meta-learning can act as an effective enough substitute for human neuron weight neuroplasticity with realistic compute budgets in 2030, because the key reason why AIs have a lot of weird deficits/are much worse than humans at simple tasks is because after an AI is trained, there is no neuroplasticity in the weights anymore, and thus it can learn nothing more after it's training date unless it uses in-cont... (read more)

Recent AI model progress feels mostly like bullshit

Noosphere8916d143

lc has argued that the measured tasks are unintentionally biased towards ones where long-term memory/context length doesn't matter:

https://www.lesswrong.com/posts/hhbibJGt2aQqKJLb7/shortform-1#vFq87Ge27gashgwy9

Godzilla Strategies

Noosphere8918d20

I like your explanation of why normal reliability engineering is not enough, but I'll flag that security against actors are probably easier than LW in general portrays, and I think computer security as a culture is prone to way overestimating the difficulty of security because of incentive issues, not remembering the times something didn't happen, and more generally side-channels arguably being much more limited than people think they do (precisely because they rely on very specific physical stuff, rather than attacking the algorithm).

It's a non-trivial po... (read more)

Three Types of Intelligence Explosion

Noosphere8918d20

I have 2 cruxes here:

I buy Heinrich's theory far less than I used to, because Heinrich made easily checkable false claims that all point in the direction of culture being more necessary for human success.

In particular, I do not buy that humans and chimpanzees are nearly that similar as Heinrich describes, and a big reason for this is that the study that showed that had heavily optimized and selected the best chimpanzees against reasonably average humans, which is not a good way to compare performance if you want the results to generalize.

I don't think ... (read more)

Three Types of Intelligence Explosion

Noosphere8919d20

I agree evolution has probably optimized human learning, but I don't think that it's so heavily optimized that we can use it to give a tighter upper bound than 13 OOMs, and the reason for this is I do not believe that humans are in equilibrium, and this means that there are probably optimizations left to discover, so I do think the 13 OOMs number is plausible )with high uncertainty).

Comment below:

https://www.lesswrong.com/posts/DbT4awLGyBRFbWugh/#mmS5LcrNuX2hBbQQE

2PeterMcCluskey18d

The first year or two of human learning seem optimized enough that they're mostly in evolutionary equilibrium - see Henrich's discussion of the similarities to chimpanzees in The Secret of Our Success. Human learning around age 10 is presumably far from equilibrium. I'll guess that I see more of the valuable learning taking place in the first 2 years or so than do other people here.

I changed my mind about orca intelligence

Noosphere8920d30

I'll flag that while I personally didn't believe in the idea that orcas are on average >6 SDs smarter than humans, and never considered it that plausible, I'd say that I don't think orcas could actually benefit that much from +6 SDs even if applied universally, and the reason is that they are in water, which severely limits your available technology options, and makes it really, really hard to form the societies needed to generate the explosion that happened post-industrial Revolution or even the agricultural revolution.

And there is a deep local optimum... (read more)

A Bear Case: My Predictions Regarding AI Progress

Noosphere8920d5-1

My take is that the big algorithmic difference that explains a lot of weird LLM deficits, and plausibly explains the post's findings, is that current neural networks do not learn at run-time, instead their weights are frozen, and this explains a central difference of why humans are able to outperform LLMs at longer tasks, because humans have the ability to learn at run-time, as do a lot of other animals.

Unfortunately, this ability is generally lost gradually starting in your 20s, but still the existence of non-trivial learning at runtime is a huge explaine... (read more)

Mo Putera's Shortform

Noosphere8922d20

Re other theories, I don't think that all other theories in existence have infinitely many adjustable parameters, and if he's referring to the fact that lots of theories have adjustable parameters that can range over the real numbers, which are infinitely complicated in general, than that's different, and string theory may have this issue as well.

Re string theory's issue of being vacuous, I think the core thing that string theory predicts that other quantum gravity models don't is that at the large scale, you recover general relativity and the standard mod... (read more)

ryan_greenblatt's Shortform

Noosphere8924d42

That said, for the purposes of alignment, it's still good news that cats (by and large) do not scheme against their owner's wishes, and the fact that cats can be as domesticated as they are while they aren't cooperative or social is a huge boon for alignment purposes (within the analogy, which is arguably questionable).

TsviBT's Shortform

Noosphere8924d20

I basically don't buy the conjecture of humans being super-cooperative in the long run, or hatred decreasing and love increasing.

To the extent that something like this is true, I expect it to be a weird industrial to information age relic that utterly shatters if AGI/ASI is developed, and this remains true even if the AGI is aligned to a human.

3TsviBT24d

So just don't make an AGI, instead do human intelligence amplification.

Superintelligence's goals are likely to be random

Noosphere891mo00

To clarify a point here:

“Oh but physical devices can’t run an arbitrarily long tape”

This is not the actual issue.

The actual issue is that even with unbounded resources, you still couldn't simulate an unbounded tape because you can't get enough space for positional encodings.

Humans are not Turing-complete in some narrow sense;

Note that for the purpose of Turing-completeness, we only need to show that if we gave it unbounded resources, it could solve any computable problem without having to change the code, and we haven't actually proven that humans aren't Turing complete (indeed my big guess is that humans are Turing completeness).

0Mikhail Samin1mo

If you give a physical computer a large enough tape, or make a human brain large enough without changing its density, it collapses into a black hole. It is really not relevant to any of the points made in the post. For any set of inputs and outputs to an algorithm, we can make a neural network that approximates with arbitrary precision these inputs and outputs, in a single forward pass, without even having to simulate a tape. I sincerely ask people to engage with the actual contents of the post related to sharp left turn, goal crystallization, etc. and not with a technicality that doesn’t affect any of the points raised who they’re not an intended audience of.

Superintelligence's goals are likely to be random

[+]Noosphere891mo-6-2

2Mikhail Samin1mo

This seems pretty irrelevant to the points in question. To the extent there’s a way to look at the world, think about it, and take actions to generally achieve your goals, e.g., via the algorithms that humans are running, technically, a large enough neural network can do this in a single forward pass. We won’t make a neural network that in a single forward pass can iterate through and check all possible proofs of length <10^100 of some conjecture, but it doesn’t mean that we can’t make a generally capable AI system; and we also make CPUs large enough for it—but that doesn’t affect whether computers are Turing-complete in a bunch of relevant ways. “Any algorithm” being restricted to algorithms that, e.g., have some limited set of variables to operate on, is a technicality expanding on which wouldn’t affect the validity of the points made or implied in the post, so the simplification of saying “any algorithm” is not misleading to a reader who is not familiar with any of this stuff; and it’s in the section marked as worth skipping to people not new to LW, as it is not intended to communicate anything new to people who are not new to LW. In reality, empirically, we see that fairly small neural networks get pretty good at the important stuff. Like, “oh, but physical devices can’t run an arbitrarily long tape” is a point completely irrelevant to whether for anything that we can do, LLMs would be able to do this, and to the question of whether AI will end up killing everyone. Humans are not Turing-complete in some narrow sense; this doesn’t prevent us from being generally intelligent.

AI Control May Increase Existential Risk

Noosphere891mo20

IMO, the discontinuity that is sufficient here is that I expect societal responses to be discontinuous, rather than continuous, and in particular, I expect societal responses will come when people start losing jobs en masse, and at that point, either the AI is aligned well enough that existential risk is avoided, or the takeover has inevitably happened and we have very little influence over the outcome.

On this point:

Meaningful representative example in what class: I think it's representative in 'weird stuff may happen', not in we will get more teenage-in

... (read more)

G Gordon Worley III's Shortform

Noosphere891mo40

This is cruxy, because I don't think that noise/non-error freeness alone of your observations lead to bribing surveyors unless we add in additional assumptions about what that noise/non-error freeness is.

(in particular, simple IID noise/quantum noise likely doesn't lead to extremal Goodhart/bribing surveyors.)

More generally, the reason I maintain a difference between these 2 failure modes of goodharting, like regressional and extremal goodharting is because they respond differently to decreasing the error.

I suspect that in the limit of 0 error, regressiona... (read more)

Anthropic, and taking "technical philosophy" more seriously

Noosphere891mo42

My own take is I do endorse a version of the "pausing now is too late objection", more specifically I think that for most purposes, we should assume pauses are too late to be effective when thinking about technical alignment, and a big portion of the reason is that I don't think we will be able to convince many people that AI is powerful enough to need governance without them first hand seeing massive job losses, and at that point we are well past the point of no return for when we could control AI as a species.

In particular, I think Eliezer is probably vi... (read more)

Why I’m not a Bayesian

Noosphere891mo20

Re democratic countries overtaken by dictatorial countries, I think that this will only last until AI that can automate at least all white collar labor is achieved, and maybe even most blue collar physical labor well enough that human wages for those jobs decline below what you need to subsist on a human, and by then dictatorial/plutocratic countries will unfortunately come back as a viable governing option, and maybe even overtaking democratic countries.

So to come back to the analogy, I think VNM-rationality dictatorship is unfortunately common and conver... (read more)

are "almost-p-zombies" possible?

Answer by Noosphere89Mar 08, 202520

My guess is that the answer is also likely no, because the self-model is still retained to a huge degree, so p-zombies can't really exist without hugely damaging the brain/being dead.

I explain a lot more about the (IMO) best current model of how consciousness works in general, since I reviewed a post on this topic:

https://www.lesswrong.com/posts/FQhtpHFiPacG3KrvD/seth-explains-consciousness#7ncCBPLcCwpRYdXuG

1KvmanThinking1mo

Would that imply that there is a hard, rigid, and abrupt limit on how accurately you can predict the actions of a conscious being without actually creating a conscious being? And if so, where is this limit?

The Dilemma’s Dilemma

Noosphere891mo20

I was implicitly assuming a closed system here, to be clear.

The trick that makes the game locally positive sum is that the earth isn't a closed system relative to the sun, and when I said globally I was referring to the entire accessible universe.

Thinking about that though, I now think this is way less relevant except on extremely long timescales, but the future may be dominated by very long-term people, so this does matter again.

Can a finite physical device be Turing equivalent?

Noosphere891mo30

I think I understand the question now.

I actually agree that if we assume that there's a finite maximum of atoms, we could in principle reformulate the universal computer as a finite state automaton, and if we were willing to accept the non-scalability of a finite state automaton, this could actually work.

The fundamental problem is that now we would have software that only works up to a specified memory limit, because we essentially burned the software into the hardware of the finite automaton and if you are ever uncertain of how much memory or time a probl... (read more)

Can a finite physical device be Turing equivalent?

Noosphere891mo20

Notably, this is why we focus on the arbitrarily large memory and time case, where we can assume that the machine has arbitrarily large memory and time to work with.

The key question here is whether a finite physical computer can always be extended with more memory and time without requiring us to recode the machine into a different program/computer, and most modern computers can do this (modulo physical issues of how you integrate more memory and time).

In essence, the key property of modern computers is that the code/systems descriptor doesn't change if we add more memory and time, and this is the thing that leads to Turing-completeness if we allow unbounded memory and time.

1[comment deleted]1mo

Can a finite physical device be Turing equivalent?

Noosphere891mo20

Notably, this was exactly the sort of belief I was trying to show is false, and your observation about the physical universe does not matter for the argument I made here, because the question is whether with say 2^1000000 atoms, you can solve larger problem sizes with the same code, and Turing-complete systems say yes to the question.

In essence, it's a question of whether we can scale our computers with more memory and time without having to change the code/algorithms, and basically all modern computers can do this in theory.

I think a much more interesti

... (read more)

1β-redex1mo

Please point out if there is a specific claim I made in my comment that you believe to be false. I said that "I don't think a TC computer can ever be built in our universe.", which you don't seem to argue with? (If we assume that we can only ever get access to a finite number of atoms. If you dispute this I won't argue with that, neither of us has a Theory of Everything to say for certain.) Just to make precise why I was making that claim and what it was trying to argue against, take this quote from the post: I dropped the "finite computer" constraint and interpreted the phrase "real world" to mean that "it can be built in our universe", this is how I arrived at the "a TC computer can be built in our universe" statement, which I claimed was false.

Can a finite physical device be Turing equivalent?

Noosphere891mo20

Every Turing machine definition I've ever seen says that the tape has to be truly unbounded. How that's formalized varies, but it always carries the sense that the program doesn't ever have to worry about running out of tape. And every definition of Turing equivalence I've ever seen boils down to "can do any computation a Turing machine can do, with at most a bounded speedup or slowdown". Which means that programs on Turing equivalent computer must not have to worry about running out of storage.

You can't in fact build a computer that can run any arbitrary

Noosphere891mo60

Nowadays, I think the main reason humans took off is because human hands were extremely suited for tool use and being at range, which means that there is a selection effect at both the genetic level for more general intelligence and a selection effect on cultures for more cultural learning, and animals just mostly lack this by default, meaning that their intelligence is way less relevant than their lack of good actuators for tool use.

Can a finite physical device be Turing equivalent?

Noosphere891mo40

Great, "unbounded" isn't the same as "infinite", but in fact all physically realizable computers are bounded. There's a specific finite amount of tape available. You cannot in fact just go down to the store and buy any amount of tape you want. There isn't unlimited time either. Nor unlimited energy. Nor will the machine tolerate unlimited wear.

Yes, but that's not relevant to the definition of Turing equivalence/completeness/universality.

The question isn't if the specific computer at your hands can solve all Turing-computable problems, but rather if we had ... (read more)

2jbash1mo

Every Turing machine definition I've ever seen says that the tape has to be truly unbounded. How that's formalized varies, but it always carries the sense that the program doesn't ever have to worry about running out of tape. And every definition of Turing equivalence I've ever seen boils down to "can do any computation a Turing machine can do, with at most a bounded speedup or slowdown". Which means that programs on Turing equivalent computer must not have to worry about running out of storage. You can't in fact build a computer that can run any arbitrary program and never run out of storage. One of the explicitly stated conditions of the definition is not met. How is that not relevant to the definition? Your title says "finite physical device". Any finite physical device (or at least any constructible finite physical device) can at least in principle be "the specific computer at your hands". For a finite physical device to be Turing equivalent, there would have to be a specific finite physical device that actually was Turing-equivalent. And no such device can ever actually be constructed. In fact no such device could exist even if it popped into being without even having to be constructed. I don't think that is the question, and perhaps more importantly I don't think that's an interesting question. You don't have that ability, you won't get that ability, and you'll never get close enough that it's practical to ignore the limitation. So who cares? ... and if you're going to talk in terms of fundamental math definitions that everybody uses, I think you have to stick to what they conventionally mean. Lisp is obviously Turing-complete. Any Lisp interpreter actually realized on any finite physical computer isn't and can't ever be. If you keep sticking more and more cells onto a list, eventually the Lisp abstraction will be violated by the program crashing with an out-of-memory error. You can't actually implement "full Lisp" in the physical world. OK, it's possib

A Bear Case: My Predictions Regarding AI Progress

Noosphere891mo61

Thinking about this, I think a generalized crux with John Wentworth et al is probably on how differently we see bureaucracies, and he sees them as terrible, whereas I see them as both quite flawed and has real problems, but are also wonderful tools to have that keeps the modern civilization's growth engine stable, and the thing that keeps the light on, so I see bureaucracies as way more important for civilization's success than John Wentworth believes.

One reason for this is a lot of the success cases of bureaucracies look like no news can be made, so success isn't obvious, whereas bureaucratic failure is obvious.

The Milton Friedman Model of Policy Change

Noosphere891mo20

One very important caveat is that the new administration is very e/acc on AI, and is rather unwilling to consider even minimal touch regulations, especially on open source, so your asks will have to be very minimal on AI safety.

For scheming, we should first focus on detection and then on prevention

Noosphere891mo2-1

This is because ethics isn't science, it doesn't "hit back" when the AI is wrong. So an AI can honestly mix up human systematic flaws with things humans value, in a way that will get approval from humans precisely because it exploits those systematic flaws.

I'd say the main reason for this is that morality is relative, and much more importantly, morality is much, much more choosable than physics, which means that where it ends up is less determined than in the case of physics.

The crux IMO is that this sort of general failure mode is much more prone to it... (read more)

4Charlie Steiner1mo

I agree that in some theoretical infinite-retries game (that doesn't allow the AI to permanently convince the human of anything), scheming has a much longer half-life than "honest" misalignment. But I'd emphasize your paranthetical. If you use a misaligned AI to help write the motivational system for its successor, or if a misaligned AI gets to carry out high-impact plans by merely convincing humans they're a good idea, or if the world otherwise plays out such that some AI system rapidly accumulates real-world power and that AI is misaligned, or if it turns out you iterate slowly and AI moves faster than you expected, you don't get to iterate as much as you'd like.

Re Hanson's Grabby Aliens: Humanity is not a natural anthropic sample space

Noosphere891mo20

To be clear, I think the main flaw of a lot of anthropics in practice is ignoring other sources of evidence, and I suspect a lot of the problem really does boil down to conservation of expected evidence violations plus ignoring other, much larger sources of evidence.

On this:

there’s no way to have more than 100% probability mass, and hence no way to cheat so that any outcome can count as “confirmation” of your theory. Under Bayesian law, play money may not be counterfeited; you only have so much clay. Unfortunately, human beings are not Bayesians. Human bei

Answer by Noosphere89Mar 04, 2025124

Re AI coding, some interesting thoughts on this are from Ajeya Cotra's talks (short form, there are a lot of weaknesses, but the real-world programmer productivity is surprisingly high for coding tasks, but is very bad outside of coding tasks, which is why AI's impact is limited so far):

https://x.com/ajeya_cotra/status/1894821432854749456

https://x.com/ajeya_cotra/status/1895161774376436147

Re this:

And this is mostly where it'll stay unless AGI labs actually crack long-horizon agency/innovations; i. e., basically until genuine AGI is actually there.

Prov

... (read more)

Viliam1mo122

The AI does not make the meetings pass 10x faster, and that is where the senior developers spend a lot of time.

How to Make Superbabies

Noosphere892mo20

As far as what a plan to automate AI safety would work out in practice, assuming a relatively strong version of the concept is in this post below, and there will be another post that comes out by the same author talking more about the big risks discussed in the comments below:

https://www.lesswrong.com/posts/TTFsKxQThrqgWeXYJ/how-might-we-safely-pass-the-buck-to-ai

In general, I think the crux is that in most timelines (at a lower bound, 65-70%) that have AGI developed relatively soon (so timelines from 2030-2045, roughly), and the alignment problem isn't so... (read more)

How to Make Superbabies

Noosphere892mo02

How much probability do you assign to automating AI safety not working in time? Because I believe the preparing to automate AI safety work is probably the highest-value in pure ability to reduce X-risk probability, assuming it does work, so I assign much higher EV to automating AI safety, relative to other approaches.

1kman2mo

I think I'm at <10% that non-enhanced humans will be able to align ASI in time, and if I condition on them succeeding somehow I don't think it's because they got AIs to do it for them. Like maybe you can automate some lower level things that might be useful (e.g. specific interpretability experiments), but at the end of the day someone has to understand in detail how the outcome is being steered or they're NGMI. Not sure exactly what you mean by "automating AI safety", but I think stronger forms of the idea are incoherent (e.g. "we'll just get AI X to figure it all out for us" has the problem of requiring X to be aligned in the first place).

How to Make Superbabies

Noosphere892mo20

More so that I'm imagining they might not even have heard of the argument, and it's helpful to note that people like Terence Tao, Timothy Gowers and more are all excellent people at their chosen fields, but most people that have a big impact on the world don't go into AI alignment.

Remember, superintelligence is not omniscience.

So I don't expect them to be self motivated to work on this specific problem without at least a little persuasion.

I'd expect a few superintelligent adults to join alignment efforts, but nowhere near thousands or tens of thousands, and I'd upper bound it at 300-500 new researchers at most in 15-25 years.

Much less impactful than automating AI safety.

3kman2mo

I don't think this will work.

How to Make Superbabies

Noosphere892mo50

The issue in this discourse, to me, is comparing this with AGI misalignment. It's conceptually related in some interesting ways, but in practical terms they're just extremely quantitatively different. And, naturally, I care about this specific non-comparability being clear because it says whether to do human intelligence enhancement; and in fact many people cite this as a reason to not do human IE.

Re human vs AGI misalignment, I'd say this is true, in that human misalignments don't threaten the human species, or even billions of people, whereas AI does,... (read more)

4TsviBT2mo

Right, ok, agreed. I agree qualitatively, but I do mean to say he's in charge of Germany, but somehow has hours of free time every day to spend with the whisperer. If it's in childhood I would guess you could do it with a lot less contact, though not sure. TBC, the whisperer here would be considered a world-class, like, therapist or coach or something, so I'm not saying it's easy. My point is that I have a fair amount of trust in "human decision theory" working out pretty well in most cases in the long run with enough wisdom. I even think something like this is worth trying with present-day AGI researchers (what I call "confrontation-worthy empathy"), though that is hard mode because you have so much less access.

The Dilemma’s Dilemma

Noosphere892mo20

I'd argue quite a lot, though independent evidence could cause me to update me here, and a key reason for that is there is a plausible argument that a lot of the evidence for cultural learning/cultural practices written in the 1940s-1960s were fundamentally laundered to hide evidence of secret practices.

More generally, I was worried that such an obviously false claim implied a lot more hidden to me wrong claims that I couldn't test, so after spot-checking I didn't want to invest more time into an expensive search process.

The present perfect tense is ruining your life

Noosphere892mo20

You mean to say that the human body was virtually “finished evolving” 200,000 years ago, thereby laying the groundwork for cultural optimization which took over form that point? Henrich’s thesis of gene-culture coevolution contrasts with this view and I find it to be much more likely to be true. For example, the former thesis posits that humans lost a massive amount of muscle strength (relative to, say, chimpanzees) over many generations and only once that process had been virtually “completed”, started to compensate by throwing rocks or making spears when

... (read more)

How to Make Superbabies

Noosphere892mo20

I'd probably bump that down to O(90%) at max, and this could get worse (I'm downranking based on the number of psychopaths/sociopaths and narcissists that exist).

How to Make Superbabies

Noosphere892mo20

I'd actually maybe agree with this, though with the caveat that there's a real possibility you will need a lot more selection/firepower as a human gets smarter, because you lack the ability to technically control humans in the way you can control AIs.

How to Make Superbabies

Noosphere892mo117

I'm saying that (waves hands vigorously) 99% of people are beneficent or "neutral" (like, maybe not helpful / generous / proactively kind, but not actively harmful, even given the choice) in both intention and in action. That type of neutral already counts as in a totally different league of being aligned compared to AGI.

I think this is ultimately the crux, at least relative to my values, I'd expect at least 20% in America to support active efforts to harm me or my allies/people I'm altruistic to, and do so fairly gleefully (an underrated example here i... (read more)

9TsviBT2mo

Ok... so I think I understand and agree with you here. (Though plausibly we'd still have significant disagreement; e.g. I think it would be feasible to bring even Hitler back and firmly away from the death fever if he spent, IDK, a few years or something with a very skilled listener / psychic helper.) The issue in this discourse, to me, is comparing this with AGI misalignment. It's conceptually related in some interesting ways, but in practical terms they're just extremely quantitatively different. And, naturally, I care about this specific non-comparability being clear because it says whether to do human intelligence enhancement; and in fact many people cite this as a reason to not do human IE.