LESSWRONG
LW

Noosphere89
3685Ω1545215817
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
An Opinionated Guide to Computability and Complexity
2Noosphere89's Shortform
3y
48
Noosphere89's Shortform
Noosphere896mo33

Link to long comments that I want to pin, but are too long to be pinned:

https://www.lesswrong.com/posts/Zzar6BWML555xSt6Z/?commentId=aDuYa3DL48TTLPsdJ

https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/?commentId=Gcigdmuje4EacwirD

Reply
Foom & Doom 1: “Brain in a box in a basement”
Noosphere8913h20

OK, imagine (for simplicity) that all humans on Earth drop dead simultaneously, but there’s a John-von-Neumann-level AI on a chip connected to a solar panel with two teleoperated robots. Every time they scavenge another chip and solar cell, there becomes another human-level AI copy. Every time a robot builds another teleoperated robot from scavenged parts, there’s that too. What exactly is going to break in “weeks or months”? Solar cells can work for 30 years, no problem. GPUs are also reported to last for decades. (Note that, as long as GPUs are a non-renewable resource, the AI would presumably take extremely good care of them, keeping them dust-free, cooling them well below the nominal temperature spec, etc.) The AI can find decent GPUs in every house on the street, and I think hundreds of millions more by breaking into big data centers. Similar for solar panels. If one robot breaks, another robot can repair it. Janky teleoperated robots without fingers made by students for $20K can vacuum, make coffee, cook a meal, etc. Competent human engineers can make pretty impressive mechanical hands using widely-available parts. I grant that it would take a long while before the growing AI clone army could run a semiconductor supply chain by itself, but it has all the time in the world. I expect it to succeed, and thus to sustain itself into the indefinite future, and I’m confused why you don’t. (Or maybe you do and I’m misunderstanding.)

BTW I also think that a minimal semiconductor supply chain would be very very much simpler than the actual semiconductor supply chain that exists in our human world, which has been relentlessly optimized for cost, not simplicity. For example, EBL (e-beam lithography) has better resolution than EUV and is a zillion times easier to build, but the human economy would never support building out km²-scale warehouses full of millions of EBL machines to compensate for their crappy throughput. But for an AI bootstrapping its way back up, why not?

The key trouble is all the power generators that sustain the AI would break within weeks or months, and the issue is even if they could build GPUs, they'd have no power to run them within at most 2 weeks:

https://www.reddit.com/r/ZombieSurvivalTactics/comments/s6augo/comment/ht4iqej/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

https://www.reddit.com/r/explainlikeimfive/comments/klupbw/comment/ghb0fer/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

Realistically, we are looking at power grid collapses within days.

And without power, none of the other building projects could work, because they'd stop receiving energy, and importantly this means the AI is on a tight timer, and some of this is partially due to expectations that the first transformative useful AI will use more compute than you project, even conditional on a different paradigm being introduced like brain-like AGIs, but another part of my view is that this is just one of many examples where humans need to constantly maintain stuff in order for the stuff to work, and if we don't assume tech that can just solve logistics is available within say 1 year, it will take time for AIs to actually survive without humans, and this time is almost certainly closer to months or years than weeks or days.

The hard part of AI takeover isn't killing all humans, it's in automating enough of the economy (including developing tech like nanotech) such that the humans stop mattering, and while AIs can do this, it takes actual time, and that time is really valuable in fast moving scenarios.

I’m confused about other parts of your comment as well. Joseph Stalin was able to use his (non-superhuman) intelligence and charisma to wind up in dictatorial control of Russia. What’s your argument that an AI could not similarly wind up with dictatorial control over humans? Don’t the same arguments apply? “If we catch the AI trying to gain power in bad ways, we’ll shut it down.” “If we catch Stalin trying to gain power in bad ways, we’ll throw him in jail.” But the latter didn’t happen. What’s the disanalogy, from your perspective?

I didn't say AIs can't take over, and I very critically did not say that AI takeover can't happen in the long run.

I only said AI takeover isn't trivial if we don't assume logistics are solvable.

But to deal with the Stalin example, the answer for how he took over was basically that he was willing to wait a long time, and in particular he used both persuasion and the fact that he already had a significant amount of power by having the General Secretary, and his takeover was basically by allying with loyalists and in particular strategically breaking alliances that he had made, and violence was used later on to show that no one was safe from him.

Which is actually how I expect successful AI takeover to happen in practice, if it does happen.

Very importantly, Stalin didn't need to create an entire civilization out of nothing, or nearly nothing, and other people like Trotsky handled the logistics, though the takeover situation was far more preferable to the communist party as they both had popular support and didn't have as long supply lines as the opposition forces like the Whites did, and they had a preexisting base of industry that was much easier to seize than modern industries.

This applies to most coups/transitions of power in that most of the successful coups aren't battles between factions, but rather one group managing to make itself the new Schelling point over other groups.

@Richard_Ngo explains more below:

https://www.lesswrong.com/posts/d4armqGcbPywR3Ptc/power-lies-trembling-a-three-book-review#The_revolutionary_s_handbook

Most of my commentary in the last comment is either arguing that things can be made more continuous and slow than your story depicts, or arguing that your references don't support what you claimed, and I did say that the cyberattack story is plausible, just that it didn't support the idea that AIs could entirely replace civilization without automating away us first, which takes time.

This doesn't show AI doom can't happen, but it does matter for the probability estimates of many LWers on here, because it's a hidden background assumption disagreement that underlies a lot of other disagreements.

Reply
Applying right-wing frames to AGI (geo)politics
Noosphere892d134

I think the key issue for liberalism under AGI/ASI is that AGI/ASI makes value alignment matter way, way more to a polity, and in particular you cannot get a polity to make you live under AGI/ASI if the AGI/ASI doesn't want you to live, because you are economically useless.

Liberalism's goal is to avoid the value alignment question, and to mostly avoid the question of who should control society, but AGI/ASI makes the question unavoidable for your basic life.

Indeed, I think part of the difficulty of AI alignment is lots of people have trouble realizing that the basic things they take for granted under the current liberal order would absolutely fall away if AIs didn't value their lives intrinisically, and had selfish utility functions.

The goal of liberalism is to make a society where vast value differences can interact without negative/0-sum conflict and instead trade peacefully, but this is not possible once we create a society where AIs can do all the work without human labor being necessary.

I like Vladimir Nesov's comment, and while I have disagreements, they're not central to his point, and the point still works, just in amended form:

https://www.lesswrong.com/posts/Z8C29oMAmYjhk2CNN/non-superintelligent-paperclip-maximizers-are-normal#FTfvrr9E6QKYGtMRT

Reply
Daniel Kokotajlo's Shortform
Noosphere893d40

Flag, but the chain of thought monitoring doesn't give much evidence that AIs will be motivated by reward.

See @TurnTrout's comment for why:

https://www.lesswrong.com/posts/7wFdXj9oR8M9AiFht/?commentId=5wMRY3NMYsoFnmCrJ

The anthropic talk and the anthropic reporting that they had to reduce the rate of reward hacking is better evidence for reward is the optimization target, because if reward wasn't the optimization target, then you could have relatively imperfect environments without having the reward hack rate be as high as it is such that they'd be motivated to reward hack.

IMO one important experiment in this direction is figuring out whether LLMs are capable enough to model how they get reward/modeling the reward, and even more ambitiously try to figure out whether they try to model rewards given in training by default.

The capability experiment doesn't imply LLMs will do this naturally, but the capability experiment is necessary to get any traction on the issue at all.

My medium confidence guess of @TurnTrout's beliefs around reward not being the optimization target is that by default, RL trained agents won't try to model how they get the reward/model the reward process, and if we made this assumption, a lot of the dangers of RL would definitely be circumventable pretty easily.

And I'd guess you assume that models will try to model the reward process by default.

Reply
Foom & Doom 1: “Brain in a box in a basement”
Noosphere893d20

Right, there’s a possible position which is: “I’ll accept for the sake of argument your claim there will be an egregiously misaligned ASI requiring very little compute (maybe ≲1 chip per human equivalent including continuous online learning), emerging into a world not terribly different from today’s. But even if so, that’s OK! While the ASI will be a much faster learner than humans, it will not magically know things that it has no way to have figured out (§1.8.1), and that includes developing nanotechnology. So it will be reliant on humans and human infrastructure during a gradual process.”

Basically this, and in particular I'm willing to grant the premise that for the sake of argument there is technology that eliminates the need for most logistics, but that all such technology will take at least a year or more of real-world experimentation that means that the AI can't immediately take over.

On this:

I’m not an ASI and haven’t thought very hard about it, so my strategies might be suboptimal, but for example it seems to me that an ASI could quite rapidly (days or weeks not months) earn or steal tons of money, and hack into basically every computer system in the world (even APT groups are generally unable to avoid getting hacked by other APT groups!), and then the AI (which now exists in a zillion copies around the world) can get people around the world to do whatever it wants via hiring them, bribing them, persuading them, threatening them, tricking them, etc.

And what does it get the people to do? Mainly “don’t allow other ASIs to be built” and “do build and release novel pandemics”. The latter should be pretty quick—making pandemics is worryingly easy IIUC (see Kevin Esvelt). If infrastructure and the electric grid starts going down, fine, the AI can rebuild, as long as it has at least one solar-cell-connected chip and a teleoperated robot that can build more robots and scavenge more chips and solar panels (see here), and realistically it will have many of those spread all around.

I think the entire crux is that all of those robots/solar cell chips you referenced currently depend on human industry/modern civilization to actually work, and they'd quickly degrade and become non-functional on the order of weeks or months if modern civilization didn't exist, and this is arguably somewhat inevitable due to economics (until you can have tech that obviates the need for long supply chains).

And in particular, in most takeover scenarios where AIs don't automate the economy first, I don't expect AIs to be able to keep producing robots for a very long time, and I'd bump it up to 300-3,000 years at minimum because there is less easily accessible resources combined with AIs being much less capable due to having very little compute relative to modern civilization.

In particular, I think that disrupting modern civilization to a degree such that humans are disempowered (assuming no tech that obviates the need for logistics) pretty much as a consequence breaks the industries/logistics needed to fuel further AI growth, because there's no more trade, which utterly fucks up modern economies.

And your references argue that human civilization wouldn't go extinct very soon because of civilizational collapse, and that AIs can hack existing human industry to help them, and I do think this is correct (modulo the issue that defense is easier than offense for the cybersecurity realm specifically, and importantly, a key reason for this is that once you catch the AI doing it, there are major consequences for AIs and humans, which actually matter for AI safety):

https://x.com/MaxNadeau_/status/1912568930079781015

The important thing is that this paper rocks and I'd love to see bigger/better versions of it, but I'll use it as a jumping-off point for a take: The headline result is that their best defenses work 93% of the time. Is that anything? If you've been reading Simon Willison (whose blog I think is great!), you might say "that’s useless, because in application security 99% is a failing grade" (https://simonwillison.net/2025/Apr/11/camel/#the-best-part-is-it-doesn-t-use-more-ai). And no wonder, because the techniques in this paper are just variations of "solve AI security problems with more AI", which Simon has emphasized are a dead end (https://simonwillison.net/2022/Sep/17/prompt-injection-more-ai/). So why are the Redwood folks rosier about 93%? The crux here is whether your threat model assumes attackers can try over and over again. Simon is assuming they can, in which case 93% would be useless, but Redwood folks are assuming that there are major costs to the attacker trying and failing (https://lesswrong.com/posts/i2nmBfCXnadeGmhzW/catching-ais-red-handed) This assumption is not at all unique to misalignment threats. It's the same assumption that Debenedetti, Carlini, and Tramer make here (https://arxiv.org/abs/2306.02895), which they call evasion "without breaking eggs". I think the right way to model a variety of security problems, e.g. insider threats from employees. One's assumptions about the costs of attacker failure have huge implications for which security measures look helpful, and I think this is an important factor to mentally track in conversations about these topics.

(It talks about a control technique but the discussion easily transfers outside of the example).

I actually agree cyber-attacks to subvert human industry are a threat and are worth keeping in mind, but none of your references support the idea that AIs can keep going without modern civilization's logistics, and I think people vastly underestimate how necessary modern civilization logistics are to support industry, and how fragile they are to even somewhat minor disruptions, let alone the disruptions that would follow after takeover (assuming it doesn't already have sufficient resources to be self-sustaining).

There are other possibilities too, but hopefully that’s suggestive of “AI doom doesn’t require zero-shot designs of nanotech” (except insofar as viruses are arguably nanotech).

I agree with this, but fairly critically I do think it actually matters quite a lot for AI strategy purposes if we don't assume AIs can quickly rebuild stuff/can obviate logistics through future tech quickly, and it matters pretty greatly to a lot of people's stories of doom, even if AIs can doom us through just trying to hijack modern civilization and then wait for humans to automate themselves away and then once humans have been fully cut out of the loop and AIs can self-sustain an economy without us, bioweapons are used to attack humans, it matters that we have time.

This makes AI control protocols for example a lot more effective, because we can assume that independent AIs outside of the central servers of stuff like Deepmind won't be able to affect things much.

Oh, I guess we also disagree RE “currently we don't have the resources outside of AI companies to actually support a superintelligent AI outside the lab, due to interconnect issues”. I expect future ASI to be much more compute-efficient. Actually, even frontier LLMs are extraordinarily expensive to train, but if we’re talking about inference rather than training, the requirements are not so stringent I think, and people keep working on it.

I actually do expect future AIs to be more compute efficient, but I think that at the point where superintelligent AIs can support themselves purely based off of stuff like personal computers, all control of the situation is lost and either the AIs are aligned and grant us a benevolent personal utopia, or they're misaligned and we are extinct/mostly dead.

So the limits of computational/data efficiency being very large don't matter much for the immediate situation on AI risk.

The point of no return happens earlier than this, and the reason is that even in a future where imitation learning/LLMs do not go all the way to AGI in practice and must have something more brain-like like continuous learning and long-term memories, that imitation learning continues to be useful and will be used by AIs, and there's a very important difference between imitation learning alone not scaling all the way to AGI and imitation learning not being useful at all, and I think LLMs provide good evidence that imitation is surprisingly useful even if it doesn't scale to AGI.

I think a general worldview clash is that I tend to think technological change is mostly driven by early prototypes that at first are pretty inefficient, and there require many changes to get the system to become more efficient, and while there are thresholds of usefulness for the AI case, change operates more continuously than people think.

Finally, we have good reason to believe that the human range is actually pretty large, such that AIs do take a noticeable amount of time from being human level to being outright superintelligent:

  • There could also be another reason for why non-imitation-learning approaches could spend a long while in the human range. Namely: Perhaps the human range is just pretty large, and so it takes a lot of gas to traverse. I think this is somewhat supported by the empirical evidence, see this AI impacts page (discussed in this SSC).
Reply
Paradigms for computation
Noosphere894d20

but it doesn't explain the failure of the Penrose-Lucas argument.

The failure of the Penrose-Lucas argument is that Godel's incompleteness theorem doesn't let you derive the conclusion he derived, because it only implies that you cannot use a computably enumerable set of axioms to make all of mathematics sound and complete, and critically this doesn't mean you cannot automate a subset of mathematics that is relevant.

There's an argument really close to this with the Chinese Room where I pointed out that intuitions from our own reality that include lots of constraints fundamentally fail to transfer over to hypotheticals, and this is a really important example of why arguments around AI need to actually attend to the constraints that are relevant in specific worlds, because without them it's trivial to have strong AI solve any problem:

https://www.lesswrong.com/posts/zxLbepy29tPg8qMnw/refuting-searle-s-wall-putnam-s-rock-and-johnson-s-popcorn#wbBQXmE5aAfHirhZ2

Really, a lot of the issues with the arguments against strong AI made by philosophers is that they have no sense of scale/sense of what mathematical theorems are actually saying, and thus fail to understand what's actually been said, combined with way overextrapolating their intuitions into cases where the intuitions have been deliberately made to fail to work.

While their weights are frozen and they don't really do continual learning, they do online learning.

While I agree In-context learning does give them some form of online learning, which at least partially explains why LLMs succeed (combined with their immense amount of data and muddling through extremely data inefficient algorithms compared to brains, which is a known weakness that could plausibly lead to the death of pure LLM scaling by 2028-2030, though note that doesn't necessarily mean timelines get that much longer), this currently isn't enough to automate lots of jobs away, and fairly critically might not be good enough in practice with realistic compute and data constraints to compete with better continual learning algorithms.

To be clear, this doesn't mean any future paradigm will be more understandable, because they will use more online/continual learning than current LLMs.

Well, it's not a key assumption of mine :)

I mostly agree with your observation, but I also think it's funny when people say this kind of thing to me: I've been reading lesswrong for like an hour a day for years (maybe, uh, more than an hour if I'm being honest...), and I produce a non-negligible fraction of the rationality/epistemics content. So - the fact that this is sometimes taken as an assumption on lesswrong doesn't seem to move me very much - I wonder if you mean to accept or to question this assumption? Anyway, I don't mean this as a criticism of your raising the point.  

Physics aside, I think that when you take indexical complexity into account, it's not true.

I'm mostly in the "pointing out the assumption exists and is used widely", rather than questioning or accepting it, because I wanted to understand how MIRI could believe simple algorithms that learn generally and have high performance existing despite the theorem, and this was the first thing that came to my mind.

And while AIXI is a useful toy model of how intelligence works, it's treatment of indexical complexity/the first person view as being privileged is an area where I seriously start to departure from it, and one of the biggest reasons I'm much more of a fan of IBP than AIXI is that it actually makes a serious effort to actually remove the first-person privilege often seen in many frameworks of intelligence.

And this actually helps us to pretty much entirely defuse the Solomonoff prior's potential malignness, because we no longer have immense probability mass on malignant simulation hypotheses, and the simulation hypotheses that do get considered can't trick the IBP agent into changing it's values.

So the indexical complexity isn't actually relevant here, thankfully (though this doesn't guarantee that the world is truly low complexity in the sense necessary for LW's view of simple, general high performing algorithms to actually work).

And the nice thing is that it no longer requires montonic preferences:

https://www.lesswrong.com/posts/dPmmuaz9szk26BkmD/vanessa-kosoy-s-shortform?commentId=kDaAdjc3YbnuS2ssp

https://www.lesswrong.com/posts/DobZ62XMdiPigii9H/non-monotonic-infra-bayesian-physicalism

Reply
Foom & Doom 1: “Brain in a box in a basement”
Noosphere895d40

I think the most important crux around takeoff speeds discussions, other than how fast AI can get smarter without more compute, is how much we should expect superintelligence to be meaningfully hindered by logistics issues by default.

In particular, assuming the existence of nanotech as Drexler envisioned would mostly eliminate the need for long supply chains, and would allow forces to be supplied entirely locally through a modern version of living off the land.

This is related to prior takeoff speeds discussions, as even if we assume the existence of technology that mostly eliminates logistical issues, it might be too difficult to develop in a short enough time to actually matter for safety-relevance.

I actually contend that a significant (though not all) of the probability of doom from AI risk fundamentally relies on the assumption that superintelligence can fundamentally trivialize the logistics cost of doing things, especially on actions which require long supply lines like war, because if we don't assume this, then takeover is quite a lot harder and has much more cost for the AI, meaning stuff like AI control/dealmaking has a much higher probability of working, because the AI can't immediately strike on it's own, and needs to do real work on acquiring physical stuff like getting more GPUs, or more robots for an AI rebellion.

Indeed, I think the essential assumption of AI control is that AIs can't trivialize away logistics costs by developing tech like nanotech, because this means their only hope of actually getting real power is by doing a rogue insider deployment, because currently we don't have the resources outside of AI companies to actually support a superintelligent AI outside the lab, due to interconnect issues.

I think a core disagreement with people that are much more doomy than me, like Steven Byrnes or Eliezer Yudkowsky and Nate Soares, is probably due to me thinking that conditional on such tech existing, I think it almost certainly requires way more time and experimentation than stuff like AlphaZero/AlphaGo or game-playing AI tends to imply (I think the game playing AIs had fundamental advantages like access to ground-truth reward that could be unlimitedly mined for data, and importantly there is ~0 cost to failure of experimentation, which is very different to most other fields, where there's a real cost for failed experiments, and we don't have unlimited data, forcing us to use more compute or algorithms), and that's if such tech exists, which is more arguable than Drexler implies.

There is of course other considerations like @ryan_greenblatt's point that even if a new paradigm is required, it's likely to be continuous with LLMs because it's possible to mix in imitation learning and continuous learning/memory, such that even if imitation learning doesn't lead to AGI on it's own, LLMs will still be a part of how such AGI is constructed, and I agree with a few quotes below by Ryan Greenblatt on this:

Prior to having a complete version of this much more powerful AI paradigm, you'll first have a weaker version of this paradigm (e.g. you haven't figured out the most efficient way to do the brain algorithmic etc). Further, the weaker version of this paradigm might initially be used in combination with LLMs (or other techniques) such that it (somewhat continuously) integrates into the old trends. Of course, large paradigm shifts might cause things to proceed substantially faster or bend the trend, but not necessarily.

Further, we should still broadly expect this new paradigm will itself take a reasonable amount of time to transition through the human range and though different levels of usefulness even if it's very different from LLM-like approaches (or other AI tech). And we should expect this probably happens at massive computational scale where it will first be viable given some level of algorithmic progress (though this depends on the relative difficulty of scaling things up versus improving the algorithms). As in, more than a year prior to the point where you can train a superintelligence on a gaming GPU, I expect someone will train a system which can automate big chunks of AI R&D using a much bigger cluster.
 

(Next quote) Also, I think the question is "can you somehow make use of imitation data" not "can the brain learning algorithm immediately use of imitation"?

The implicit assumption that logistics is trivial for superintelligence I think bleeds into a lot of LW thinking around AI, and a lot of AI disagreements basically turn on how far AIs can make logistics easier than current human supply chains.

Reply
Paradigms for computation
Noosphere895d53

My take on how recursion theory failed to be relevant for today's AI is that it turned out that what a machine could do if unconstrained basically didn't matter at all, and in particular it basically didn't matter what limits an ideal machine could do, because once we actually impose constraints that force computation to use very limited amounts of resources, we get a non-trivial theory and importantly all of the difficulty of explaining how humans do stuff lies here.

There was too much focus on "could a machine do something at all?" and not enough focus on "what could a machine with severe limitations could do?"

The reason is that in a sense, it is trivial to solve any problem with a machine if I'm allowed zero constraints except that the machine has to exist in a mathematical sense.

A good example of this is the paper on A Universal Hypercomputer, which shows how absurdly powerful computation can be if you are truly unconstrained:

https://arxiv.org/abs/1806.08747

Or davidad's comment on how every optimization problem is trivial under worst-case scenarios:

https://www.lesswrong.com/posts/yTvBSFrXhZfL8vr5a/worst-case-thinking-in-ai-alignment#N3avtTM3ESH4KHmfN

There are other problems, like embeddedness issues, but this turned out to be a core issue, and it turned out that the constants and constraints mattered more than recursion theory thought.

I'll flag here that while it's probably true that a future paradigm will involve more online learning/continual learning, LLMs currently don't do this, and after they're trained their weights/neurons are very much frozen. Indeed, I think this is a central reason why LLMs currently underperform their benchmarks, and is why I don't expect pure LLMs to work out as a paradigm for AGI in practice:

https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks#hSkQG2N8rkKXosLEF

Unfortunately, trained neural networks are the things that perform learning and inference online, during deployment. It seems like a bad sign for alignment if we can only understand the behavior of A.G.I. indirectly.

On the argument against simple algorithms that learn generally and have high performance existing, I think a key assumption of LW is that our physics is pretty simple in Kolmogorov complexity (perhaps excluding the constants), and if we buy this, then the true complexity of the world is upper bounded low enough that glass-box learners have real use.

That doesn't mean they have to be findable easily, though.

IMO, the most successful next paradigm will require us to much more strongly focus on the behavioral properties, rather than trying to hope for mechanistic interpretability success, and we will likely have to focus on generalizable statements about the input/output behavior, and mostly ignore the model's structure, since it's probably going to be too messy to work with by default.

This is why I'm more bullish on Vanessa Kosoy's IBP than basically any other theoretical agenda except Natural Abstractions.

Reply
TurnTrout's shortform feed
Noosphere8911d2-16

If I'm being honest, I'm much less concerned about the fact that So8res blocked you from commenting than I am by the fact that he deleted your comment.

The block was a reasonable action in my eyes to prevent more drama, but the deletion was demonstrative of being willing to suppress true information that would indicate his plan could fail catastrophically.

I do think there's something to be said for @RobertM and @habryka's concerns that it would be a bad thing to set a norm where any sorta-relevant post becomes an area to relitigate past drama, as drama has a tendency to consume everything, but as @GeneSmith had said, this almost certainly has a limiting principle, and I see less of a danger than usual here (though I am partial to @habryka's solution of having the delete comment UI button be different).

A key part of the reason here is that the 1st footnote demonstrates a pattern of trying to deflect from more serious issues into more safe issue territory, which makes me much more suspicious that the reason for why TurnTrout's comment was deleted was because of the more sensible reasons that Habryka and RobertM argued.

Let's just say I'm much less willing to trust Nate's reasoning without independent confirmation going forward.

Reply
Load More
Acausal Trade
15d
(+18/-18)
Shard Theory
9mo
(+2)
RLHF
9mo
(+27)
Embedded Agency
3y
(+640/-10)
Qualia
3y
(-1)
Embedded Agency
3y
(+314/-43)
Qualia
3y
(+74/-4)
Qualia
3y
(+20/-10)
11Difficulties of Eschatological policy making [Linkpost]
1mo
3
7State of play of AI progress (and related brakes on an intelligence explosion) [Linkpost]
2mo
0
57The case for multi-decade AI timelines [Linkpost]
2mo
22
15The real reason AI benchmarks haven’t reflected economic impacts
3mo
0
22Does the AI control agenda broadly rely on no FOOM being possible?
Q
3mo
Q
3
0Can a finite physical device be Turing equivalent?
4mo
10
37When is reward ever the optimization target?
Q
9mo
Q
17
1What does it mean for an event or observation to have probability 0 or 1 in Bayesian terms?
Q
10mo
Q
22
36My disagreements with "AGI ruin: A List of Lethalities"
10mo
46
7Does a time-reversible physical law/Cellular Automaton always imply the First Law of Thermodynamics?
Q
10mo
Q
11
Load More