LESSWRONG
LW

All of Richard_Ngo's Comments + Replies

Third-wave AI safety needs sociopolitical thinking

We disagree on which explanation is more straightforward, but regardless, that type of inference is very different from "literal written evidence".

8habryka36m

FWIW, I would currently take bets that Musk will pretty unambiguously enact and endorse censorship of things critical of him or the Trump administration more broadly within the next 12 months. I agree this case is ambiguous, but my pretty strong read based on him calling for criminal prosecution of journalists who say critical things about him or the Trump administration is that the moment its a question of political opportunity, not willingness. I am not totally sure, but sure enough to take a 1:1 bet on this operationalization.

Richard Ngo's Shortform

Richard_Ngo11h20

One of the main ways I think about empowerment is in terms of allowing better coordination between subagents.

In the case of an individual human, extreme morality can be seen as one subagent seizing control and overriding other subagents (like the ones who don't want to chop off body parts).

In the case of a group, extreme morality can be seen in terms of preference cascades that go beyond what most (or even any) of the individuals involved with them would individually prefer.

In both cases, replacing fear-based motivation with less coercive/more cooperative ... (read more)

2Wei Dai7h

I'm not sure that fear or coercion has much to do with it, because there's often no internal conflict when someone is caught up in some extreme form of the morality game, they're just going along with it wholeheartedly, thinking they're just being a good person or helping to advance the arc of history. In the subagents frame, I would say that the subagents have an implicit contract/agreement that any one of them can seize control, if doing so seems good for the overall agent in terms of power or social status. But quite possibly I'm not getting your point, in which case please explain more, or point to some specific parts of your articles that are especially relevant?

Richard Ngo's Shortform

Richard_Ngo1d*7117

In response to an email about what a pro-human ideology for the future looks like, I wrote up the following:

The pro-human egregore I'm currently designing (which I call fractal empowerment) incorporates three key ideas:

Firstly, we can see virtue ethics as a way for less powerful agents to aggregate to form more powerful superagents that preserve the interests of those original less powerful agents. E.g. virtues like integrity, loyalty, etc help prevent divide-and-conquer strategies. This would have been in the interests of the rest of the world w... (read more)

1Purplehermann9h

2nd point is a scary one. Empowering others in the relative sense is a terrible idea, unless they are trustworthy/virtuous. Same issue as AI risk In the absolute terms sure

Wei Dai1d*278

How would this ideology address value drift? I've been thinking a lot about the kind quoted in Morality is Scary. The way I would describe it now is that human morality is by default driven by a competitive status/signaling game, where often some random or historically contingent aspect of human value or motivation becomes the focal point of the game, and gets magnified/upweighted as a result of competitive dynamics, sometimes to an extreme, even absurd degree.

(Of course from the inside it doesn't look absurd, but instead feels like moral progress. One exa... (read more)

Richard Ngo's Shortform

Richard_Ngo5d114

In my post on value systematization I used utilitarianism as a central example of value systematization.

Value systematization is important because it's a process by which a small number of goals end up shaping a huge amount of behavior. But there's another different way in which this happens: core emotional motivations formed during childhood (e.g. fear of death) often drive a huge amount of our behavior, in ways that are hard for us to notice.

Fear of death and utilitarianism are very different. The former is very visceral and deep-rooted; it typically inf... (read more)

1Lucien3d

Reminds me of Maslow's pyramid. I made an article about values, saying the supreme value is life and every other value derives from it. Watch out, this most probably does not align with your view at first glance: https://www.lesswrong.com/posts/xx3St4KC3KHHPGfL9/human-alignment

1robo5d

I don't think it's system 1 doing the systemization. Evolution beat fear of death into us in lots of independent forms (fear of heights, snakes, thirst, suffocation, etc.), but for the same underlying reason. Fear of death is not just an abstraction humans invented or acquired in childhood; is a "natural idea" pointed at by our brain's innate circuitry from many directions. Utilitarianism doesn't come with that scaffolding. We don't learn to systematize Euclidian and Minkowskian spaces the same way either.

2Thane Ruthenis5d

I think that's right. Taking on the natural-abstraction lens, there is a "ground truth" to the "hierarchy of values". That ground truth can be uncovered either by "manual"/symbolic/System-2 reasoning, or by "automatic"/gradient-descent-like/System-1 updates, and both processes would converge to the same hierarchy. But in the System-2 case, the hierarchy would be clearly visible to the conscious mind, whereas the System-1 route would make it visible only indirectly, by the impulses you feel. I don't know about the conflict thing, though. Why do you think System 2 would necessarily oppose System 1's deepest motivations?

Towards a scale-free theory of intelligent agency

Richard_Ngo6dΩ440

I've now edited that section. Old version and new version here for posterity.

Old version:

None of these is very satisfactory! Intuitively speaking, Alice and Bob want to come to an agreement where respect for both of their interests is built in. For example, they might want the EUM they form to value fairness between their two original sets of interests. But adding this new value is not possible if they’re limited to weighted averages. The best they can do is to agree on a probabilistic mixture of EUMs—e.g. tossing a coin to decide between option 1 and opti... (read more)

Towards a scale-free theory of intelligent agency

Richard_Ngo6dΩ440

I was a bit lazy in how I phrased this. I agree with all your points; the thing I'm trying to get at is that this approach falls apart quickly if we make the bargaining even slightly less idealized. E.g. your suggestion "Form an EUM which is totally indifferent about the cake allocation between them and thus gives 100% of the cake to whichever agent is cheaper/easier to provide cake for":

Strongly incentivizes deception (including self-deception) during bargaining (e.g. each agent wants to overstate the difficulty of providing cake for it).
Strongly incentiv

... (read more)

4Richard_Ngo6d

I've now edited that section. Old version and new version here for posterity. Old version: None of these is very satisfactory! Intuitively speaking, Alice and Bob want to come to an agreement where respect for both of their interests is built in. For example, they might want the EUM they form to value fairness between their two original sets of interests. But adding this new value is not possible if they’re limited to weighted averages. The best they can do is to agree on a probabilistic mixture of EUMs—e.g. tossing a coin to decide between option 1 and option 2—which is still very inflexible, since it locks in one of them having priority indefinitely. Based on similar reasoning, Scott Garrabrant rejects the independence axiom. He argues that the axiom is unjustified because rational agents should be able to follow through on commitments they made about which decision procedure to follow (or even hypothetical commitments). New version: These are all very unsatisfactory. Bob wouldn’t want #1, Alice wouldn’t want #2, and #3 is extremely non-robust. Alice and Bob could toss a coin to decide between options #1 and #2, but then they wouldn’t be acting as an EUM (since EUMs can’t prefer a probabilistic mixture of two options to either option individually). And even if they do, whoever loses the coin toss will have a strong incentive to renege on the deal. We could see these issues merely as the type of frictions that plague any idealized theory. But we could also seem them as hints about what EUM is getting wrong on a more fundamental level. Intuitively speaking, the problem here is that there’s no mechanism for separately respecting the interests of Alice and Bob after they’ve aggregated into a single agent. For example, they might want the EUM they form to value fairness between their two original sets of interests. But adding this new value is not possible if they’re limited to (a probability distribution over) weighted averages of their utilities. This makes agg

Elite Coordination via the Consensus of Power

Richard_Ngo8d163

On a meta level, I have a narrative that goes something like: LessWrong tried to be truth-seeking, but was scared of discussing the culture war, so blocked that off. But then the culture war ate the world, and various harms have come about from not having thought clearly about that (e.g. AI governance being a default left-wing enterprise that tried to make common cause with AI ethics). Now cancel culture is over and there are very few political risks to thinking about culture wars, but people are still scared to. (You can see Scott gradually dipping his to... (read more)

7Lucius Bushnaq6d

I don't think the risks of talking about the culture war have gone down. If anything, it feels like it's yet again gotten worse. What exactly is risky to talk about has changed a bit, but that's it. I'm more reluctant than ever to involve myself in culture war adjacent discussions.

yams8d134

I read your comment as conflating 'talking about the culture war at all' and 'agreeing with / invoking Curtis Yarvin', which also conflates 'criticizing Yarvin' with 'silencing discussion of the culture war'.

This reinforces a false binary between totally mind-killed wokists and people (like Yarvin) who just literally believe that some folks deserve to suffer, because it's their genetic destiny.

This kind of tribalism is exactly what fuels the culture war, and not what successfully sidesteps, diffuses, or rectifies it. NRx, like the Cathedral, is a min... (read more)

Elite Coordination via the Consensus of Power

Richard_Ngo9d141

Thanks for the well-written and good-faith reply. I feel a bit confused by how to relate to it on a meta level, so let me think out loud for a while.

I'm not surprised that I'm reinventing a bunch of ideas from the humanities, given that I don't have much of a humanities background and didn't dig very far through the literature.

But I have some sense that even if I had dug for these humanities concepts, they wouldn't give me what I want.

What do I want?

Concepts that are applied to explaining current cultural and political phenomena around me (because those ar

... (read more)

2testingthewaters9d

Hey, thank you for taking the time to reply honestly and in detail as well. With regards to what you want, I think that this is in many senses also what I am looking for, especially the last item about tying in collective behaviour to reasoning about intelligence. I think one of the frames you might find the most useful is one you've already covered---power as a coordination game. As you alluded to in your original post, people aren't in a massive hive mind/conspiracy---they mostly want to do what other successful people seem to be doing, which translates well to a coordination game and also explains the rapid "board flips" once a critical mass of support/rejection against some proposition is reached. For example, witness the rapid switch to majority support of gay marriage in the 2010s amongst the population in general. Would also love to discuss this with you in more detail (I trained as an English student and also studied Digital Humanities). I will leave off with a few book suggestions that, while maybe not directly answering your needs, you might find interesting. * Capitalist Realism by Mark Fisher (as close to a self-portrait by the modern humanities as it gets) * Hyperobjects by Timothy Morton (high level perspective on how cultural, material, and social currents impact our views on reality) * How minds change by David McRaney (not humanities, but pop sci about the science of belief and persuasion) P.S. Re: the point about Yarvin being right, betting on the dominant group in society embracing a dangerous delusion is a remarkably safe bet. (E.g. McCarthyism, the aforementioned Bavarian Witch Hunts, fascism, lysenkoism etc.)

Jan_Kulveit9d132

My meta- practical suggestion is to ask AIs with prompts like notice where the ideas or arguments matches existing ideas from humanities, using different language. Ideally point to references to such sources. Often you will find people who came up with somewhat similar models or observations. Also while people may be hard to reach or dead, and engaging with long books is costly, in my experience even their simulacra can provide useful feedback, come up with ideas, point to what you miss.

Another meta- idea is it seems good to notice the skulls. My suspicion... (read more)

7samuelshadrach9d

Strongly in favour of this. There are people in academia doing this type of work, a lot of them are economists by training studying sociology and political science. See for example Freaknomics by Stephen Levitt or Daron Acemoglu who recently won a nobel prize. Search keywords: neo-instutionalism, rational choice theory. There are a lot of political science papers on rational choice theory, I haven't read many of them so I can't give immediate recommendations. I'd be happy to join you in your search for existing literature, if that's a priority for you. Or just generally discuss the stuff. I'm particularly interested in applying rational choice models to how the internet will affect society.

The ants and the grasshopper

Richard_Ngo11d40

I have thought about this on and off for several years and finally decided that you're right and have changed it. Thanks for pushing on this.

Trojan Sky

Richard_Ngo15d2212

Nice, that's almost exactly how I intended it. Except that I wasn't thinking of the "stars" as satellites looking for individual humans to send propaganda at (which IMO is pretty close to "communicating"), but rather a network of satellites forming a single "screen" across the sky that plays a video infecting any baseline humans who look at it.

In my headcanon the original negotiators specified that sunlight would still reach the earth unimpeded, but didn't specify that no AI satellites would be visible from the Earth. I don't have headcanon explanations fo... (read more)

5gwern9d

That was what I was thinking, yes. "A pact would normally allow voluntary communication to be initiated with the AIs, so any glitcher which had been successfully attacked would have simply communicated back to its masters, either downloading new instructions & attacks or finetuning the existing ones or being puppeted directly by the AIs, sometime over the past centuries or millennia; if nothing else, they have an unlimited amount of time to stare at the sky and be reprogrammed arbitrarily after the initial exploit; so glitchers are indeed 'glitchy' and must represent a permanently failed attack method. That is why they bumble around semi-harmlessly: a broken worm or virus can cause a lot of trouble as it futilely portscans or DoSes targets or goes through infinite loops etc, even if the code is buggy and has accidentally locked out its creators as well as everyone else."

2Davidmanheim15d

My headcannon for the animals was that early on, they released viruses that genetically modified non-human animals in ways that don't violate the pact. I didn't think the pact could have been as broad as "the terrestrial Earth will be left unmodified," because the causal impact of their actions certainly changed things. I assumed it was something like "AIs and AI created technologies may not do anything that interferes with humans actions on Earth. or harms humans in any way" - but genetic engineering instructions sent from outside of the earth, assumedly pre-collapse, didn't qualify because they didn't affect human, they made animals affect humans, which was parsed as similar to impacts of the environment on humans, not an AI technology.

Trojan Sky

Richard_Ngo15d64

in general I think people should explain stuff like this. "I might as well not help" is a very weak argument compared with the benefits of people understanding the world better.

7Shankar Sivarajan15d

It's a straightforward application of the Berryman Logical Imaging Technique, best known for its use by the other basilisk.

Why I’m not a Bayesian

Richard_Ngo15d47

Oh, I see what you mean now. In that case, no, I disagree. Right now this notion of robustness is pre-theoretic. I suspect that we can characterize robustness as "acting like a belief/goal agent" in the limit, but part of my point is that we don't even know what it means to act "approximately like belief/goal agents" in realistic regimes, because e.g. belief/goal agents as we currently characterize them can't learn new concepts.

Relatedly, see the dialogue in this post.

Trojan Sky

Richard_Ngo16d80

I appreciated this comment! Especially:

dude, how the hell do you come up with this stuff.

5the gears to ascension16d

It took me several edits to get spoilers to work right, I had to switch from markdown to the rich text editor. Your second spoiler is empty, which is how mine were breaking.

Why I’m not a Bayesian

Richard_Ngo16d20

This quote from my comment above addresses this:

And so I'd say that AIXI-like agents and coalitional agents converge in the limit of optimality, but before that coalitional agency will be a much better framework for understanding realistic agents, including significantly superhuman agents.

2Cole Wyeth16d

So the thing that coalitional agents are robust at is acting approximately like belief/goal agents, and you’re only making a structural claim about agency? If so, I find your model pretty plausible.

Why I’m not a Bayesian

Richard_Ngo17d20

Thank you Cole for the comment! Some quick thoughts in response (though I've skipped commenting on the biology examples and the ML examples since I think our intuitions here are a bit too different to usefully resolve via text):

Ngo’s view seems to be that after some level of decomposition, the recursion bottoms out at agents that can be seen as expected utility maximizers (though it’s not totally clear to me where this point occurs on his model, he seems to think that these irreducible agents are more like rough heuristics than sophisticated planners, so t

... (read more)

2Noosphere8916d

Re democratic countries overtaken by dictatorial countries, I think that this will only last until AI that can automate at least all white collar labor is achieved, and maybe even most blue collar physical labor well enough that human wages for those jobs decline below what you need to subsist on a human, and by then dictatorial/plutocratic countries will unfortunately come back as a viable governing option, and maybe even overtaking democratic countries. So to come back to the analogy, I think VNM-rationality dictatorship is unfortunately common and convergent over a long timescale, and it's democracies/coalition politics that are fragile over the sweep of history, because they only became dominant-ish starting in the 18th century and end sometime in the 21st century.

2Cole Wyeth17d

What is this coalitional structure for if not to approximate an EU maximizing agent?

1Jonas Hallgren17d

I saw the comment and thought I would drop some stuff that are beginnings of approaches for a more mathematical theory of iterated agency. A general underlying idea is to decompose a system into it's maximally predictive sub-agents, sort of like an arg-max of daniel dennetts intentional stance. There are various underlying reasons for why you would believe that there are algorithms for discovering the most important nested sub-parts of systems using things like Active Inference especially where it has been applied in computational biology. Here's some related papers: https://arxiv.org/abs/1412.2447 - We consider biological individuality in terms of information theoretic and graphical principles. Our purpose is to extract through an algorithmic decomposition system-environment boundaries supporting individuality. We infer or detect evolved individuals rather than assume that they exist. Given a set of consistent measurements over time, we discover a coarse-grained or quantized description on a system, inducing partitions (which can be nested) https://arxiv.org/pdf/2209.01619 - Trying to relate Agency to POMDPs and the intentional stance.

nikola's Shortform

Richard_Ngo1mo40

I found this tricky to parse because of two phrasing issues:

The post depends a lot on what you mean by "school" (high school versus undergrad).
I feel confused about what claim you're making about the waiting room strategy: you say that some people shouldn't use it, but you don't actually claim that anyone in particular should use it. So are you just mentioning that it's a possible strategy? Or are you implying that it should be the default strategy?

2Nikola Jurkovic1mo

1. All of the above but it seems pretty hard to have an impact as a high schooler, and many impact avenues aren't technically "positions" (e.g. influencer) 2. I think that everyone expect "Extremely resilient individuals who expect to get an impactful position (including independent research) very quickly" is probably better off following the strategy.

The Gentle Romance

Richard_Ngo2mo40

Something that's fascinating about this art of yours is that I can't tell if you're coherently in favor of this, or purposefully invoking thinking errors in the audience, or just riffing, or what.

Thanks for the fascinating comment.

I am a romantic in the sense that I believe that you can achieve arbitrary large gains from symbiosis if you're careful and skillful enough.

Right now very few people are careful and skillful enough. Part of what I'm trying to convey with this story is what it looks like for AI to provide most of the requisite skill.

Another way of... (read more)

Ten people on the inside

Richard_Ngo2moΩ9136

FWIW I think of "OpenAI leadership being untrustworthy" (a significant factor in me leaving) as different from "OpenAI having bad safety policies" (not a significant factor in me leaving). Not sure if it matters, I expect that Scott was using "safety policies" more expansively than I do. But just for the sake of clarity:

I am generally pretty sympathetic to the idea that it's really hard to know what safety policies to put in place right now. Many policies pushed by safety people (including me, in the past) have been mostly kayfabe (e.g. being valuable as c... (read more)

2habryka2mo

(I meant the more expansive definition. Plausible that me and Zac talked past each other because of that)

The Gentle Romance

Richard_Ngo2mo90

Oh huh, I had the opposite impression from when I published Tinker with you. Thanks for clarifying!

The Gentle Romance

Richard_Ngo2mo40

Ty! You're right about the Asimov deal, though I do have some leeway. But I think the opening of this story is a little slow, so I'm not very excited about that being the only thing people see by default.

Unrelatedly, my last story is the only one of my stories that was left as a personal blog post (aside from the one about parties). Change of policy or oversight?

2Raemon2mo

I think that was a random oversight. Moved to frontage. I do agree the opening is kinda slow

The Gentle Romance

Richard_Ngo2mo40

Ah, glad to hear the effort was noticeable. I do think that as I get more practice at being descriptive, concreteness will become easier for me (my brain just doesn't work that way by default). And anyone reading this comment is welcome to leave me feedback about places in my stories where I should have been more concrete.

But I'm also pivoting away from stories in general right now, there's too much other stuff I want to spend time on. I have half a dozen other stories for which I've already finished first drafts, so I'll probably gradually release those i... (read more)

The Gentle Romance

Richard_Ngo2mo*154

I wrote most of it a little over a year ago. In general I don't plot out stories, I just start writing them and see what happens. But since I was inspired by The Gentle Seduction I already had a broad idea of where it was going.

I then sent a draft to some friends for feedback. One friend left about 50 comments in places where I'd been too abstract or given a vague description, with each comment literally just saying "like what?"

This was extremely valuable feedback but almost broke my will to finish the story. It took me about a year to work through most of... (read more)

2Mo Putera2mo

I've read most of your stories over at Narrative Ark and wanted to remark that The Gentle Romance did feel more concrete than usual, which was nice. Given how much effort it took for you however, I suppose I shouldn't expect future stories at Narrative Ark to be similarly concrete?

The Minority Coalition

Richard_Ngo2mo30

The Minority Faction

On Eating the Sun

Richard_Ngo3mo103

I'm not sure what the details would look like, but I'm pretty sure ASI would have enough new technologies to figure something out within 10,000 years.

I feel like this is the main load-bearing claim underlying the post, but it's barely argued for.

In some sense the sun is already "eating itself" by doing a fusion reaction, which will last for billions more years. So you're claiming that AI could eat the sun (at least) six orders of magnitude faster, which is not obvious to me.

I don't think my priors on that are very different from yours but the thing that would have made this post valuable for me is some object-level reason to upgrade my confidence in that.

2jessicata3mo

1. Doesn't have to expend the energy. It's about reshaping the matter to machines. Computers take lots of mass-energy to constitute them, not to power them. 2. Things can go 6 orders of magnitude faster due to intelligence/agency, it's not highly unlikely in general. 3. I agree that in theory the arguments here could be better. It might require knowing more physics than I do, and has the "how does Kasparov beat you at chess" problem.

The Field of AI Alignment: A Postmortem, and What To Do About It

Richard_Ngo3mo72

FWIW twitter search is ridiculously bad, it's often better to use google instead. In this case I had it as the second result when I googled "richardmcngo twitter safety fundamentals" (richardmcngo being my twitter handle).

Is "VNM-agent" one of several options, for what minds can grow up into?

Richard_Ngo3mo82

Yepp, though note that this still feels in tension with the original post to me - I expect to find a clean, elegant replacement to VNM, not just a set of approximately-equally-compelling alternatives.

Why? Partly because of inside views which I can’t explain in brief. But mainly because that’s how conceptual progress works in general. There is basically always far more hidden beauty and order in the universe than people are able to conceive (because conceiving of it is nearly as hard as discovering it - like, before Darwin, people wouldn’t have been able to... (read more)

2Noosphere893mo

I think a crux here is that I think the domain of values/utility functions is a domain in which it's likely that multiple structures are equally compelling, and my big reason for this probably derives from me being a moral relativist here, in which while morality is something like a real thing, it's not objective or universal, and other people are allowed to hold different moralities and not update on them. (Side note, but most of the objections that a lot of people hold about moral realism can be alleviated just by being a moral relativist, rather than a moral anti-realist).

8AnnaSalamon3mo

I... don't think I'm taking the hidden order of the universe non-seriously. If it matters, I've been obsessively rereading Christopher Alexander's "The nature of order" books, and trying to find ways to express some of what he's looking at in LW-friendly terms; this post is part of an attempt at that. I have thousands and thousands of words of discarded drafts about it. Re: why I think there might be room in the universe for multiple aspirational models of agency, each of which can be self-propagating for a time, in some contexts: Biology and culture often seem to me to have multiple kinda-stable equilibria. Like, eyes are pretty great, but so is sonar, and so is a sense of smell, or having good memory and priors about one's surroundings, and each fulfills some of the same purposes. Or diploidy and haplodiploidy are both locally-kinda-stable reproductive systems. What makes you think I'm insufficiently respecting the hidden order of the universe?

From the Archives: a story

Richard_Ngo3mo20

As a quick note: the auto-generated glossary for this story is pretty cool (though it predictably contains spoilers).

lemonhope's Shortform

Richard_Ngo4mo100

Because I might fund them or forward it to someone else who will.

lemonhope's Shortform

Richard_Ngo4mo120

In general people should feel free to DM me with pitches for this sort of thing.

3kave4mo

Perhaps say some words on why they might want to?

"The Solomonoff Prior is Malign" is a special case of a simpler argument

Richard_Ngo4mo30

I think this epistemic uncertainty is distinct from the type of "objective probabilities" I talk about in my post, and I don't really know how to use language without referring to degrees of my epistemic uncertainty.

The part I was gesturing at wasn't the "probably" but the "low measure" part.

Is your position that the problem is deeper than this, and there is no objective prior over worlds, it's just a thing like ethics that we choose for ourselves, and then later can bargain and trade with other beings who have a different prior of realness?

Yes, that... (read more)

"The Solomonoff Prior is Malign" is a special case of a simpler argument

Richard_Ngo4mo32

Hmmm, uncertain if we disagree. You keep saying that these concepts are cursed and yet phrasing your claims in terms of them anyway (e.g. "probably very low measure"), which suggests that there's some aspect of my response you don't fully believe.

In particular, in order for your definition of "what beings are sufficiently similar to you" to not be cursed, you have to be making claims not just about the beings themselves (since many Boltzmann brains are identical to your brain) but rather about the universes that they're in. But this is kinda what I mean by... (read more)

2David Matolcsi4mo

Hm, probably we disagree on something. I'm very confused how to mesh epistemic uncertainty with these "distribution over different Universes" types of probability. When I say "Boltzmann brains are probably very low measure", I mean "I think Boltzmann brains are very low measure, but this is a confusing topic and there might be considerations I haven't thought of and I might be totally mistaken". I think this epistemic uncertainty is distinct from the type of "objective probabilities" I talk about in my post, and I don't really know how to use language without referring to degrees of my epistemic uncertainty. Maybe we have some deeper disagreement here. It feels plausible to me that there is a measure of "realness" in the Multiverse that is an objective fact about the world, and we might be able to figure it out. When I say probabilities are cursed, I just mean that even if an objective prior over worlds and moments exist (like the Solomonoff prior), your probabilities of where you are are still hackable by simulations, so you shouldn't rely on raw probabilities for decision-making, like the people using the Oracle do. Meanwhile, expected values are not hackable in the same way, because if they recreate you in a tiny simulation, you don't care about that, and if they recreate you in a big simulation or promise you things in the outside world (like in my other post), then that's not hacking your decision making, but a fair deal, and you should in fact let that influence your decisions. Is your position that the problem is deeper than this, and there is no objective prior over worlds, it's just a thing like ethics that we choose for ourselves, and then later can bargain and trade with other beings who have a different prior of realness?

Akash's Shortform

Richard_Ngo4mo15-7

I don't think this line of argument is a good one. If there's a 5% chance of x-risk and, say, a 50% chance that AGI makes the world just generally be very chaotic and high-stakes over the next few decades, then it seems very plausible that you should mostly be optimizing for making the 50% go well rather than the 5%.

Vladimir_Nesov4mo175

Still consistent with great concern. I'm pointing out that O O's point isn't locally valid, observing concern shouldn't translate into observing belief that alignment is impossible.

Akash's Shortform

Richard_Ngo4mo92

Worse than the current situation, because the counterfactual is that some later project happens which kicks off in a less race-y manner.

In other words, whatever the chance of its motivation shifting over time, it seems dominated by the chance that starting the equivalent project later would just have better motivations from the outset.

4[anonymous]4mo

Can you say more about scenarios where you envision a later project happening that has different motivations? I think in the current zeitgeist, such a project would almost definitely be primarily motivated by beating China. It doesn't seem clear to me that it's good to wait for a new zeitgeist. Reasons: * A company might develop AGI (or an AI system that is very good at AI R&D that can get to AGI) before a major zeitgeist change. * The longer we wait, the more capable the "most capable model that wasn't secured" is. So we could risk getting into a scenario where people want to pause but since China and the US both have GPT-Nminus1, both sides feel compelled to race forward (whereas this wouldn't have happened if security had kicked off sooner.)

"The Solomonoff Prior is Malign" is a special case of a simpler argument

Richard_Ngo4mo93

Great post. One slightly nitpicky point, though: even in the section where you argue that probabilities are cursed, you are still talking in the language of probabilities (e.g. "my modal guess is that I'm in a solipsist simulation that is a fork of a bigger simulation").

I think there's probably a deeper ontological shift you can do to a mindset where there's no actual ground truth about "where you are". I think in order to do that you probably need to also go beyond "expected utilities are real", because expected utilities need to be calculated by assignin... (read more)

3David Matolcsi4mo

I like your poem on Twitter. I think that Boltzmann brains in particular are probably very low measure though, at least if you use Solomonoff induction. If you think that weighting observer moments within a Universe by their description complexity is crazy (which I kind of feel), then you need to come up with a different measure on observer moments, but I expect that if we find a satisfying measure, Boltzmann brains will be low measure in that too. I agree that there's no real answer to "where you are", you are a superposition of beings across the multiverse, sure. But I think probabilities are kind of real, if you make up some definition of what beings are sufficiently similar to you that you consider them "you", then you can have a probability distribution over where those beings are, and it's a fair equivalent rephrasing to say "I'm in this type of situation with this probability". (This is what I do in the post. Very unclear though why you'd ever want to estimate that, that's why I say that probabilities are cursed.) I think expected utilities are still reasonable. When you make a decision, you can estimate who are the beings whose decision correlate with this one, and what is the impact of each of their decisions, and calculate the sum of all that. I think it's fair to call this sum expected utility. It's possible that you don't want to optimize for the direct sum, but for something determined by "coalition dynamics", I don't understand the details well enough to really have an opinion. (My guess is we don't have real disagreement here and it's just a question of phrasing, but tell me if you think we disagree in a deeper way.)

Anthropic: Three Sketches of ASL-4 Safety Case Components

Richard_Ngo4moΩ8110

Cool, ty for (characteristically) thoughtful engagement.

I am still intuitively skeptical about a bunch of your numbers but now it's the sort of feeling which I would also have if you were just reasoning more clearly than me about this stuff (that is, people who reason more clearly tend to be able to notice ways that interventions could be surprisingly high-leverage in confusing domains).

Why I’m not a Bayesian

Richard_Ngo4mo10

Ty for the link but these seem like both clearly bad semantics (e.g. under either of these the second-best hypothesis under consideration might score arbitrarily badly).

The Minority Coalition

Richard_Ngo4mo20

Just changed the name to The Minority Coalition.

1NoriMori19922mo

What was the title before?

Anthropic: Three Sketches of ASL-4 Safety Case Components

Richard_Ngo5moΩ11140

1. Yepp, seems reasonable. Though FYI I think of this less as some special meta argument, and more as the common-sense correction that almost everyone implicitly does when giving credences, and rationalists do less than most. (It's a step towards applying outside view, though not fully "outside view".)

2. Yepp, agreed, though I think the common-sense connotations of "if this became" or "this would have a big effect" are causal, especially in the context where we're talking to the actors who are involved in making that change. (E.g. the non-causal interpreta... (read more)

Daniel Kokotajlo5moΩ9170

Good point re 2. Re 1, meh, still seems like a meta-argument to me, because when I roll out my mental simulations of the ways the future could go, it really does seem like my If... condition obtaining would cut out about half of the loss-of-control ones.

Re 3: point by point:
1. AISIs existing vs. not: Less important; I feel like this changes my p(doom) by more like 10-20% rather than 50%.
2. Big names coming out: idk this also feels like maybe 10-20% rather than 50%
3. I think Anthropic winning the race would be a 40% thing maybe, but being a runner-up doesn'... (read more)

Anthropic: Three Sketches of ASL-4 Safety Case Components

Richard_Ngo5moΩ13161

We have discussed this dynamic before but just for the record:

I think that if it became industry-standard practice for AGI corporations to write, publish, and regularly update (actual instead of just hypothetical) safety cases at at this level of rigor and detail, my p(doom) would cut in half.

This is IMO not the type of change that should be able to cut someone's P(doom) in half. There are so many different factors that are of this size and importance or bigger (including many that people simply have not thought of yet) such that, if this change could halv... (read more)

Daniel Kokotajlo5moΩ6100

Sorry! My response:

1. Yeah you might be right about this, maybe I should get less excited and say something like "it feels like it should cut in half but taking into account Richard's meta argument I should adjust downwards and maybe it's just a couple percentage points"

2. If the conditional obtains, that's also evidence about a bunch of other correlated good things though (timelines being slightly longer, people being somewhat more reasonable in general, etc.) so maybe it is legit to think this would have quite a big effect

3. Are you sure there are so man... (read more)

2Noosphere895mo

While I agree that people are in general overconfident, including LessWrongers, I don't particularly think this is because Bayesianism is philosophically incorrect, but rather due to both practical limits on computation combined with sometimes not realizing how data-poor their efforts truly are. (There are philosophical problems with Bayesianism, but not ones that predict very well the current issues of overconfidence in real human reasoning, so I don't see why Bayesianism is so central here. Separately, while I'm not sure there can ever be a complete theory of epistemology, I do think that Bayesianism is actually quite general, and a lot of the principles of Bayesianism is probably implemented in human brains, allowing for practicality concerns like cost of compute.)

Against Almost Every Theory of Impact of Interpretability

Richard_Ngo5mo51

The former can be sufficient—e.g. there are good theoretical researchers who have never done empirical work themselves.

In hindsight I think "close conjunction" was too strong—it's more about picking up the ontologies and key insights from empirical work, which can be possible without following it very closely.

Towards more cooperative AI safety strategies

Richard_Ngo5mo40

I think there's something importantly true about your comment, but let me start with the ways I disagree. Firstly, the more ways in which you're power-seeking, the more defense mechanisms will apply to you. Conversely, if you're credibly trying to do a pretty narrow and widely-accepted thing, then there will be less backlash. So Jane Street is power-seeking in the sense of trying to earn money, but they don't have much of a cultural or political agenda, they're not trying to mobilize a wider movement, and earning money is a very normal thing for companies ... (read more)

Why I’m not a Bayesian

Richard_Ngo5mo1-1

The bits are not very meaningful in isolation; the claim "program-bit number 37 is a 1" has almost no meaning in the absence of further information about the other program bits. However, this isn't much of an issue for the formalism.

In my post I defend the use of propositions as a way to understand models, and attack the use of propositions as a way to understand reality. You can think of this as a two-level structure: claims about models can be crisp and precise enough that it makes sense to talk about them in propositional terms, but for complex bits of ... (read more)

2abramdemski5mo

I agree that Solomonoff’s epistemology is noncentral in the way you describe, but I don't think it impacts my points very much; replace Solomonoff with whatever epistemic theory you like. It was just a convenient example. (Although I expect defenders of Solomonoff to expect the program bits to be meaningful; and I somewhat agree. It's just that the theory doesn't address the meaning there, instead treating programs more like black-box predictors.) In my view, meaning is the property of being optimized to adhere to some map-territory relationship. However, this optimization itself must always occur within some model (it provides the map-territory relationship to optimize for). In the context of Solomonoff Induction, this may emerge from the incentive to predict, but it is not easy to reason about. In some sense, reality isn't made of bits, propositions, or any such thing; it is of unknowable type. However, we always describe it via terms of some type (a language). I'm no longer sure where the disagreement lies, if any, but I still feel like the original post overstates things.

The Minority Coalition

Richard_Ngo5mo50

The minority faction is the group of entities that are currently alive, as opposed to the vast number of entities that will exist in the future. I.e. the one Clarke talks about when he says "why won’t you help the rest of us form a coalition against them?"

In hindsight I should probably have called it The Minority Coalition.

2Richard_Ngo4mo

Just changed the name to The Minority Coalition.

Why I’m not a Bayesian

Richard_Ngo6mo40

Here's how that would be handled by a Bayesian mind:
There's some latent variable representing the semantics of "humanity will be extinct in 100 years"; call that variable S for semantics.
Lots of things can provide evidence about S. The sentence itself, context of the conversation, whatever my friend says about their intent, etc, etc.
... and yet it is totally allowed, by the math of Bayesian agents, for that variable S to still have some uncertainty in it even after conditioning on the sentence itself and the entire low-level physical state of my friend, or

... (read more)

9johnswentworth6mo

We are indeed in the logically omniscient setting still, so nothing would resolve that uncertainty. The simplest concrete example I know is the Boltzman distribution for an ideal gas - not the assorted things people say about the Boltzmann distribution, but the actual math, interpreted as Bayesian probability. The model has one latent variable, the temperature T, and says that all the particle velocities are normally distributed with mean zero and variance proportional to T. Then, just following the ordinary Bayesian math: in order to estimate T from all the particle velocities, I start with some prior P[T], calculate P[T|velocities] using Bayes' rule, and then for ~any reasonable prior I end up with a posterior distribution over T which is very tightly peaked around the average particle energy... but has nonzero spread. There's small but nonzero uncertainty in T given all of the particle velocities. And in this simple toy gas model, those particles are the whole world, there's nothing else to learn about which would further reduce my uncertainty in T.

Why I’m not a Bayesian

Richard_Ngo6mo40

"Dragons are attacking Paris!" seems true by your reasoning, since there are no dragons, and therefore it is vacuously true that all of them are attacking Paris.

5localdeity6mo

Are you not familiar with the term "vacuously true"? I find this very surprising. People who study math tend to make jokes with it. The idea is that, if we were to render a statement like "Colorless green ideas sleep furiously" into formal logic, we'd probably take it to mean the universal statement "For all X such that X is a colorless green idea, X sleeps furiously". A universal statement is logically equivalent to "There don't exist any counterexamples", i.e. "There does not exist X such that X is a colorless green idea and X does not sleep furiously". Which is clearly true, and therefore the universal is equally true. There is, of course, some ambiguity when rendering English into formal logic. It's not rare for English speakers to say "if" when they mean "if and only if", or "or" when they mean "exclusive or". (And sometimes "Tell me which one", as in "Did you do A, or B?" "Yes." "Goddammit.") Often this doesn't cause problems, but sometimes it does. (In which case, as I've said, the solution is not to give their statement an ambiguous truth value, but rather to ask them to restate it less ambiguously.) "Dragons are attacking Paris" seems most naturally interpreted as the definite statement "There's some unspecified number—but since I used the plural, it's at least 2—of dragons that are attacking Paris", which would be false. One could also imagine interpreting it as a universal statement "All dragons are currently attacking Paris", which, as you say, would be vacuously true since there are no dragons. However, in English, the preferred way to say that would be "Dragons attack Paris", as CBiddulph says. "Dragons are attacking Paris" uses the present progressive tense, while "Dragons attack Paris" uses what is called the "simple present"/"present indefinite" tense. Wiki says: English grammar rules aren't necessarily universal and unchanging, but they do give at least medium-strength priors on how to interpret a sentence.

2CBiddulph6mo

Your example wouldn't be true, but "Dragons attack Paris" would be, interpreted as a statement about actual dragons' habits

Why I’m not a Bayesian

Richard_Ngo6mo228

Ty for the comment. I mostly disagree with it. Here's my attempt to restate the thrust of your argument:

The issues with binary truth-values raised in the post are all basically getting at the idea that the meaning of a proposition is context-dependent. But we can model context-dependence in a Bayesian way by referring to latent variables in the speaker's model of the world. Therefore we don't need fuzzy truth-values.

But this assumes that, given the speaker's probabilistic model, truth-values are binary. I don't see why this needs to be the case. Here's an ... (read more)

johnswentworth6mo121

But this assumes that, given the speaker's probabilistic model, truth-values are binary.

In some sense yes, but there is totally allowed to be irreducible uncertainty in the latents - i.e. given both the model and complete knowledge of everything in the physical world, there can still be uncertainty in the latents. And those latents can still be meaningful and predictively powerful. I think that sort of uncertainty does the sort of thing you're trying to achieve by introducing fuzzy truth values, without having to leave a Bayesian framework.

Let's look at th... (read more)

Why I’m not a Bayesian

Richard_Ngo6mo30

Suppose you have two models of the earth; one is a sphere, one is an ellipsoid. Both are wrong, but they're wrong in different ways. Now, we can operationalize a bunch of different implications of these hypotheses, but most of the time in science the main point of operationalizing the implications is not to choose between two existing models, or because we care directly about the operationalizations, but rather to come up with a new model that combines their benefits.

7Archimedes6mo

I see what you're gesturing at but I'm having difficulty translating it into a direct answer to my question. Cases where language is fuzzy are abundant. Do you have some examples of where a truth value itself is fuzzy (and sensical) or am I confused in trying to separate these concepts?

"Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"

Richard_Ngo6mo139

IMO all of the "smooth/sharp" and "soft/hard" stuff is too abstract. When I concretely picture what the differences between them are, the aspect that stands out most is whether the takeoff will be concentrated within a single AI/project/company/country or distributed across many AIs/projects/companies/countries.

This is of course closely related to debates about slow/fast takeoff (as well as to the original Hanson/Yudkowsky debates). But using this distinction instead of any version of the slow/fast distinction has a few benefits:

If someone asks "why should

... (read more)

The Sun is big, but superintelligences will not spare Earth a little sunlight

Richard_Ngo6mo108

Well, the whole point of national parks is that they're always going to be unproductive because you can't do stuff in them.

If you mean in terms of extracting raw resources, maybe (though presumably a bunch of mining/logging etc in national parks could be pretty valuable) but either way it doesn't matter because the vast majority of economic productivity you could get from them (e.g. by building cities) is banned.

2Nathan Young6mo

Yeah aren't a load of national parks near large US conurbations and hence the opportunity cost in world terms is significant.

The Sun is big, but superintelligences will not spare Earth a little sunlight

Richard_Ngo6mo4625

Nothing makes humans all that special

This is just false. Humans are at the very least privileged in our role as biological bootloaders of AI. The emergence of written culture, industrial technology, and so on, are incredibly special from a historical perspective.

You only set aside occasional low-value fragments for national parks, mostly for your own pleasure and convenience, when it didn't cost too much?

Earth as a proportion of the solar system's planetary mass is probably comparable to national parks as a proportion of the Earth's land, if not lower.

1hunterglenn6mo

I also wonder if, compared to some imaginary baseline, modern humans are unusual in the greatness of their intellectual power and understanding and the less impressive magnitude of its development in other ways. Maybe a lot of our problems flow from being too smart in that sense, but I believe that our best hope is still not to fear our problematic intelligence, but rather to lean into it as a powerful tool for figuring out what to do from here. If another imaginary species could get along by just instinctively being harmonious, humans might require a persuasive argument. But if you can actually articulate the truth of the even-selfish-superiority of harmony (especially right now), then maybe our species can do the right thing out of understanding rather than instinct. And maybe that means we're capable of unusually fast turnarounds as a species. Once we articulate the thing intelligently enough, it's highly mass-scalable

2Jemal Young6mo

Maybe I've misunderstood your point, but if it's that humanity's willingness to preserve a fraction of Earth for national parks is a reason for hopefulness that ASI may be willing to preserve an even smaller fraction of the solar system (namely, Earth) for humanity, I think this is addressed here: "research purposes" involving simulations can be a stand-in for any preference-oriented activity. Unless ASI would have a preference for letting us, in particular, do what we want with some fraction of available resources, no fraction of available resources would be better left in our hands than put to good use.

Eli Tyre6mo112

Earth as a proportion of the solar system's planetary mass is probably comparable to national parks as a proportion of the Earth's land, if not lower.

Yeah, but not if we weight that land by economic productivity, I think.