One of the main ways I think about empowerment is in terms of allowing better coordination between subagents.
In the case of an individual human, extreme morality can be seen as one subagent seizing control and overriding other subagents (like the ones who don't want to chop off body parts).
In the case of a group, extreme morality can be seen in terms of preference cascades that go beyond what most (or even any) of the individuals involved with them would individually prefer.
In both cases, replacing fear-based motivation with less coercive/more cooperative ...
In response to an email about what a pro-human ideology for the future looks like, I wrote up the following:
The pro-human egregore I'm currently designing (which I call fractal empowerment) incorporates three key ideas:
Firstly, we can see virtue ethics as a way for less powerful agents to aggregate to form more powerful superagents that preserve the interests of those original less powerful agents. E.g. virtues like integrity, loyalty, etc help prevent divide-and-conquer strategies. This would have been in the interests of the rest of the world w...
How would this ideology address value drift? I've been thinking a lot about the kind quoted in Morality is Scary. The way I would describe it now is that human morality is by default driven by a competitive status/signaling game, where often some random or historically contingent aspect of human value or motivation becomes the focal point of the game, and gets magnified/upweighted as a result of competitive dynamics, sometimes to an extreme, even absurd degree.
(Of course from the inside it doesn't look absurd, but instead feels like moral progress. One exa...
In my post on value systematization I used utilitarianism as a central example of value systematization.
Value systematization is important because it's a process by which a small number of goals end up shaping a huge amount of behavior. But there's another different way in which this happens: core emotional motivations formed during childhood (e.g. fear of death) often drive a huge amount of our behavior, in ways that are hard for us to notice.
Fear of death and utilitarianism are very different. The former is very visceral and deep-rooted; it typically inf...
I've now edited that section. Old version and new version here for posterity.
Old version:
None of these is very satisfactory! Intuitively speaking, Alice and Bob want to come to an agreement where respect for both of their interests is built in. For example, they might want the EUM they form to value fairness between their two original sets of interests. But adding this new value is not possible if they’re limited to weighted averages. The best they can do is to agree on a probabilistic mixture of EUMs—e.g. tossing a coin to decide between option 1 and opti...
I was a bit lazy in how I phrased this. I agree with all your points; the thing I'm trying to get at is that this approach falls apart quickly if we make the bargaining even slightly less idealized. E.g. your suggestion "Form an EUM which is totally indifferent about the cake allocation between them and thus gives 100% of the cake to whichever agent is cheaper/easier to provide cake for":
On a meta level, I have a narrative that goes something like: LessWrong tried to be truth-seeking, but was scared of discussing the culture war, so blocked that off. But then the culture war ate the world, and various harms have come about from not having thought clearly about that (e.g. AI governance being a default left-wing enterprise that tried to make common cause with AI ethics). Now cancel culture is over and there are very few political risks to thinking about culture wars, but people are still scared to. (You can see Scott gradually dipping his to...
I read your comment as conflating 'talking about the culture war at all' and 'agreeing with / invoking Curtis Yarvin', which also conflates 'criticizing Yarvin' with 'silencing discussion of the culture war'.
This reinforces a false binary between totally mind-killed wokists and people (like Yarvin) who just literally believe that some folks deserve to suffer, because it's their genetic destiny.
This kind of tribalism is exactly what fuels the culture war, and not what successfully sidesteps, diffuses, or rectifies it. NRx, like the Cathedral, is a min...
Thanks for the well-written and good-faith reply. I feel a bit confused by how to relate to it on a meta level, so let me think out loud for a while.
I'm not surprised that I'm reinventing a bunch of ideas from the humanities, given that I don't have much of a humanities background and didn't dig very far through the literature.
But I have some sense that even if I had dug for these humanities concepts, they wouldn't give me what I want.
What do I want?
My meta- practical suggestion is to ask AIs with prompts like notice where the ideas or arguments matches existing ideas from humanities, using different language. Ideally point to references to such sources. Often you will find people who came up with somewhat similar models or observations. Also while people may be hard to reach or dead, and engaging with long books is costly, in my experience even their simulacra can provide useful feedback, come up with ideas, point to what you miss.
Another meta- idea is it seems good to notice the skulls. My suspicion...
I have thought about this on and off for several years and finally decided that you're right and have changed it. Thanks for pushing on this.
Nice, that's almost exactly how I intended it. Except that I wasn't thinking of the "stars" as satellites looking for individual humans to send propaganda at (which IMO is pretty close to "communicating"), but rather a network of satellites forming a single "screen" across the sky that plays a video infecting any baseline humans who look at it.
In my headcanon the original negotiators specified that sunlight would still reach the earth unimpeded, but didn't specify that no AI satellites would be visible from the Earth. I don't have headcanon explanations fo...
in general I think people should explain stuff like this. "I might as well not help" is a very weak argument compared with the benefits of people understanding the world better.
Oh, I see what you mean now. In that case, no, I disagree. Right now this notion of robustness is pre-theoretic. I suspect that we can characterize robustness as "acting like a belief/goal agent" in the limit, but part of my point is that we don't even know what it means to act "approximately like belief/goal agents" in realistic regimes, because e.g. belief/goal agents as we currently characterize them can't learn new concepts.
Relatedly, see the dialogue in this post.
I appreciated this comment! Especially:
dude, how the hell do you come up with this stuff.
This quote from my comment above addresses this:
And so I'd say that AIXI-like agents and coalitional agents converge in the limit of optimality, but before that coalitional agency will be a much better framework for understanding realistic agents, including significantly superhuman agents.
Thank you Cole for the comment! Some quick thoughts in response (though I've skipped commenting on the biology examples and the ML examples since I think our intuitions here are a bit too different to usefully resolve via text):
...Ngo’s view seems to be that after some level of decomposition, the recursion bottoms out at agents that can be seen as expected utility maximizers (though it’s not totally clear to me where this point occurs on his model, he seems to think that these irreducible agents are more like rough heuristics than sophisticated planners, so t
I found this tricky to parse because of two phrasing issues:
Something that's fascinating about this art of yours is that I can't tell if you're coherently in favor of this, or purposefully invoking thinking errors in the audience, or just riffing, or what.
Thanks for the fascinating comment.
I am a romantic in the sense that I believe that you can achieve arbitrary large gains from symbiosis if you're careful and skillful enough.
Right now very few people are careful and skillful enough. Part of what I'm trying to convey with this story is what it looks like for AI to provide most of the requisite skill.
Another way of...
FWIW I think of "OpenAI leadership being untrustworthy" (a significant factor in me leaving) as different from "OpenAI having bad safety policies" (not a significant factor in me leaving). Not sure if it matters, I expect that Scott was using "safety policies" more expansively than I do. But just for the sake of clarity:
I am generally pretty sympathetic to the idea that it's really hard to know what safety policies to put in place right now. Many policies pushed by safety people (including me, in the past) have been mostly kayfabe (e.g. being valuable as c...
Oh huh, I had the opposite impression from when I published Tinker with you. Thanks for clarifying!
Ty! You're right about the Asimov deal, though I do have some leeway. But I think the opening of this story is a little slow, so I'm not very excited about that being the only thing people see by default.
Unrelatedly, my last story is the only one of my stories that was left as a personal blog post (aside from the one about parties). Change of policy or oversight?
Ah, glad to hear the effort was noticeable. I do think that as I get more practice at being descriptive, concreteness will become easier for me (my brain just doesn't work that way by default). And anyone reading this comment is welcome to leave me feedback about places in my stories where I should have been more concrete.
But I'm also pivoting away from stories in general right now, there's too much other stuff I want to spend time on. I have half a dozen other stories for which I've already finished first drafts, so I'll probably gradually release those i...
I wrote most of it a little over a year ago. In general I don't plot out stories, I just start writing them and see what happens. But since I was inspired by The Gentle Seduction I already had a broad idea of where it was going.
I then sent a draft to some friends for feedback. One friend left about 50 comments in places where I'd been too abstract or given a vague description, with each comment literally just saying "like what?"
This was extremely valuable feedback but almost broke my will to finish the story. It took me about a year to work through most of...
The Minority Faction
I'm not sure what the details would look like, but I'm pretty sure ASI would have enough new technologies to figure something out within 10,000 years.
I feel like this is the main load-bearing claim underlying the post, but it's barely argued for.
In some sense the sun is already "eating itself" by doing a fusion reaction, which will last for billions more years. So you're claiming that AI could eat the sun (at least) six orders of magnitude faster, which is not obvious to me.
I don't think my priors on that are very different from yours but the thing that would have made this post valuable for me is some object-level reason to upgrade my confidence in that.
FWIW twitter search is ridiculously bad, it's often better to use google instead. In this case I had it as the second result when I googled "richardmcngo twitter safety fundamentals" (richardmcngo being my twitter handle).
Yepp, though note that this still feels in tension with the original post to me - I expect to find a clean, elegant replacement to VNM, not just a set of approximately-equally-compelling alternatives.
Why? Partly because of inside views which I can’t explain in brief. But mainly because that’s how conceptual progress works in general. There is basically always far more hidden beauty and order in the universe than people are able to conceive (because conceiving of it is nearly as hard as discovering it - like, before Darwin, people wouldn’t have been able to...
As a quick note: the auto-generated glossary for this story is pretty cool (though it predictably contains spoilers).
Because I might fund them or forward it to someone else who will.
In general people should feel free to DM me with pitches for this sort of thing.
I think this epistemic uncertainty is distinct from the type of "objective probabilities" I talk about in my post, and I don't really know how to use language without referring to degrees of my epistemic uncertainty.
The part I was gesturing at wasn't the "probably" but the "low measure" part.
Is your position that the problem is deeper than this, and there is no objective prior over worlds, it's just a thing like ethics that we choose for ourselves, and then later can bargain and trade with other beings who have a different prior of realness?
Yes, that...
Hmmm, uncertain if we disagree. You keep saying that these concepts are cursed and yet phrasing your claims in terms of them anyway (e.g. "probably very low measure"), which suggests that there's some aspect of my response you don't fully believe.
In particular, in order for your definition of "what beings are sufficiently similar to you" to not be cursed, you have to be making claims not just about the beings themselves (since many Boltzmann brains are identical to your brain) but rather about the universes that they're in. But this is kinda what I mean by...
I don't think this line of argument is a good one. If there's a 5% chance of x-risk and, say, a 50% chance that AGI makes the world just generally be very chaotic and high-stakes over the next few decades, then it seems very plausible that you should mostly be optimizing for making the 50% go well rather than the 5%.
Still consistent with great concern. I'm pointing out that O O's point isn't locally valid, observing concern shouldn't translate into observing belief that alignment is impossible.
Worse than the current situation, because the counterfactual is that some later project happens which kicks off in a less race-y manner.
In other words, whatever the chance of its motivation shifting over time, it seems dominated by the chance that starting the equivalent project later would just have better motivations from the outset.
Great post. One slightly nitpicky point, though: even in the section where you argue that probabilities are cursed, you are still talking in the language of probabilities (e.g. "my modal guess is that I'm in a solipsist simulation that is a fork of a bigger simulation").
I think there's probably a deeper ontological shift you can do to a mindset where there's no actual ground truth about "where you are". I think in order to do that you probably need to also go beyond "expected utilities are real", because expected utilities need to be calculated by assignin...
Cool, ty for (characteristically) thoughtful engagement.
I am still intuitively skeptical about a bunch of your numbers but now it's the sort of feeling which I would also have if you were just reasoning more clearly than me about this stuff (that is, people who reason more clearly tend to be able to notice ways that interventions could be surprisingly high-leverage in confusing domains).
Ty for the link but these seem like both clearly bad semantics (e.g. under either of these the second-best hypothesis under consideration might score arbitrarily badly).
Just changed the name to The Minority Coalition.
1. Yepp, seems reasonable. Though FYI I think of this less as some special meta argument, and more as the common-sense correction that almost everyone implicitly does when giving credences, and rationalists do less than most. (It's a step towards applying outside view, though not fully "outside view".)
2. Yepp, agreed, though I think the common-sense connotations of "if this became" or "this would have a big effect" are causal, especially in the context where we're talking to the actors who are involved in making that change. (E.g. the non-causal interpreta...
Good point re 2. Re 1, meh, still seems like a meta-argument to me, because when I roll out my mental simulations of the ways the future could go, it really does seem like my If... condition obtaining would cut out about half of the loss-of-control ones.
Re 3: point by point:
1. AISIs existing vs. not: Less important; I feel like this changes my p(doom) by more like 10-20% rather than 50%.
2. Big names coming out: idk this also feels like maybe 10-20% rather than 50%
3. I think Anthropic winning the race would be a 40% thing maybe, but being a runner-up doesn'...
We have discussed this dynamic before but just for the record:
I think that if it became industry-standard practice for AGI corporations to write, publish, and regularly update (actual instead of just hypothetical) safety cases at at this level of rigor and detail, my p(doom) would cut in half.
This is IMO not the type of change that should be able to cut someone's P(doom) in half. There are so many different factors that are of this size and importance or bigger (including many that people simply have not thought of yet) such that, if this change could halv...
Sorry! My response:
1. Yeah you might be right about this, maybe I should get less excited and say something like "it feels like it should cut in half but taking into account Richard's meta argument I should adjust downwards and maybe it's just a couple percentage points"
2. If the conditional obtains, that's also evidence about a bunch of other correlated good things though (timelines being slightly longer, people being somewhat more reasonable in general, etc.) so maybe it is legit to think this would have quite a big effect
3. Are you sure there are so man...
The former can be sufficient—e.g. there are good theoretical researchers who have never done empirical work themselves.
In hindsight I think "close conjunction" was too strong—it's more about picking up the ontologies and key insights from empirical work, which can be possible without following it very closely.
I think there's something importantly true about your comment, but let me start with the ways I disagree. Firstly, the more ways in which you're power-seeking, the more defense mechanisms will apply to you. Conversely, if you're credibly trying to do a pretty narrow and widely-accepted thing, then there will be less backlash. So Jane Street is power-seeking in the sense of trying to earn money, but they don't have much of a cultural or political agenda, they're not trying to mobilize a wider movement, and earning money is a very normal thing for companies ...
The bits are not very meaningful in isolation; the claim "program-bit number 37 is a 1" has almost no meaning in the absence of further information about the other program bits. However, this isn't much of an issue for the formalism.
In my post I defend the use of propositions as a way to understand models, and attack the use of propositions as a way to understand reality. You can think of this as a two-level structure: claims about models can be crisp and precise enough that it makes sense to talk about them in propositional terms, but for complex bits of ...
The minority faction is the group of entities that are currently alive, as opposed to the vast number of entities that will exist in the future. I.e. the one Clarke talks about when he says "why won’t you help the rest of us form a coalition against them?"
In hindsight I should probably have called it The Minority Coalition.
...Here's how that would be handled by a Bayesian mind:
- There's some latent variable representing the semantics of "humanity will be extinct in 100 years"; call that variable S for semantics.
- Lots of things can provide evidence about S. The sentence itself, context of the conversation, whatever my friend says about their intent, etc, etc.
- ... and yet it is totally allowed, by the math of Bayesian agents, for that variable S to still have some uncertainty in it even after conditioning on the sentence itself and the entire low-level physical state of my friend, or
"Dragons are attacking Paris!" seems true by your reasoning, since there are no dragons, and therefore it is vacuously true that all of them are attacking Paris.
Ty for the comment. I mostly disagree with it. Here's my attempt to restate the thrust of your argument:
The issues with binary truth-values raised in the post are all basically getting at the idea that the meaning of a proposition is context-dependent. But we can model context-dependence in a Bayesian way by referring to latent variables in the speaker's model of the world. Therefore we don't need fuzzy truth-values.
But this assumes that, given the speaker's probabilistic model, truth-values are binary. I don't see why this needs to be the case. Here's an ...
But this assumes that, given the speaker's probabilistic model, truth-values are binary.
In some sense yes, but there is totally allowed to be irreducible uncertainty in the latents - i.e. given both the model and complete knowledge of everything in the physical world, there can still be uncertainty in the latents. And those latents can still be meaningful and predictively powerful. I think that sort of uncertainty does the sort of thing you're trying to achieve by introducing fuzzy truth values, without having to leave a Bayesian framework.
Let's look at th...
Suppose you have two models of the earth; one is a sphere, one is an ellipsoid. Both are wrong, but they're wrong in different ways. Now, we can operationalize a bunch of different implications of these hypotheses, but most of the time in science the main point of operationalizing the implications is not to choose between two existing models, or because we care directly about the operationalizations, but rather to come up with a new model that combines their benefits.
IMO all of the "smooth/sharp" and "soft/hard" stuff is too abstract. When I concretely picture what the differences between them are, the aspect that stands out most is whether the takeoff will be concentrated within a single AI/project/company/country or distributed across many AIs/projects/companies/countries.
This is of course closely related to debates about slow/fast takeoff (as well as to the original Hanson/Yudkowsky debates). But using this distinction instead of any version of the slow/fast distinction has a few benefits:
Well, the whole point of national parks is that they're always going to be unproductive because you can't do stuff in them.
If you mean in terms of extracting raw resources, maybe (though presumably a bunch of mining/logging etc in national parks could be pretty valuable) but either way it doesn't matter because the vast majority of economic productivity you could get from them (e.g. by building cities) is banned.
Nothing makes humans all that special
This is just false. Humans are at the very least privileged in our role as biological bootloaders of AI. The emergence of written culture, industrial technology, and so on, are incredibly special from a historical perspective.
You only set aside occasional low-value fragments for national parks, mostly for your own pleasure and convenience, when it didn't cost too much?
Earth as a proportion of the solar system's planetary mass is probably comparable to national parks as a proportion of the Earth's land, if not lower.
Earth as a proportion of the solar system's planetary mass is probably comparable to national parks as a proportion of the Earth's land, if not lower.
Yeah, but not if we weight that land by economic productivity, I think.
We disagree on which explanation is more straightforward, but regardless, that type of inference is very different from "literal written evidence".