Sometimes, I say some variant of “yeah, probably some people will need to do a pivotal act” and people raise the objection: “Should a small subset of humanity really get so much control over the fate of the future?”
(Sometimes, I hear the same objection to the idea of trying to build aligned AGI at all.)
I’d first like to say that, yes, it would be great if society had the ball on this. In an ideal world, there would be some healthy and competent worldwide collaboration steering the transition to AGI.[1]
Since we don’t have that, it falls to whoever happens to find themselves at ground zero to prevent an existential catastrophe.
A second thing I want to say is that design-by-committee… would not exactly go well in practice, judging by how well committee-driven institutions function today.
Third, though, I agree that it’s morally imperative that a small subset of humanity not directly decide how the future goes. So if we are in the situation where a small subset of humanity will be forced at some future date to flip the gameboard — as I believe we are, if we’re to survive the AGI transition — then AGI developers need to think about how to do that without unduly determining the shape of the future.
The goal should be to cause the future to be great on its own terms, without locking in the particular moral opinions of humanity today — and without locking in the moral opinions of any subset of humans, whether that’s a corporation, a government, or a nation.
(If you can't see why a single modern society locking in their current values would be a tragedy of enormous proportions, imagine an ancient civilization such as the Romans locking in their specific morals 2000 years ago. Moral progress is real, and important.)
But the way to cause the future to be great “on its own terms” isn’t to do nothing and let the world get destroyed. It’s to intentionally not leave your fingerprints on the future, while acting to protect it.
You have to stabilize the landscape / make it so that we’re not all about to destroy ourselves with AGI tech; and then you have to somehow pass the question of how to shape the universe back to some healthy process that allows for moral growth and civilizational maturation and so on, without locking in any of humanity’s current screw-ups for all eternity.
Unfortunately, the current frontier for alignment research is “can we figure out how to point AGI at anything?”. By far the most likely outcome is that we screw up alignment and destroy ourselves.
If we do solve alignment and survive this great transition, then I feel pretty good about our prospects for figuring out a good process to hand the future to. Some reasons for that:
- Human science has a good track record for solving difficult-seeming problems; and if there’s no risk of anyone destroying the world with AGI tomorrow, humanity can take its time and do as much science, analysis, and weighing of options as needed before it commits to anything.
- Alignment researchers have already spent a lot of time thinking about how to pass that buck, and make sure that the future goes great and doesn’t have our fingerprints on it, and even this small group of people have made real progress, and the problem doesn't seem that tricky. (Because there are so many good ways to approach this carefully and indirectly.)
- Solving alignment well enough to end the acute risk period without killing everyone implies that you’ve cleared a very high competence bar, as well as a sanity bar that not many clear today. Willingness and ability to diffuse moral hazard is correlated with willingness and ability to save the world.
- Most people would do worse on their own merits if they locked in their current morals, and would prefer to leave space for moral growth and civilizational maturation. The property of realizing that you want to (or would on reflection want to) diffuse the moral hazard is also correlated with willingness and ability to save the world.
- Furthermore, the fact that — as far as I know — all the serious alignment researchers are actively trying to figure out how to avoid leaving their fingerprints on the future, seems like a good sign to me. You could find a way to be cynical about these observations, but these are not the observations that the cynical hypothesis would predict ab initio.
This is a set of researchers that generally takes egalitarianism, non-nationalism, concern for future minds, non-carbon-chauvinism, and moral humility for granted, as obvious points of background agreement; the debates are held at a higher level than that.
This is a set of researchers that regularly talk about how, if you’re doing your job correctly, then it shouldn’t matter who does the job, because there should be a path-independent attractor-well that isn't about making one person dictator-for-life or tiling a particular flag across the universe forever.
I’m deliberately not talking about slightly-more-contentful plans like coherent extrapolated volition here, because in my experience a decent number of people have a hard time parsing the indirect buck-passing plans as something more interesting than just another competing political opinion about how the future should go. (“It was already blues vs. reds vs. oranges, and now you’re adding a fourth faction which I suppose is some weird technologist green.”)
I’d say: Imagine that some small group of people were given the power (and thus responsibility) to steer the future in some big way. And ask what they should do with it. Ask how they possibly could wield that power in a way that wouldn’t be deeply tragic, and that would realistically work (in the way that “immediately lock in every aspect of the future via a binding humanity-wide popular vote” would not).
I expect that the best attempts to carry out this exercise will involve re-inventing some ideas that Bostrom and Yudkowsky invented decades ago. Regardless, though, I think the future will go better if a lot more conversations occur in which people take a serious stab at answering that question.
The situation humanity finds itself in (on my model) poses an enormous moral hazard.
But I don’t conclude from this “nobody should do anything”, because then the world ends ignominiously. And I don’t conclude from this “so let’s optimize the future to be exactly what Nate personally wants”, because I’m not a supervillain.[2]
The existence of the moral hazard doesn’t have to mean that you throw up your hands, or imagine your way into a world where the hazard doesn’t exist. You can instead try to come up with a plan that directly addresses the moral hazard — try to solve the indirect and abstract problem of “defuse the moral hazard by passing the buck to the right decision process / meta-decision-process”, rather than trying to directly determine what the long-term future ought to look like.
Rather than just giving up in the face of difficulty, researchers have the ability to see the moral hazard with their own eyes and ensure that civilization gets to mature anyway, despite the unfortunate fact that humanity, in its youth, had to steer past a hazard like this at all.
Crippling our progress in its infancy is a completely unforced error. Some of the implementation details may be tricky, but much of the problem can be solved simply by choosing not to rush a solution once the acute existential risk period is over, and by choosing to end the acute existential risk period (and its associated time pressure) before making any lasting decisions about the future.[3]
(Context: I wrote this with significant editing help from Rob Bensinger. It’s an argument I’ve found myself making a lot in recent conversations.)
- ^
Note that I endorse work on more realistic efforts to improve coordination and make the world’s response to AGI more sane. “Have all potentially-AGI-relevant work occur under a unified global project” isn’t attainable, but more modest coordination efforts may well succeed.
- ^
And I’m not stupid enough to lock in present-day values at the expense of moral progress, or stupid enough to toss coordination out the window in the middle of a catastrophic emergency with human existence at stake, etc.
My personal CEV cares about fairness, human potential, moral progress, and humanity’s ability to choose its own future, rather than having a future imposed on them by a dictator. I'd guess that the difference between "we run CEV on Nate personally" and "we run CEV on humanity writ large" is nothing (e.g., because Nate-CEV decides to run humanity's CEV), and if it's not nothing then it's probably minor.
- ^
See also Toby Ord’s The Precipice, and its discussion of “the long reflection”. (Though, to be clear, a short reflection is better than a long reflection, if a short reflection suffices. The point is not to delay for its own sake, and the amount of sidereal time required may be quite short if a lot of the cognitive work is being done by uploaded humans and/or aligned AI systems.)
tl;dr: I take meta-ethics, like psychology and economics ~200 years ago, to be asking questions we don't really have the tools or know-how to answer. And even if we did, there is just a lot of work to be done (e.g. solving meta-semantics, which no doubt involves solving language acquisition. Or e.g. doing some sort of evolutionary anthropology of moral language). And there are few to do the work, with little funding.
Long answer: I take one of philosophy's key contributions to the (more empirical) sciences to be the highlighting of new or ignored questions, conceptual field clearing, the laying out of non-circular pathways in the theoretical landscape, the placing of landmarks at key choice points. But they are not typically the ones with the tools to answer those questions or make the appropriate theoretical choices informed by finer data. Basically, philosophy generates new fields and gets them to a pre-paradigmatic stage: witness e.g. Aristotle on physics, biology, economics etc.; J. S. Mill and Kant on psychology; Yudkowsky and Bostrom on AI safety; and so on. Give me enough time and I can trace just about every scientific field to its origins in what can only be described as philosophical texts. Once developed to that stage, putatively philosophical methods (conceptual analysis, reasoning by analogy, logical argument, postulation and theorizing, sporadic reference to what coarse data is available) won't get things much further – progress slows to a crawl or authors might even start going in circles until the empirical tools, methods, interest and culture are available to take things further.
(That's the simplified, 20-20 hindsight view with a mature philosophy and methodology of science in hand: for much of history, figuring out how to "take things further" was just as contested and confused as anything else, and was only furthered through what was ex ante just more philosophy. Newton was a rival of Descartes and Leibniz: his Principia was a work of philosophy in its time. Only later did we start calling it a work of physics, as pertaining to a field of its own. Likewise with Leibniz and Descartes' contributions to physics.)
Re: meta-ethics, I don't think it's going in circles yet, but do recognize the rate at which it has produced new ideas (found genuinely new choice points) has slowed down. It's still doing much work in collapsing false choice points though (and this seems healthy: it should over-generate and then cut down).
One thing it has completely failed to do is sell the project to the rest of the scientific community (hence why I write). But it's also tough sell. There are various sociological obstacles at work here:
There are also methodological obstacles: the relevant data is just hard to collect; the number of confounding variables, myriad; the dimensionality of the systems involved, incredibly high! Compare, for example, with macroeconomics: natural experiments are extremely few and far between, and even then confounding variables abound; the timescales of the phenomena of interest (e.g. sustained recessions vs sustained growth periods) are very long, and as such we have very little data – there've only been a handful of such periods since record keeping began. We barely understand/can predict macro-econ any better than we did 100 years ago, and it's not for a lack of brilliance, rigor or funding.
In the sense that I take you to be using "science" (forming a narrow hypothesis, carefully collecting pertinent data, making pretty graphs with error bars) neither of them are probably doing it well.[1] But we shouldn't really expect them to? Like, that's not what the discipline is good for.
I'd bet they liberally employ the usual theoretical desiderata (explanatory power, ontological parsimony, theoretical conservatism) to argue for their view, but they probably only make cursory reference to empirical studies. And until they are do refer to more empirical work, they won't converge on an answer (or improve our predictions, if you prefer). But, again, I don't expect them to, since I think most of the pertinent empirical work is yet to be done.
I'm not surprised you find this cheeky, but just FYI I was dead serious: that's pretty much literally what I and many think is possibly the case.
So this is very interesting to me, and I think I agree with you on some points here, but that you're missing others. But first I need to understand what you mean by "natural sparsity" and what your (very very rough) story is of how our words get their referents. I take it you're drawing on ML concepts and explanations, and it sounds like a story some philosophers tell, but I'm not familiar with the lingo and want to understand this better. Please tell me more. Related: would you say that we know more about water than our 1700s counterparts, or would you just say "water" today refers to something different than what it referred to in the 1700s? In which case, what is it we've gained relative to them? More accurate predictions regarding... what?
Thanks, yep, I'm not sure. Whether or not there is an attractor (and how that attraction is supposed to work) seems like the major crux – certainly in our case!
One thing I want to defend and clarify: someone the other day objected that philosophers are overly confident in their proposals, overly married to them. I think I would agree in some sense, since I think their work is often in doing pre-paradigmatic work: they often jump the gun and declare victory, take philosophizing to be enough to settle a matter. Accordingly, I need to correct the following:
I should have said the field as whole is not married to any particular theory. But I'm not sure having individual researchers try so hard to develop and defend particular views is so perverse. Seems pretty normal that in trying to advance theory, individual theorists heavily favor one or another theory – the one they are curious about, want to develop, make robust and take to its limit. One shouldn't necessarily look to one particular frontier physicist to form your best guess about their frontier – instead one should survey the various theories being advanced/developed in the area.