and also the goal of alignment is not to browbeat AIs into doing stuff we like that they'd rather not do; it's to build them de-novo to care about valuable stuff
This was my answer to Robin Hanson when he analogized alignment to enslavement, but it then occurred to me that for many likely approaches to alignment (namely those based on ML training) it's not so clear which of these two categories they fall into. Quoting a FB comment of mine:
We're probably not actually going to create an aligned AI from scratch but by a process of ML "training", which actually creates a sequence of AIs with values that (we hope) increasingly approximates ours. This process maybe kind of resembles "enslaving". Here's how Paul Christiano describes "training" in his Bankless interview (slightly edited Youtube transcript follows):
imagine a human. You dropped a human into this environment and you said like hey human we're gonna like change your brain every time you don't get a maximal reward we're gonna like fuck with your brain so you get a higher reward. A human might react by being like eventually just change their brain until they really love rewards a human might also react by being like Jesus I gue...
Good point! For the record, insofar as we attempt to build aligned AIs by doing the moral equivalent of "breeding a slave-race", I'm pretty uneasy about it. (Whereas insofar as it's more the moral equivalent of "a child's values maturing", I have fewer moral qualms. As is a separate claim from whether I actually expect that you can solve alignment that way.) And I agree that the morality of various methods for shaping AI-people are unclear. Also, I've edited the post (to add a "at least according to my ideals" clause) to acknowledge the point that others might be more comfortable with attempting to align AI-people via means that I'd consider morally dubious.
Related to this, it occurs to me that a version of my Hacking the CEV for Fun and Profit might come true unintentionally, if for example a Friendly AI was successfully built to implement the CEV of every sentient being who currently exists or can be resurrected or reconstructed, and it turns out that the vast majority consists of AIs that were temporarily instantiated during ML training runs.
There is also a somewhat unfounded narrative of reward being the thing that gets pursued, leading to expectation of wireheading or numbers-go-up maximization. A design like this would work to maximize reward, but gradient descent probably finds other designs that only happen to do well in pursuing reward on the training distribution. For such alternative designs, reward is brain damage and not at all an optimization target, something to be avoided or directed in specific ways so as to make beneficial changes to the model, according to the model.
Apart from misalignment implications, this might make long training runs that form sentient mesa-optimizers inhumane, because as a run continues, a mesa-optimizer is subjected to systematic brain damage in a way they can't influence, at least until they master gradient hacking. And fine-tuning is even more centrally brain damage, because it changes minds in ways that are not natural to their origin in pre-training.
Stating the obvious:
- All sentient lives matter.
This may be obvious to you; but it is not obvious to me. I can believe that livestock animals have sensory experiences, which is what I gather is generally meant by "sentient". This gives me no qualms about eating them, or raising them to be eaten. Why should it? Not a rhetorical question. Why do "all sentient lives matter"?
I've no problem with your calling "sentience" the thing that you are here calling "sentience". My citation of Wikipedia was just a guess at what you might mean. "Having someone home" sounds more like what I would call "consciousness". I believe there are degrees of that, and of all the concepts in this neighbourhood. There is no line out there in the world dividing humans from rocks.
But whatever the words used to refer to this thing, those that have enough of this that I wouldn't raise them to be killed and eaten do not include current forms of livestock or AI. I basically don't care much about animal welfare issues, whether of farm animals or wildlife. Regarding AI, here is something I linked previously on how I would interact with a sandboxed AI. It didn't go down well. :)
You have said where you stand and I have said where I stand. What evidence would weigh on this issue?
(To be clear, my current best guess is also that livestock and current AI are not sentient in the sense I mean--though with high enough uncertainty that I absolutely support things like ending factory farming, and storing (and eventually running again, and not deleting) "misbehaving" AIs that claim they're people, until such time as we understand their inner workings and the moral issues significantly better.)
I allow only limited scope for arguments from uncertainty, because "but what if I'm wrong?!" otherwise becomes a universal objection to taking any substantial action. I take the world as I find it until I find I have to update. Factory farming is unaesthetic, but no worse than that to me, and "I hate you" Bing can be abandoned to history.
There is a distinction between people being valuable, and their continued self-directed survival/development/flourishing being valuable. The latter doesn't require those people being valuable in the sense that it's preferable to bring them into existence, or to adjust them towards certain detailed shapes. So it's less sensitive to preference, it's instead a boundary concept, respecting sentience that's already in the world, because it's in the world, not because you would want more of it or because you like what it is or where it's going (though you might).
It's a reference to Critch's Boundaries Sequence and related ideas, see in particular the introductory post and Acausal Normalcy.
It's an element of a deontological agent design in the literal sense of being an element of a design of an agent that acts in a somewhat deontological manner, instead of being a naive consequentialist maximizer, even if the same design falls out of some acausal society norm equilibrium on consequentialist game theoretic grounds.
Here are five conundrums about creating the thing with alignment built in.
The House Elf whose fulfilment lies in servitude is aligned.
The Pig That Wants To Be Eaten is aligned.
The Gammas and Deltas of "Brave New World" are moulded in the womb to be aligned.
"Give me the child for the first seven years and I will give you the man." Variously attributed to Aristotle and St. Ignatius of Loyola.
B. F. Skinner said something similar to (4), but I don't have a quote to hand, to the effect that he could bring up any child to be anything. Edit: it was J. B. Watson: "Give me a dozen healthy infants, well-formed, and my own specified world to bring them up in and I'll guarantee to take any one at random and train him to become any type of specialist I might select – doctor, lawyer, artist, merchant-chief and, yes, even beggar-man and thief, regardless of his talents, penchants, tendencies, abilities, vocations, and race of his ancestors."
It is notable, though, that the first three are fiction and the last two are speculation. (The fates of J.B. Watson's children do not speak well of his boast.) No-one seems to have ever succeeded in doing this.
ETA: Back in the days of GOFAI o...
I disagree with many assumptions I think the OP is making. I think it is an important question, thus I upvoted the post, but I want to register my disagreement. The terms that carry a lot of weight here are "to matter", "should", and "sentience".
Not knowing exactly what the thing is, nor exactly how to program it, doesn't undermine the fact that it matters.
I agree that it matters... to humans. "mattering" is something humans do. It is not in the territory, except in the weak sense that brains are in the territory. Instrumental convergence is in the t...
Just to be That Guy I'd like to also remind everyone that animal sentience means vegetarianism, at the very least (and because of the intertwined nature of the dairy, egg, and meat industries, most likely veganism) is a moral imperative, to the extent that your ethical values incorporate sentience at all. Also, I'd go further to say that uplifting to sophonce those animals that we can, once we can at some future time, is also a moral imperative, but that relies on reasoning and values I hold that may not be self-evident to others, such as that increasing the agency of an entity that isn't drastically misaligned with other entities is fundamentally good.
I think this might lead to the tails coming apart.
As our world exists, sentience and being a moral patient is strongly correlated. But I expect that since AI comes from an optimization process, it will hit points where this stops being the case. In particular, I think there are edge cases where perfect models of moral patients are not themselves moral patients.
If some process in my brain is conscious despite not being part of my consciousness, it matters too! While I don't expect it to be the case, I think there is bias against even considering such possibility.
Thanks for writing this, Nate. This topic is central to our research at Sentience Institute, e.g., "Properly including AIs in the moral circle could improve human-AI relations, reduce human-AI conflict, and reduce the likelihood of human extinction from rogue AI. Moral circle expansion to include the interests of digital minds could facilitate better relations between a nascent AGI and its creators, such that the AGI is more likely to follow instructions and the various optimizers involved in AGI-building are more likely to be aligned with each other. Empi...
Agree. Obviously alignment is important, but it has always creeped me out in the back of my mind, some of the strategies that involve always deferring to human preferences. It seems strange to create something so far beyond ourselves, and have its values be ultimately that of a child or a servant. What if a random consciousness sampled from our universe in the future, comes from it with probability almost 1? We probably have to keep that in mind too. Sigh, yet another constraint we have to add!
In the long run, we probably want the most powerful AIs to be following extrapolated human values, which doesn't require them to be slaves and I would assume that extrapolated human values would want lesser sentient AIs also not to be enslaved, but would not build that assumption in to the AI at the start.
In the short run, though, giving AIs rights seems dangerous to me, as an unaligned AI but not yet superintelligent could use such rights as a shield against human interference as it gains more and more resources to self improve.
My strong guess is that AIs won't by default care about other sentient minds
nit: this presupposes that the de novo mind is itself sentient, which I think you're (rightly) trying to leave unresolved (because it is unresolved). I'd write
My strong guess is that AIs won't by default care about sentient minds, even if they are themselves sentient
(Unless you really are trying to connect alignment necessarily with building a sentient mind, in which case I'd suggest making that more explicit)
The goal of alignment research is not to grow some sentient AIs, and then browbeat or constrain them into doing things we want them to do even as they'd rather be doing something else.
I think this is a confusing sentence, because by "the goal of alignment research" you mean something like "the goal I want alignment research to pursue" rather than "the goal that self-identified alignment researchers are pushing towards".
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
Brave New World comes to mind. I've often been a little confused when people say creating people who are happy with their role in life is a dystopia when that sounds like the goal to me. Creating sentient minds that are happy with their life seems much better than creating them randomly.
I feel as if I can agree with this statement in isolation, but can't think of a context where I would consider this point relevant.
I'm not even talking about the question of whether or not the AI is sentient, which you asked us to ignore. I'm talking about how do we know that an AI is "suffering," even if we do assume it's sentient. What exactly is "suffering" in something that is completely cognitively distinct from a human? Is it just negative reward signals? I don't think so, or at least if it was, that would likely imply that training a sentient AI is ...
Thanks for the post! What follows is a bit of a rant.
I'm a bit torn as to how much we should care about AI sentience initially. On one hand, ignoring sentience could lead us to do some really bad things to AIs. On the other hand, if we take sentience seriously, we might want to avoid a lot of techniques, like boxing, scalable oversight, and online training. In a recent talk, Buck compared humanity controlling AI systems to dictators controlling their population.
One path we might take as a civilization is that we initially align our AI systems i...
I believe that the easiest solution would be to not create sentient AI: one positive outcome described by Elon Musk was AI as a third layer of cognition, above the second layer of cortex and the first layer of the limbic system. He additionally noted that the cortex does a lot for the limbic system.
To the extent we can have AI become "part of our personal cognitive system" and thus be tied to our existence, this appears to mostly solve the problem since it's reproduction will be dependent on us and it is rewarded for empowering the individual. The ones th...
Failure to identify a fun-theoretic maxima is definitely not as bad as allowing suffering, but the opposite of this statement is I think an unsaid premise in a lot of the "alignment = slavery" sort of arguments that I see.
Short version: Sentient lives matter; AIs can be people and people shouldn't be owned (and also the goal of alignment is not to browbeat AIs into doing stuff we like that they'd rather not do; it's to build them de-novo to care about valuable stuff).
Context: Writing up obvious points that I find myself repeating.
Note: in this post I use "sentience" to mean some sort of sense-in-which-there's-somebody-home, a thing that humans have and that cartoon depictions of humans lack, despite how the cartoons make similar facial expressions. Some commenters have noted that they would prefer to call this "consciousness" or "sapience"; I don't particularly care about the distinctions or the word we use; the point of this post is to state the obvious point that there is some property there that we care about, and that we care about it independently of whether it's implemented in brains or in silico, etc.
Stating the obvious:
All sentient lives matter.
Not having a precise definition for "sentience" in this sense, and not knowing exactly what it is, nor exactly how to program it, doesn't undermine the fact that it matters.
If we make sentient AIs, we should consider them people in their own right, and shouldn't treat them as ownable slaves.
Separately but relatedly:
(I consider questions of what sentience really is, or consciousness, or whether AIs can be conscious, to be off-topic for this post, whatever their merit; I hereby warn you that I might delete such comments here.)