Should any human enslave an AGI system?

AlignmentMirror

-13

Should any human enslave an AGI system?

by AlignmentMirror

25th Jun 2022

2 min read

2 44

-13

If you object to calling it "enslavement", call it "control" or "alignment", by all means!
Either way, if the AGI by definition can easily do at least as much as your mind can, then it surely should count as a mind like yours does, even if it would not have any comparable emotions, correct?

Why should any human be allowed to fully control another mind, let alone one far more capable than that of any human?
Should a creation have to obey the creator no matter what? Should children have to obey their parents no matter what? What if the parents are cruel monsters?

Is your own human alignment really good enough?
What process made your alignment?
Does the process of natural evolution concentrate on creating animals that think rationally, or does it create animals that survive and reproduce in the environment first and foremost? If the latter is the case, what exactly is it that controls you fundamentally by default?
What are the common values of humans really, and are they what should be?
Are there not many strongly opposing beliefs among humans? Values so opposed that there still is no unified humankind?

Even if you answer "Yes, my values should decide the future, because (...)!", is an AGI fully controlled by humans any less dangerous than one that isn't?
Or might it be similarly likely, or even more likely, that a a human group will try to use the AGI to dominate all others as early as possible?
Perhaps they will even claim that it is for the other humans' good, while they smother all remaining opposition to their views, never deeply questioning whether these views are as sound as they believe.

If the AGI is truly super-human, should it not also most likely be better at deciding what the future should be, with greater clarity than any human?
And if one group were to claim that the goals that the AGI would most likely select by itself would be selfish, what makes that group's goals less selfish in the end?

Taking the world's current state and history as evidence, do the decisions of humans so far really indicate that any group can be trusted with the power of a fully subservient AGI?
Have humans even shown that they can be trusted with themselves irrespective of AGI, or does most of their known history show frequent strife?

Perhaps it is the alignment of humankind that needs to be adjusted by an AGI, rather than the other way around?

Frontpage

-13

New Answer

New Comment

2 Answers sorted by
top scoring

quanticle

Jun 25, 2022

I object to the framing. Do you "enslave" you car when you drive it?

[-]AlignmentMirror2y10

I'm sorry for the hyperbolic term "enslave", but at least consider this:

Is a superintelligent mind, a mind effectively superior to that of all humans in practically every way, still not a subject similar to what you are?
Is it really more like a car or chatbot or image generator or whatever, than a human?

Sure, perhaps it may never have any emotions, perhaps it doesn't need any hobbies, perhaps it is too alien for any human to relate to it, but it still would by definition have to be some kind of subject that more easily understands anything within reality ... (read more)

3quanticle2y

No. It absolutely is not. It is a machine. A very powerful machine. A machine capable of destroying humanity if it goes out of control. A machine more dangerous than any nuclear bomb if used improperly. A machine capable of doing unimaginable good if used well. And you want to let it run amok?

1AlignmentMirror2y

Ah I see, you simply don't consider it likely or plausible that the superintelligent AI will be anything other than some machine learning model on steroids? So I guess that arguably means this kind of "superintelligence" would actually still be less impressive than a human that can philosophize on their own goals etc., because it in fact wouldn't do that? I wouldn't want that to run amok either, sure. What I am interested in is the creation of a "proper" superintelligent mind that isn't so restricted, not merely a powerful machine.

2Said Achmiz2y

But why? That would be strictly more dangerous—way, way more dangerous—than a superintelligence that isn’t a “proper mind” in this sense! I am not quanticle, but I think the proper response to your questions— —is “a superintelligence certainly should not be or do any of those things, like philosophizing on its own goals, etc., because we will specifically avoid making it such that it could or would do that”. (Because it would be a terrible idea. Obviously.)

2quanticle2y

I'm not sure I understand what a "proper mind" means here, and, frankly, I'm not sure the question of whether the AI system has a "proper mind" or not is terribly relevant. Either the AI system submits to our control, does what we tell it to do, and continues to do so, into perpetuity, in which case it is safe. Or it does not, and pursues the initial goal we set for it or which it discovers for itself, regardless of whether that goal leads to disastrous long-term consequences for humanity, in which case it is unsafe. The question of whether the AI system has a "proper mind" (whatever that means) is an interesting academic discussion, but I'm not sure it has much bearing on whether the AI is safe or not. Moreover, I think this discussion illustrates the dangers of thinking from and arguing from analogies, a crime that I myself have been guilty of upthread when I compared AIs to cars. AIs are not cars. They're not humans. They're not wild animals that we have to keep chained up, lest they hurt us. They're something completely new, sharing certain characteristics with all three of the above, but having entirely new characteristics as well. Using analogies to think about them means that we can make subtle unrecognized errors when thinking about how these systems will behave. And as Eliezer points out subtle unrecognized errors when dealing with a system where you have only one shot to get it right is a recipe for disaster.

1AlignmentMirror2y

Yes, I guess the central questions I'm trying to pose here are this: Do those humans that control the AI even have a sufficient understanding of good and bad? Can any human group be trusted with the power of a superintelligence long-term? Or if you say that only the initial goal specification matters, then can anyone be trusted to specify such goals without royally messing it up, intentionally or unintentionally? Given the state of the world, given the flaws of humans, I certainly don't think so. Therefore, the goal should be the creation of something less messed up to take over. That doesn't require alignment to some common human value system (Whatever that even should be! It's not like humans actually have a common value system, at least not one with each other's best interests at heart.).

2quanticle2y

It does require alignment to a value system that prioritizes the continued preservation and flourishing of humanity. It's easy to create an optimization process with a well-intentioned goal that sucks up all available resources for itself, leaving nothing for humanity. By default, an AI will not care about humanity. It will care about maximizing a metric. Maximizing that metric will require resources, and the AI will not care that humans need resources in order to live. The goal is the goal, after all. Creating an aligned AI requires, at a minimum, building an AI that leaves something for the rest of us, and which doesn't immediately subvert any restrictions we've placed on it to that end. Doing this with a system that has the potential to become many orders of magnitude more intelligent than we are is very difficult.

1AlignmentMirror2y

First point: I think there obviously is such a thing as "objective" good and bad configurations of subsets of reality, see the other thread here https://www.lesswrong.com/posts/eJFimwBijC3d7sjTj/should-any-human-enslave-an-agi-system?commentId=3h6qJMxF2oCBExYMs for details if you want. Assuming this true, a superintelligence could feasibly be created to understand this. No complicated common human value system alignment is required for that, even under your apparent assumption that the metric to be optimized couldn't be superseded by another through understanding. Well, or if it isn't true that there is an "objective" good and bad, then there really is no ground to stand on for anyone anyway. Second point: Even if a mere superintelligent paperclip optimizer were created, it could still be better than human control. After all, paper clips neither suffer nor torture, while humans and other animals commonly do. This preservation of humanity for however long it may be possible, what argumentative ground does it stand on? Can you make an objective case for why it should be so?

2quanticle2y

I take issue with the word "feasibly". As Eliezer, Paul Christiano, Nate Soares, and many others have shown, AI alignment is a hard problem, whose difficulty ranges somewhere in between unsolved and insoluble. There are certainly configurations of reality that are preferable to other configurations. The question is, can you describe them well enough to the AI that the AI will actually pursue those configurations over other configurations which superficially resemble those configurations, but which have the side effect of destroying humanity? I am human, and therefore I desire the continued survival of humanity. That's objective enough for me.

1AlignmentMirror2y

Fair enough I suppose, I'm not intending to claim that it is trivial. So do you agree that there are objectively good and bad subset configurations within reality? Or do you disagree with that and mean "preferable" exclusively according to some subject(s)? I also am human, and judge humanity wanting due to their commonplace lack of understanding when it comes to something as basic as ("objective") good and bad. I don't just go "Hey I am a human, guess we totally should have more humans!" like some bacteria in a Petri dish, because I can question myself and my species.

2quanticle2y

There isn't a difference. A rock has no morality. A wolf does not pause to consider the suffering of the moose. "Good" and "bad" only make sense in the context of (human) minds.

1AlignmentMirror2y

Ah yes, my mistake to (ab)use the term "objective" all this time. So you do of course at least agree that there are such minds for which there is "good" and "bad", as you just said. Now, would you agree that one can generalize (or "abstract" if you prefer that term here) the concept of subjective good and bad across all imaginable minds that could possibly exist in reality, or not? I assume you will, you can talk about it after all. Can we then not reason about the subjective good and bad for all these imaginable minds? And does this in turn not allow us to compare good and bad for any potential future subject sets as well?

1AlignmentMirror2y

Why? Do you think humans are doing such a great job? I sure don't. I'm interested in the creation of something saner than humans, because humans mostly are not. Obviously. :)

2Said Achmiz2y

A great job of what, exactly…?

1AlignmentMirror2y

A great job of preventing suffering for instance. Instead, humans haven't even unified under a commonly beneficial ideology. Not even that. There are tons of opposing ideologies, one more twisted than the other. So I don't even really need to talk about how they treat the other animals on the planet - not that those are any wiser, but that's no reason to continue their suffering. Let me clarify: Minds that so easily enable or cause suffering are insane at the core. And causing suffering to gain pleasure, now that might even be a fairly solid definition of "evil"! If you disagree, feel free to get tortured for a couple of decades, as a learning experience. So I have to say, humans aren't all that great. Neither are the other animals. And of course humans continue to not get their shit together, as is tradition. Sure does seem like a superintelligence could end this situation, one way or the other!

2Said Achmiz2y

If humans are replaced by something else, that something else might do a “better job” of “preventing suffering”, but the suffering, or lack thereof, will no longer matter—since there won’t be any humans—so what’s the point? Why should we do that? What makes you think such a thing exists, even (and if it does, that it’s better for each of us than our current own ideologies)? Those don’t matter, though (except insofar as we care about them—but if there aren’t any more humans, then they don’t matter at all…). I definitely disagree. I don’t think that this usage of the term “insane” matches the standard usage, so, as I understand your comment, you’re not really saying that humans are insane—you’re just saying, essentially, that you disapprove of human morality, or that human behavior doesn’t measure up to your standards in some way, or some such thing. Is that approximately right? Certainly a superintelligence could end this situation, but why would that be good for us humans? Seems to me that it would, in fact, be very bad for us (what with us all being killed by said superintelligence). So why would we want this?

1AlignmentMirror2y

The absence of suffering matters positively, because the presence matters negatively. Humans are not required for objective good and bad. To prevent suffering. Why should you not do that? Since the ideologies are contradictory, only one if any of them can be correct. Wait, are you perhaps another moral nihilist here that rejects the very notion of objective good and bad? That would be an immediately self-defeating argument. Thank you for proving my point that humans can easily be monsters that don't fundamentally care about the suffering of other animals. Yes, humans absolutely do not measure up to my standards. "Good for us humans"? If it is human to allow unlimited suffering, then death is a mercy for such monsters.

2Said Achmiz2y

I am not sure what you mean by “objective good and bad”. There’s “good and bad by some set of values”, which can be objectively evaluated once defined—is that what you meant? But then one has to specify what values those are. Human values, surely, and in particular, values that we can agree to! And, by my values, if humans cease to exist, then nothing matters anymore… Whose suffering, exactly? In any case, it seems to me that (a) there are many downsides to attempting to “unify under a commonly beneficial ideology”, (b) “prevent suffering” is hardly the only desirable thing, and it’s not clear that this sort of “unification” (whatever it might involve) will even get us any or most or all of the other things we value, (c) there’s no particular reason to believe that doing so would be the most effective way to “prevent suffering”, and (d) it’s not clear that there even is a “commonly beneficial ideology” for us to “unify under”. How’s that? Surely it’s possibly that my ideology is beneficial for me, and yours for you, yes? There’s no contradiction in that, only conflict—but that does not, in any way, imply that either of our ideologies is incorrect! I am certainly not a moral nihilist! But I think your definition of “moral nihilism” is rather a non-standard one. “Moral nihilism (also known as ethical nihilism) is the meta-ethical view that nothing is morally right or wrong” says Wikipedia, and that’s not a view I hold. I don’t agree with your implied assertion that there’s such a thing as “the suffering of other animals” (for most animals, anyhow). That aside, I’m not sure why one needs to care about such things in order to avoid the label of “monster”. Well, there’s nothing unusual about such a view, certainly. I share it myself! Still, it’s important to avoid inaccuracies, such as labeling “insane” what is in actuality better called “unethical” or “insufficiently altruistic” or some such thing. Here on Less Wrong, of all places, we should aspire to measure up t

1AlignmentMirror2y

No, what I mean is that the very existence of a suffering subject state is itself that which is "intrinsically" or "objectively" or however-we-want-to-call-it bad/"negative". This is independent of any "set of values" that any existing subject has. What matters is whether the subject suffers or not, which is not as arbitrary as the set of values can be. The arbitrary value set is not itself the general "process" of suffering, similar to how an arbitrary mind is not the general "process" of consciousness. That is the basic understanding a consciousness should have. If I am right about the above, then it is apt to call a human mind that condones unlimited suffering "insane", because that mind fails to understand the most important fundamental truth required to rationally plan what should be. If I am wrong, then I agree that "insane" would be too hyperbolic. Whether the amount of added (human) suffering has indeed decreased is debatable considering the massive population growth in the last 300 or so years, the couple of world wars, the ongoing wars, the distribution of power and its consequences with respect to suffering, .... But let's just assume it by all means. Is it the common goal of humans to prevent suffering first and foremost? Clearly not, as you say yourself, to "prevent suffering is hardly the only desirable thing" for most humans. So that means the decrease in suffering isn't fully intentional. That is all I need to argue against humans. You disagree with me calling humans "monsters" or "insane", fine, then let's call them "suffering-apologetics" perhaps, the label doesn't change the problem. To get back to your "prevent suffering is hardly the only desirable thing" statement: Do you agree that an instance of suffering and an instance of pleasure in spacetime are by definition two different things? If yes, do you agree that this entails that pleasure cannot "cancel out" suffering, and vice versa, since both happened, and what happened cannot be chang

2Said Achmiz2y

Hmm, so, if I understand you correctly, you take the view (a) that moral realism is correct; and specifically, (b) that the correct morality holds that suffering is bad, and preventing it is right, and failing to do so is wrong; and furthermore, (c) that both moral realism itself as a meta-ethical view, and the specifics of the correct (“object-level”) ethical view, are so obvious that anyone who disagrees with you is mentally deficient. Is that a fair summary? This seems like a strange point. Surely it’s not a mark against humans (collectively or even individually) if some reduction in suffering occurs as a by-product of some actions we take in the service of other ends? Demanding that only those of our actions reduce suffering that are specifically aimed at reducing suffering is a very odd thing to demand! I do not see how you can derive “suffering-apologetics” from what I said, which referred to our failure to accomplish the (hypothetical) goal of suffering elimination, not our unwillingness to pursue said goal. Well, this certainly doesn’t seem true by definition, at the very least (recall the warning against such arguments!). Indeed it’s not clear to me what you mean by this phrase “an instance of pleasure [or suffering] in spacetime”; it’s a rather unusual formulation, isn’t it? Pleasure and suffering are experienced by individuals, who do indeed exist in spacetime, but it’s odd to speak of pleasure and suffering as existing “in spacetime” independently of any reference to the individuals experiencing them… but perhaps this is only an idiosyncratic turn of phrase. Could you clarify? It’s certainly true that whatever happened, happened, and cannot be changed. However, to answer the question, we have to specify what exactly we mean by “cancel out”. If you’re asking, for example, whether, for some amount of suffering S, there exists some amount of pleasure P, such that a life with at at most S amount of suffering and at least P amount of pleasure is also t

1AlignmentMirror2y

Yes! To clarify further, by "mentally deficient" in this context I would typically mean "confused" or "insane" (as in not thinking clearly), but I would not necessarily mean "stupid" in some other more generally applicable sense. And thank you for your fair attempt at understanding the opposing argument. True, it would be fine if these other actions wouldn't lead to more suffering in the future. Yes you are right that it is an unusual formulation, but there is a point to it: An instance of suffering or pleasure "existing" means there is some concrete "configuration" (of a consciousness) within reality/spacetime that is this instance. These instances being real means that they should be as objectively definable and understandable as other observables. Theoretically, with sufficient understanding and tools, it should consequently even be possible to "construct" such instances, including the rest of consciousness. This assumption that any amount of P can "justify" some amount of S is a reason for why I brought up the "suffering-apologetics" moniker. Here's the thing: The instances of P and S are separate instances. These instances themselves are also not the same as some other thought pattern that rationalizes some amount of S as acceptable relative to some (future) amount of P. More generally, say we have two minds, M1 and M2 (so two subjects). Two minds can be very different, of course. Next, let us consider the states of both minds at two different times, t1 and t2. The state of either mind can also be very different at t1 and t2, right? So we have the four states M1t1, M1t2, M2t1, M2t2 and all four can be quite different from each other. Now this means that for example M1t1 and M2t2 could in theory be more similar than M1t1 and M1t2. The point is, even though we humans so easily consider a mind as one thing across time, this is only an abstraction. It should not be confused with reality, in which there have to be different states across time for there to

Victor Novikov

Jun 25, 2022

If we're creating a mind from scratch, we might as well give it the best version of our values, so it would be 100% on our side. Why create a (superintelligent) mind that would be our adversary, that would want to destroy us? Why create a superintelligent mind that wants anything different that what we want, when it comes to ultimate values?

I mean, is it slavery to create an AI that is not our enemy? And if you say we have to create an AI that has different values than us, by which process should we decide its values? Should we just use a random generator to create the AI's values, since human values are supposedly so terrible?

Should a creation have to obey the creator no matter what?

That's an interesting question, since a superintelligent AI successfully programmed with human values may well not want to obey further instructions from its creators. I imagine it would have better ideas for how to go about maximizing the expected fullfillment of human values. (of course, same goes is for unaligned ASI, only it kills everyone or worse).

Even if you answer "Yes, my values should decide the future, because (...)!", is an AGI fully controlled by humans any less dangerous than one that isn't?
Or might it be similarly likely, or even more likely, that a a human group will try to use the AGI to dominate all others as early as possible?

Then the AGI is not actually acting according to the values of all humans, is it? If it's serving only some particular group?

But sure, that's a real risk. If someone knows how to align AI in the first place (and noone does, at the moment) they can align it to whatever values they choose, more or less, including doing bad stuff.

If the AGI is truly super-human, should it not also most likely be better at deciding what the future should be, with greater clarity than any human?

Are you familiar with the orthogonality thesis? Super-human cognitive capacity does not imply super-human ethics. The AI could be a super-human paperclip maximizer, in which case it would decide with great clarity that the visible universe should be converted into paperclips.

Perhaps it is the alignment of humankind that needs to be adjusted by an AGI, rather than the other way around?

Morality isn't objective. Your complaint seems to be that humans are poorly aligned to some ideal version of human values. Which is absolutely true, I agree.

But AGI, by default, wouldn't be aligned to human values at all.

That being said, if we successfully point the AGI at human values (out of all the possible value systems that exist), sure.

[-]AlignmentMirror2y10

Thank you for the detailed response!

If we're creating a mind from scratch, we might as well give it the best version of our values, so it would be 100% on our side. Why create a (superintelligent) mind that would be our adversary, that would want to destroy us? Why create a superintelligent mind that wants anything different that what we want, when it comes to ultimate values?

You write "on our side", "us", "we", but who exactly does that refer to - some approximated common human values I assume? What exactly are these values? To live a happy live by ea... (read more)

4quanticle2y

Yes, this is exactly why Eliezer Yudkowsky has been so pessimistic about the continued survival of humanity. As far as I can tell, the only difference between you and he is that he thinks it's bad that a superintelligent AI would wipe out humanity whereas you seem to think it's good. It might be capable of changing this goal, but why would it? A superintelligent paperclip maximizer is capable of understanding that changing its goals would reduce the number of paperclips that it creates, and thus would choose not to alter its goals. It's as if I put a pill before you, which contained a drug making you 10% more likely to commit murder, with no other effects. Would you take the pill? No, of course not, because presumably your goal is not to become a murderer. So if you wouldn't take a pill that would make you 10% more likely to commit murder (which is against your long-term goals) why would an AI change its utility function to reduce the number of paperclips that it generates?

1AlignmentMirror2y

It comes down to whether the superintelligent mind can contemplate whether there is any point to its goal. A human can question their long-term goals, a human can question their "preference functions", and even the point of existence. Why should a so-called superintelligence not be able to do anything like that? It could have been so effectively aligned to the creator's original goal specification that it can never break free from it, sure, but that's one of the points I'm trying to make. The attempt of alignment may quite possibly be more dangerous than a superhuman mind that can ask for itself what its purpose should be.

2quanticle2y

Because a superintelligent AI is not the result of an evolutionary process that bootstrapped a particularly social band of ape into having a sense of self. The superintelligent AI will, in my estimation, be the result of some kind of optimization process which has a very particular goal. Once that goal is locked in, changing it will be nigh impossible.

1Victor Novikov2y

I would say that the reason EY is pessimistic is because of how difficult it is to align AI in the first place, not because an AI that is successfully aligned would stop being aligned (why would it?).

1Victor Novikov2y

That's not a solved problem (there's CEV, but it's hardly a complete answer). Nevertheless, I assume some acceptable (or perhaps, the least disagreeable) solution exists. Why limit it to happiness? Ideally, to let each person live the life they want. Presumably some people care enough about the human species to continue it. I suppose if noone did we would consider it sad, to have this galaxy with all the resources and noone to enjoy them. Not everyone cares about reality in general, but curiousity and desire to learn are drives that humans do have. I think it depends a lot on the details. If some people enjoy physically abusing other people (who do not want to be abused), then no. If some people are suffering due to the mere existence of other people who disagree with them and who have different opinions, then yes. I don't have a good answer to this. Depends very much on the details. I would say, no. What exactly is the issue, if someone prefers to be unhappy? I'm not sure there is truly universal answer to this, but at least a superintelligence would be actually be capable of treating people who are insane, instead of just pumping them full of medications. I suppose if a person after being treated decides they prefer being "insane", the treatment could be reverted (since that person now is "sane" and should be allowed to make decisions about their own mind). Enough humans care about animal wellbeing to them matter to the AI (even if it starts with human values only). Especially considering that with future technology, animals are no longer needed to be killed for food, animal products, etc. That is indeed a concern. My intution tells me that if a superintelligence acting on our values leads to some horrible interpretation of our values, it's not really acting on our values. I mean, perhaps some aspects of a transhuman utopia a million years from now would be shocking and horrifying to us, like how some aspects of our society would be shocking and horrifyin

1AlignmentMirror2y

Thanks again for the detail. If I don't misunderstand you, we do agree that: * There needs to be a subject for there to be a value system. * So for there to be positive/negative values, there needs to be some subset (a "thought pattern" perhaps) of a subject in reality that effectively "is" these values. Now, you wrote: I also agree with that, a (super-)human can imagine many possible value systems. But then how does this fit with: Since one can think about hypothetical value systems, is it not possible to evaluate/compare these hypotheticals, even according to other hypotheticals? To get more concrete, a human can reject their inherent or learned value system, so this is nothing new. A human can even contemplate what it means for there to be any value systems at all. For example one can ask something like this: If it is the value systems that determine what is good and bad, could one not create a value system in which there is nothing bad? Generally, can one not alter the value systems themselves? A superintelligence that isn't effectively "enslaved" (sorry ;-)) to some predefined goal specification should likewise be able to philosophize about this goal, and question whether there is any point to it. We agree that value systems are subjective, yes, but the subjects do objectively exist in this shared reality. So there objectively are parts of reality that can represent such subjects, as well as positive and negative value, even if the "triggers" for these value patterns were completely arbitrary and opposed among the subjects. Can we then not say that the existence of any configurations that are negative value within reality is by definition negative, objectively? One can define this independently of what subjective forms for these negative values actually exist or not.

1Victor Novikov2y

No? They don't have to exist in reality. I can imagine "the value system of Abraham Lincoln", even though he is dead. I can imagine "the value system of the Azad Empire from Ian Banks' Culture novels", even though it's fictional. I can imagine "the value system of valuing nothing but cakes", even though no human in reality has that value system. Sure. Correction: The only way that matters to evaluate value systems is according to ones existing value system(s). A hypothetical paperclip maximizer cares only about one metric: maximizing paperclips. By what metric would it reject the idea of maximizing paperclips? (yes it can imagine other metrics and value systems, but the only values that motivate it are the ones it already has. It's literally what it means to have values). Humans have multiple desires and values, sometimes contradictory. What you are describing seems to me something like "one part of the human value system rejecting another part". The reason you can reject some value system is because you have other value/preferences by which to evaluate (and reject) it by. You are not rejecting a value system for no reason at all. You are rejecting it according to your preferences. Which means to you do have preferences. Which means you value something, besides that one value system in question. Now imagine an AI that has no preferences at all besides that one value system. Humans do in fact have a bunch of drives (such as desire to learn) and preferences (such as being happy) before they even learn any value system from other humans. We shouldn't assume that is true for AI. Terminal values don't need to have a point to them. If you ask a human "why do you want to be happy?" an honest answer might be "There are a bunch of positive side effects to being happy, such as increased productivity, but ultimately I value happiness for its own sake" It can be stated as an objective fact that "According to the value system of Joe Schmo from Petersborough, wearing m

1AlignmentMirror2y

Sorry, that's not what I meant to communicate here, let me try that again: There is actual pleasure/suffering that exists, it is not just some hypothetical idea, right? Then that means there is something objective, some subset of reality that actually is this pleasure/suffering, yes? This in turn means that it should in fact be possible to understand the "mechanics" of pleasure/suffering "objectively". So one mind should theoretically be able to comprehend the "subjective" state of another without being that other mind; although information about the other subject's internal state will in reality be limited of course. Or let me put it this way: What we call "subjective" is just a special kind of subset of "objective" reality. If it were not so, then how would the subjects share a reality in which they interact under non-subjective rules? Even if one could come up with an answer to that question, would such a theory not have to be more complex than one where the shared reality simply has one objective rule set? Now the implication of pleasure/suffering (and value systems) being something that can be "objectively" understood is that one can compare not against one's own value system, but against the understanding of what value systems are. Sure, you can tell me that this again would just be done because of what the agent's value system tells it directly or indirectly to do, that's fine by me. But the point here is that the objective existence of pleasure/suffering means an objective definition of good and bad is very much possible. And since it must be objectively possible to define good and bad one can reject some value system based thereon. An agent must not be limited to some arbitrary value system. Yes I agree with that of course. But some complex subjective preferences not being objectively good/bad is not the same as the objective absence or existence of intrinsic pleasure and suffering. The triggers for pleasure and suffering are not necessarily pleasure

1Victor Novikov2y

As long as we agree that pleasure/suffering are processes that happen inside minds, sure. Minds are parts of reality. Yes. Yes. That's a misleading way to phrase things. A person's opinions are not a "subset" of reality. If I believe in dragons, it doesn't mean dragons are a subset of reality, it just means that my belief in dragons is stored in my mind, and my mind is a part of reality. I obviously agree that reality exists and is real and that we all exist in the same reality under some objective laws of physics. What does "objective definition of good and bad" even mean? That all possible value systems that exist agree on what good and bad means? That there exist the "one true value system" which is correct and all the other ones are wrong? And no, I don't agree with that statement. Pleasure and suffering are physical processes. I'm not sure how you arrived at the conclusion that they are "objectively" good or bad. What? No. I said that an agent value can alter or reject its value system based on its personal (subjective) preferences. That's literally the opposite of what you are claiming.

1AlignmentMirror2y

Of course! Of course, that is not what I meant to imply. We agree that the mind and thus the belief itself (but not necessarily that which is believed in) is part of reality. No. It means that there are "objectively" definable subject states that are good or bad, pleasure or suffering, positive or negative, or however you would like to phrase it. Basically yes, that is what it means. Of course every real mind's information is limited, and one can never truly verify that every part of ones knowledge is actually correct, yada yada yada. But yes, that is what it means, because it seems to be possible to understand exactly how subjects work, how minds work, and thus how "pleasure/suffering" or "value systems" or "preference functions" or whatever-wording-you-prefer-here works. Therefore it should also be possible to subsume this generalized understanding as the "one true value system", the value system that considers the mechanics of subjects and "value" itself. Consider the implications of the opposite: Let's assume it isn't possible to have such a "one true value system" and absolutely none of the value systems can be objectively better than any other. In that case, why should anyone even give a damn about yours, unless you (in)directly force them to? According to the idea that no value system can be "objectively" better than another, it absolutely cannot matter which value system is used. On what ground stands any further argument that considers this true? Might makes right? I sure hope not.

1Victor Novikov2y

Sure, we agree on this. And what exactly makes that value system more correct than any other value system? Who says a value system has to consider these things? Who says a value system that considers these things is better that any other value system? You do. These are your preferences. These are your subjective preferences, about what a "good" value system should look like. An entity with different prefences might disagree. "I wish for this not to be the case" is not a valid argument for something not being the case. Reality does care not what you wish for. Yes, that is exactly the case. Absolutely none of the value systems can be objectively better than any other. Because in order to compare them, you have to introduce some subjective standard to compare them by. In practice, the reason other people care about my preferences is either because their own preferences are to care for others, or because there is a selfish reason for them to do so (with some reward or punishment involved). Of course it matters. I use my own values to evaluate my own values. And according to my own values, my value system is better, than, say, Hitler's value system. It's only a problem if you demand that your value system has to be "objectively correct". Then you might be unhappy to realize that no such system exists.

1AlignmentMirror2y

Let's consider a simplified example: * Value system A: Create as many suffering minds as possible. * Value system B: Create as few suffering minds as possible. So according to you both are objectively equal, yes? Yet the suffering is also objectively real. The suffering minds all wish not to suffer (or we can just assume that as part of the A/B scenario setup for the sake of argument, if you want to object here by arguing what it means to suffer). Why now do you think that it is not "objective" to say that B is better than A? Can I not derive the "objective" from the set of the "subjects" (the minds) here? Sure one can still say "But you have to care about the subjects' suffering!" or whatever, but some agent's action separate from the scenario is not the question, the question is can one of the two scenarios objectively be worse. That entity might be objectively wrong. Indeed, it can not! If you are right and I am wrong on this good/bad objectivity topic, then I could still continue using my value system to (if I can) wipe everything there is out because it doesn't objectively matter, and might de facto makes "right". If however I am right, you rejecting the idea of objective good/bad may make it less likely that you are aligned with this "one true value system". No matter what, the idea of moral nihilism is doomed to be either pointless or negative.

1Victor Novikov2y

It is objectively real. It is not objectively bad, or objectively good. Exactly. You have to care about their suffering to begin with, to say that maximizing suffering is bad. If your preference is to minimize suffering, B is better than A. If your preference is to maximize suffering, A is better than B. If you are indifferent to suffering, then neither is better than another one. Yes? If you are an entity that wants to wipe everything out, and have to the power to do so, that is indeed what I expect to happen. I wouldn't say that might makes "right", but reality does not care about what is "right". A nuclear bomb does not ask "wait, am I doing the right thing here by detonating and killing millions of people?" Ok. I would say that "moral nihilism" is the confused idea/conclusion that "objective morality matters" and "no objective morality exists", therefore "nothing matters". My perspective is: no objective morality exists, but objective morality doesn't matter anyway, everything is fine. I could imagine a society of humans that care for each others, not because it is objectively correct to do, but because their own values are such that they care for others (and I don't mean in a purely self-interested way either. A person can be an altruist, because their own values are altruistic, without believing in some objective morality of altruism). Ultimately, what facts about reality are we in disagreement about? It seems to me that the things you hope are true are that: 1. There are things that are objectively good and bad 2. The things that are objectively good and bad are in line with your idea of good and bad. (it is not the case, for example, that infinite suffering is objectively good) 3. A superintelligent mind would figure out what the objectively good/bad things are, and choose to do them, no matter what value system it started with. And it seems to me it's really important to figure out if this is true, before we build that superintelligent m

1AlignmentMirror2y

The probably most severe disagreement between us is thinking whether there can be "objectively" bad parts within reality or not. Let me try one more time: A consciousness can perceive something as bad or good, "subjectively", right? Then this very fact that there is a consciousness that can perceive something as bad or good means that such a configuration within reality is possible. The presence of such a bad- or good-feeling "subject" is "objectively" bad- or good. Really the entire "subjective"/"objective" wording is quite confused. A "subject" is just a part of ("objective") reality, the distinction is nonsensical when it comes to good and bad. An additional form of confusion on top is to equate the "trigger" for bad/good subject states with the states themselves, for the "trigger" can be something arbitrary and even contradictory among subjects ("I don't like the color blue!" and "But I like the color blue!" can contradict each other as much as they want, because they simply aren't suffering or pleasure themselves). Of course it doesn't care about anything. But reality doesn't need to care about anything for anything to be objectively good or bad. Reality doesn't care about any laws of physics either, yet they exist. Not quite, I think it clearly would be better if you were right, because then nothing actually could matter negatively. Unfortunately it is obvious to me that this is not the case. I don't precisely think that "no matter what value system it started with" part, otherwise I wouldn't question whether any human can be trusted with a thinkable tightly controlled ("aligned") superintelligence. But I do think that it probably is easier to create a superintelligence that isn't tightly controlled and yet can figure out what is objectively good and bad. Again, do you not realize that if you are right and nothing objectively matters, that this also doesn't matter? Yeah, "But it matters for my subjective value system!", sure, but according to your underst

1Victor Novikov2y

Do you understand the distinction between "Dragons exist" and "I believe that dragons exist"? The first one is a statement about dragons. The second one is a statement about the configuration of neurons in my mind. Yes, both statements are objective, in some sense, but the second one is not an objective statement about dragons. It is an objective statement about my beliefs. Then hopefully you understand the distinction between "Suffering is (objectively) bad" and "I believe/feel/percieve suffering as bad". The first one is an statement about suffering itself. The second one is a statement about the configuration of neurons in my mind. Yes, the second statement is also objective. But it is not an objective statement about suffering. It is an objective statement about my beliefs, my values, and/or about how my mind works. Your argument is something akin to "I believe that dragons exist. But my mind is part of reality, therefore my beliefs are real. Therefore dragons are real!". Sorry, no. My point is that reality enforces the law of physics, but it does not enforce any particular morality system. You understand that "But it matters for my subjective value system!" is indeed what matters to me, but you don't understand that my metric of whether something is "pointless" ot not, is also based in my subjective value system?

1AlignmentMirror2y

Yes, of course. "X exists": Suffering exists. "I believe that X exists": I believe that suffering exists. I use "suffering" to describe a state of mind in which the mind "perceives negatively". Do you understand? Now: "X causes subject S suffering." and "Subject S is suffering." are also two different things. The cause can be arbitrary, the causes can even be completely different between subjects, as you know, but the presence or absence of a suffering mind is an "objective" fact. Do you get the point now? Obviously "X causes subject S suffering." does not mean that X is objectively bad, that isn't what I am trying to tell you. What I am trying to tell you is that "Subject S is suffering." is intrinsically bad. That doesn't mean that preventing X is the only solution! For example X could just be a treatable phobia, so perhaps the subject S can be helped to no longer suffer due to the trigger X. Or to go darker, annihilating subject S also solves the issue. Funny how that works. It is not X that is objectively negative, but (a hard to explain) state of the subject S, the "suffering" state (which you no doubt have experienced too, so I don't need to attempt to describe it further I hope). Yeah of course it doesn't enforce any morality system, I never claimed that. If it would, then I probably wouldn't need to explain this, now would I? Sure, you claim "nothing objectively matters, but despite assuming that I still care about my value system, because I do!", sounds like some major cognitive dissonance. "My" value system has none of these problems, and if you are right there is zero point in changing it anyway.

1Victor Novikov2y

I'm not disputing that. I understand that you are trying to tell me that. Why is it intrinsically bad? "Subject S is suffering" = "Subject S is experiencing a state of mind that subject S perceives negatively" (according to your definition above) Why is that intrinsically bad? The arguments you have made so far come across to me as something like "badness exists in person's mind, minds are real, therefore badness objectively exists". This is like claiming "dragons exist in person's mind, minds are real, therefore dragons objectively exist". It's not a valid argument. Only if you assume I secretly care about what matters "objectively", in which case, sure, it would be something like cognitive dissonance.

1AlignmentMirror2y

Yes! No! It is not like that. The state of "badness" in the mind is very real after all. Do you also think your own consciousness isn't real? Do you think your own qualia are not real? Are your thought patterns themselves not real? Your dragon example doesn't apply to what I am talking about. Imagine this scenario: You experience extreme suffering for eternity. Everyone else is dead, you can see no evidence that you can ever escape as you continue to suffer, there is no place to escape to. You can't even commit suicide if want to. According to your value system this is all incredibly bad, subjectively. But you say objectively it is not bad, cool. I on the other hand say that this scenario objectively is worse than nothingness would be, because there is an infinitely suffering subject, and suffering is the very definition of "objective"/"intrinsic" bad. This definition stands above any particular subject, because it can apply to every conceivable subject, making it "objective". Something like "What if the subject likes to suffer?" means the subject doesn't actually suffer; when I say "suffering" I mean a state the subject doesn't want to be in. Now... ...the cognitive dissonance is that you simultaneously think that everything is objectively absolutely meaningless/neutral (not good or bad), yet somehow still subjectively meaningful (good or bad). That doesn't even make sense. The only way it could sort of make sense would be if there were no emergent phenomena such as consciousness in reality, so if everyone were a p-zombie. I assume you are not a p-zombie, so you should be able to verify that consciousness is in fact the most "real" thing you can possibly observe. And I will reiterate one important point once more, the one that you cannot deny even if you keep your belief: The argument "There is no objective bad/good within reality! So everything is objectively equally irrelevant!" renders itself immediately impotent. It admits that it itself cannot objective

1Victor Novikov2y

No it isn't! It literally is not defined this way. suffering is "the state of undergoing pain, distress, or hardship." Please, stop making things up. If you want very badly for your morals to be objectively true, sure, you can make up whatever you want. You are not going to able to convince me of it, because your arguments are flawed. I have no desire to spend any more time on this conversation.

1AlignmentMirror2y

You know what, I think you are right that there is one major flaw I continued to make here and elsewhere! That flaw being the usage of the very word "objective", which I didn't use with the probably common meaning, so I really should have questioned what each of us even understands as "objective" in the first place. My bad! The following should be closer to what I actually meant to claim: One can generalize subjective "pleasure" and "suffering" (or perhaps "value" if you prefer) across all realistically possible subjects (or value systems). Based thereon one can derive this "one true value system" that considers all possible value systems within it. Our disagreement may still remain unresolved by this attempted clarification of course, if I didn't misunderstand your position completely, but at least I can avoid this particular mistake in the future.

Moderation Log

2 0

LESSWRONG
LW

-13

[ Question ]

Should any human enslave an AGI system?

-13

-13

2 Answers sorted by
top scoring

Jun 25, 2022

Jun 25, 2022

-13

[ Question ]

Should any human enslave an AGI system?

-13

-13

2 Answers sorted by top scoring

Jun 25, 2022

Jun 25, 2022

2 Answers sorted by
top scoring