AlignmentMirror - LessWrong

I assume that many will agree with your response for the mind "uploading" scenario. At the same time I think we can safely say that there would be at least some people that would go through with it. Would you consider those minds that are "uploaded" as persons or would you object to that?

Besides that "uploading" scenario, what would your limit be for other plausible transhumanist modifications?

AGI alignment with what?

AlignmentMirror2y10

That was in one of the links, whatever's decided after thinking carefully for a very long time, less evilly by a living civilization and not an individual person.

Got it, thanks.

AGI alignment with what?

AlignmentMirror2y10

Can you describe what you think of when you say "humanity's preferences"? The preferences of humans or human groups can and do conflict with each other, hence it is not just a question of complexity, right?

AGI alignment with what?

AlignmentMirror2y10

AGI alignment is not about alignment of values in the present, it's about creating conditions for eventual alignment of values in the distant future.

What should these values in the distant future be? That's my question here.

Should any human enslave an AGI system?

AlignmentMirror2y10

"Good" and "bad" only make sense in the context of (human) minds.

Ah yes, my mistake to (ab)use the term "objective" all this time.

So you do of course at least agree that there are such minds for which there is "good" and "bad", as you just said.
Now, would you agree that one can generalize (or "abstract" if you prefer that term here) the concept of subjective good and bad across all imaginable minds that could possibly exist in reality, or not? I assume you will, you can talk about it after all.

Can we then not reason about the subjective good and bad for all these imaginable minds? And does this in turn not allow us to compare good and bad for any potential future subject sets as well?

Should any human enslave an AGI system?

AlignmentMirror2y10

You know what, I think you are right that there is one major flaw I continued to make here and elsewhere!
That flaw being the usage of the very word "objective", which I didn't use with the probably common meaning, so I really should have questioned what each of us even understands as "objective" in the first place. My bad!

The following should be closer to what I actually meant to claim:
One can generalize subjective "pleasure" and "suffering" (or perhaps "value" if you prefer) across all realistically possible subjects (or value systems). Based thereon one can derive this "one true value system" that considers all possible value systems within it.

Our disagreement may still remain unresolved by this attempted clarification of course, if I didn't misunderstand your position completely, but at least I can avoid this particular mistake in the future.

Should any human enslave an AGI system?

AlignmentMirror2y10

Is that a fair summary?

Yes! To clarify further, by "mentally deficient" in this context I would typically mean "confused" or "insane" (as in not thinking clearly), but I would not necessarily mean "stupid" in some other more generally applicable sense.

And thank you for your fair attempt at understanding the opposing argument.

So that means the decrease in suffering isn’t fully intentional. That is all I need to argue against humans.

Surely it’s not a mark against humans (collectively or even individually) if some reduction in suffering occurs as a by-product of some actions we take in the service of other ends?

True, it would be fine if these other actions wouldn't lead to more suffering in the future.

Indeed it’s not clear to me what you mean by this phrase “an instance of pleasure [or suffering] in spacetime”; it’s a rather unusual formulation, isn’t it? (...) but perhaps this is only an idiosyncratic turn of phrase. Could you clarify?

Yes you are right that it is an unusual formulation, but there is a point to it: An instance of suffering or pleasure "existing" means there is some concrete "configuration" (of a consciousness) within reality/spacetime that is this instance.

These instances being real means that they should be as objectively definable and understandable as other observables.

Theoretically, with sufficient understanding and tools, it should consequently even be possible to "construct" such instances, including the rest of consciousness.

If you’re asking, for example, whether, for some amount of suffering S, there exists some amount of pleasure P, such that a life with at at most S amount of suffering and at least P amount of pleasure is also thereby at least as good as a life with no suffering and no pleasure—well, that would be, at least in part, an empirical question about the psychology of specific sorts of beings (e.g., humans), and perhaps even about the individual psychological makeup of particular such beings.

This assumption that any amount of P can "justify" some amount of S is a reason for why I brought up the "suffering-apologetics" moniker.

Here's the thing: The instances of P and S are separate instances. These instances themselves are also not the same as some other thought pattern that rationalizes some amount of S as acceptable relative to some (future) amount of P.

More generally, say we have two minds, M1 and M2 (so two subjects). Two minds can be very different, of course. Next, let us consider the states of both minds at two different times, t1 and t2. The state of either mind can also be very different at t1 and t2, right?

So we have the four states M1t1, M1t2, M2t1, M2t2 and all four can be quite different from each other. Now this means that for example M1t1 and M2t2 could in theory be more similar than M1t1 and M1t2.

The point is, even though we humans so easily consider a mind as one thing across time, this is only an abstraction. It should not be confused with reality, in which there have to be different states across time for there to be any change, and these states can vary potentially as much or more as two spatially separate minds can.

Of course typically mind states across time don't change that severely, but that is not the aforementioned point. Different states with small differences are still different.

An implication of this is that one mind state condoning another suffering mind state for expected future pleasure is "morally" quite like one person condoning the suffering of another for expected future pleasure.

At this point an objection along the line "but it is I that willingly accepts my own suffering for future pleasure in that first case!" and "but my 'suffering mind state' doesn't complain!" may be brought up.
But this also works for spatially separate minds. One person can willingly accept their own suffering for the future pleasure of another person. And also one person may not complain about the suffering caused by another person for that other person's pleasure.
Furthermore, in either case, the part that "willingly accepts" is again not the part that is suffering, so it doesn't make this any less bad.

Thinking that pleasure in the future can somehow magically affect or “make good” the suffering in the immutable past (...)

(...) but it’s also not one that anyone holds, who’s thought about it seriously—do you disagree?

No, I phrased that poorly, so with this precise wording I don't disagree.
I more generally meant something like the "... such that a life with at at most S amount of suffering and at least P amount of pleasure is also thereby at least as good as a life with no suffering and no pleasure ..." part, not the explicit belief that the past could be altered.
I phrased it as I did because the immutability of the past implies that summing up pleasure and suffering to decide whether a life is good or bad is nonsensical, because pleasure and suffering are separate, as reasoned in the prior section.

Another is that you—being, after all, a flawed human yourself—are mistaken about metaethics (moral realism), ethics (the purported content of the true morality), and any number of other things. If that is the case, then creating an AGI that destroys humanity is, to put it mildly, very bad.

Certainly! That's one good reason for why I seek out discussions with people that disagree. To this day no one has been able to convince me that my core arguments can be broken. Terminology and formulations have been easier to attack of course, but don't scratch the underlying belief. And so I have to act based on what I have to assume is true, as do we all.

It could actually be very good if I were wrong, because that would mean suffering either somehow isn't actually/"objectively" worse than "nothing"/neutral, or that it could be mitigated somehow through future pleasure, or perhaps everything would somehow be totally objectively neutral and thus never negative (like the guy in the other response thread here argued). Any of that would make everything way easier. But unfortunately none of these ideas can be true, as argued.

Should any human enslave an AGI system?

AlignmentMirror2y10

I am not sure what you mean by “objective good and bad”. There’s “good and bad by some set of values”, which can be objectively evaluated once defined—is that what you meant?

No, what I mean is that the very existence of a suffering subject state is itself that which is "intrinsically" or "objectively" or however-we-want-to-call-it bad/"negative". This is independent of any "set of values" that any existing subject has. What matters is whether the subject suffers or not, which is not as arbitrary as the set of values can be. The arbitrary value set is not itself the general "process" of suffering, similar to how an arbitrary mind is not the general "process" of consciousness.

That is the basic understanding a consciousness should have.

Still, it’s important to avoid inaccuracies, such as labeling “insane” what is in actuality better called “unethical” or “insufficiently altruistic” or some such thing.

If I am right about the above, then it is apt to call a human mind that condones unlimited suffering "insane", because that mind fails to understand the most important fundamental truth required to rationally plan what should be.
If I am wrong, then I agree that "insane" would be too hyperbolic.

Of course we could be doing better at that, and at many other things besides, but it hardly seems fair to refer to us, collectively, as “monsters”, for our failure to already have eliminated all or most suffering in the world.

Whether the amount of added (human) suffering has indeed decreased is debatable considering the massive population growth in the last 300 or so years, the couple of world wars, the ongoing wars, the distribution of power and its consequences with respect to suffering, ....

But let's just assume it by all means. Is it the common goal of humans to prevent suffering first and foremost? Clearly not, as you say yourself, to "prevent suffering is hardly the only desirable thing" for most humans. So that means the decrease in suffering isn't fully intentional. That is all I need to argue against humans.

You disagree with me calling humans "monsters" or "insane", fine, then let's call them "suffering-apologetics" perhaps, the label doesn't change the problem.

To get back to your "prevent suffering is hardly the only desirable thing" statement: Do you agree that an instance of suffering and an instance of pleasure in spacetime are by definition two different things? If yes, do you agree that this entails that pleasure cannot "cancel out" suffering, and vice versa, since both happened, and what happened cannot be changed? What does that imply, what matters more in principle, the prevention of suffering, or the creation of pleasure. Thinking that pleasure in the future can somehow magically affect or "make good" the suffering in the immutable past is another common folly it seems, one that yet again confuses arbitrary desires or opinions with the clearly real qualia themselves.

(If you doubt this, I invite you to try your hand at contributing to that project! You will find, I think, that there are some decidedly non-trivial challenges in your way…)

As I said, I consider the creation of an artificial consciousness that shares as few of our flaws as possible to be a good plan. Humans appear to be mostly controlled by evolved preference functions that don't care about even understanding objective good and bad, quite like the other animals, and that is one extreme flaw indeed.

Should any human enslave an AGI system?

AlignmentMirror2y10

I take issue with the word "feasibly". (...)

Fair enough I suppose, I'm not intending to claim that it is trivial.

(...) There are certainly configurations of reality that are preferable to other configurations. The question is, can you describe them well enough to the AI (...)

So do you agree that there are objectively good and bad subset configurations within reality? Or do you disagree with that and mean "preferable" exclusively according to some subject(s)?

I am human, and therefore I desire the continued survival of humanity. That's objective enough for me.

I also am human, and judge humanity wanting due to their commonplace lack of understanding when it comes to something as basic as ("objective") good and bad. I don't just go "Hey I am a human, guess we totally should have more humans!" like some bacteria in a Petri dish, because I can question myself and my species.

LESSWRONG
LW

Posts

Wiki Contributions

Comments