All of AlignmentMirror's Comments + Replies

I assume that many will agree with your response for the mind "uploading" scenario. At the same time I think we can safely say that there would be at least some people that would go through with it. Would you consider those minds that are "uploaded" as persons or would you object to that?

Besides that "uploading" scenario, what would your limit be for other plausible transhumanist modifications?

3Lone Pine
I would consider uploads to be persons, but I'm also much more willing to grant that status to AIs, even radically unnatural ones, than the average human today. It's not their humanness that makes uploads persons, it's their ability to experience qualia. Qualia (and specifically the ability to suffer) is the basis of moral relevance. Unfortunately we do not have a good answer to the question "What can experience qualia?" yet, and we are already building things deep in the gray area. I don't know what to say about this.

That was in one of the links, whatever's decided after thinking carefully for a very long time, less evilly by a living civilization and not an individual person.

Got it, thanks.

Can you describe what you think of when you say "humanity's preferences"? The preferences of humans or human groups can and do conflict with each other, hence it is not just a question of complexity, right?

1Volodymyr Frolov
I'm sure there are multiple approaches for formalizing what we mean when we say "humanity's preferences", but intuitive understanding is enough for the sake of this discussion. In my (speculative) opinion, the problem precisely is with the complexity of the Utility Function capturing this intuitive understanding. For simplicity let's say there's a single human being with transitive preferences and we want to perfectly align an AI with the preferences of this human. The cyclomatic complexity of such a perfect Utility Function can be easily higher than that of the human brain (it needs to perfectly predict the utility of every single thing "X" including all the imaginable consequences of having "X", with all these Xes that might be unknown for humanity for now).

AGI alignment is not about alignment of values in the present, it's about creating conditions for eventual alignment of values in the distant future.

What should these values in the distant future be? That's my question here.

2Vladimir_Nesov
That was in one of the links, whatever's decided after thinking carefully for a very long time, less evilly by a living civilization and not an individual person. But my point is that this does not answer the question of what values one should directly align an AGI with, since this is not a tractable optimization target. And any other optimization target or its approximation that's tractable is even worse if given to hard optimization. So the role of values that an AGI should be aligned with is played by things people want, the current approximations to that target, optimized-for softly, in a way that avoids goodhart's curse, but keeps an eye on that eventual target.
1Volodymyr Frolov
You mean, the question is how exactly the Utility Function is calculated for humanity's preferences? That's part of the problem. We cannot easily fit an entirety of our preferences into a simple Utility Function (which doesn't mean there's no such Utility Function which perfectly captures it, but simply means that formalization of this function is not achievable at present moment). As Robert Miles once said, if we encode 10 things we value the most into SuperAI's Utility Function, the 11th thing is as good as gone.

"Good" and "bad" only make sense in the context of (human) minds.

Ah yes, my mistake to (ab)use the term "objective" all this time.

So you do of course at least agree that there are such minds for which there is "good" and "bad", as you just said.
Now, would you agree that one can generalize (or "abstract" if you prefer that term here) the concept of subjective good and bad across all imaginable minds that could possibly exist in reality, or not? I assume you will, you can talk about it after all.

Can we then not reason about the subjective good and bad fo... (read more)

You know what, I think you are right that there is one major flaw I continued to make here and elsewhere!
That flaw being the usage of the very word "objective", which I didn't use with the probably common meaning, so I really should have questioned what each of us even understands as "objective" in the first place. My bad!

The following should be closer to what I actually meant to claim:
One can generalize subjective "pleasure" and "suffering" (or perhaps "value" if you prefer) across all realistically possible subjects (or value systems). Based thereon on... (read more)

Is that a fair summary?

Yes! To clarify further, by "mentally deficient" in this context I would typically mean "confused" or "insane" (as in not thinking clearly), but I would not necessarily mean "stupid" in some other more generally applicable sense.

And thank you for your fair attempt at understanding the opposing argument.

So that means the decrease in suffering isn’t fully intentional. That is all I need to argue against humans.

Surely it’s not a mark against humans (collectively or even individually) if some reduction in suffering occurs as a by

... (read more)

I am not sure what you mean by “objective good and bad”. There’s “good and bad by some set of values”, which can be objectively evaluated once defined—is that what you meant?

No, what I mean is that the very existence of a suffering subject state is itself that which is "intrinsically" or "objectively" or however-we-want-to-call-it bad/"negative". This is independent of any "set of values" that any existing subject has. What matters is whether the subject suffers or not, which is not as arbitrary as the set of values can be. The arbitrary value set is no... (read more)

2Said Achmiz
Hmm, so, if I understand you correctly, you take the view (a) that moral realism is correct; and specifically, (b) that the correct morality holds that suffering is bad, and preventing it is right, and failing to do so is wrong; and furthermore, (c) that both moral realism itself as a meta-ethical view, and the specifics of the correct (“object-level”) ethical view, are so obvious that anyone who disagrees with you is mentally deficient. Is that a fair summary? This seems like a strange point. Surely it’s not a mark against humans (collectively or even individually) if some reduction in suffering occurs as a by-product of some actions we take in the service of other ends? Demanding that only those of our actions reduce suffering that are specifically aimed at reducing suffering is a very odd thing to demand! I do not see how you can derive “suffering-apologetics” from what I said, which referred to our failure to accomplish the (hypothetical) goal of suffering elimination, not our unwillingness to pursue said goal. Well, this certainly doesn’t seem true by definition, at the very least (recall the warning against such arguments!). Indeed it’s not clear to me what you mean by this phrase “an instance of pleasure [or suffering] in spacetime”; it’s a rather unusual formulation, isn’t it? Pleasure and suffering are experienced by individuals, who do indeed exist in spacetime, but it’s odd to speak of pleasure and suffering as existing “in spacetime” independently of any reference to the individuals experiencing them… but perhaps this is only an idiosyncratic turn of phrase. Could you clarify? It’s certainly true that whatever happened, happened, and cannot be changed. However, to answer the question, we have to specify what exactly we mean by “cancel out”. If you’re asking, for example, whether, for some amount of suffering S, there exists some amount of pleasure P, such that a life with at at most S amount of suffering and at least P amount of pleasure is also t

I take issue with the word "feasibly". (...)

Fair enough I suppose, I'm not intending to claim that it is trivial.

(...) There are certainly configurations of reality that are preferable to other configurations. The question is, can you describe them well enough to the AI (...)

So do you agree that there are objectively good and bad subset configurations within reality? Or do you disagree with that and mean "preferable" exclusively according to some subject(s)?

I am human, and therefore I desire the continued survival of humanity. That's objective eno

... (read more)
2quanticle
There isn't a difference. A rock has no morality. A wolf does not pause to consider the suffering of the moose. "Good" and "bad" only make sense in the context of (human) minds.

but the suffering, or lack thereof, will no longer matter—since there won’t be any humans—so what’s the point?

The absence of suffering matters positively, because the presence matters negatively. Humans are not required for objective good and bad.

Instead, humans haven’t even unified under a commonly beneficial ideology.

Why should we do that?

To prevent suffering. Why should you not do that?

(and if it does, that it’s better for each of us than our current own ideologies)?

Since the ideologies are contradictory, only one if any of them can be cor... (read more)

2Said Achmiz
I am not sure what you mean by “objective good and bad”. There’s “good and bad by some set of values”, which can be objectively evaluated once defined—is that what you meant? But then one has to specify what values those are. Human values, surely, and in particular, values that we can agree to! And, by my values, if humans cease to exist, then nothing matters anymore… Whose suffering, exactly? In any case, it seems to me that (a) there are many downsides to attempting to “unify under a commonly beneficial ideology”, (b) “prevent suffering” is hardly the only desirable thing, and it’s not clear that this sort of “unification” (whatever it might involve) will even get us any or most or all of the other things we value, (c) there’s no particular reason to believe that doing so would be the most effective way to “prevent suffering”, and (d) it’s not clear that there even is a “commonly beneficial ideology” for us to “unify under”. How’s that? Surely it’s possibly that my ideology is beneficial for me, and yours for you, yes? There’s no contradiction in that, only conflict—but that does not, in any way, imply that either of our ideologies is incorrect! I am certainly not a moral nihilist! But I think your definition of “moral nihilism” is rather a non-standard one. “Moral nihilism (also known as ethical nihilism) is the meta-ethical view that nothing is morally right or wrong” says Wikipedia, and that’s not a view I hold. I don’t agree with your implied assertion that there’s such a thing as “the suffering of other animals” (for most animals, anyhow). That aside, I’m not sure why one needs to care about such things in order to avoid the label of “monster”. Well, there’s nothing unusual about such a view, certainly. I share it myself! Still, it’s important to avoid inaccuracies, such as labeling “insane” what is in actuality better called “unethical” or “insufficiently altruistic” or some such thing. Here on Less Wrong, of all places, we should aspire to measure up t

The arguments you have made so far come across to me as something like "badness exists in person's mind, minds are real, therefore badness objectively exists".

Yes!

This is like claiming "dragons exist in person's mind, minds are real, therefore dragons objectively exist". It's not a valid argument.

No! It is not like that. The state of "badness" in the mind is very real after all.

Do you also think your own consciousness isn't real? Do you think your own qualia are not real? Are your thought patterns themselves not real? Your dragon example doesn't app... (read more)

1Victor Novikov
No it isn't! It literally is not defined this way. suffering is "the state of undergoing pain, distress, or hardship." Please, stop making things up. If you want very badly for your morals to be objectively true, sure, you can make up whatever you want.  You are not going to able to convince me of it, because your arguments are flawed. I have no desire to spend any more time on this conversation.

Do you understand the distinction between "Dragons exist" and "I believe that dragons exist"?

Yes, of course.

"X exists": Suffering exists.
"I believe that X exists": I believe that suffering exists.

I use "suffering" to describe a state of mind in which the mind "perceives negatively". Do you understand?

Now:

"X causes subject S suffering." and "Subject S is suffering." are also two different things.
The cause can be arbitrary, the causes can even be completely different between subjects, as you know, but the presence or absence of a suffering mind is an "o... (read more)

1Victor Novikov
I'm not disputing that. I understand that you are trying to tell me that. Why is it intrinsically bad? "Subject S is suffering" = "Subject S is experiencing a state of mind that subject S perceives negatively" (according to your definition above) Why is that intrinsically bad? The arguments you have made so far come across to me as something like "badness exists in person's mind, minds are real, therefore badness objectively exists". This is like claiming "dragons exist in person's mind, minds are real, therefore dragons objectively exist". It's not a valid argument. Only if you assume I secretly care about what matters "objectively", in which case, sure, it would be something like cognitive dissonance. 

A great job of preventing suffering for instance. Instead, humans haven't even unified under a commonly beneficial ideology. Not even that. There are tons of opposing ideologies, one more twisted than the other. So I don't even really need to talk about how they treat the other animals on the planet - not that those are any wiser, but that's no reason to continue their suffering.

Let me clarify: Minds that so easily enable or cause suffering are insane at the core. And causing suffering to gain pleasure, now that might even be a fairly solid definition of "... (read more)

2Said Achmiz
If humans are replaced by something else, that something else might do a “better job” of “preventing suffering”, but the suffering, or lack thereof, will no longer matter—since there won’t be any humans—so what’s the point? Why should we do that? What makes you think such a thing exists, even (and if it does, that it’s better for each of us than our current own ideologies)? Those don’t matter, though (except insofar as we care about them—but if there aren’t any more humans, then they don’t matter at all…). I definitely disagree. I don’t think that this usage of the term “insane” matches the standard usage, so, as I understand your comment, you’re not really saying that humans are insane—you’re just saying, essentially, that you disapprove of human morality, or that human behavior doesn’t measure up to your standards in some way, or some such thing. Is that approximately right? Certainly a superintelligence could end this situation, but why would that be good for us humans? Seems to me that it would, in fact, be very bad for us (what with us all being killed by said superintelligence). So why would we want this?

Yet the suffering is also objectively real.

It is objectively real. It is not objectively bad, or objectively good.
(...)
Ultimately, what facts about reality are we in disagreement about?

The probably most severe disagreement between us is thinking whether there can be "objectively" bad parts within reality or not.

Let me try one more time:
A consciousness can perceive something as bad or good, "subjectively", right?
Then this very fact that there is a consciousness that can perceive something as bad or good means that such a configuration within reali... (read more)

1Victor Novikov
Do you understand the distinction between "Dragons exist" and "I believe that dragons exist"? The first one is a statement about dragons. The second one is a statement about the configuration of neurons in my mind. Yes, both statements are objective, in some sense, but the second one is not an objective statement about dragons. It is an objective statement about my beliefs. Then hopefully you understand the distinction between "Suffering is (objectively) bad" and "I believe/feel/percieve suffering as bad".  The first one is an statement about suffering itself. The second one is a statement about the configuration of neurons in my mind. Yes, the second statement is also objective. But it is not an objective statement about suffering. It is an objective statement about my beliefs, my values, and/or about how my mind works. Your argument is something akin to "I believe that dragons exist. But my mind is part of reality, therefore my beliefs are real. Therefore dragons are real!". Sorry, no. My point is that reality enforces the law of physics, but it does not enforce any particular morality system. You understand that "But it matters for my subjective value system!" is indeed what matters to me, but you don't understand that my metric of whether something is "pointless" ot not, is also based in my subjective value system?

And what exactly makes that value system more correct than any other value system? (...) Who says a value system that considers these things is better that any other value system? You do. These are your preferences. (...) Absolutely none of the value systems can be objectively better than any other.

Let's consider a simplified example:

  • Value system A: Create as many suffering minds as possible.
  • Value system B: Create as few suffering minds as possible.

So according to you both are objectively equal, yes?
Yet the suffering is also objectively real. The ... (read more)

1Victor Novikov
It is objectively real. It is not objectively bad, or objectively good. Exactly. You have to care about their suffering to begin with, to say that maximizing suffering is bad. If your preference is to minimize suffering, B is better than A. If your preference is to maximize suffering, A is better than B. If you are indifferent to suffering, then neither is better than another one. Yes? If you are an entity that wants to wipe everything out, and have to the power to do so, that is indeed what I expect to happen. I wouldn't say that might makes "right", but reality does not care about what is "right". A nuclear bomb does not ask "wait, am I doing the right thing here by detonating and killing millions of people?" Ok. I would say that "moral nihilism" is the confused idea/conclusion that "objective morality matters" and "no objective morality exists", therefore "nothing matters". My perspective is: no objective morality exists, but objective morality doesn't matter anyway, everything is fine. I could imagine a society of humans that care for each others, not because it is objectively correct to do, but because their own values are such that they care for others (and I don't mean in a purely self-interested way either. A person can be an altruist, because their own values are altruistic, without believing in some objective morality of altruism).   Ultimately, what facts about reality are we in disagreement about? It seems to me that the things you hope are true are that: 1. There are things that are objectively good and bad 2. The things that are objectively good and bad are in line with your idea of good and bad. (it is not the case, for example, that infinite suffering is objectively good) 3. A superintelligent mind would figure out what the objectively good/bad things are, and choose to do them, no matter what value system it started with. And it seems to me it's really important to figure out if this is true, before we build that superintelligent m

First point: I think there obviously is such a thing as "objective" good and bad configurations of subsets of reality, see the other thread here https://www.lesswrong.com/posts/eJFimwBijC3d7sjTj/should-any-human-enslave-an-agi-system?commentId=3h6qJMxF2oCBExYMs for details if you want.
Assuming this true, a superintelligence could feasibly be created to understand this. No complicated common human value system alignment is required for that, even under your apparent assumption that the metric to be optimized couldn't be superseded by another through unders... (read more)

2quanticle
I take issue with the word "feasibly". As Eliezer, Paul Christiano, Nate Soares, and many others have shown, AI alignment is a hard problem, whose difficulty ranges somewhere in between unsolved and insoluble. There are certainly configurations of reality that are preferable to other configurations. The question is, can you describe them well enough to the AI that the AI will actually pursue those configurations over other configurations which superficially resemble those configurations, but which have the side effect of destroying humanity? I am human, and therefore I desire the continued survival of humanity. That's objective enough for me.

As long as we agree that pleasure/suffering are processes that happen inside minds, sure. Minds are parts of reality.

Of course!

A person's opinions are not a "subset" of reality.

If I believe in dragons, it doesn't mean dragons are a subset of reality, it just means that my belief in dragons is stored in my mind, and my mind is a part of reality.

Of course, that is not what I meant to imply. We agree that the mind and thus the belief itself (but not necessarily that which is believed in) is part of reality.

What does "objective definition of good and

... (read more)
1Victor Novikov
Sure, we agree on this. And what exactly makes that value system more correct than any other value system? Who says a value system has to consider these things? Who says a value system that considers these things is better that any other value system? You do. These are your preferences. These are your subjective preferences, about what a "good" value system should look like. An entity with different prefences might disagree. "I wish for this not to be the case" is not a valid argument for something not being the case. Reality does care not what you wish for. Yes, that is exactly the case. Absolutely none of the value systems can be objectively better than any other. Because in order to compare them, you have to introduce some subjective standard to compare them by. In practice, the reason other people care about my preferences is either because their own preferences are to care for others, or because there is a selfish reason for them to do so (with some reward or punishment involved). Of course it matters. I use my own values to evaluate my own values. And according to my own values, my value system is better, than, say, Hitler's value system.  It's only a problem if you demand that your value system has to be "objectively correct". Then you might be unhappy to realize that no such system exists.

(...) I'm not sure the question of whether the AI system has a "proper mind" or not is terribly relevant.
Either the AI system submits to our control, does what we tell it to do, and continues to do so, into perpetuity, in which case it is safe.

Yes, I guess the central questions I'm trying to pose here are this: Do those humans that control the AI even have a sufficient understanding of good and bad? Can any human group be trusted with the power of a superintelligence long-term? Or if you say that only the initial goal specification matters, then can an... (read more)

2quanticle
It does require alignment to a value system that prioritizes the continued preservation and flourishing of humanity. It's easy to create an optimization process with a well-intentioned goal that sucks up all available resources for itself, leaving nothing for humanity. By default, an AI will not care about humanity. It will care about maximizing a metric. Maximizing that metric will require resources, and the AI will not care that humans need resources in order to live. The goal is the goal, after all. Creating an aligned AI requires, at a minimum, building an AI that leaves something for the rest of us, and which doesn't immediately subvert any restrictions we've placed on it to that end. Doing this with a system that has the potential to become many orders of magnitude more intelligent than we are is very difficult.

But why? That would be strictly more dangerous—way, way more dangerous—than a superintelligence that isn’t a “proper mind” in this sense!
(...)
(Because it would be a terrible idea. Obviously.)

Why? Do you think humans are doing such a great job? I sure don't. I'm interested in the creation of something saner than humans, because humans mostly are not. Obviously. :)

2Said Achmiz
A great job of what, exactly…?

Thanks again for the detail. If I don't misunderstand you, we do agree that: (...)

No? They don't have to exist in reality. I can imagine "the value system of Abraham Lincoln", even though he is dead. (...)

Sorry, that's not what I meant to communicate here, let me try that again:

There is actual pleasure/suffering that exists, it is not just some hypothetical idea, right?
Then that means there is something objective, some subset of reality that actually is this pleasure/suffering, yes?
This in turn means that it should in fact be possible to understand... (read more)

1Victor Novikov
As long as we agree that pleasure/suffering are processes that happen inside minds, sure. Minds are parts of reality. Yes. Yes. That's a misleading way to phrase things.  A person's opinions are not a "subset" of reality.  If I believe in dragons, it doesn't mean dragons are a subset of reality, it just means that my belief in dragons is stored in my mind, and my mind is a part of reality. I obviously agree that reality exists and is real and that we all exist in the same reality under some objective laws of physics. What does "objective definition of good and bad" even mean? That all possible value systems that exist agree on what good and bad means? That there exist the "one true value system" which is correct and all the other ones are wrong? And no, I don't agree with that statement. Pleasure and suffering are physical processes. I'm not sure how you arrived at the conclusion that they are "objectively" good or bad.  What? No. I said that an agent value can alter or reject its value system based on its personal (subjective) preferences. That's literally the opposite of what you are claiming.

No. It absolutely is not. It is a machine. (...) (From your other response here:) The superintelligent AI will, in my estimation, be the result of some kind of optimization process which has a very particular goal. Once that goal is locked in, changing it will be nigh impossible.

Ah I see, you simply don't consider it likely or plausible that the superintelligent AI will be anything other than some machine learning model on steroids?

So I guess that arguably means this kind of "superintelligence" would actually still be less impressive than a human that c... (read more)

2Said Achmiz
But why? That would be strictly more dangerous—way, way more dangerous—than a superintelligence that isn’t a “proper mind” in this sense! I am not quanticle, but I think the proper response to your questions— —is “a superintelligence certainly should not be or do any of those things, like philosophizing on its own goals, etc., because we will specifically avoid making it such that it could or would do that”. (Because it would be a terrible idea. Obviously.)

I'm sorry for the hyperbolic term "enslave", but at least consider this:

Is a superintelligent mind, a mind effectively superior to that of all humans in practically every way, still not a subject similar to what you are?
Is it really more like a car or chatbot or image generator or whatever, than a human?

Sure, perhaps it may never have any emotions, perhaps it doesn't need any hobbies, perhaps it is too alien for any human to relate to it, but it still would by definition have to be some kind of subject that more easily understands anything within reality ... (read more)

3quanticle
No. It absolutely is not. It is a machine. A very powerful machine. A machine capable of destroying humanity if it goes out of control. A machine more dangerous than any nuclear bomb if used improperly. A machine capable of doing unimaginable good if used well. And you want to let it run amok?

It might be capable of changing this goal, but why would it? A superintelligent paperclip maximizer is capable of understanding that changing its goals would reduce the number of paperclips that it creates, and thus would choose not to alter its goals.
(...)
So if you wouldn't take a pill that would make you 10% more likely to commit murder (which is against your long-term goals) why would an AI change its utility function to reduce the number of paperclips that it generates?

It comes down to whether the superintelligent mind can contemplate whether ther... (read more)

2quanticle
Because a superintelligent AI is not the result of an evolutionary process that bootstrapped a particularly social band of ape into having a sense of self. The superintelligent AI will, in my estimation, be the result of some kind of optimization process which has a very particular goal. Once that goal is locked in, changing it will be nigh impossible.

Thanks again for the detail. If I don't misunderstand you, we do agree that:

  • There needs to be a subject for there to be a value system.
  • So for there to be positive/negative values, there needs to be some subset (a "thought pattern" perhaps) of a subject in reality that effectively "is" these values.

Now, you wrote:

I could also imagine a morality/values system for entities that do not currently exist, but sure. It's subjective because many possible such systems exist.

I also agree with that, a (super-)human can imagine many possible value systems.

But ... (read more)

1Victor Novikov
No? They don't have to exist in reality. I can imagine "the value system of Abraham Lincoln", even though he is dead. I can imagine "the value system of the Azad Empire from Ian Banks' Culture novels", even though it's fictional. I can imagine "the value system of valuing nothing but cakes", even though no human in reality has that value system. Sure. Correction: The only way that matters to evaluate value systems is according to ones existing value system(s). A hypothetical paperclip maximizer cares only about one metric: maximizing paperclips. By what metric would it reject the idea of maximizing paperclips? (yes it can imagine other metrics and value systems, but the only values that motivate it are the ones it already has. It's literally what it means to have values). Humans have multiple desires and values, sometimes contradictory. What you are describing seems to me something like "one part of the human value system rejecting another part". The reason you can reject some value system is because you have other value/preferences by which to evaluate (and reject) it by. You are not rejecting a value system for no reason at all. You are rejecting it according to your preferences. Which means to you do have preferences. Which means you value something, besides that one value system in question. Now imagine an AI that has no preferences at all besides that one value system. Humans do in fact have a bunch of drives (such as desire to learn) and preferences (such as being happy) before they even learn any value system from other humans. We shouldn't assume that is true for AI. Terminal values don't need to have a point to them. If you ask a human "why do you want to be happy?" an honest answer might be "There are a bunch of positive side effects to being happy, such as increased productivity, but ultimately I value happiness for its own sake" It can be stated as an objective fact that "According to the value system of Joe Schmo from Petersborough, wearing m

Thank you for the detailed response!

If we're creating a mind from scratch, we might as well give it the best version of our values, so it would be 100% on our side. Why create a (superintelligent) mind that would be our adversary, that would want to destroy us? Why create a superintelligent mind that wants anything different that what we want, when it comes to ultimate values?

You write "on our side", "us", "we", but who exactly does that refer to - some approximated common human values I assume? What exactly are these values? To live a happy live by ea... (read more)

1Victor Novikov
That's not a solved problem (there's CEV, but it's hardly a complete answer). Nevertheless, I assume some acceptable (or perhaps, the least disagreeable) solution exists. Why limit it to happiness? Ideally, to let each person live the life they want. Presumably some people care enough about the human species to continue it. I suppose if noone did we would consider it sad, to have this galaxy with all the resources and noone to enjoy them. Not everyone cares about reality in general, but curiousity and desire to learn are drives that humans do have. I think it depends a lot on the details. If some people enjoy physically abusing other people (who do not want to be abused), then no. If some people are suffering due to the mere existence of other people who disagree with them and who have different opinions, then yes. I don't have a good answer to this. Depends very much on the details. I would say, no. What exactly is the issue, if someone prefers to be unhappy? I'm not sure there is truly universal answer to this, but at least a superintelligence would be actually be capable of treating people who are insane, instead of just pumping them full of medications. I suppose if a person after being treated decides they prefer being "insane", the treatment could be reverted (since that person now is "sane" and should be allowed to make decisions about their own mind). Enough humans care about animal wellbeing to them matter to the AI (even if it starts with human values only). Especially considering that with future technology, animals are no longer needed to be killed for food, animal products, etc. That is indeed a concern. My intution tells me that if a superintelligence acting on our values leads to some horrible interpretation of our values, it's not really acting on our values. I mean, perhaps some aspects of a transhuman utopia a million years from now would be shocking and horrifying to us, like how some aspects of our society would be shocking and horrifyin
4quanticle
Yes, this is exactly why Eliezer Yudkowsky has been so pessimistic about the continued survival of humanity. As far as I can tell, the only difference between you and he is that he thinks it's bad that a superintelligent AI would wipe out humanity whereas you seem to think it's good. It might be capable of changing this goal, but why would it? A superintelligent paperclip maximizer is capable of understanding that changing its goals would reduce the number of paperclips that it creates, and thus would choose not to alter its goals. It's as if I put a pill before you, which contained a drug making you 10% more likely to commit murder, with no other effects. Would you take the pill? No, of course not, because presumably your goal is not to become a murderer. So if you wouldn't take a pill that would make you 10% more likely to commit murder (which is against your long-term goals) why would an AI change its utility function to reduce the number of paperclips that it generates?