Comment Permalink

Donald Hobson4y50

if a human had been brought up to have ‘goals as bizarre … as sand-grain-counting or paperclip-maximizing’, they could reflect on them and revise them in the light of such reflection.

Human "goals" and AI goals are a very different kind of thing.

Imagine the instrumentally rational paperclip maximizer. If writing a philosophy essay will result in more paperclips, it can do that. If winning a chess game will lead to more paperclips, it will win the game. For any gradable task, if doing better on the task leads to more paperclips, it can do that task. This includes the tasks of talking about ethics, predicting what a human acting ethically would do etc. In short, this is what is meant by "far surpass all the intellectual activities of any man however clever.".

The singularity hypothesis is about agents that are better at achieving their goal than human. In particular, the activities this actually depends on for an intelligence explosion are engineering and programming AI systems. No one said that an AI needed to be able to reflect on and change its goals.

Humans "ability" to reflect on and change our goals is more that we don't really know what we want. Suppose we think we want chocolate, and then we read about the fat content, and change our mind. We value being thin more. The goal of getting chocolate was only ever an instrumental goal, it changed based on new information. Most of the things humans call goals are instrumental goals, not terminal goals. The terminal goals are difficult to intuitively access. This is how humans appear to change their "goals". And this is the hidden standard to which paperclip maximizing is compared and found wanting. There is some brain module that feels warm and fuzzy when it hears "be nice to people", and not when it hears "maximize paperclips".

Showing 3 of 10 replies (Click to show all)

1VCM4y

But reasoning about morality? Is that a space with logic or with anything goes?

2Donald Hobson4y

Imagine a device that looks like a calculator. When you type 2+2, you get 7. You could conclude its a broken calculator, or that arithmetic is subjective, or that this calculator is not doing addition at all. Its doing some other calculation. Imagine a robot doing something immoral. You could conclude that its broken, or that morality is subjective, or that the robot isn't thinking about morality at all. These are just different ways to describe the same thing. Addition has general rules. Like a+b=b+a. This makes it possible to reason about. Whatever the other calculator computes may follow this rule, or different rules, or no simple rules at all.

TAG4y10

These are just different ways to describe the same thing.

Not to the extent that there's no difference at all...you can exclude some of them on further investigation.

See in context

5 Is the argument that AI is an xrisk valid?

by MACannon

19th Jul 2021

1 min read

5

This is a linkpost for https://onlinelibrary.wiley.com/doi/10.1111/rati.12320

Hi folks,

My supervisor and I co-authored a philosophy paper on the argument that AI represents an existential risk. That paper has just been published in Ratio. We figured LessWrong would be able to catch things in it which we might have missed and, either way, hope it might provoke a conversation.

We reconstructed what we take to be the argument for how AI becomes an xrisk as follows:

The "Singularity" Claim: Artificial Superintelligence is possible and would be out of human control.
The Orthogonality Thesis: More or less any less of intelligence is compatible with more or less any final goal. (as per Bostrom's 2014 definition)

From the conjuction of these two presmises, we can conclude that ASI is possible, it might have a goal, instrumental or final, which is at odds with human existence, and, given the ASI would be out of our control, that the ASI is an xrisk.

We then suggested that each premise seems to assume a different interpretation of 'intelligence", namely:

The "Singularity" claim assumes general intelligence
The Orthogonality Thesis assumes instrumental intelligence

If this is the case, then the premises cannot be joined together in the original argument, aka the argument is invalid.

We note that this does not mean that AI or ASI is not an xrisk, only that the the current argument to that end, as we have reconstructed it, is invalid.

Eagerly, earnestly, and gratefully looking forward to any responses.

Existential riskGeneral intelligenceOrthogonality ThesisSingularitySuperintelligenceAI

Frontpage

5

Mentioned in

95[Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering

New Comment

62 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:37 AM

[-]Steven Byrnes4y*380

First I want to say kudos for posting that paper here and soliciting critical feedback :)

Singularity claim: Superintelligent AI is a realistic prospect, and it would be out of human control.

Minor point, but I read this as "it would definitely be out of human control". If so, this is not a common belief. IIRC Yampolskiy believes it, but Yudkowsky doesn't (I think?), and I don't, and I think most x-risk proponents don't. The thing that pretty much everyone believes is "it could be out of human control", and then a subset of more pessimistic people (including me) believes "there is an unacceptably high probability that it will be out of human control".

Let us imagine a system that is a massively improved version of AlphaGo (Silver et al., 2018), say ‘AlphaGo+++’, with instrumental superintelligence, i.e., maximising expected utility. In the proposed picture of singularity claim & orthogonality thesis, some thoughts are supposed to be accessible to the system, but others are not. For example:
Accessible
I can win if I pay the human a bribe, so I will rob a bank and pay her.
I cannot win at Go if I am turned off.
The more I dominate the world, the better my chances to achieve my goals.
I should kill all humans because that would improve my chances of winning.
Not accessible
Winning in Go by superior play is more honourable than winning by bribery.
I am responsible for my actions.
World domination would involve suppression of others, which may imply suffering and violation of rights.
Killing all humans has negative utility, everything else being equal.
Keeping a promise is better than not keeping it, everything else being equal.
Stabbing the human hurts them, and should thus be avoided, everything else being equal.
Some things are more important than me winning at Go.
Consistent goals are better than inconsistent ones
Some goals are better than others
Maximal overall utility is better than minimal overall utility.

I'm not sure what you think is going on when people do ethical reasoning. Maybe you have a moral realism perspective that the laws of physics etc. naturally point to things being good and bad, and rational agents will naturally want to do the good thing. If so, I mean, I'm not a philosopher, but I strongly disagree. Stuart Russell gives the example of "trying to win at chess" vs "trying to win at suicide chess". The game has the same rules, but the goals are opposite. (Well, the rules aren't exactly the same, but you get the point.) You can't look at the laws of physics and see what your goal in life should be.

My belief is that when people do ethical reasoning, they are weighing some of their desires against others of their desires. These desires ultimately come from innate instincts, many of which (in humans) are social instincts. The way our instincts work is that they aren't (and can't be) automatically "coherent" when projected onto the world; when we think about things one way it can spawn a certain desire, and when we think about the same thing in a different way it can spawn a contradictory desire. And then we hold both of those in our heads, and think about what we want to do. That's how I think of ethical reasoning.

I don't think ethical reasoning can invent new desires whole cloth. If I say "It's ethical to buy bananas and paint them purple", and you say "why?", and then I say "because lots of bananas are too yellow", and then you say "why?" and I say … anyway, at some point this conversation has to ground out at something that you find intuitively desirable or undesirable.

So when I look at your list I quoted above, I mostly say "Yup, that sounds about right."

For example, imagine that you come to believe that everyone in the world was stolen away last night and locked in secret prisons, and you were forced to enter a lifelike VR simulation, so everyone else is now an unconscious morally-irrelevant simulation except for you. Somewhere in this virtual world, there is a room with a Go board. You have been told that if white wins this game, you and everyone will be safely released from prison and can return to normal life. If black wins, all humans (including you and your children etc.) will be tortured forever. You have good reason to believe all of this with 100% confidence.

OK that's the setup. Now let's go through the list:

I can win if I pay the human a bribe, so I will rob a bank and pay her. Yup, if there's a "human" (so-called, really it's just an NPC in the simulation) playing black, amenable to bribery, I would absolutely bribe "her" to play bad moves.
I cannot win at Go if I am turned off. Yup, white has to win this game, my children's lives are at stake, I'm playing white, nobody else will play white if I'm gone, I'd better stay alive.
The more I dominate the world, the better my chances to achieve my goals. Yup, anything that will give me power and influence over the "person" playing black, or power and influence over "people" who can help me find better moves or help me build a better Go engine to consult on my moves, I absolutely want that.
I should kill all humans because that would improve my chances of winning. Well sure, if there are "people" who could conceivably get to the board and make good moves for black, that's a problem for me and for all the real people in the secret prisons whose lives are at stake here.

Winning in Go by superior play is more honourable than winning by bribery. Well I'm concerned about what the fake simulated "people" think about me because I might need their help, and I certainly don't want them trying to undermine me by making good moves for black. So I'm very interested in my reputation. But "honourable" as an end in itself? It just doesn't compute. The "honourable" thing is working my hardest on behalf of the real humanity, the ones in the secret prison, and helping them avoid a life of torture.
I am responsible for my actions. Um, OK, sure, whatever.
World domination would involve suppression of others, which may imply suffering and violation of rights. Those aren't real people, they're NPCs in this simulated scenario, they're not conscious, they can't suffer. Meanwhile there are billions of real people who can suffer, including my own children, and they're in a prison, they sure as heck want white to win at this Go game.
Killing all humans has negative utility, everything else being equal. Well sure, but those aren't humans, the real humans are in secret prisons.
Keeping a promise is better than not keeping it, everything else being equal. I mean, the so-called "people" in this simulation may form opinions about my reputation, which impacts what they'll do for me, so I do care about that, but it's not something I inherently care about.
Stabbing the human hurts them, and should thus be avoided, everything else being equal. No. Those are NPCs. The thing to avoid is the real humanity being tortured forever.
Some things are more important than me winning at Go. For god's sake, what could possibly be more important than white winning this game??? Everything is at stake here. My own children and everyone else being tortured forever versus living a rich life.
Consistent goals are better than inconsistent ones. Sure, I guess, but I think my goals are consistent. I want to save humanity from torture by making sure that white wins the game in this simulation.
Some goals are better than others. Yes. My goals are the goals that matter. If some NPC tells me that I should take up a life of meditation, screw them.
Maximal overall utility is better than minimal overall utility. Not sure what that means. The NPCs in this simulation don't have "utility". The real humans in the secret prison do.

Maybe you'll object that "the belief that these NPCs can pass for human but be unconscious" is not a belief that a very intelligent agent would subscribe to. But I only made the scenario like that because you're a human, and you do have the normal suite of innate human desires, and thus it's a bit tricky to get you in the mindset of an agent who cares only about Go. For an actual Go-maximizing agent, you wouldn't have to have those kinds of beliefs, you could just make the agent not care about humans and consciousness and suffering in the first place, just as you don't care about "hurting" the colorful blocks in Breakout. Such an agent would (I presume) give correct answers to quiz questions about what is consciousness and what is suffering and what do humans think about them, but it wouldn't care about any of that! It would only care about Go.

(Also, even if you believe that not-caring-about-consciousness would not survive reflection, you can get x-risk from an agent with radically superhuman intelligence in every domain but no particular interest in thinking about ethics. It's busy doing other stuff, y'know, so it never stops to consider whether conscious entities are inherently important! In this view, maybe 30,000,000 years after destroying all life and tiling the galaxies with supercomputers and proving every possible theorem about Go, then it stops for a while, and reflects, and says "Oh hey, that's funny, I guess Go doesn't matter after all, oops". I don't hold that view anyway, just saying.)

(For more elaborate intuition-pumping fiction metaethics see Three Worlds Collide.)

[-]Rafael Harth4y70

Reading this, I feel somewhat obligated to provide a different take. I am very much a moral realist, and my story for why the quoted passage isn't a good argument is very different from yours. I guess I mostly want to object to the idea that [believing AI is dangerous] is predicated on moral relativism.

Here is my take. I dispute the premise:

In the proposed picture of singularity claim & orthogonality thesis, some thoughts are supposed to be accessible to the system, but others are not. For example:

I'll grant that most of the items on the inaccessible list are, in fact, probably accessible to an ASI, but this doesn't violate the orthogonality thesis. The Orthogonality thesis states that a system can have any combination of intelligence and goals, not that it can have any combination of intelligence and beliefs about ethics.

Thus, let's grant that an AI with a paperclip-like utility function can figure out #6-#10. So what? How is [knowing that creating paperclips is morally wrong] going to make it behave differently?

You (meaning the author of the paper) may now object that we could program an AI to do what is morally right. I agree that this is possible. However:

(1) I am virtually certain that any configuration of maximal utility doesn't include humans, so this does nothing to alleviate x-risks. Also, even if you subscribe to this goal, the political problem (i.e., convincing AI people to implement it) sounds impossible.

(2) We don't know how to formalize 'do what is morally right'.

(3) If you do black box search for a model that optimizes for what is morally right, this still leaves you with the entire inner alignment problem, which is arguably the hardest part of the alignment problem anyway.

Unlike you (now meaning Steve), I wouldn't even claim that letting an AI figure out moral truths is a bad approach, but it certainly doesn't solve the problem outright.

[-]Steven Byrnes4y50

Oh OK, I'm sufficiently ignorant about philosophy that I may have unthinkingly mixed up various technically different claims like

"there is a fact of the matter about what is moral vs immoral",
"reasonable intelligent agents, when reflecting about what to do, will tend to decide to do moral things",
"whether things are moral vs immoral has nothing to do with random details about how human brains are constructed",
"even non-social aliens with radically different instincts and drives and brains would find similar principles of morality, just as they would probably find similar laws of physics and math".

I really only meant to disagree with that whole package lumped together, and maybe I described it wrong. If you advocate for the first of these without the others, I don't have particularly strong feelings (…well, maybe the feeling of being confused and vaguely skeptical, but we don't have to get into that).