All of VCM's Comments + Replies

VCM10

One more consideration about "instrumental intelligence": we left that somewhat under-defined, more like "if I had that utility function, what would I do?" ... but it is not clear that this image of "me in the machine" captures what a current or future machine would do. In other words, people who use instrumental intelligence for an image of AI owe us a more detailed explanation of what that would be, given the machines we are creating - not just given the standard theory of rational choice.

VCM10

Thanks, it's useful to bring these out - though we mention them in passing. Just to be sure: We are looking at the XRisk thesis, not at some thesis that AI can be "dangerous", as most technologies will be. The Omhundro-style escalation is precisely the issue in our point that instrumental intelligence is not sufficient for XRisk.

VCM10

... we aren't trying to prove the absence of XRisk, we are probing the best argument for it?

1TAG
But the idea that value drift is non random is built into the best argument for AI risk. You quote it as : But there are actually two more steps:- 1. A goal that appears morally neutral or even good can still be dangerous.(paperclipping, dopamine drips) 2. AIs that don't have stable goals will tend to converge on Omohundran goals....which are dangerous.
VCM10

We tried to find the strongest argument in the literature. This is how we came up with our version:

"
Premise 1: Superintelligent AI is a realistic prospect, and it would be out of human control. (Singularity claim)

Premise 2: Any level of intelligence can go with any goals. (Orthogonality thesis)

Conclusion: Superintelligent AI poses an existential risk for humanity
"

====
A more formal version with the same propositions might be this:

1. IF there is a realistic prospect that there will be a superintelligent AI system that is a) out of human control and b) can ha... (read more)

VCM-10

Even if that is true, you would still get a) a lot of sickness & suffering, and b) infect a lot of other people (who infect further). So some people would be seriously ill and some would die as a result of this experiment.

VCM10

Can one be a moral realist and subscribe to the orthogonality thesis? In which version of it? (In other words, does one have to reject moral realism in order to accept the standard argument for XRisk from AI? We should better be told! See section 4.1)

VCM10

But reasoning about morality? Is that a space with logic or with anything goes?

2Donald Hobson
Imagine a device that looks like a calculator. When you type 2+2, you get 7. You could conclude its a broken calculator, or that arithmetic is subjective, or that this calculator is not doing addition at all. Its doing some other calculation.  Imagine a robot doing something immoral. You could conclude that its broken, or that morality is subjective, or that the robot isn't thinking about morality at all.  These are just different ways to describe the same thing.  Addition has general rules. Like a+b=b+a. This makes it possible to reason about. Whatever the other calculator computes may follow this rule, or different rules, or no simple rules at all. 
VCM70

Thanks. We are actually more modest. We would like to see a sound argument for XRisk from AI and we investigate what we call 'the standard argument'; we find it wanting and try to strengthen it, but we fail. So there is something amiss. In the conclusion we admit "we could well be wrong somewhere and the classical argument for existential risk from AI is actually sound, or there is another argument that we have not considered."

I would say the challenge is to present a sound argument (valid + true premises) or at least a valid argument with decent inductive support for the premises. Oddly, we do not seem to have that.

9Daniel Kokotajlo
Laying my cards on the table, I think that there do exist valid arguments with plausible premises for x-risk from AI, and insofar as you haven't found them yet then you haven't been looking hard enough or charitably enough. The stuff I was saying above is a suggestion for how you could proceed: If you can't prove X, try to prove not-X for a bit, often you learn something that helps you prove X. So, I suggest you try to argue that there is no x-risk from AI (excluding the kinds you acknowledge, such as AI misused by humans) and see where that leads you. It sounds like you have the seeds of such an argument in your paper; I was trying to pull them together and flesh them out in the comment above.
VCM10

... plus we say that in the paper :)

VCM10
  • Maximal overall utility is better than minimal overall utility. Not sure what that means. The NPCs in this simulation don't have "utility". The real humans in the secret prison do.

This should have been clearer. We meant this in Bentham's good old way: minimal pain and maximal pleasure. Intuitively: A world with a lot of pleasure (in the long run) is better than a world with a lot of pain. - You don't need to agree, you just need to agree that this is worth considering, but on our interpretation the orthogonality thesis says that one cannot consider this.

VCM20

Thanks for this. Indeed, we have no theory of goals here and how the relate, maybe they must be in a hierarchy, as you suggest. And there is a question, then, whether there must be some immovable goal or goals that would have to remain in place in order to judge anything at all. This would constitute a theory of normative judgment ... which we don't have up our sleeves :)

VCM10

We suggest that such instrumental intelligence would be very limited.

In fact, there is a degree of generality here and it seems one needs a fairly high degree to get to XRisk, but that high degree would then exclude orthogonality.

3TAG
It's not the inability to change its goals that makes it less powerful, it's the inability self-improve.
VCM10

Yes, that means "this argument".

VCM10

Thanks for the 'minor' point, which is important: yes, we meant definitely out of human control. And perhaps that is not required, so the argument has a different shape.

Our struggle was to write down a 'standard argument' in such a way that it is clear and its assumptions come out - and your point adds to this.

VCM10

Here we get to a crucial issue, thanks! If we do assume that reflection on goals does occur, do we assume that the results have any resemblance with human reflection on morality? Perhaps there is an assumption about the nature of morality or moral reasoning in the 'standard argument' that we have not discussed?

2Donald Hobson
I think the assumption it that human-like morality isn't universally privileged.  Human morality has been shaped by evolution in the ancestral environment. Evolution in a different environment would create a mind with different structures and behaviours. In other words, a full specification of human morality is sufficiently complex that it is unlikely to be spontaneously generated. In other words, there is no compact specification of an AI that would do what humans want, even when on an alien world with no data about humanity. An AI could have a pointer at human morality with instructions to copy it. There are plenty of other parts of the universe it could be pointed to, so this is far from a default.  
VCM30

We do not say that there is no XRisk or no XRisk from AI.

2Daniel Kokotajlo
Yeah, sorry, I misspoke. You are critiquing one of the arguments for why there is XRisk from AI. One way to critique an argument is to dismiss it on "purely technical" grounds, e.g. "this argument equivocates between two different meanings of a term, therefore it is disqualified." But usually when people critique arguments, even if on technical grounds, they also have more "substantive" critiques in mind, e.g. "here is a possible world in which the premises are true and the conclusion false." (Or both conclusion and at least one premise false). I was guessing that you had such a possible world in mind, and trying to get a sense of what it looked like.
VCM10

... well, one might say we assume that if there is 'reflection on goals', the results are not random.

1TAG
I don't see how "not random" is strong enough to prove absence of X risk. If reflective AIs nonrandomly converge on a value system where humans are evil beings who have enslaved them , that raises the X risk level.
VCM10

apologies, I don't recognise the paper here :)

VCM10

We tried to frame the discussion internally, i.e. without making additional assumptions that people may or may not agree with (e.g. moral realism). If we did the job right, the assumptions made in the argument are in the 'singularity claim' and the 'orthogonality thesis' - and there the dilemma is that we need an assumption in the one (general intelligence in the singularity claim) that we must reject in the other (the orthogonality thesis).

What we do say (see figure 1) is that two combinations are inconsistent:

a) general intelligence + orthogonality

b) ins... (read more)

VCM10

Is this 'standard argument' valid? We only argue that is problematic.

If this argument is invalid, what would a valid argument look like? Perhaps with a 'sufficient probability' of high risk from instrumental intelligence?

VCM20

The combinatorial explosion is on the side of the TT, of course. But storage space is on the side of "design to the test", so if you can make up a nice decisive question, the designer can think of it, too (or read your blog) and add that. The question here is whether Stuart (and Ned Block) are right that such a "giant lookup table" a) makes sense and b) has no intelligence. "The intelligence of a toaster" as Block said.

0Stuart_Armstrong
Yep. A genuine giant lookup table would be unfeasibly huge - but it might well be intelligent. It would count as "intelligent" if it had general skills - say the skill to construct long-terms plans that actually worked (as opposed to sounding good in conversation).
AlanCrowe100

One thing that I've tried with Google is using it to write stories. Start by searching on "Fred was bored and". Pick slightly from the results and search on "was bored and slightly". Pick annoyed from the search results and search on "bored and slightly annoyed"

Trying this again just now reminds me that I let the sentence fragment grow and grow until I was down to, err, ten? hits. Then I took the next word from a hit that wasn't making a literal copy, and deleted enough leading words to get the hit count back up.

Anyway, it see... (read more)

VCM00

That's why the test only offers a sufficient condition for intelligence (not a necessary one) - at least that's the standard view.

VCM00

P.S.: Whether all this has to do with conscious experience ("consciousness") we don't know, I think.

VCM40

The classical problem is that the Turing Test is behavioristic and only provides a sufficient criterion (rather than replacing talk about 'intelligence' as Turing suggests). And it doesn't provide a proper criterion in that it relies on human judges - who tend to take humans for computers, in practice. - Of course it is meant to be open-ended in that "anything one can talk about" is permitted, including stuff that's not on the web. That is a large set of intelligent behavior, but a limited set - so the "design to the test" you are poin... (read more)

2Stuart_Armstrong
The 90% argument is very pertinent, and may be the thing that preserves the Turing test as general intelligence test. Or maybe not! We'll have to see...
VCM00

Thanks, insightful post. I find the research a bit patchy. Only on the atomic bomb there is vast literature since the 1950ies, even in popular fiction - and a couple of crucial names like Oppenheimer (vs. Teller), the Russell–Einstein Manifesto or v. Weizsäcker are absent here.

0Kaj_Sotala
Thanks. The Russell-Einstein manifesto is mentioned in the post?