You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Warrigal comments on Open thread, September 2-8, 2013 - Less Wrong Discussion

0 Post author: David_Gerard 02 September 2013 02:07PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (376)

You are viewing a single comment's thread.

Comment author: [deleted] 08 September 2013 02:03:23AM 0 points [-]

Why should an AI have to self-modify in order to be super-intelligent?

One argument for self-modifying FAI is that "developing an FAI is an extremely difficult problem, and so we will need to make our AI self-modifying so that it can do some of the hard work for us". But doesn't making the FAI self-modifying make the problem much more difficult, since how we have to figure out how to make goals stable under self-modification, which is also a very difficult problem?

The increased difficulty could be offset by the ability for the AI to undergo a "self-modifying foom", which results in a titanic amount of intelligence increase from relatively modest beginnings. But would it be possible for an AI to have a "knowledge-about-problem-solving foom" instead, where the AI increases its intelligence not by modifying itself, but by increasing the amount of knowledge it has about how to solve problems?

Here are some differences that come to mind between the two kinds of fooms:

  • A self-modification could change the AI's behavior in an arbitrary manner. Obtaining knowledge about problem-solving could only change the AI's behavior via metacognition.
  • A bad self-modification could easily destroy the AI's safety (unless we figure out how to fix this problem!). Obtaining knowledge about problem-solving would only destroy the AI's safety if the knowledge is substantially misleading. (An AI might somehow come to believe that it should only read pro-Green books, and then fail to take into account the fact that beliefs naively derived from reading pro-Green books will be biased towards Green.)
  • Any "method of being intelligent" can be turned into a self-modification. Not every method of being intelligent can effectively be turned into a piece of knowledge about problem-solving, because there's only a limited set of beliefs that the AI could act upon. (A non-self-modifying AI may be programmed to think about pizza upon believing the statement "I should think about pizza", but it is less likely to be programmed to adjust all its beliefs to be pro-Blue, without evidence, upon believing the statement "I should adjust all my beliefs to be pro-Blue, without evidence".)

Certainly self-modification has its advantages, but so does pure KAPS, so I'm confused about how it seems like literally everyone in the FAI community seems to believe self-modification is necessary for a strong AI.

Comment author: hairyfigment 08 September 2013 05:59:49AM 2 points [-]

But would it be possible for an AI to have a "knowledge-about-problem-solving foom" instead, where the AI increases its intelligence not by modifying itself, but by increasing the amount of knowledge it has about how to solve problems?

My immediate reaction is, 'Possibly -- wait, how is that different? I imagine the AI would write subroutines or separate programs that it thinks will do a better job than its old processes. Where do we draw the line between that and self-modification or -replacement?'

If we just try to create protected code that it can't change, the AI can remove or subvert those protections (or get us to change them!) if and when it acquires enough effectiveness.

Comment author: [deleted] 09 September 2013 04:19:08AM *  1 point [-]

The distinction I have in mind is that a self-modifying AI can come up with a new thinking algorithm to use and decide to trust it, whereas a non-self-modifying AI could come up with a new algorithm or whatever, but would be unable to trust the algorithm without sufficient justification.

Likewise, if an AI's decision-making algorithm is immutably hard-coded as "think about the alternatives and select the one that's rated the highest", then the AI would not be able to simply "write a new AI … and then just hand off all its tasks to it"; in order to do that, it would somehow have to make it so that the highest-rated alternative is always the one that the new AI would pick. (Of course, this is no benefit unless the rating system is also immutably hard-coded.)

I guess my idea in a nutshell is that instead of starting with a flexible system and trying to figure out how to make it safe, we should start with a safe system and try to figure out how to make it flexible. My major grounds for believing this, I think, is that it's probably going to be much easier to understand a safe but inflexible system than it is to understand a flexible but unsafe system, so if we take this approach, then the development process will be easier to understand and will therefore go better.

Comment author: ChristianKl 09 September 2013 11:09:18AM *  1 point [-]

Likewise, if an AI's decision-making algorithm is immutably hard-coded as "think about the alternatives and select the one that's rated the highest", then the AI would not be able to simply "write a new AI … and then just hand off all its tasks to it"; in order to do that, it would somehow have to make it so that the highest-rated alternative is always the one that the new AI would pick.

You basically say that the AI should be unable to learn to trust a process that was effective in the past to also be effective in the future. I think that would restrict intelligence a lot.

Comment author: [deleted] 10 September 2013 03:20:02AM 0 points [-]

Yeah, that's a good point. What I want to say is, "oh, a non-self-modifying AI would still be able to hand off control to a sub-AI, but it will automatically check to make sure the sub-AI is behaving correctly; it won't be able to turn off those checks". But my idea here is definitely starting to feel more like a pipe dream.

Comment author: Armok_GoB 16 September 2013 01:06:52AM *  0 points [-]

Hmm, might still be something gleaned for attempting to steelman this or work in different related directions.

Edit; maybe something with an AI not being able to tolerate things it can't make certain proofs about? Problem is it'd have to be able to make those proofs about humans if they are included in its environment, and if they are not it might make UFAI there (Intuition pump; a system that consists of a program it can prove everything about, and humans that program asks questions to). Yea this doesn't seem very useful.

Comment author: ChristianKl 10 September 2013 05:51:37PM 0 points [-]

You can't really tell whether something that is smarter than yourself is behaving correctly. In the end a non-self-modifying AI checking on whether a self-modifying sub-AI is behaving correctly isn't much different from a safety perspective than a human checking whether the self modifying AI is behaving correctly.

Comment author: drethelin 09 September 2013 04:49:31AM 1 point [-]

immutably hard-coding something in is a lot easier to say than to do.

Comment author: drethelin 08 September 2013 09:51:24PM 1 point [-]

Or it can write a new AI that's an improved version of itself and then just hand off all its tasks to it.

Comment author: Vaniver 08 September 2013 05:29:17PM 1 point [-]

Why should an AI have to self-modify in order to be super-intelligent?

I'm not sure where the phrase "have to" is coming from. I don't think the expectation that we will build a self-modifying intelligence that becomes a superintelligence is because that seems like the best way to do it but because it's the easiest way to do it, and thus the one likely to be taken first.

In broad terms, the Strong AI project is expected to look like "humans build dumb computers, humans and dumb computers build smart computers, smart computers build really smart computers." Once you have smart computers that can build really smart computers, it looks like they will (in the sense that at least one institution with smart computers will let them, and then we have a really smart computer on our hands), and it seems likely that the modifications will occur at a level that humans are not able to manage effectively (so it really will be just smart computers making the really smart computers).

But doesn't making the FAI self-modifying make the problem much more difficult, since how we have to figure out how to make goals stable under self-modification, which is also a very difficult problem?

Yes. This is why MIRI is interested in goal stability under self-modification.

Comment author: [deleted] 09 September 2013 04:20:34AM 0 points [-]

I'm not sure where the phrase "have to" is coming from. I don't think the expectation that we will build a self-modifying intelligence that becomes a superintelligence is because that seems like the best way to do it but because it's the easiest way to do it, and thus the one likely to be taken first.

Yeah, I guess my real question isn't why we think an AI would have to self-modify; my real question is why we think that would be the easiest way to do things.

Comment author: drethelin 09 September 2013 04:30:40AM 0 points [-]

you'd have to actively stop it from doing so. An AI is just code: If the AI has the ability to write code it has the ability to self modify.

Comment author: [deleted] 09 September 2013 04:42:47AM *  0 points [-]

An AI is just code: If the AI has the ability to write code it has the ability to self modify.

If the AI has the ability to write code and the ability to replace parts of itself with that code, then it has the ability to self-modify. This second ability is what I'm proposing to get rid of. See my other comment.

Comment author: drethelin 09 September 2013 04:44:36AM 4 points [-]

If an AI can't modify its own code it can just write a new AI that can.

Comment author: Vaniver 09 September 2013 04:58:39PM 1 point [-]

If the AI has the ability to write code and the ability to replace parts of itself with that code, then it has the ability to self-modify.

Unpack the word "itself."

(This is basically the same response as drethelin's, except it highlights the difficulty in drawing clear delineations between different kinds of impacts the AI can have on the word. Even if version A doesn't alter itself, it still alters the world, and it may do so in a way that bring around version B (either indirectly or directly), and so it would help if it knew how to design B.)

Comment author: [deleted] 10 September 2013 03:27:58AM 0 points [-]

Well, I'm imagining the AI as being composed of a couple of distinct parts—a decision subroutine (give it a set of options and it picks one), a thinking subroutine (give it a question and it tries to determine the answer), and a belief database. So when I say "the AI can't modify itself", what I mean more specifically is "none of the options given to the decision subroutine will be something that involves changing the AI's code, or changing beliefs in unapproved ways".

So perhaps "the AI could write some code" (meaning that the thinking algorithm creates a piece of code inside the belief database), but "the AI can't replace parts of itself with that code" (meaning that the decision algorithm can't make a decision to alter any of the AI's subroutines or beliefs).

Now, certainly an out-of-the-box AI would, in theory, be able to, say, find a computer and upload some new code onto it, and that would amount to self-modification. I'm assuming we're going to first make safe AI and then let it out of the box, rather than the other way around.