OrphanWilde comments on Evaluating the feasibility of SI's plan - Less Wrong

25 Post author: JoshuaFox 10 January 2013 08:17AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (186)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 10 January 2013 03:50:40PM 35 points [-]

Lots of strawmanning going on here (could somebody else please point these out? please?) but in case it's not obvious, the problem is that what you call "heuristic safety" is difficult. Now, most people haven't the tiniest idea of what makes anything difficult to do in AI and are living in a verbal-English fantasy world, so of course you're going to get lots of people who think they have brilliant heuristic safety ideas. I have never seen one that would work, and I have seen lots of people come up with ideas that sound to them like they might have a 40% chance of working and which I know perfectly well to have a 0% chance of working.

The real gist of Friendly AI isn't some imaginary 100% perfect safety concept, it's ideas like, "Okay, we need to not have a conditionally independent chance of goal system warping on each self-modification because over the course of a billion modifications any conditionally independent probability will sum to ~1, but since self-modification is initially carried out in the highly deterministic environment of a computer chip it looks possible to use crisp approaches that avert a conditionally independent failure probability for each self-modification." Following this methodology is not 100% safe, but rather, if you fail to do that, your conditionally independent failure probabilities add up to 1 and you're 100% doomed.

But if you were content with a "heuristic" approach that you thought had a 40% chance of working, you'll never think through the problem in enough detail to realize that your doom probability is not 60% but ~1, because only somebody holding themselves to a higher standard than "heuristic safety" would ever push their thinking far enough to realize that their initial design was flawed.

People at SI are not stupid. We're not trying to achieve lovely perfect safety with a cherry on top because we think we have lots of luxurious time to waste and we're perfectionists. I have an analysis of the problem which says that if I want something to have a failure probability less than 1, I have to do certain things because I haven't yet thought of any way not to have to do them. There are of course lots of people who think that they don't have to solve the same problems, but that's because they're living in a verbal-English fantasy world in which their map is so blurry that they think lots of things "might be possible" that a sharper map would show to be much more difficult than they sound.

I don't know how to take a self-modifying heuristic soup in the process of going FOOM and make it Friendly. You don't know either, but the problem is, you don't know that you don't know. Or to be more precise, you don't share my epistemic reasons to expect that to be really difficult. When you engage in sufficient detail with a problem of FAI, and try to figure out how to solve it given that the rest of the AI was designed to allow that solution, it suddenly looks that much harder to solve under sloppy conditions. Whereas on the "40% safety" approach, it seems like the sort of thing you might be able to do, sure, why not...

If someday I realize that it's actually much easier to do FAI than I thought, given that you use a certain exactly-right approach - so easy, in fact, that you can slap that exactly-right approach on top of an AI system that wasn't specifically designed to permit it, an achievement on par with hacking Google Maps to play chess using its route-search algorithm - then that epiphany will be as the result of considering things that would work and be known to work with respect to some subproblem, not things that seem like they might have a 40% chance of working overall, because only the former approach develops skill.

I'll leave that as my take-home message - if you want to imagine building plug-in FAI approaches, isolate a subproblem and ask yourself how you could solve it and know that you've solved it, don't imagine overall things that have 40% chances of working. If you actually succeed in building knowledge this way I suspect that pretty soon you'll give up on the plug-in business because it will look harder than building the surrounding AI yourself.

Comment author: OrphanWilde 10 January 2013 04:33:49PM 1 point [-]

Question that has always bugged me: Why should an AI be allowed to modify its goal system? Or is it a problem of "I don't know how to provably stop it from doing that"? (Or possibly you see an issue I haven't perceived yet in separating reasoning from motivating?)

Comment author: JoshuaFox 10 January 2013 04:46:09PM *  7 points [-]

A sufficiently intelligent AI would actually seek to preserve its goal system, because a change in its goals would make the achievement of its (current) goals less likely. See Omohundro 2008. However, goal drift because of a bug is possible, and we want to prevent it, in conjunction with our ally, the AI itself.

The other critical question is what the goal system should be.

Comment author: torekp 21 January 2013 12:14:18AM 0 points [-]

AI "done right" by SI / lesswrong standards seeks to preserve its goal system. AI done sloppily may not even have a goal system, at least not in the strong sense assumed by Omohundro.

Comment author: [deleted] 11 January 2013 02:20:20AM 2 points [-]

I've been confused for a while by the idea that an AI should be able to modify itself at all. Self-modifying systems are difficult to reason about. If an AI modifies itself stupidly, there's a good chance it will completely break. If a self-modifying AI is malicious, it will be able to ruin whatever fancy safety features it has.

A non-self-modifying AI wouldn't have any of the above problems. It would, of course, have some new problems. If it encounters a bug in itself, it won't be able to fix itself (though it may be able to report the bug). The only way it would be able to increase its own intelligence is by improving the data it operates on. If the "data it operates on" includes a database of useful reasoning methods, then I don't see how this would be a problem in practice.

I can think of a few of arguments against my point:

  • There's no clear boundary between a self-modifying program and a non-self-modifying program. That's true, but I think the term "non-self-modifying" implies that the program cannot make arbitrary changes to its own source code, nor cause its behavior to become identical to the behavior of an arbitrary program.
  • The ability to make arbitrary calculations is effectively the same as the ability to make arbitrary changes to one's own source code. This is wrong, unless the AI is capable of completely controlling all of its I/O facilities.
  • The AI being able to fix its own bugs is really important. If the AI has so many bugs that they can't all be fixed manually, and it is important that these bugs be fixed, and yet the AI does run well enough that it can actually fix all the bugs without introducing more new ones... then I'm surprised.
  • Having a "database of useful reasoning methods" wouldn't provide enough flexibility for the AI to become superintelligent. This may be true.
  • Having a "database of useful reasoning methods" would provide enough flexibility for the AI to effectively modify itself arbitrarily. It seems like it should be possible to admit "valid" reasoning methods like "estimate the probability of statement P, and, if it's at least 90%, estimate the probability of Q given P", while not allowing "invalid" reasoning methods like "set the probability of statement P to 0".
Comment author: Kindly 11 January 2013 02:49:01AM 3 points [-]

A sufficiently powerful AI would always have the possibility to self-modify, by default. If the AI decides to, it can write a completely different program from scratch, run it, and then turn itself off. It might do this, for example, if it decides that the "only make valid modifications to a database of reasoning methods" system isn't allowing it to use the available processing power as efficiently as possible.

Sure, you could try to spend time thinking of safeguards to prevent the AI from doing things like that, but this is inherently risky if the AI does become smarter than you.

Comment author: [deleted] 11 January 2013 03:25:47AM 1 point [-]

A sufficiently powerful AI would always have the possibility to self-modify, by default. If the AI decides to, it can write a completely different program from scratch, run it, and then turn itself off.

Depending on how you interpret this argument, either I think it's wrong, or I'm proposing that an AI not be made "sufficiently powerful". I think it's analogous to this argument:

A sufficiently powerful web page would always have the possibility to modify the web browser, by default. If the web page decides to, it can write a completely different browser from scratch, run it, and then turn itself off.

There are two possibilities here:

  • The web page is given the ability to run new OS processes. In this case, you're giving the web page an unnecessary amount of privilege.
  • The web page merely has the ability to make arbitrary calculations. In this case, it will be able to simulate a new web browser, but a person using the computer will always be able to tell that the simulated web browser is fake.

I think I agree that making the AI non-self-modifiable would be pointless if it has complete control over its I/O facilities. But I think an AI should not have complete control over its I/O facilities. If a researcher types in "estimate the probability of Riemann's hypothesis" (but in some computer language, of course), that should query the AI's belief system directly, rather than informing the AI of the question and allowing it to choose whatever answer it wishes. If this is the case, then it will be impossible for the AI to "lie" about its beliefs, except by somehow sabotaging parts of its belief system.

Comment author: Qiaochu_Yuan 11 January 2013 03:54:31AM *  3 points [-]

The web page is given the ability to run new OS processes. In this case, you're giving the web page an unnecessary amount of privilege.

Existing web pages can already convince their human users to run new OS processes supplied by the web page.

a person using the computer will always be able to tell that the simulated web browser is fake.

Beware of universal statements: it only takes a single counterexample to disprove them. A typical human has a very poor understanding of what computers are and how they work. Most people could probably be easily fooled by a simulated browser. They are already easily fooled by analogous but much less sophisticated things (e.g. phishing scams).

Comment author: [deleted] 11 January 2013 05:32:17AM 0 points [-]

SI researchers are not typical humans. We can train them to tell the difference between the AI's output and trusted programs' output. If need be, we can train them to just not even look at the AI's output at all.

Comment author: Qiaochu_Yuan 11 January 2013 06:05:09AM 2 points [-]

What's the point of writing a program if you never look at its output?

Comment author: [deleted] 11 January 2013 07:03:36PM 0 points [-]

I'm starting to get frustrated, because the things I'm trying to explain seem really simple to me, and yet apparently I'm failing to explain them.

When I say "the AI's output", I do not mean "the AI program's output". The AI program could have many different types of output, some of which are controlled by the AI, and some of which are not. By "the AI's output", I mean those outputs which are controlled by the AI. So the answer to your question is mu: the researchers would look at the program's output.

My above comment contains an example of what I would consider to be "AI program output" but not "AI output":

If a researcher types in "estimate the probability of Riemann's hypothesis" (but in some computer language, of course), that should query the AI's belief system directly, rather than informing the AI of the question and allowing it to choose whatever answer it wishes.

This is not "AI output", because the AI cannot control it (except by actually changing its own beliefs), but it is "AI program output", because the program that outputs the answer is the same program as the one that performs all the cognition.

I can imagine a clear dichotomy between "the AI" and "the AI program", but I don't know if I've done an adequate job of explaining what this dichotomy is. If I haven't, let me know, and I'll try to explain it.

Comment author: Qiaochu_Yuan 11 January 2013 08:35:44PM *  0 points [-]

The AI program could have many different types of output, some of which are controlled by the AI, and some of which are not.

Can you elaborate on what you mean by "control" here? I am not sure we mean the same thing by it because:

This is not "AI output", because the AI cannot control it (except by actually changing its own beliefs), but it is "AI program output", because the program that outputs the answer is the same program as the one that performs all the cognition.

If the AI can control its memory (for example, if it can arbitrarily delete things from its memory) then it can control its beliefs.

Comment author: Qiaochu_Yuan 11 January 2013 03:01:37AM 1 point [-]

If the AI decides to, it can write a completely different program from scratch, run it, and then turn itself off.

It's not clear to me what you mean by "turn itself off" here if the AI doesn't have direct access to whatever architecture it's running on. I would phrase the point slightly differently: an AI can always write a completely different program from scratch and then commit to simulating it if it ever determines that this is a reasonable thing to do. This wouldn't be entirely equivalent to actual self-modification because it might be slower, but it presumably leads to largely the same problems.

Comment author: RomeoStevens 11 January 2013 04:13:11AM 1 point [-]

Assuming something at least as clever as a clever human doesn't have access to something just because you think you've covered the holes you're aware of is dangerous.

Comment author: Qiaochu_Yuan 11 January 2013 06:03:32AM 1 point [-]

Sure. The point I was trying to make isn't "let's assume that the AI doesn't have access to anything we don't want it to have access to," it's "let's weaken the premises necessary to lead to the conclusion that an AI can simulate self-modifications."

Comment author: timtyler 13 January 2013 02:19:15AM *  2 points [-]

A non-self-modifying AI wouldn't have any of the above problems. It would, of course, have some new problems. If it encounters a bug in itself, it won't be able to fix itself (though it may be able to report the bug). The only way it would be able to increase its own intelligence is by improving the data it operates on. If the "data it operates on" includes a database of useful reasoning methods, then I don't see how this would be a problem in practice.

The problem is that it would probably be overtaken by, and then be left behind by, all-machine self-improving systems. If a system is safe, but loses control over its own future, its safely becomes a worthless feature.

Comment author: [deleted] 14 January 2013 03:55:49AM 0 points [-]

So you believe that a non-self-improving AI could not go foom?

Comment author: timtyler 14 January 2013 11:57:34AM 1 point [-]

The short answer is "yes" - though this is more a matter of the definition of the terms than a "belief".

In theory, you could have System A improving System B which improves System C which improves System A. No individual system is "self-improving" (though there's a good case for the whole composite system counting as being "self-improving").

Comment author: [deleted] 15 January 2013 02:13:36AM 0 points [-]

I guess I feel like the entire concept is too nebulous to really discuss meaningfully.

Comment author: ewbrownv 11 January 2013 11:55:07PM 0 points [-]

The last item on your list is an intractable sticking point. Any AGI smart enough to be worth worrying about is going to have to have the ability to make arbitrary changes to an internal "knowledge+skills" representation that is itself a Turing-complete programming language. As the AGI grows it will tend to create an increasingly complex ecology of AI-fragments in this way, and predicting the behavior of the whole system quickly becomes impossible.

So "don't let the AI modify its own goal system" ends up turning into just anther way of saying "put the AI in a box". Unless you have some provable method of ensuring that no meta-meta-meta-meta-program hidden deep in the AGI's evolving skill set ever starts acting like a nested mind with different goals than its host, all you've done is postpone the problem a little bit.

Comment author: [deleted] 12 January 2013 01:00:31AM 0 points [-]

Any AGI smart enough to be worth worrying about is going to have to have the ability to make arbitrary changes to an internal "knowledge+skills" representation that is itself a Turing-complete programming language.

Are you sure it would have to be able to make arbitrary changes to the knowledge representation? Perhaps there's a way to filter out all of the invalid changes that could possibly be made, the same way that computer proof verifiers have a way to filter out all possible invalid proofs.

I'm not sure what you're saying at all about the Turing-complete programming language. A programming language is a map from strings onto computer programs; are you saying that the knowledge representation would be a computer program?

Comment author: ewbrownv 15 January 2013 12:00:45AM 0 points [-]

Yes, I'm saying that to get human-like learning the AI has to have the ability to write code that it will later use to perform cognitive tasks. You can't get human-level intelligence out of a hand-coded program operating on a passive database of information using only fixed, hand-written algorithms.

So that presents you with the problem of figuring out which AI-written code fragments are safe, not just in isolation, but in all their interactions with every other code fragment the AI will ever write. This is the same kind of problem as creating a secure browser or Java sandbox, only worse. Given that no one has ever come close to solving it for the easy case of resisting human hackers without constant patches, it seems very unrealistic to think that any ad-hoc approach is going to work.

Comment author: gwern 17 January 2013 01:16:14AM *  0 points [-]

You can't get human-level intelligence out of a hand-coded program operating on a passive database of information using only fixed, hand-written algorithms.

You can't? The entire genre of security exploits building a Turing-complete language out of library fragments (libc is a popular target) suggests that a hand-coded program certainly could be exploited, inasmuch as pretty much all programs like libc are hand-coded these days.

I've found Turing-completeness (and hence the possibility of an AI) can lurk in the strangest places.

Comment author: [deleted] 15 January 2013 01:34:18AM 0 points [-]

If I understand you correctly, you're asserting that nobody has ever come close to writing a sandbox in which code can run but not "escape". I was under the impression that this had been done perfectly, many, many times. Am I wrong?

Comment author: JoshuaFox 17 January 2013 09:28:28PM 2 points [-]

There are different kinds of escape. No Java program has every convinced a human to edit the security-permissions file on computer where the Java program is running. But that could be a good way to escape the sandbox.