ewbrownv comments on Evaluating the feasibility of SI's plan - Less Wrong

25 Post author: JoshuaFox 10 January 2013 08:17AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (186)

You are viewing a single comment's thread. Show more comments above.

Comment author: ewbrownv 11 January 2013 11:55:07PM 0 points [-]

The last item on your list is an intractable sticking point. Any AGI smart enough to be worth worrying about is going to have to have the ability to make arbitrary changes to an internal "knowledge+skills" representation that is itself a Turing-complete programming language. As the AGI grows it will tend to create an increasingly complex ecology of AI-fragments in this way, and predicting the behavior of the whole system quickly becomes impossible.

So "don't let the AI modify its own goal system" ends up turning into just anther way of saying "put the AI in a box". Unless you have some provable method of ensuring that no meta-meta-meta-meta-program hidden deep in the AGI's evolving skill set ever starts acting like a nested mind with different goals than its host, all you've done is postpone the problem a little bit.

Comment author: [deleted] 12 January 2013 01:00:31AM 0 points [-]

Any AGI smart enough to be worth worrying about is going to have to have the ability to make arbitrary changes to an internal "knowledge+skills" representation that is itself a Turing-complete programming language.

Are you sure it would have to be able to make arbitrary changes to the knowledge representation? Perhaps there's a way to filter out all of the invalid changes that could possibly be made, the same way that computer proof verifiers have a way to filter out all possible invalid proofs.

I'm not sure what you're saying at all about the Turing-complete programming language. A programming language is a map from strings onto computer programs; are you saying that the knowledge representation would be a computer program?

Comment author: ewbrownv 15 January 2013 12:00:45AM 0 points [-]

Yes, I'm saying that to get human-like learning the AI has to have the ability to write code that it will later use to perform cognitive tasks. You can't get human-level intelligence out of a hand-coded program operating on a passive database of information using only fixed, hand-written algorithms.

So that presents you with the problem of figuring out which AI-written code fragments are safe, not just in isolation, but in all their interactions with every other code fragment the AI will ever write. This is the same kind of problem as creating a secure browser or Java sandbox, only worse. Given that no one has ever come close to solving it for the easy case of resisting human hackers without constant patches, it seems very unrealistic to think that any ad-hoc approach is going to work.

Comment author: gwern 17 January 2013 01:16:14AM *  0 points [-]

You can't get human-level intelligence out of a hand-coded program operating on a passive database of information using only fixed, hand-written algorithms.

You can't? The entire genre of security exploits building a Turing-complete language out of library fragments (libc is a popular target) suggests that a hand-coded program certainly could be exploited, inasmuch as pretty much all programs like libc are hand-coded these days.

I've found Turing-completeness (and hence the possibility of an AI) can lurk in the strangest places.

Comment author: [deleted] 15 January 2013 01:34:18AM 0 points [-]

If I understand you correctly, you're asserting that nobody has ever come close to writing a sandbox in which code can run but not "escape". I was under the impression that this had been done perfectly, many, many times. Am I wrong?

Comment author: JoshuaFox 17 January 2013 09:28:28PM 2 points [-]

There are different kinds of escape. No Java program has every convinced a human to edit the security-permissions file on computer where the Java program is running. But that could be a good way to escape the sandbox.