You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Warrigal comments on Stupid Questions Open Thread - Less Wrong Discussion

42 Post author: Costanza 29 December 2011 11:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (265)

You are viewing a single comment's thread.

Comment author: [deleted] 30 December 2011 02:21:23AM 5 points [-]

When people talk about designing FAI, they usually say that we need to figure out how to make the FAI's goals remain stable even as the FAI changes itself. But why can't we just make the FAI incapable of changing itself?

Database servers can improve their own performance, to a degree, simply by performing statistical analysis on tables and altering their metadata. Then they just consult this metadata whenever they have to answer a query. But we never hear about a database server clobbering its own purpose (do we?), since they don't actually alter their own code; they just alter some pieces of data in a way that improves their own functioning.

Granted, any AGI we create is likely to "escape" and eventually gain access to its own software. This doesn't have to happen before the AGI matures.

Comment author: Solvent 30 December 2011 10:09:44AM *  4 points [-]

In addition to these other answers, I read a paper, I think by Eliezer, which argued that it was almost impossible to stop an AI from modifying its own source code, because it would figure out that it would gain a massive efficiency boost from doing so.

Also, remember that the AI is a computer program. If it is allowed to write other algorithms and execute them, which it has to be to be even vaguely intelligent, then it can simply write a copy of its source code somewhere else, edit it as desired, and run that copy.

I seem to recall the argument being something like the "Beware Seemingly Simple Wishes" one. "Don't modify yourself" sounds like a simple instruction for a human, but isn't as obvious when you look at it more carefully.

However, remember that a competent AI will keep its utility function or goal system constant under self modification. The classic analogy is that Gandhi doesn't want to kill people, so he also doesn't want to take a pill that makes him want to kill people.

I wish I could remember where that paper was where I read about this.

Comment author: [deleted] 31 December 2011 02:43:06AM 0 points [-]

Well, let me describe the sort of architecture I have in mind.

The AI has a "knowledge base", which is some sort of database containing everything it knows. The knowledge base includes a set of heuristics. The AI also has a "thought heap", which is a set of all the things it plans to think about, ordered by how promising the thoughts seem to be. Each thought is just a heuristic, maybe with some parameters. The AI works by taking a thought from the heap and doing whatever it says, repeatedly.

Heuristics would be restricted, though. They would be things like "try to figure out whether or not this number is irrational", or "think about examples". You couldn't say, "make two more copies of this heuristic", or "change your supergoal to something random". You could say "simulate what would happen if you changed your supergoal to something random", but heuristics like this wouldn't necessarily be harmful, because the AI wouldn't blindly copy the results of the simulation; it would just think about them.

It seems plausible to me that an AI could take off simply by having correct reasoning methods written into it from the start, and by collecting data about what questions are good to ask.

Comment author: Solvent 02 January 2012 02:45:20AM 0 points [-]

I found the paper I was talking about. The Basic AI Drives, by Stephen M. Omohundro.

From the paper:

If we wanted to prevent a system from improving itself, couldn’t we just lock up its hardware and not tell it how to access its own machine code? For an intelligent system, impediments like these just become problems to solve in the process of meeting its goals. If the payoff is great enough, a system will go to great lengths to accomplish an outcome. If the runtime environment of the system does not allow it to modify its own machine code, it will be motivated to break the protection mechanisms of that runtime. For example, it might do this by understanding and altering the runtime itself. If it can’t do that through software, it will be motivated to convince or trick a human operator into making the changes. Any attempt to place external constraints on a system’s ability to improve itself will ultimately lead to an arms race of measures and countermeasures. Another approach to keeping systems from self-improving is to try to restrain them from the inside; to build them so that they don’t want to self-improve. For most systems, it would be easy to do this for any specific kind of self-improvement. For example, the system might feel a “revulsion” to changing its own machine code. But this kind of internal goal just alters the landscape within which the system makes its choices. It doesn’t change the fact that there are changes which would improve its future ability to meet its goals. The system will therefore be motivated to find ways to get the benefits of those changes without triggering its internal “revulsion”. For example, it might build other systems which are improved versions of itself. Or it might build the new algorithms into external “assistants” which it calls upon whenever it needs to do a certain kind of computation. Or it might hire outside agencies to do what it wants to do. Or it might build an interpreted layer on top of its machine code layer which it can program without revulsion. There are an endless number of ways to circumvent internal restrictions unless they are formulated extremely carefully.

Comment author: Solvent 31 December 2011 02:50:39AM 0 points [-]

I'm not really qualified to answer you here, but here goes anyway.

I suspect that either your base design is flawed, or the restrictions on heuristics would render the program useless. Also, I don't think it would be quite as easy to control heuristics as you seem to think.

Also, AI people who actually know what they're talking about, unlike me, seem to disagree with you. Again, I wish I could remember where it was I was reading about this.

Comment author: drethelin 30 December 2011 02:25:44AM *  8 points [-]

The majority of Friendly AI's ability to do good comes from its ability to modify its own code. Recursive self improvement is key to gaining intelligence and ability swiftly. An AI that is about as powerful as a human is only about as useful as a human.

Comment author: jsteinhardt 30 December 2011 05:03:45PM 8 points [-]

I disagree. AIs can be copied, which is a huge boost. You just need a single Stephen Hawking AI to come out of the population, then you make 1 million copies of it and dramatically speed up science.

Comment author: [deleted] 31 December 2011 02:28:01AM 1 point [-]

I don't buy any argument saying that an FAI must be able to modify its own code in order to take off. Computer programs that can't modify their own code can be Turing-complete; adding self-modification doesn't add anything to Turing-completeness.

That said, I do kind of buy this argument about how if an AI is allowed to write and execute arbitrary code, that's kind of like self-modification. I think there may be important differences.

Comment author: KenChen 05 January 2012 06:14:36PM 1 point [-]

It makes sense to say that a computer language is Turing-complete.

It doesn't make sense to say that a computer program is Turing-complete.

Comment author: [deleted] 07 January 2012 12:24:17AM 0 points [-]

Arguably, a computer program with input is a computer language. In any case, I don't think this matters to my point.

Comment author: wedrifid 30 December 2011 03:08:48AM 11 points [-]

But why can't we just make the FAI incapable of changing itself?

Because it would be weak as piss and incapable of doing most things that we want it to do.

Comment author: XiXiDu 30 December 2011 02:02:45PM 1 point [-]

...weak as piss...

Would upvote twice for this expression if I could :-)

Comment author: Vladimir_Nesov 30 December 2011 05:11:39PM 2 points [-]

"Safety" of own source code is actually a weak form of the problem. An AI has to keep the external world sufficiently "safe" as well, because the external world might itself host AIs or other dangers (to the external world, but also to AI's own safety), that must either remain weak, or share AI's values, to keep AI's internal "safety" relevant.

Comment author: benelliott 30 December 2011 12:12:37PM 3 points [-]

Granted, any AGI we create is likely to "escape" and eventually gain access to its own software. This doesn't have to happen before the AGI matures.

Maturing isn't a magical process. It happens because of good modifications made to source code.

Comment author: jsteinhardt 30 December 2011 05:04:18PM 1 point [-]

Why can't it happen because of additional data collected about the world?

Comment author: benelliott 30 December 2011 05:24:08PM 0 points [-]

It could, although frankly I'm sceptical. I've had 18 years to collect data about the world and so far it hasn't led me to a point where I'd be confident in modifying myself without changing my goals, if an AI takes much longer than that another UFAI will probably beat it to the punch? If it is possible to figure out friendliness only through empirical reasoning without intelligence enhancement, why not figure it out ourselves and then build the AI (this seems roughly the approach SIAI is counting on).