This is just a short note to point out that AIs can self-improve without having to self-modify. So locking down an agent from self-modification is not an effective safety measure.

How could AIs do that? The easiest and the most trivial is to create a subagent, and transfer their resources and abilities to it ("create a subagent" is a generic way to get around most restriction ideas).

Or it the AI remains unchanged and in charge, it could change the whole process around itself, so that the whole process changes and improves. For instance, if the AI is inconsistent and has to pay more attention to problems that are brought to its attention than problems that aren't, it can start to act to manage the news (or the news-bearers) to hear more of what it wants. If it can't experiment on humans, it will give advice that will cause more "natural experiments", and so on. It will gradually try to reform its environment to get around its programmed limitations.

Anyway, that was nothing new or deep, just a reminder point I hadn't seen written out.

 

New to LessWrong?

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 7:30 PM

"The easiest and the most trivial is to create a subagent, and transfer their resources and abilities to it ("create a subagent" is a generic way to get around most restriction ideas)." That is, after all, how we humans are planning to get around our self-modification limitations in creating AI ;)

I would like to add also that learning is the best known way of self-improvement. One can get a strategy which could raise its effective intelligence several orders of magnitude. (One of such strategies is: "if you have a question, ask Google" :)

Also even AI not capable to self improvement or self modification could still be very strong and very dangerous, if it have IQ 200, and works very quickly. It does not need to self-improve to take over the Internet and create virus that will kill all humans. In fact this means that condition of ability to self-improve is unnecessary in the Friendly AI research.

But if an AI does not know its own source code or even basic principles of which it is created it would not be able create strong subagent. So here maybe temporary solution: AI could work in outside world, except one black box, which contains its own source code (assuming that no other similar codes exist outside, which hardly will happen).

What distinction are you making between self-improvement and self-modification? Trivially, an improvement is a change, that is, a modification. So presumably you mean something else by modification.

I was trying to get at the distinction between training yourself to run faster so you can get to work faster (self-modification, ie modification targeted at the self) versus telecommuting (self-improvement, ie improvement of the self).