Warrigal comments on Tiling Agents for Self-Modifying AI (OPFAI #2) - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (260)
Here's a distinction you could make: an AI is self-modifying if it is effectively capable of making any change to its source code at any time, and non-self-modifying if it is not. (The phrase "capable of" is vague, of course.)
I can imagine non-self-modifying AI having an advantage over self-modifying AI, because it might be possible for an NSM AI to be protected from its own stupidity, so to speak. If the AI were to believe that overwriting all of its beliefs with the digits of pi is a good idea, nothing bad would happen, because it would be unable to do that. Of course, these same restrictions that make the AI incapable of breaking itself might also make it incapable of being really smart.
I believe I've heard someone say that any AI capable of being really smart must be effectively self-modifying, because being really smart involves the ability to make arbitrary calculations, and if you can make arbitrary calculations, then you're not restricted. My objection is that there's a big difference between making arbitrary calculations and running arbitrary code; namely, the ability to run arbitrary code allows you to alter other calculations running on the same machine.
Lemme expand on my thoughts a little bit. I imagine a non-self-modifying AI to be made of three parts: a thinking algorithm, a decision algorithm, and a belief database. The thinking and decision algorithms are immutable, and the belief database is (obviously) mutable. The supergoal is coded into the decision algorithm, so it can't be changed. (Problem: the supergoal only makes sense in the concept of certain beliefs, and beliefs are mutable.) The contents of the belief database influence the thinking algorithm's behavior, but they don't determine its behavior.
The ideal possibility is that we can make the following happen:
(My ideas haven't been taken seriously in the past, and I have no special knowledge in this area, so it's likely that my ideas are worthless. They feel valuable to me, however.)