I have been trying to find out what is known about invariants in self-modifying systems. This might become a rather acute topic if we end up moving towards self-modifying AIs or self-modifying ecosystems of AIs.

But it seems that not much has been done. For example, I have found a 1995 Chinese paper, "S-and T-Invariants in Cyber Net Systems" by Yuan Chongyi, Google Scholar page, PDF available which is doing a study of invariants in self-modifying nets (a natural extension of Petri nets), but it only has 4 references to it known to Google Scholar.

I wonder if people know about more examples of this kind of research (or about researchers or organizations currently trying to look at this topic)...

New to LessWrong?

New Answer
New Comment


2 comments, sorted by Click to highlight new comments since:

Yeah, not a ton. For I think the obvious reason that real-world agents are complicated and hard to reason about.

Though search up "tiling agents" for some MIRI work in this vein.

Yes, thanks!

I am familiar with some work from MIRI about that which focuses on Loebian obstacle, e.g. this 2013 paper: Tiling Agents for Self-Modifying AI, and the Löbian Obstacle.

But I should look closer at other parts of those MIRI papers; perhaps there might be some material which actually establishes some invariants, at least for some simple, idealized examples of self-modification...