It feels like community discussion has largely abandoned the topic of AGI having the self-modifying property, which makes sense because there are a lot of more fundamental things to figure out.
But I think we should revisit the question at least in the context of narrow AI, because the tools are now available to accomplish exactly this on several levels. This thought was driven by reading a blog post, Writing BPF Code in Rust.
BPF stands for Berkeley Packet Filter, which was originally for network traffic analysis but has since been used for tracing the Linux kernel. The pitch is that this can now be used to let userspace code run in the kernel, which is to say change the way the kernel is working.
The ability to use code to write code is very old; it is arguably the core insight of LISP. But this is becoming increasingly common as a practice now, including things like writing OCaml or Haskell for generating correct C code, and increasingly powerful code generation tricks in compilers.
It's also possible to change how the compilers work now. The Julia language has a package named Cassette.jl, which allows dynamically injecting behavior into the Just-in-Time compiler. As it happens both this Julia trick and the BPF trick for the Linux kernel rely on LLVM.
All of this means at present we can in fact write code that modifies itself, and that modifies the way it is compiled, and that modifies the way the environment runs it (assuming the environment is Linux). The infrastructure seems to be present for large-scale, multi-layer self modification to take place. This seems like it makes questions about self modification testable in a way they weren't before. The BPF and Cassette.jl tricks don't even require software written for them explicitly, they work on previously existing code whose authors had no idea such capability existed. These methods are independent of ideas like Software 2.0/Differentiable Programming, and combined they make me wonder if the safety problems we are concerned with might actually start appearing at the level of applications first.
Self modifying code has been possible but not practical for as long as we have had digital computers. Now it has toolchains, use cases, and in the near future tens to hundreds of people will do it as their day job.
The strong version of my claim is that I expect to see the same kinds of failure modes we are concerned with in AGI pushed down to the level of consumer-grade software, at least in huge applications like social networks and self-driving cars.
I think it is now simple and cheap enough for a single research group to do something like:
Which is to say, it feels like we have enough tooling to start doing "Hello World" grade self-modification tests that account for every level of the stack, in real systems.