AI is here, and AGI is coming. It's quite possible that any work being done now will be futile in comparison to reducing AI risk.
This is one of those things that's unsettling for me as someone who did a Ph. D. in a non-AI area of computer science.
But one of the main vectors by which a bootstrap AGI will gain power is by hacking into other systems. And that's something I can do something about.
Not many appreciate this, but unhackable systems are very possible. Security vulnerabilities occur when there is some broken assumption or coding mistake. They are not omnipresent: someone has to put them there. Software has in general gotten more secure over the last few decades, and technologies that provide extremely high security guarantees have. Consider the verified hypervisor coming out of Bedrock Systems; RockSalt, an unbreakable sandbox; or sel4, the verified kernel now being used in real safety-critical systems.
Suppose we "solve" security by bringing the vulnerabilities in important applications to near zero. Suppose we also "solve" the legacy problem, and are able to upgrade a super-majority of old software, included embedded devices, to be similarly secure. How much will this reduce AI risk?
To be clear: I personally am mainly interested in assuming this will be solved, and then asking the impact on AI safety. If you want talk about how hard it is, then, well, I won't be interested because I've given many lectures on closely related topics, although some others here may benefit from the discussion.
(When I call something verified or unbreakable, there are a number of technicalities about what exactly has been proven and what the assumptions are. E.g.: nothing I've mentioned provides guarantees against hardware attacks such as Row Hammer or instruction skipping. I'll be happy to explain these to anyone in great detail, but am more interested in discussion which assumes these will all be solved.)
Remember that security isn't primarily a technical problem. It's an economic/social/game theory problem.
It's not enough to be able to write safe code. You have to be able to deliver it at lower cost than non-safe code. And not lower cost in the long term, either. You have to be able to deliver total system functionality X next quarter at a lower cost. Every incremental change has to independently pass that economic filter. You also have to bear in mind that many of the costs of non-security tend to be externalized, whereas all of the costs of security tend to be internalized.
... unless, of course, you can find a way to change things so that people take a longer view and more of the costs are internalized. But those tend to demand politically difficult coercive measures.
There's also the problem of resistance from practitioners. The necessary discipline can be unpleasant, and there are big learning curves.
Changing what counts as "best practices" is hard.
Also, while I very much support formal methods, I think "unhackable" is overselling things by a lot. To get there, you'd have to be able to specify what would and would not be correct behavior for a big, ever-changing system with huge numbers of subsystems that are themselves complicated. And probably specify the correct behavior for every subsystem in the same general language. And, as you point out, there will always be issues that fall outside the model. The adversary, AI or otherwise, is not required to ignore Rowhammer just because you didn't think about it or couldn't model it.
I'm not saying give up, but I am saying don't underestimate the challenges...
Really appreciate this informative and well-written answer. Nice to hear from someone on the ground about SELinux instead of the NSA's own presentations.