It would make sense to have a modular approach - do you have any suggestions on the types of modules the AI might consist of (excluding the engineering type ones like NLP, Machine vision, etc).
The type of modules an AI would consist of would depend on how it is actually implemented.
A putative new idea for AI control; index here.
This idea, due to Eric Drexler, is to separate out the different parts of an AI into modules. There would be clearly designated pieces, either physical or algorithmic, with this part playing a specific role: this module would contain the motivation, this module the probability estimator, this module the models of the outside world, this module the natural language understanding unit, etc...
It's obvious how such a decomposition would be useful for many of the methods I've been detailing here. We could also distil each module - reduce it to a smaller, weaker (?) and more understandable submodule, in order to better understand what is going on. In one scenario, an opaque AI gets to design its successor, in the form of a series of such modules.
This property seems desirable; the question is, how could we get it?
EDIT: part of the idea of "modules" is that AIs often need to do calculations or estimations that would be of great value to us if we could access them in isolation. This idea is developed more in these posts.
Designing in modules
The main threat here is that a given submodule would contain more than just the properties we want. After all, a natural language parser could consist of a general intelligence plus a motivation to understand language. Another possible worry is that the modules are overfitted to the problem or to each other: the language parser works perfectly, but only in this one AI design.
There are several ways we could try and combat this.
If we allow false counterfactuals, then we can also:
Obviously anti-restriction-hacking would be useful to just module separation (and vice versa).
This is the beginning of the process of defining this, but it would be great to have a safe(ish) method of separating modules in this way.
Any suggestions?