This may be trivial or obvious for a lot of people, but it doesn't seem like anyone has bothered to write it down (or I haven't looked hard enough). It started out as a generalization of Paul Christiano's IDA, but also covers things like safe recursive self-improvement.
Start with a team of one or more humans (researchers, programmers, trainers, and/or overseers), with access to zero or more AIs (initially as assistants). The human/AI team in each round develops a new AI and adds it to the team, and repeats this until maturity in AI technology is achieved. Safety/alignment is ensured by having some set of safety/alignment properties on the team that is inductively maintained by the development process.
The reason I started thinking in this direction is that Paul's approach seemed very hard to knock down, because any time a flaw or difficulty is pointed out or someone expresses skepticism on some technique that it uses or the overall safety invariant, there's always a list of other techniques or invariants that could be substituted in for that part (sometimes in my own brain as I tried to criticize some part of it). Eventually I realized this shouldn't be surprising because IDA is an instance of this more general model of safety-oriented AI development, so there are bound to be many points near it in the space of possible safety-oriented AI development practices. (Again, this may already be obvious to others including Paul, and in their minds IDA is perhaps already a cluster of possible development practices consisting of the most promising safety techniques and invariants, rather than a single point.)
If this model turns out not to have been written down before, perhaps it should be assigned a name, like Iterated Safety-Invariant AI-Assisted AI Development, or something pithier?
Maybe one of the problem of the idea of the "alignment" is that is named as a noun and thus we describe it as a thing which could actually exist, while, in fact, it is only a high-level description of some form of hypothetical relation of two complex systems. In that case, it is not a "liquid" and can't be "distilled". I will illustrate this consideration by the following example:
Imagine that I can safely drive a bike at the speed of 20 km/h and after some training I could extend my safe speed on 1 km/h, so it is reasonable to conclude that I could distill "safe driving" to 21 km/h. Repeating this process, I could reach higher and higher speed of biking. However, it is also obvious that I will have a fatal crash somewhere between 100 and 200 km/h. The reason for it is that on the higher speeds the probability of accidents is exponentially growing. The "accidents" are the real thing, but not "safety" which is only a high-level description of driving habits.
Conclusion: Accidents can be avoided by not riding a bike or limiting bike's speed, but safety can't be unlimitedly stretch. Thus AI development should not be "safety" or "alignment" oriented, but disaster avoidance oriented.