Question regarding an alignment problem: one of the key difficulties in alignment is (said by Eliezer Yudkowsky to be) that if "the verifier is broken" (i.e. the human verifier measuring alignment can be fooled by the alien actress) then we cannot be sure that a given alignment evaluation is true. Has there been any serious discussion of using a daisy chain of increasingly intelligent systems to evaluate alignment?
Hand-wavily: let human intelligence be ~= H, can we find some epsilon e such that we construct a series of n increasingly intelligent systems of intelligence I(n) = H + n*e and we only ask for one-hop-forward verification in this system. That is to say,... (read more)
Consider this subset of hierarchy of relevant states of the world, from good to bad:
I think there is a case to be made that we are de-facto in state #3 now, but AI video gen will move us into state #2. While this is far worse than state #1, it's an improvement rather than a deterioration (I used to be convinced it would be a deterioration, but am now updating my thinking).
Just to mention: once we are firmly in #2, then our trust in video... (read more)