The following text proposes a potential solution to the AI alignment problem. I am sharing it here as I have not come across any major issues with the proposed approach, such as those faced by the AI Stop button or the unworkable schemes outlined in the AGI Ruin: A List of Lethalities article. However I am still sceptical about this approach, so if you do have any problems or critiques that I have not addressed under the problems or assumptions sections, please feel free to share them in the comments below.
The Idea
The proposed solution involves using different AIs to control the model. While this idea has been suggested previously, it is not... (read 881 more words →)
Thank you again for your response! Someone taking the time to discuss this proposal really means a lot to me.
I fully agree with your conclusion of "unnecessary complexity" based on the premise that the method for aligning the judge is then somehow used to align the model, which of course doesn't solve anything. That said I believe there might have been a misunderstanding, because this isn't at all what this system is about. The judge, when controlling a model in the real world or when aligning a model that is already reasonably smart (more on this in the following Paragraph) is always a human.
The part about using a model trained via supervised... (read more)