This post is not only a groundbreaking research into the nature of LLMs but also a perfect meme. Janus's ideas are now widely cited at AI conferences and papers around the world. While the assumptions may be correct or incorrect, the Simulators theory has sparked huge interest among a broad audience, including not only AI researchers. Let's also appreciate the fact that this post was written based on the author's interactions with non-RLHFed GPT-3 model, well before the release of ChatGPT or Bing, and it has accurately predicted some quirks in their behaviors.
For me, the most important implication of the Simulators theory is that LLMs are neither agents nor tools. Therefore, the alignment/safety measures developed within the Bostromian paradigm are not applicable to them, a point Janus later beautifully illustrated in the Waluigi Effect post. This leads me to believe that AI alignment has to be a practical discipline and cannot rely purely on theoretical scenarios.
This post is not only a groundbreaking research into the nature of LLMs but also a perfect meme. Janus's ideas are now widely cited at AI conferences and papers around the world. While the assumptions may be correct or incorrect, the Simulators theory has sparked huge interest among a broad audience, including not only AI researchers. Let's also appreciate the fact that this post was written based on the author's interactions with non-RLHFed GPT-3 model, well before the release of ChatGPT or Bing, and it has accurately predicted some quirks in their behaviors.
For me, the most important implication of the Simulators theory is that LLMs are neither agents nor tools. Therefore, the alignment/safety measures developed within the Bostromian paradigm are not applicable to them, a point Janus later beautifully illustrated in the Waluigi Effect post. This leads me to believe that AI alignment has to be a practical discipline and cannot rely purely on theoretical scenarios.