If it’s worth saying, but not worth its own post, here's a place to put it. (You can also make a shortform post)
And, if you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are welcome.
If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the new Concepts section.
The Open Thread tag is here.
Maybe the project will come up with some mechanism that detects that. But if they fall back to the naive "just watch what it does in the test environment and assume it'll do the same in production," then there is a risk it's going to figure out it's in a test environment, and that its judges would not react well to finding out what is wrong with its utility function, and then it will act aligned in the testing environment.
If we ever see a news headline saying "Good News, AGI seems to 'self-align' regardless of the sign of the utility function!" that will be some very bad news.