Thanks for this summary. I didn't find the examples of irreducible categories all that convincing. I think that, although more difficult than the average category (e.g. certain physical objects), there are underlying traits that we are pointing at for these.
For example:
Teams or clubs: BG give the particularly crisp example of Pokemon GO, where players choose to associate with one of three teams that are functionally indistinguishable except for their respective color.
The trait here is not within the group but in how it interacts with other groups. The trait is that they collaborate against other teams.
More generally, this argument for self id gender seems to also work for self id racial/ethnic categories. The received narrative would rebut this by reference to the fact that this phenomena is not present among people to the extent that gender dysphoria is. What do you think BG's position on this is?
if you don't do RL or other training schemes that seem designed to induce agentyness and you don't do tasks that use an agentic supervision signal, then you probably don't get agents for a long time
Is this really the case? If you imagine a perfect Oracle AI, which is certainly not agenty, it seems to me that with some simple scaffolding, one could construct a highly agentic system. It would go something along the lines of
This is my line of reasoning why AIS matters for language models in general.
Perhaps I've missed the point of your post, but to me the whole confusion around Gender is not internal validity, after all circular definitions are valid - but not convincing to the outside view.