Thank you for the catch, that is correct, it should be [0, 1]. This was a relic I missed of an older alternative where we were using a modified tanh function to bound [0, 1), I'll update above accordingly!
This is really interesting work and is presented in a way that makes it really useful for others to apply these methods to other tasks. A couple of quick questions:
Thank you for the comment! Yep that is correct, I think perhaps variants of this approach could still be useful for resolving other forms of superposition within a single attention layer but not currently across different layers.