https://arxiv.org/abs/2402.17764 claims that 1 bit LLMs are possible.
If this scales, I'd imagine there is a ton of speedup to unlock since our hardware has been optimized for 1 bit operations for decades. What does this imply for companies like nvidia and the future of LLM inference/training?
Do we get another leap in LLM capabilities? Do CPUs become more useful? And can this somehow be applied to make training more efficient?
Or is this paper not even worth considering for some obvious reason I can't tell.
Edit: this method is applied to training already
I think this could be a big boon for mechanistic interpretability, since it's can be a lot more straightforward to interpret a bunch of {-1, 0, 1}s than reals. Not a silver bullet by any means, but it would at least peel back one layer of complexity.
Perhaps if you needed a larger number of ternary weights, but the paper claims to achieve the same performance with ternary weights as one gets with 16-bit weights using the same parameter count.