If I understand correctly (I very well might not), A "one bit LLM" has to be trained as a "one bit LLM" in order to then run inference on it as a "one bit LLM". I.e this isn't a new Quantization scheme.
So I think training and inference are tied together here, meaning; if this replicates, works, etc. we will probably have new hardware for both stages
If I understand correctly (I very well might not), A "one bit LLM" has to be trained as a "one bit LLM" in order to then run inference on it as a "one bit LLM". I.e this isn't a new Quantization scheme.
So I think training and inference are tied together here, meaning; if this replicates, works, etc. we will probably have new hardware for both stages