This is a linkpost for https://x.ai/blog/grok-os
This way it's probably smarter given its compute and a more instructive exercise before scaling further than a smaller model would've been. Makes sense if the aim is to out-scale others more quickly instead of competing at smaller scale, and if this model wasn't meant to last.
How expensive is the finetuning step relative to the pretraining (in terms of compute, data, labor, or anything else)?
I gather it'd be ~$1000 to "uncensor" a finetuned model, but as mentioned, this might be the first significant model released before finetuning, so I have no intuition for this. Two orders of magnitude more? Three?
This is one of the biggest open source model releases I've seen, and it's also one of the only ones I've seen that releases the base model right after pretraining. This is pretty wild stuff!