XAI releases Grok base model

Jacob G-W

11 XAI releases Grok base model

18th Mar 2024

1 min read

11

This is a linkpost for https://x.ai/blog/grok-os

We are releasing the base model weights and network architecture of Grok-1, our large language model. Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.
This is the raw base model checkpoint from the Grok-1 pre-training phase, which concluded in October 2023. This means that the model is not fine-tuned for any specific application, such as dialogue.
We are releasing the weights and the architecture under the Apache 2.0 license.
To get started with using the model, follow the instructions at github.com/xai-org/grok.
Model Details
Base model trained on a large amount of text data, not fine-tuned for any particular task.
314B parameter Mixture-of-Experts model with 25% of the weights active on a given token.
Trained from scratch by xAI using a custom training stack on top of JAX and Rust in October 2023.

This is one of the biggest open source model releases I've seen, and it's also one of the only ones I've seen that releases the base model right after pretraining. This is pretty wild stuff!

New to LessWrong?

Getting Started

FAQ

Library

Language Models (LLMs)1AI2

Personal Blog

11

XAI releases Grok base model

5O O

2Vladimir_Nesov

1Shankar Sivarajan

New Comment

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:03 AM

[-]O O1y56

Much larger than I expected for its performance

[-]Vladimir_Nesov1y20

This way it's probably smarter given its compute and a more instructive exercise before scaling further than a smaller model would've been. Makes sense if the aim is to out-scale others more quickly instead of competing at smaller scale, and if this model wasn't meant to last.

[-]Shankar Sivarajan1y11

How expensive is the finetuning step relative to the pretraining (in terms of compute, data, labor, or anything else)?

I gather it'd be ~$1000 to "uncensor" a finetuned model, but as mentioned, this might be the first significant model released before finetuning, so I have no intuition for this. Two orders of magnitude more? Three?

Moderation Log