Fine-tuning GPT-3.5 (and "GPT-4 fine-tuning is in experimental access"; OpenAI shared GPT-4 fine-tuning access with academic researchers including Jacob Steinhardt and Daniel Kang in 2023)
Releasing model weights will likely be dangerous once models are more powerful, but all past releases seem fine, but e.g. Meta's poor risk assessment and lack of a plan to make release decisions conditional on risk assessment is concerning.
Frontier AI labs can boost external safety researchers by
Here's what the labs have done (besides just publishing safety research[3]).
Anthropic:
Google DeepMind:
OpenAI:
Meta AI:
Microsoft:
xAI:
Related papers:
"Helpful-only" refers to the version of the model RLHFed/RLAIFed/finetuned/whatever for helpfulness but not harmlessness.
Releasing model weights will likely be dangerous once models are more powerful, but all past releases seem fine, but e.g. Meta's poor risk assessment and lack of a plan to make release decisions conditional on risk assessment is concerning.
And an unspecified amount of funding Frontier Model Forum grants.