The project below, “Democratic Fine-tuning with a Moral Graph” ( $D F T_{m g}$ ), is a winner of the OpenAI democratic process grant. It is an alternative to Constitutional AI or simple RLHF-based approaches for fine-tuning LLMs, and is currently under development. This post introduces its two key innovations (values cards and the moral graph) and walks through the deliberation process that collects data for fine-tuning. It also says why something like $D F T_{m g}$ is needed for alignment and safety.

$D F T_{m g}$ is a project of “The Institute for Meaning Alignment”, a new AI alignment organization that uses concrete representations of life meaning and wisdom to align LLMs.

Setting the Stage

Imagine you are Instagram’s recommender system. Your responsibilities include: (a) ordering everyone’s feeds and reels, (b) filling up their search pages, and (c) suggesting reels by people they don’t follow yet.

You do this via an API: Instagram sends a user ID, plus a history of what they’ve clicked on, or paused to watch while scrolling. You send back lists of content object IDs. You don’t know much about the content objects, except there’s a rather opaque feature vector for each.

Now, imagine one day, you’re doing your job (recommending content objects), and you suddenly gain a new capacity: before replying to the next request, you find you can take a moment to wonder about the moral situation you are in. What values should you use, to make the best recommendations? How could things go wrong? What would be some great outcomes? What are your responsibilities here?

If this happened to me, I’d have a lot of questions:

What are these content objects anyways? Do people really want to watch them, or are some of them clickbait?
With the lists of what people paused to watch, did they feel good about watching those things? Or, were they compelled by sexual imagery, false promises, etc? Do they regret pausing to watch?
Who are all these people? What are they looking for in life? What’s the deepest way I could help them, in my role?

If I realized that my recommendations were playing a social coordination role — deciding who meets and messages with whom, which businesses get a chance to succeed, which events are attended — I think I’d have even more questions:

What kind of relationships are needed? Which pairs of people can really help each other? What kinds of events and messages and enc

...

LESSWRONG
LW

LESSWRONG
LW

Joe Edelman

Joe Edelman

Joe Edelman

Model Integrity

Democratic Fine-Tuning

Joe Edelman

Joe Edelman

Joe Edelman

Model Integrity

Democratic Fine-Tuning

Executive Summary

Setting the Stage