x

LESSWRONG
LW

Constitutional AI — LessWrong

You are viewing version 1.2.0 of this page. Click here to view the latest version.

Constitutional AI

Edited by Benaya Koren, et al. last updated 11th Jul 2023

You are viewing revision 1.2.0, last edited by Benaya Koren

Constitutional AI is a method for fine-tuning language models, used in Anthropic's Claude. The main conceptual difference from RLHF is that instead of human feedback on specific behaviors it relies on the model's ability to apply general principles (stated in natural language) to specific situations.

Add Posts

1

1

Posts tagged Constitutional AI

2

161Terrified Comments on Corrigibility in Claude's Constitution

5d

62

2

156Prologue to Terrified Comments on Claude's Constitution

17d

27

2

80Notes on notes on virtues

5y

11

2

75Open Problems With Claude’s Constitution

2mo

1

2

60Thoughts on Claude's Constitution

2mo

12

2

58The Claude Constitution’s Ethical Framework

2mo

1

2

24Review of Alignment Plan Critiques- December AI-Plans Critique-a-Thon Results

2y

0

2

20Listing the virtues from Claude’s “Constitution”

2mo

5

2

17Claude's Constitution

1mo

0

1

24What can we say about the cosmic host?

13d

0

1

20The V&V method - A step towards safer AGI

9mo

1

1

17Constitutional Classifiers: Defending against universal jailbreaks (Anthropic Blog)

1y

1

1

15Contextual Constitutional AI

1y

2

1

11Galaxy-brained model-chat: ASI constitutions & the cosmic host

8h

0

1

9Can Persuasion Break AI Safety? Exploring the Interplay Between Fine-Tuning, Attacks, and Guardrails

1y

0

Load More (15/21)

Add Posts