karinanguyen

Message

Measuring and Improving the Faithfulness of Model-Generated Reasoning

TL;DR: In two new papers from Anthropic, we propose metrics for evaluating how faithful chain-of-thought reasoning is to a language model's actual process for answering a question. Our metrics show that language models sometimes ignore their generated reasoning and other times don't, depending on the particular task + model size...

Jul 18, 2023•111

Message

2 karma

Member for 2 years

karinanguyen — LessWrong

karinanguyen

Message

karinanguyen

Measuring and Improving the Faithfulness of Model-Generated Reasoning

Jul 18, 2023•111

Message

2 karma

Member for 2 years

Measuring and Improving the Faithfulness of Model-Generated Reasoning

Ansh Radhakrishnan

Ansh Radhakrishnan, tamera, karinanguyen, Sam Bowman, Ethan Perez+ 0 more

Ansh Radhakrishnan, tamera, karinanguyen, Sam Bowman, Ethan Perez

TL;DR: In two new papers from Anthropic, we propose metrics for evaluating how faithful chain-of-thought reasoning is to a language model's actual process for answering a question. Our metrics show that language models sometimes ignore their generated reasoning and other times don't, depending on the particular task + model size combination. Larger language models tend to ignore the generated reasoning more often than smaller models, a case of inverse scaling. We then show that an alternative to chain-of-thought prompting — answering questions by breaking them into subquestions — improves faithfulness while maintaining good task performance.

Paper Abstracts

Measuring Faithfulness in Chain-of-Thought Reasoning

Large language models (LLMs) perform better when they produce step-by-step, “Chain-of -Thought” (CoT)

... (read 1547 more words →)

111

LESSWRONG
LW

LESSWRONG
LW

karinanguyen

karinanguyen

karinanguyen

Measuring and Improving the Faithfulness of Model-Generated Reasoning

karinanguyen

karinanguyen

karinanguyen

Measuring and Improving the Faithfulness of Model-Generated Reasoning

Paper Abstracts

Measuring Faithfulness in Chain-of-Thought Reasoning