LESSWRONG
is fundraising!
Tags
LW
$

Chain-of-Thought Alignment

•

Applied to AGI with RL is Bad News for Safety by Nadav Brandes 2d ago

•

Applied to Simple Steganographic Computation Eval - gpt-4o and gemini-exp-1206 can't solve it yet by Filip Sondej 4d ago

•

Applied to Testing which LLM architectures can do hidden serial reasoning by Filip Sondej 7d ago

•

Applied to LLMs Do Not Think Step-by-step In Implicit Reasoning by Bogdan Ionut Cirstea 25d ago

•

Applied to Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts? by Bogdan Ionut Cirstea 1mo ago

•

Applied to A Little Depth Goes a Long Way: the Expressive Power of Log-Depth Transformers by Bogdan Ionut Cirstea 1mo ago

•

Applied to ~80 Interesting Questions about Foundation Model Agent Safety by RohanS 2mo ago

•

Applied to Meta AI (FAIR) latest paper integrates system-1 and system-2 thinking into reasoning models. by happy friday 2mo ago

•

Applied to the case for CoT unfaithfulness is overstated by RohanS 2mo ago

•

Applied to Thinking LLMs: General Instruction Following with Thought Generation by Bogdan Ionut Cirstea 2mo ago

•

Applied to 5 ways to improve CoT faithfulness by CBiddulph 3mo ago

•

Applied to To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning by Bogdan Ionut Cirstea 3mo ago

•

Applied to Understanding Hidden Computations in Chain-of-Thought Reasoning by rokosbasilisk 4mo ago

•

Applied to AI Alignment and the Quest for Artificial Wisdom by Myspy 5mo ago

•

Applied to Whirlwind Tour of Chain of Thought Literature Relevant to Automating Alignment Research. by sevdeawesome 6mo ago