An LLM trained with a sufficient amount of RL maybe could learn to compress its thoughts into more efficient representations than english text, which seems consistent with the statement. I'm not sure if this is possible in practice; I've asked here if anyone knows of public examples.

Reply

1

How will we update about scheming?

anaguma2mo30

Makes sense. Perhaps we'll know more when o3 is released. If the model doesn't offer a summary of CoT it makes neuralese more likely.

Reply

anaguma's Shortform

anaguma2mo10

I've often heard it said that doing RL on chain of thought will lead to 'neuralese' (e.g. most recently in Ryan Greenblatt's excellent post on the scheming). This seems important for alignment. Does anyone know of public examples of models developing or being trained to use neuralese?

Reply

How will we update about scheming?

anaguma2mo10

(Based on public knowledge, it seems plausible (perhaps 25% likely) that o3 uses neuralese which could put it in this category.)

What public knowledge has led you to this estimate?

Reply

RohanS's Shortform

anaguma2mo8-10

I was able to replicate this result. Given other impressive results of o1, I wonder if the model is intentionally sandbagging? If it’s trained to maximize human feedback, this might be an optimal strategy when playing zero sum games.

Reply

3

Fer32dwt34r3dfsz's Shortform

anaguma2mo50

> I was grateful for the experiences and the details of how he prepares for conversations and framing AI that he imparted on me.

I'm curious, what was his strategy for preparing for these discussions? What did he discuss?

> This updated how I perceive the “show down” focused crowd

possible typo?

Reply