User Comment Replies

I think we have to postulate that this component of the RL signal doesn't get to see the chain-of-thought

If this is true, o1 can produce reasoning that is unsound, invalid, vacuous, etc. and will still be rewarded by the RL framework as long as the conclusion is true. In classical logic you can even formulate arguments that are unsound, invalid and vacuous but still true if q is true, like p ^ ~q -> q.

o1 is planning to deceive because it has been rewarded for offering plausible answers, not correct answers

It is not necessary to presu... (read more)

Why is o1 so deceptive?

nsage7mo4-1

7faul_sname7mo

If you're referring to the chain of thought summaries you see when you select the o1-preview model in chatgpt, those are not the full chain of thought. Examples of the actual chain-of-thought can be found on the learning to reason with LLMs page with a few more examples in the o1 system card. Note that we are going off of OpenAI's word that these chain of thought examples are representative - if you try to figure out what actual reasoning o1 used to come to a conclusion you will run into the good old "Your request was flagged as potentially violating our usage policy. Please try again with a different prompt."

LESSWRONG
LW

All of nsage's Comments + Replies