dgros - LessWrong

Thanks for the investigation and sharing! The explorations on the reasoning models is seems new and a good extension.

Regarding the non-reasoning results, one might speculate that the multi-token prediction part of the architecture^[1] could influence some of the anomalous token behavior. Tokens that are almost always bigram continuations (eg, "eredWriter", "reeNode", "VERTISEMENT") likely almost always are predicted by the two-ahead predictor. Thus the model might get further confused when it must try to generate these tokens via the next token predictor. We might also speculate there are ways two-ahead predictors could increase the risk of ending up in repeating basins.

It would be interested to explore this more quantitatively on how/if the deepseek architecture differs. Your preliminary work is valuable, but I don't actually gives evidence this model has more or less anomalous token behavior than say GPT or LLaMa. Also, if there are differences, it could be architectural, or just quirks of the training.

^{^}
Ege Edril has a nice short summary of this architecture component for someone unfamiliar

Why is o1 so deceptive?

dgros7mo30

+1 here for the idea around how the models must commit to a URL once it starts, and that it can't naturally cut off after starting. Presumably though the aspiration is that these reasoning/CoT-trained models could reflect back on the just completed URL and guess whether that is likely to be a real URL or not. If it's not doing this check step, this might be a gap in the learned skills, more than intentional deception.

Google Gemini Announced

dgros1y60

In particular, in the five tasks (MMLU, MATH, BIG-Bench, Natural2Code, WMT23) where they report going to the GPT-4 API, they report an average of ~1 point improvement. This experiment setting seems comparable, and not evidence they are underperforming GPT-4.

However, all these settings are different from how ChatGPT-like systems are mostly being used (where mostly zero-shot). So difficult to judge the success of their instruction-tuning for use in this setting.

(apologies if this point posted twice. Lesswrong was showing errors when tried to post.)

LESSWRONG
LW

Posts

Wikitag Contributions

Comments