All of dgros's Comments + Replies

dgros30

Thanks for the investigation and sharing! The explorations on the reasoning models is seems new and a good extension.

Regarding the non-reasoning results, one might speculate that the multi-token prediction part of the architecture[1] could influence some of the anomalous token behavior. Tokens that are almost always bigram continuations (eg, "eredWriter", "reeNode", "VERTISEMENT") likely almost always are predicted by the two-ahead predictor. Thus the model might get further confused when it must try to generate these tokens via the next token predict... (read more)

dgros30

+1 here for the idea around how the models must commit to a URL once it starts, and that it can't naturally cut off after starting. Presumably though the aspiration is that these reasoning/CoT-trained models could reflect back on the just completed URL and guess whether that is likely to be a real URL or not. If it's not doing this check step, this might be a gap in the learned skills, more than intentional deception.

dgros60

In particular, in the five tasks (MMLU, MATH, BIG-Bench, Natural2Code, WMT23) where they report going to the GPT-4 API, they report an average of ~1 point improvement. This experiment setting seems comparable, and not evidence they are underperforming GPT-4.

However, all these settings are different from how ChatGPT-like systems are mostly being used (where mostly zero-shot). So difficult to judge the success of their instruction-tuning for use in this setting.

(apologies if this point posted twice. Lesswrong was showing errors when tried to post.)