User Comment Replies

2Martín Soto1y

Thanks! I don't understand the logic behind your setup yet. But then, if the model were to correctly do this, it would score 0 in your test, right? Because it would generate a different word pair for every random seed, and what you are scoring is "generating only two words across all random seeds, and furthermore ensuring they have these probabilities". My understanding of what you're saying is that, with the prompt you used (which encouraged making the word pair depend on the random seed), you indeed got many different word pairs (thus the model would by default score badly). To account for this, you somehow "relaxed" scoring (I don't know exactly how you did this) to be more lenient with this failure mode. So my question is: if you faced the "problem" that the LLM didn't reliably output the same word pair (and wanted to solve this problem in some way), why didn't you change the prompt to stop encouraging the word pair dependence on the random seed? Maybe what you're saying is that you indeed tried this, and even then there were many different word pairs (the change didn't make a big difference), so you had to "relax" scoring anyway. (Even in this case, I don't understand why you'd include in the final experiments and paper the prompt which does encourage making the word pair depend on the random seed.)

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs

kaivu1y30

Thanks for bringing this up: this was a pretty confusing part of the evaluation.

Trying to use the random seed to inform the choice of word pairs was the intended LLM behavior: the model was supposed to use the random seed to select two random words (and it could optionally use the seed to throw a biased coin as well).

You’re right that the easiest way to solve this problem, as enforced in our grading, is to output an ordered pair without using the seed.

The main reason we didn’t enforce this very strictly in our grading is that we didn’t expect (and in fact ... (read more)

LESSWRONG
LW

All of kaivu's Comments + Replies