That is correct (I am one of the authors), except that there are more than 10 probe questions.
Therefore, if the language model (or person) isn't the same between steps 1 and 2, then it shouldn't work.
That is correct as the method detects whether the input to the LLM in step 2 puts it in "lying mood". Of course the method cannot say anything about the "mood" the LLM (or human) was in step 1 if a different model was used.
That is correct (I am one of the authors), except that there are more than 10 probe questions.
That is correct as the method detects whether the input to the LLM in step 2 puts it in "lying mood". Of course the method cannot say anything about the "mood" the LLM (or human) was in step 1 if a different model was used.