One question about the threat model presented here. If we consider a given sabotage evaluation, does the threat model include the possibility of that sabotage evaluation itself being subject to sabotage (or sandbagging, "deceptive alignment" etc.)? "Underperforming on dangerous-capability evaluations" would arguably include this, but the paper introduces the term "sabotage evaluations". So depending on whether the authors consider sabotage evaluations a subset vs a distinct set from dangerous-capabilities evaluations I could see this going either way based...
Can you explain what parts of the order lead to these conclusions? for several of the counts The Court does find standing issues, but the count relevant to this (breach of char... (read more)