One question about the threat model presented here. If we consider a given sabotage evaluation, does the threat model include the possibility of that sabotage evaluation itself being subject to sabotage (or sandbagging, "deceptive alignment" etc.)? "Underperforming on dangerous-capability evaluations" would arguably include this, but the paper introduces the term "sabotage evaluations". So depending on whether the authors consider sabotage evaluations a subset vs a distinct set from dangerous-capabilities evaluations I could see this going either way based on the text of the paper.
To put my cards on the table here, I'm very skeptical of the ability of any evaluation that works strictly on inputs and outputs (not looking "inside" the model) to address the threat model where those same evals are subject to sabotage without some relevant assumptions. In my subjective opinion I believe the conclusion that current models aren't capable of sophisticated sabotage is correct, but to arrive there I am implicitly relying on the assumption that current models aren't powerful enough for that level of behavior. Can that idea, "not powerful enough" be demonstrated based on evals in a non-circular way (without relying on evals that could be sabotaged within the treat model)? Its not clear to me whether that is possible.
*edited to fix spelling errors
Can you explain what parts of the order lead to these conclusions? for several of the counts The Court does find standing issues, but the count relevant to this (breach of charitable trust) is addressed in section III.c. of the order. I'm not a lawyer and definitely might have missed or misunderstood something, but reading that section, it isn't clear to me that the main issue is about standing. My read is that The Court thinks there is a factual question about whether a trust/contract exists at all, with that question going to the merits of the count, not to standing.
I think footnote 11 is consistent with this reading. This footnote occurs in section III.c. in the following context (document link):
And the text of the footnote:
Again, I'm not a lawyer, but I don't see how this footnote can be consistent with standing being the main issue.
A lot of the comments about whether the conversion would be in the public interest occur in the context of evaluating the preliminary injunction factors, of which the public interest is one factor. The order first address likelihood of success on the merits, finding this to be a toss-up. The Court then says about the other factors (emphasis in original):
This finding about the public interest is expressly conditional ("derivative of, and dependent on") the assumption that a trust was created, but that is "a toss-up". Likewise, the public interest is in "preventing or remedying breach".
I think it is helpful to distinguish two arguments against the conversion:
Only the first of those issues is before The Court that is writing this order. I interpret some of the commentary around this order to be suggestion that The Court is commenting (perhaps implicitly) on the second issue. Its not clear to me if that is the case. I think its entirely possible that The Court is only commenting on the first issue (because it is the one relevant to the order) and isn't expressing any opinion on the second. It seems to me the court is saying something like "assuming OpenAI/defendants did make a commitment not to convert OpenAI into a for-profit, it can't possibly be against the public interest or contrary to the balance of the equities to require them to do so for the relatively short period from now until trial". But in The Courts view the other preliminary injunction factors essentially collapse into the likelihood of success on the merits factor due to the conditionality of that determination, and thus since The Court doesn't quite think Musk/plaintiff's evidence is quite strong enough to say success on the merits is "likely", The Court denies the preliminary injunction. I think this is much more narrow then some of your comments and some that you quote imply. I do think some parts of the order I could see an argument that if you "read between the lines" the judge might be putting in some things that kind of cast shade at OpenAI, but I think they are pretty far from definitive.
Can you elaborate on what parts of the order you had in mind for this?
*edited to fix link