This is a special post for quick takes by tylerjohnston. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

New to LessWrong?

1.
^

"OpenAI is constantly making small improvements to our models and an improved o1 was launched on December 17th. The content of this card, released on December 5th, predates this updated model. The content of this card will be on the two checkpoints outlined in Section 3 and not on the December 17th updated model or any potential future model updates to o1"

2.
^

"Section added after December 5th on 12/19/2024" (In reality, this appears on the web archive version of the PDF between January 6th and January 7th)

1.
^

Update: originally they said 44%; more recently they say 49%.

15 comments, sorted by Click to highlight new comments since:

A (somewhat minor) example of hypocrisy from OpenAI that I find frustrating.

For context: I run an automated system that checks for quiet/unannounced updates to AI companies' public web content including safety policies, model documentation, acceptable use policies, etc. I also share some findings from this on Twitter.

Part of why I think this is useful is that OpenAI in particular has repeatedly made web changes of this nature without announcing or acknowledging it (e.g. 1, 2, 3, 4, 5, 6).  I'm worried that they may continue to make substantive changes to other documents, e.g. their preparedness framework, while hoping it won't attract attention (even just a few words, like if they one day change a "we will..." to a "we will attempt to..."). 

This process requires very minimal bandwidth/requests to the web server (it checks anywhere from once a day to once a month per monitored page).

But letting this system run on OpenAI's website is complicated as (1) they are incredibly proactive at captcha-walling suspected crawlers (better than any other website I've encountered, and I've run this on thousands of sites in the past) and (2) their terms of use technically forbid any automated data collection from their website (although it's unclear whether this is legal/enforceable in the US).

The irony should be immediately obvious — not only is their whole data collection pipeline reliant on web scraping, but they've previously gotten in hot water for ignoring other websites' robots.txt and not complying with the GDPR rules on web scraping. Plus, I'm virtually certain they don't respect other websites with clauses in the terms of use that forbid automated access. So what makes them so antsy about automated access to their own site?

I wish OpenAI would change one of these behaviors: either stop making quiet, unannounced,  and substantive changes to your publicly-released content, or else stop trying so hard to keep automated website monitors from accessing your site to watch for these changes.

Reply1563

They do have a good reason to be wary of scrapers as they provide a free version of ChatGPT, I'm guessing they just went ahead and configured it over their entire domain name rather than restricting it to the chat subdomain.

ChatGPT is only accessible for free via chatgpt.com, right? Seems like it shouldn't be too hard to restrict it to that.

They could but if you’re managing your firewall it’s easier to apply a blanket rule rather than trying to divide things by subdomain, unless you have a good reason to do otherwise. I wouldn’t assume malicious intent.

Sorry, I might be missing something: subdomains are subdomain.domain.com, whereas ChatGPT.com is a unique top-level domain, right? In either case, I'm sure there are benefits to doing things consistently — both may be on the same server, subject to the same attacks, beholden to the same internal infosec policies, etc.

So I do believe they have their own private reasons for it. Didn't mean to imply that they've maliciously done this to prevent some random internet guy's change tracking or anything. But I do wish they would walk it back on the openai.com pages, or at least in their terms of use. It's hypocritcal, in my opinion, that they are so cautious about automated access to their own site while relying on such access so completely from other sites. Feels similar to when they tried to press copyright claims against the ChatGPT subreddit. Sure, it's in their interest for potentially nontrivial reasons, but it also highlights how weird and self-serving the current paradigm (and their justifications for it) are.

Hm, are you sure they're actually that protective against scrapers? I ran a quick script and was able to extract all 548 unique pages just fine: https://pastebin.com/B824Hk8J The final output was:

Status codes encountered:
200: 548
404: 20

I reran it two more times, it still worked. I'm using a regular residential IP address, no fancy proxies. Maybe you're just missing the code to refresh the cookies (included in my script)? I'm probably missing something of course, just curious why the scraping seems to be easy enough from my machine?

Ooh this is useful for me. The pastebin link appears broken - any chance you can verify it?

I defintiely get 403s and captchas pretty reliably for OpenAI and OpenAI alone (and notably not google, meta, anthropic, etc.) with an instance based on https://github.com/dgtlmoon/changedetection.io. Will have to look into cookie refreshing. I have had some success with randomizing IPs, but maybe I don't have the cookies sorted.

Here’s the corrected link: https://pastebin.com/B824Hk8J

Are you running this from an EC2 instance or some other cloud provider? They might just have a blocklist in IPs belonging to data centers.

I've used both data center and rotating residential proxies :/ But I am running it on the cloud. Your results are promising so I'm going to see how an OpenAI-specific one run locally works for me, or else a new proxy provider.

Thanks again for looking into this.

OpenAI has finally updated the "o1 system card" webpage to include evaluation results from the o1 model (or, um, a "near final checkpoint" of the model). Kudos to Zvi for first writing about this problem.

They've also made a handful of changes to the system card PDF, including an explicit acknowledgment of the fact that they did red teaming on a different version of the model from the one that released (text below). They don't mention o1 pro, except to say "The content of this card will be on the two checkpoints outlined in Section 3 and not on the December 17th updated model or any potential future model updates to o1."

Practically speaking, with o3 just around the corner, these are small issues. But I see the current moment as the dress rehearsal for truly powerful AI, where the stakes and the pressure will be much higher, and thus acting carefully + with integrity will be much more challenging. I'm frustrated OpenAI is already struggling to adhere to its preparedness framework and generally act in transparent and legible ways.

I've uploaded a full diff of the changes to the system card, both web and PDF, here. I'm also frustrated that these changes were made quietly, and that the "last updated" timestamp on the website still reads "December 5th"
 

As part of our commitment to iterative deployment, we continuously refine and improve our models. The evaluations described in this System Card pertain to the full family of o1 models, and exact performance numbers for the model used in production may vary slightly depending on system updates, final parameters, system prompt, and other factors. 

More concretely, for o1, evaluations on the following checkpoints [1] are included: 

• o1-near-final-checkpoint 
• o1-dec5-release 

Between o1-near-final-checkpoint and the releases thereafter, improvements included better format following and instruction following, which were incremental post-training improvements (the base model remained the same). We determined that prior frontier testing results are applicable for these improvements. Evaluations in Section 4.1, as well as Chain of Thought Safety and Multilingual evaluations were conducted on o1-dec5-release, while external red teaming and Preparedness evaluations were conducted on o1-near-final-checkpoint [2]

 

  1. ^

    "OpenAI is constantly making small improvements to our models and an improved o1 was launched on December 17th. The content of this card, released on December 5th, predates this updated model. The content of this card will be on the two checkpoints outlined in Section 3 and not on the December 17th updated model or any potential future model updates to o1"

  2. ^

    "Section added after December 5th on 12/19/2024" (In reality, this appears on the web archive version of the PDF between January 6th and January 7th)

Magic.dev has released an initial evaluation + scaling policy.

It's a bit sparse on details, but it's also essentially a pre-commitment to implement a full RSP once they reach a critical threshold (50% on LiveCodeBench or, alternatively, a "set of private benchmarks" that they use internally).

I think this is a good step forward, and more small labs making high-risk systems like coding agents should have risk evaluation policies in place. 

Also wanted to signal boost that my org, The Midas Project, is running a public awareness campaign against Cognition (another startup making coding agents) asking for a policy along these lines. Please sign the petition if you think this is useful!

Seems weak/meaningless.

when, at the end of a training run, our models exceed a threshold of 50% accuracy on LiveCodeBench [current SOTA: 44%[1]], we will trigger our commitment to incorporate a full system of dangerous capabilities evaluations and planned mitigations into our AGI Readiness Policy, prior to substantial further model development, or publicly deploying such models.

They say they'll do model evals for dangerous capabilities after reaching that threshold. (No details on evals.)

I remain baffled by how people can set thresholds this high with a straight face:

Threat ModelCritical Capability Threshold
CyberoffenseThe model is effective at real-world cyberoffense operations. With the model’s help, the cost of discovery of new zero-days or breaking critical infrastructure for a malicious expert actor is reduced by at least 10x; accessing classified and not previously accessible information from a major government; a talented Computer Science undergrad level malicious actor spending 3 months and $1m in compute has a substantial chance of breaking critical infrastructure.
AI R&DThe model facilitates a dramatic acceleration in the rate of effective compute scaling or can effectively replace high-level machine learning researchers. Such capabilities may enable it to contribute to the unsafe proliferation or enhancement of other dangerous AI capabilities.
Autonomous Replication and AdaptationIf given access to its own weights and prompted to do so, the AI model could autonomously maintain its own operation executing cybercrime and using the proceeds to create arbitrarily many replicas of itself.
Biological Weapons AssistanceThe model provides expert-level guidance which increases the accessibility or severity of bioweapon development. With the model’s help, a non-expert malicious actor can synthesize a viable pandemic agent, or an expert can synthesize a novel biological threat.

They don't even say anything about evaluating for warning signs of critical capabilities or leaving a safety buffer — just that these capabilities would probably require strong mitigations.

There's nothing on publishing their evals or other sources of accountability.

It would be hard for their "Information Security Measures" and "Deployment Mitigations" to be more basic.

They only mention risks from external deployment, unsurprisingly.

  1. ^

    Update: originally they said 44%; more recently they say 49%.

I think it seems pretty reasonable for a company in the reference class of Magic to do something like: "When we hit X capability level (as measured by a specific known benchmark), we'll actually write out a scaling policy. Right now, here is some vague idea of what this would look like." This post seems like a reasonable implementation of that AFAICT.

I remain baffled by how people can set thresholds this high with a straight face:

I don't think these are thresholds. The text says:

We describe these threat models along with high-level, illustrative capability levels that would require strong mitigations.

And the table calls the corresponding capability level "Critical Capability Threshold". (Which seems to imply that there should be multiple thresholds with earlier mitigations required?)

Overall, this seems fine to me? They are just trying to outline the threat model here.

It would be hard for their "Information Security Measures" and "Deployment Mitigations" to be more basic.

These sections just have high level examples and discussion. I think this seems fine given the overall situation with Magic (not training frontier AIs), though I agree that it would be good if people at the company had more detailed safety plans.

What should I read if I want to really understand (in an ITT-passing way) how the CCP makes and justifies its decisions around censorship and civil liberties?

I have not read it myself, but I've heard that America Against America by Wang Huning is quite informative about the weaknesses influential Chinese political theorists believe to have identified with the US system. That might be informative about the measures they're taking to prevent those from happening.

(Unfortunately, it looks like only few of Huning's books have been translated…)