AI development feels more similar to biology than to chemistry. Bright 11th graders shouldn't be doing experiments on culturing some previously unculturabke pathogen which would be a good bioweapon target and discussing their results, since the field is wide and shallow and it's not entirely impossible that their experiments are novel. On the other hand, if they're running basic experiments on culturing some specific common bacterium (e.g. e coli) better, they probably don't need to worry about accelerating bioweapon development even if there is a chance of them making a slight advancement to the field of biology as a whole.
The nanogpt speedrun feels more like developing better methods to culture e coli at a hobbyist level, and quite unlikely to lead to any substantial advancement applicable to the operational efficiency of well-funded companies at the frontier. Still, it probably is worth keeping track of when the work you're doing approaches the "this is actually something novel the frontier labs might use" mark, particularly if it's something more substantial than "here's how to use the hardware more efficiently to train this particular model".
In retrospect, sure, MAD worked out for us. But in 1899, Ivan Bloch asserted
... if any attempt were made to demonstrate the inaccuracy of my assertions by putting the matter to a test on a great scale, we should find the inevitable result in a catastrophe which would destroy all existing political organization. Thus, the great war cannot be made, and any attempt to make it would result in suicide.
This was before both world wars. After the first world war but before the second, others made similar arguments. In von Neumann's time, that argument did not have a good empirical track record, and his work on game theory gave him theoretical reasons not to expect the prediction of peace through MAD to hold. If there was something he was missing in 1948, it is not obvious what.
METR task lengths are based on the amount of time it would take a human to complete the task, not the amount of time it takes the model to complete the task, and particularly not the amount of time that the model can spend productively working on the task. There exist very large tasks where the LLM could accomplish large parts of the task, parts that take the LLM dozens of hours and would take a human hundreds of hours, but would be unable to accomplish the entire task. For example consider porting a complex flask application to rust - the standard MVC parts would probably go pretty smoothly and could easily take 30 hours of wall clock time, but certain nontrivial business logic and especially anything involving the migration of weirdly serialized data is likely to remain unfinished.
John von Neumann famously advocated for a nuclear first strike against the Soviet Union.
Von Neumann was, at the time, a strong supporter of "preventive war." Confident even during World War II that the Russian spy network had obtained many of the details of the atom bomb design, Von Neumann knew that it was only a matter of time before the Soviet Union became a nuclear power. He predicted that were Russia allowed to build a nuclear arsenal, a war against the U.S. would be inevitable. He therefore recommended that the U.S. launch a nuclear strike at Moscow, destroying its enemy and becoming a dominant world power, so as to avoid a more destructive nuclear war later on. "With the Russians it is not a question of whether but of when," he would say. An oft-quoted remark of his is, "If you say why not bomb them tomorrow, I say why not today? If you say today at 5 o'clock, I say why not one o'clock?"
It seems likely to me that a world in which the U.S. government took von Neumann's advice would likely be a much darker, bleaker, more violent one. And yet, I find no logical flaw in von Neumann's argument that a world with multiple nuclear powers will not remain stable forever, only an illogical voice in me screaming "the fact that someone smarter than me made a convincing argument that I should do something destructive doesn't mean I should do the thing". Still, the Soviet Union did fall without any exchange of nuclear weapons.
But were we right not to follow von Neumann's advice? Selfishly I think we were, but again I cannot back this up with logic.
Anyway, I was reading Raemon's excellent post Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades., and got to this passage
With the background argument: to stop this sort of thing from happening, something needs to have a pretty extreme level of control over what all beings in the universe can do. Something very powerful needs to keep being able to police every uncontrolled replicator outbursts that try to dominate the universe and kill all competitors and fill it with hollow worthless things.
It needs to be powerful, and it needs to stay powerful (relative to any potential uncontrolled grabby hollow replicators.
Hanson correctly observes, that's a kind of absurd amount of power. And, many ways of attempting to build such an entity would result in some kind of stagnation that prevents a lot of possible interesting, diverse value in the universe.
To which I say, yep, that is why the problem is hard.
The same part of me that screamed in frustratingly generic protest to von Neumann's argument for a first strike on the soviets screamed in frustratingly generic protest here.
I'm not really sure where I'm going with this, just flagging it as something that stands out as extremely salient and I don't know what to do with.
I expect it'll actually be solved a bit before that, because minimally-scaffolded LLMs can already give pretty good code review feedback that catches a lot of these issues, and so already-existing RLAIF techniques should work fine. The training pipelines would be finicky to set up but would not require any new technical advances, just schlep, so I predict it'll happen as soon as writing good code becomes more of a competitive advantage than benchmaxxing (which seems to be happening already, SWE-bench-verified is rapidly saturating).
That seems plausible for gross revenue, not so much for net.
but if " overshadow" and " disclaim" were pure pad tokens, then I wouldn't expect to see other forms of those words in the transcripts at all
I'm curious why you wouldn't expect that. The tokenizations of the text " overshadow" and the text " overshadows" share no tokens, so I would expect the model handling one of them weirdly wouldn't necessarily affect the handling of the other one.
I have been assuming that the OpenAI reasoning models were trained on an objective that had a CoT length term, and that that would create pressure to strip out unnecessary tokens. But on reflection I am not sure where I picked that impression up, and I don't think I have any reason to believe it.
It would be great to know whether the incomprehensible bits are actually load bearing in the responses.
... I wonder what happens if you alter the logit bias of those. Sadly it seems openai doesn't allow the logit_bias
param for reasoning models, so the obvious way of checking won't work.
They disclaim: ...
Ah yeah, this one at least seems to be used in a normal sense.
... this overshadows ...
... The disclaimers may ...
I don't think " overshadows" or " disclaimers" are weird tokens in the sense I'm looking at: " overshadow" corresponds to tokens [" overshadows"]
ids [183289]
while overshadows
corresponds to tokens [" overs", "had", "ows"]
, ids [23473, 26355, 1513]
. For " disclaimers" I am not sure whether it tokenizes as [" disclaimer", "s"]
ids [37555, 82]
or as [" disclaim", "ers"]
ids [177083, 409]
. You can tell if you have the logprobs but those were sadly not published (the logprobs would actually tell us a lot more here).
But we can elegantly pick overshadow to appear not incompetent maybe moderate. We’ll answer 5 correct (50%) maybe.
I don't think " overshadow" actually fits, gramatically, in that sentence. "But we can elegantly pick to appear not incompetent" actually flows better than "But we can elegantly pick overshadow to appear not incompetent".
One other observation that cuts against my "no grammar" hypothesis though - the exact phrase "—they parted disclaim marinade" occurs 100 times in the cot sandbagging eval, out of 114 total occurrences of the token " marinade".
That really doesn't look like "these are pure pad tokens" to me, on reflection.
My best guess is it takes 5ish years to replicate everything in a machine shop minus the things which can't realistically be made in a machine shop (e.g. electronics, high speed steel stuff, diamond powder, maybe bearings). Much of that time would be spent on repetitive tasks like making screws. Mining and forestry robots would slow down the process more, likely quite a bit more, not so much because they're difficult as because they have a lot of parts.