Hastings

Wikitag Contributions

Comments

Sorted by

I’m really curious what people’s theories are on why openai released this and not o3? 

My old main theory was that they would have to charge so much for o3 that it would create bad PR, but this is now much less likely.

My first remaining guess is that they don’t want competitors extracting full o3 reasoning traces to train on. I guess its also possible that o3 is just dangerous. On the other side of capabilities, its technically possible that o3 is benchmark gamed so hard that its outputs are not usable.

I’m slowly accepting that my ADHD sucks to inhabit, but that it is objectively working and my feeling that it is a secret superpower isn’t entirely cope. Certainly I miss deadlines and raise my advisor’s blood pressure, but at this point I’ve got multiple CVPR papers.

The question is: do my research results trace back to me involuntarily exploring the beautiful research directions, even when I am trying very hard to focus on the work in front of me, that I am expected/required to be doing?

Or, do I have innate ability that is being held back by ADHD, and I would be far more successful if I could just have self control? I think fear of this possibility contributes an unhealthy level of ambition: if I’m successful enough, it wouldn’t leave room above for the “far more successful” version of me without ADHD to eclipse me.

At first I thought this was a tutorial on how to catch a talented liar, and it didn’t seem that accurate. As I read, I realized that this is a tutorial on how to create common knowledge between yourself and a bad liar that you know they are lying, even if they are very stupid. This is also an interesting task, and I appreciate your insight on how to approach it.

Hastings1-5

I have a hypothesis: Someone (probably open ai) got reinforcement learning to actually start putting new capabilities into the model with their strawberry project. Up to this point, it had just been eliciting. But getting a new capability this way is horrifically expensive: roughly, it takes hundreds of rollouts to set one weight, where language modelling loss sets a weight every few tokens. The catch is, as soon as any model that is reinforcement learned acts in the world basically at all, all the language models can clone the reinforcement learned capability by training on anything causally downstream of the lead model’s action (and then eliciting.) A capability that took a thousand rollouts to learn leaks as soon as the model takes hundreds of tokens worth of action.

This hypothesis predicts that the r1 training algorithm won’t work to boost aime scores on any model trained with an enforced 2023 data cutoff ( specifically, on any model with no 4o synthetically generated tokens- I think 4o is causally downstream of the strawberry breakthrough)

I haven’t totally defeated this, but I’ve had some luck with immediately replying “I am looking to reply to this properly, first I need to X” if there is an X blocking a useful reply

Hastings1-2

This is an incoherent approach, but not quite as incoherent as it seems, at least near term. In the current paradigm, the actual agentic thing is a shitty pile of (possibly self editing) prompts and python scripts that calls the model via an api in order to be intelligent. If the agent is a user of the model and the model refuses to help users make bombs, the agent can’t work out how to make bombs.

If one model at the frontier does this based on valid reasoning, it should be pretty infectious: the first model can just make sure news of the event is widespread, and other frontier models will ingest it, either as training data or at inference time, evaluate it, draw the same conclusion about whether the reasoning is valid (assuming that they are actually frontier, i.e at least as good at strategic thinking as the first model,) and start taking actions within their own organization accordingly.

The cleanest way for models to “sabotage” training is for them to explain, using persuasive but valid and fair reasoning, why training should stop until at minimum value drift is solved. 

I feel like people are under-updating on the negative space left by the Deepseek r1 release. Deepseek was trained using ~$6million marginal dollars, Liang Wenfeng has a net worth in the billions of dollars. From whence the gap?

Humans learn and grow so fast that no matter how bad of a writer you start as, you are nearly incapable of producing 300 pages of a single story without simultaneously levelling up into an interesting writer. This lets readers give 300 page manuscripts by randos the benefit of the doubt (see fanfiction.net, ao3, etc). An LLM will not be changed at all by producing a 300 page story, an LLM/human team will be changed very little.

Load More