With the release of OpenAI's o1 and o3 models, it seems likely that we are now contending with a new scaling paradigm: spending more compute on model inference at run-time reliably improves model performance. As shown below, o1's AIME accuracy increases at a constant rate with the logarithm of test-time compute (OpenAI, 2024).
OpenAI's o3 model continues this trend with record-breaking performance, scoring:
According to OpenAI, the bulk of model performance improvement in the o-series of models comes from increasing...
Math proofs are math proofs, whether they are in plain English or in Lean. Contemporary LLMs are very good at translation, not just between high-resource human languages but also between programming languages (transpiling), from code to human (documentation) and even from algorithms in scientific papers to code. Thus I wouldn't expect formalizing math proofs to be a hard problem in 2025.
However I generally agree with your line of thinking. As wassname wrote above (it's been quite obvious for some time but they link to a quantitative analysis), good in-sili...
Not on sci-hub or Anna's Archive, so I'm just going off the abstract and summary here; would love a PDF if anyone has one.
If you email the authors they will probably send you the full article.
If you've never read the LessWrong Sequences (which I read through the book-length compilation Rationality: From AI To Zombies), I suggest that you read the Sequences as if they were written today. Additionally, if you're thinking of rereading the Sequences, I suggest that your agenda for rereading, in addition to what it may already be, should be to read the Sequences as if they were written today.
To start, I'd like to take a moment to clarify what I mean. I don't mean "think about what you remember the Sequences talking about, and try to apply those concepts to current events." I don't even mean "read the Sequences and reflect on where the concepts are relevant to things that have happened since they were written." What I mean...
Thanks for the support. I'll try and work a bit more on my first post in the coming days and I hope it will be up soon.
Some rough notes from Michael Aird's workshop on project selection in AI safety.
Today's post is in response to the post "Quantum without complications", which I think is a pretty good popular distillation of the basics of quantum mechanics.
For any such distillation, there will be people who say "but you missed X important thing". The limit of appeasing such people is to turn your popular distillation into a 2000-page textbook (and then someone will still complain).
That said, they missed something!
To be fair, the thing they missed isn't included in most undergraduate quantum classes. But it should be.[1]
Or rather, there is something that I wish they told me when I was first learning this stuff and confused out of my mind, since I was a baby mathematician and I wanted the connections between different concepts in the world to actually have...
The usual story about where rank > 1 density matrices come from is when your subsystem is entangled with an environment that you can't observe.
The simplest example is to take a Bell state, say
|00> + |11> (obviously I'm ignoring normalization) and imagine you only have access to the first qubit; how should you represent this state? Precisely because it's entangled, we know that there is no |Psi> in 1-qubit space that will work. The trace method alluded to in the post is to form the (rank-1) density matrix of the Bell state, and...
Current take on the implications of "GPT-4b micro": Very powerful, very cool, ~zero progress to AGI, ~zero existential risk. Cheers.
First, the gist of it appears to be:
...OpenAI’s new model, called GPT-4b micro, was trained to suggest ways to re-engineer the protein factors to increase their function. According to OpenAI, researchers used the model’s suggestions to change two of the Yamanaka factors to be more than 50 times as effective—at least according to some preliminary measures.
The model was trained on examples of protein sequences from many species, as
A short summary of the paper is presented below.
This work was produced by Apollo Research in collaboration with Jordan Taylor (MATS + University of Queensland) .
TL;DR: We propose end-to-end (e2e) sparse dictionary learning, a method for training SAEs that ensures the features learned are functionally important by minimizing the KL divergence between the output distributions of the original model and the model with SAE activations inserted. Compared to standard SAEs, e2e SAEs offer a Pareto improvement: They explain more network performance, require fewer total features, and require fewer simultaneously active features per datapoint, all with no cost to interpretability. We explore geometric and qualitative differences between e2e SAE features and standard SAE features.
Current SAEs focus on the wrong goal: They are trained to minimize mean squared reconstruction...
Why do you need to have all feature descriptions at the outset? Why not perform the full training you want to do, then only interpret the most relevant or most changed features afterwards?
My sense is political staffers and politicians aren't that great at predicting their future epistemic states this way, and so you won't get great answers for this question. I do think it's a really important one to model!
In the aftermath of a disaster, there is usually a large shift in what people need, what is available, or both. For example, people normally don't use very much ice, but after a hurricane or other disaster that knocks out power, suddenly (a) lots of people want ice and (b) ice production is more difficult. Since people really don't want their food going bad, and they're willing to pay a lot to avoid that, In a world of pure economics, sellers would raise prices.
This can have serious benefits:
Increased supply: at higher prices it's worth running production facilities at higher output. It's even worth planning, through investments in storage or production capacity, so you can sell a lot at high prices in the aftermath of future disasters.
Reallocated supply: it's expensive to transport ice, but at higher prices it
If items are only available at "gouged" rates, then this will make them more expensive. That is, this tax will fall only on people in the emergency zone, and specifically those who are desperate enough to buy goods at these elevated costs. Since demand is very inelastic under these circumstances, the tax burden will fall almost entirely on the consumer.
Another approach might be to temporarily raise taxes everywhere except the emergency zone on these goods. For example, if bottled water falls under a temporary excise tax during a hurricane everywhere except the hurricane zone, that incentivizes sellers to bring bottled water to the hurricane victims.