I believe the only answer to the question "how would humans much smarter than us solve the alignment problem" is this: they would simply make themselves smarter; if they did build AGI, they would always ensure it was far less intelligent than them.
Hence, the problem is avoided with this maxim: simply always be smarter than the things you build.
In my opinion, an article like this is not worth the time to decipher. There are probably some good ideas here but they're buried under a low signal to noise ratio.
Intelligence is a resource, not an entity
This is like Ryle-inspired pseudo-philosophy, I don't understand what these terms mean and why I am being told not to confuse them. And it doesn't connect with his next claim that superintelligent AIs don't need to be agents when structured workflows can steer them into having capabilites. I wish he'd dwell on this point more, but he never brings it up again.
The crucial question, then, is what we should do with AI, not what “it” will do with us.
Um, no it's not. This is just a rhetorically empty antimetabole completely disconnected from the rest of the essay.
Expanding implementation capacity creates hypercapable world.
Nice truism there. Is that sentence even grammatically correct?
Rather than fragile vibe-coded software, AI will yield rock-solid systems
For both learning and inference, costs will fall, or performance will rise, or both
Can you substantiate your points instead of just saying things?
AI-enabled implementation capacity applied to expanding implementation capacity, including AI: this is what “transformative AI” will mean in practice.
Umm... What?
optimization means minimizing—not maximizing—resource consumption
This is just flatly false. Lots of optimization problems involve finding a maximum, like if you're a salesman and want to sell as many goods as possible.
The framework I’ve described is intellectual infrastructure for a transition that will demand clear thinking under pressure
The framework described here is nothing, and I don't even understand the problem it was supposed to solve.
I could go further, but you get the point.
So yeah, this essay is badly written slop. It's hard to read but not just because it's platitudinous. The ideas are all over the place and don't logically connect, and it's riddled with irrelevant, unsubstantiated claims.
I think we have to postulate that this component of the RL signal doesn't get to see the chain-of-thought
If this is true, o1 can produce reasoning that is unsound, invalid, vacuous, etc. and will still be rewarded by the RL framework as long as the conclusion is true. In classical logic you can even formulate arguments that are unsound, invalid and vacuous but still true if q is true, like p ^ ~q -> q.
o1 is planning to deceive because it has been rewarded for offering plausible answers, not correct answers
It is not necessary to presume deception because o1 does not need to produce sound reasoning in order to produce correct answers, let alone plausible ones. More likely it isn’t made to care about the correctness of its reasoning because it's not receiving reinforcement on the correctness of its inferential steps.
The original CoT paper used human evaluators to check the logic, so I’m guessing OpenAI did the same thing. Regardless of whether the evaluation was automated or done by humans, it's not clear whether the evaluation rubric instructed the evaluators to penalize bad reasoning even when the conclusion was correct, and how much these penalties were weighted relative to the penalty for an incorrect conclusion. I suspect the RL model is primarily reinforcing the conclusions rather than the arguments themselves, whereas a proper reward signal should be over the entire inferential chain. In fact, the inferential chain is really all that matters, because the conclusion is simply a step that accepts or rejects some equality condition between the question posed.
Another issue is that a lot of o1’s thoughts consist of vagaries like “reviewing the details” or “considering the implementation”, and it’s not clear how to even determine if these steps are inferentially valid.
Yeah, I'm sure this is not a typical example of his writing style or exposition of the ideas he's advocated for over the bulk of his career.