If so much effort is being focused into AI research capability, I'd actually expect modally Agent-3 to be better than typical OpenBrain employee but completely incapable of replacing almost all employees in other fields. "capabilities are spiky" is a clear current fact about frontier AI, but your scenario seems to underestimate it.
I suppose I mean influence over politics, policy, or governance (this is very high level since these are all distinct and separable), rather than actually being political necessarily. I do think there are some common skills, but actually being a politician weighs so many other factors more heavily that the strategic skill is not selected on very strongly at all. Being a politician's advisor, on the other hand...
Yes, it's a special case, but importantly one that is not evaluated by Brier score or Manifold bucks.
I guess that's the main element I didn't mention: many people on this forum would suggest judging via predictive skill/forecasting success. I think this is an ok heuristic, but of course the long time horizons involved in many strategic questions makes it hard to judge (and Tetlock has documented the problems with forecasting over long time horizons where these questions matter most).
Mostly, the people I think of as having strong strategic skill are closely linked to some political influence (which implicitly requires this skill to effect change) such as a...
Nice post! As someone who spends a lot of time in AI policy on strategic thought and talking to people who I think are amongst the best strategic thinkers on AI, I appreciated this piece and think you generally describe the skills pretty well.
However, you say "research" skill by default does not lead to strategic skill, which is very true, but this varies drastically depending on the type of research! Mechanistic interpretability, in fact, appears to me to be an example of a field which is so in the weeds empirical with good feedback loops, that it makes i...
This is an intuition only based on speaking with researchers working on LLMs, but I think that OAI thinks that a model can simultaneously be good enough at next token prediction to assist with research but also be very very far away from being a powerful enough optimizer to realise that it is being optimized for a goal or that deception is an optimal strategy, since the latter two capabilities require much more optimization power. And that the default state of cutting edge LLMs for the next few years is to have GPT-3 levels of deception (essentially none) and graduate student levels of research assistant ability.
I don't think it's odd at all - even a terrible chess bot can outplay almost all humans. Because most humans haven't studied chess. MATH is a dataset of problems from high school competitions, which are well known to require a very limited set of math knowledge and be solveable by applying simple algorithms.
I know chain of thought prompting well - it's not a way to lift a fundamental constraint, it just is a more efficient targeting of the weights which represent what you want in the model.
...It really isn't hard. No new paradigms are required. The proo
I mean, to me all this indicates is that our conception of "difficult reasoning problems" is wrong and incorrectly linked to our conception of "intelligence". Like, it shouldn't be surprising that the LM can solve problems in text which are notoriously based around applying a short step by step algorithm, when it has many examples in the training set.
To me, this says that "just slightly improving our AI architectures to be less dumb" is incredibly hard, because the models that we would have previously expected to be able to solve trivial arithmetic problems if they could do other "harder" problems are unable to do that.
I happened to be reading this post today, as Science has just published a story on a fabrication scandal regarding an influential paper on amyloid-β: https://www.science.org/content/article/potential-fabrication-research-images-threatens-key-theory-alzheimers-disease
I was wondering if this scandal changes the picture you described at all?
There's also the possibility that a CCP AGI can only happen through being trained on Western data to some extent (i.e., the English language internet) because otherwise they can't scale data enough. This implies that it would probably be a "Marxism with Chinese characteristics [with American characteristics]" AI since it seems like that just raises the "alignment to CCP values" technical challenge difficulty a lot.