(Preface: I know about go engines, less about chess ones). I don't think this experiment will forecast the impact of AI without further addressing neural networks. In particular,
The strongest engines are MuZero-like engines that use neural network heuristics trained on self-play. Training such large networks on commodity CPUs is implausible, much less on 20 year old hardware.
Given trained networks, new engines will almost always beat old ones. For example in go, the open source engine KataGo running on a single core CPU, doing only one playout per move has a 5d rank (2300 ELO) on the Online Go Server. Old engines can't reach this rank even on large computing clusters.
The large improvement in strength is mainly attributable to neural networks being practical with new hardware, not new algorithms. The algorithms for both new and old engines are based on comparatively old literature (Monte Carlo Tree Search, decision theory, etc.)
Conclusion: Most of what you want to measure comes down to neural network training. The training framework is not directly comparable or backwards-compatible with old techniques, so the experiment formulation has to address this.
(Preface: I know about go engines, less about chess ones). I don't think this experiment will forecast the impact of AI without further addressing neural networks. In particular,
Conclusion: Most of what you want to measure comes down to neural network training. The training framework is not directly comparable or backwards-compatible with old techniques, so the experiment formulation has to address this.