I can only speak about those that are policy-preserving for a particular player (me).
(In retrospect, I should have maybe been optimising for fun-preserving instead, though policy-preserving is important if you spend a lot of time playing against people of drastically different skill levels, you don't want to unlearn how to play the game correctly when you're back to playing a normal game).
That being said,
Of these, I think the third option is probably the best for game balancing (as it'd be easy to dial in the penalty as needed), but the fourth would (at least for me) would probably be the most policy-preserving.
[1] https://en.wikipedia.org/wiki/Lanchester%27s_laws
I think you left out a very crucial point; perverse incentives. You want to handicap the stronger player, but in such a way that both players would not drastically change their policy from one they would normally use against a player of similar strength.
I've had this problem before in real time strategy games, (like age of empires 2, which is what I play), the gap between the skill floor and skill ceiling is massive, and so when playing with friends (especially 1v1s) there's often a large difference in skill between two players (making it not fun for either player) and the game doesn't have any nice built-in features to easily handicap.
One approach is to have the stronger player sit idle for the first X minutes of the game, but that leads to the weaker player forced to play much more aggressively and "rush", or build a bunch of defensive towers in the enemy base, or other such things, otherwise they lose the advantage of the stronger player going idle.
Similarly, barring the stronger player from using a particular class of units (say, cavalry) will again mean the weaker player will shift from the meta and not bother with making units that counter cavalry, and make more of that units that are countered by cavalry (the game has a rough rock-paper-scissors model of unit types).
The only policy-preserving handicap I can think of for would be to force the stronger player to not use hotkeys, and manually click every action with the mouse. I think I would not drastically change my behaviour with no hotkeys, I would just execute actions slower. The downside is that for the stronger player it's less fun, they're now fighting the brain-to-computer interface rather than a skilled opponent.
AIXI isn't isn't a practically realisable model due to its incomputability, but there's nice optimality results, and it gives you an ideal model of intelligence that you can approximate (https://arxiv.org/abs/0909.0801). It uses a universal Bayesian mixture over environments, using the Solomonoff prior (in some sense the best choice of prior) to learn, (in a way you can make formal) as fast as possible, as fast as any agent possibly could. There's some recent work done on trying to build practical approximations using deep learning instead of the CTW mixture (https://arxiv.org/html/2401.14953v1).
(Sorry for the lazy formatting, I'm on a phone right now. Maybe now is the time to get around to making a website for people to link)