I think the issue presented in the post is that the Solomonoff hypothesis cannot be sampled from, even though we can determine the probability density function computationally. If we were to compute the expected value of the reward based on our action, we run into the curse of dimensionality: there is a single point contributing most of the reward. A Solomonoff inductor would correctly find the probability density function that h(s_2)=s_1 with high probability.
However, I think that if we ask the Solomonoff predictor to predict the reward directly, then it will correctly arrive at a model that predicts the rewards. So we can fix the presented agent.
I think this modeling assumes Russia can escalate conventionally and that the conventional NATO response would be perceived as escalatory by Russia even if it destroyed their army. Russia can't escalate conventionally: they have run out of tanks and men.
Ukraine is already doing a great job of destroying Russia's army with NATO weapons, and Russia hasn't used nukes to stop it. In the aftermath of usage, increasing the rate of that destruction is just more of the same. Even if Russia would like to escalate more, they need to actually stop the army from liberating Ukraine, and nukes are not good at that. Jumping up the ladder to attacking NATO territory is unlikely, especially if NATO views edge cases as reasonable self-defense not worth starting something over (e.g. firing on a radar targeting a Russian airplane from Poland). The threat to nuke NATO isn't credible, and Russia has downplayed its threats as Ukraine succeeds more, saying things like it doesn't know yet where the border it is defending is.
I think there are a great many actions NATO could take that would hurt Russia to the point of quick victory and are unlikely to be worth war, without loosing essential credibility that would force Russia into nuclear escalation.
While these all go beyond what has happened so far, they aren't the kind of provocation that invites more nukes, and Russia can't really use nuclear weapons to stop them because nuclear weapons are not good at stopping these things. There's a reason we don't have tactical nukes in the US, and that's because conventional smart weapons can do the same job better.
Ultimately I think Russia is attempting to use the nuclear threat because they can't use nukes to get what they want, and I don't think they can threaten to stop Ukraine and assistence credibly.