Does it make sense to calculate the score like this for events that aren't independent? You no longer have the cool property that it doesn't matter how you chop up your observations.
I think the correct thing to do would be to score the single probability that each model gave to this exact outcome. Equivalently you could add the scores for each state, but for each use the probabilities conditional on the states you've already scored. For 538 these probabilities are available via their interactive forecast.
Otherwise you're counting the correlated part of the...
Looking at states still throws away information. Trump lost by slightly over a 0.6% margin in the states that he'd have needed to win. The polls were off by slightly under a 6% margin. If those numbers are correct, I don't see how your conclusion about the relative predictive power of 538 and betting markets can be very different from what your conclusion would be if Trump had narrowly won. Obviously if something almost happens, that's normally going to favor a model that assigned 35% to it happening over a model that assigned 10% to it happening. Both Nate Silver and Metaculus users seem to me to be in denial about this.
Both Nate Silver and Metaculus users seem to me to be in denial about this.
I think this is a strawman. Nate Silver says that his model has good calibration across its lifetime, and is in fact slightly too conservative. I agree that, if the only two things you consider are (a) the probabilities for a Biden win in 2020, 65% and 89%, and (b) the margin of the win in 2020, then betting markets are a clear winner. But how much does that matter? (And the article you linked doesn't mention markets at all.)
There's a lot of debate about how good the polls and 538 have been is this election in comparison to the betting markets. While it's hard to compare it when just looking at percentage for Biden winning, it would be possible to calculate the Briers score by looking at all US states. Did anybody do the math?