Evaluating 2021 ACX Predictions

Zvi

Sources: Scott’s evaluations (includes original predictions), my buy/sell/hold post.

Remember: Evaluating Predictions in Hindsight

As a yearly tradition, when Scott Alexander comes out with his yearly predictions, I do a buy/sell/hold post, where I say what I would do if given the opportunity to trade against Scott’s probabilities, and outlining how I think about the questions.

Then, when the results come in, I look back at what happened, and evaluate the predictions from both of us in a holistic manner. This is that post.

Scott grades himself on calibration. Calibration is important, and if your calibration is off it is important to notice that and adjust, but it is a small part of making good predictions. When making up one’s own questions, there’s no numerical evaluation that tells you how you are doing, because you can choose easier or harder questions. One must look at reasoning. I’d love if Scott shared more of his reasoning on at least some questions, but this is still a very good exercise, so I certainly can’t complain.

Anyway, here we go. I list my original commentary, then my reaction now.

[FALSE] means did not happen, [TRUE] means did happen, [MU] means neither.

If I skip a question, it’s because I didn’t have anything to say about it due to it being a personal matter or other issues involving private information.

1. Biden approval rating (as per 538) is greater than 50%: 80% [FALSE]

Biden’s approval rating is clearly steady. There’s always some honeymoon effect, but it would take a surprising event to send it that far down. 80% seems like it’s in the ballpark. Hold.

This was a bad prediction. My reasoning was that Democrats outnumber Republicans, and in today’s partisan age would approve pretty much no matter what, and that’s simply wrong. Democrats noticed things not going great on many fronts, and responded accordingly, and also there’s a traditional slump around now when voters realize the big promises from the campaign are not going to happen. I do think there was some chance that the original hypothesis was right, but I put way too much weight on it. A chance of 50% or so here seems reasonable, with the default being ‘you can’t actually please the whole coalition at once and often there’s still a pandemic and people will blame you for it.’

2. Court packing is clearly going to happen (new justices don’t have to be appointed by end of year): 5% [FALSE]

Indeed do many things come to pass, and ‘clearly going to happen’ isn’t a clear definition. If this is ‘legislation expanding the size of the court has passed’ then this seems high to me because not only does it seem unlikely Biden gets 50 votes on this, it seems unlikely he’d get them this quickly with so much else on the agenda, but also they’re talking about it, Biden’s already gone gangbusters on giant bills and 5% isn’t that high. So I can’t disagree strongly. Hold.

On reflection 5% does seem like the right ballpark here. Good (but easy) prediction.

3. Yang is New York mayor: 80% [FALSE]

Yang is only at 69% on PredictIt, although PredictIt tends to be too low on favorites in this range (although not enough to justify trading on its own). He’s ahead, but he’s prone to rather silly things and there’s plenty of time to mess up, so I think I’m with PredictIt on this and I’ll stick with 70%. Sell.

Selling to market is never a huge mistake but this feels like a situation where the market was making a dumb prediction and one should call them on it in theory, even if the carrying and transaction involved don’t justify betting. Yang was not a proven politician, his previous success was a wedge campaign, he was a strange match for New York and there were a lot of candidates and a lot of time left in a very strange year. Logic was fine, but still should have sold lower, to at least 60%.

4. Newsom recalled as CA governor: 5% [FALSE]

Depending on what counts as ‘recalled’ this is either at least 10%, or it’s damn near 0%. I don’t see how you get 5%. Once you get an election going, anything can happen. Weird one, I’d need more research.

This is hard to evaluate even in hindsight. Was Newsom close to losing, or was it a ‘still had all these’ situation and not close at all? Was getting that far unlucky or inevitable? My guess is that there were enough ways this could have gone wrong that this should have been in the 10%-20% range, so this prediction was lousy, but not terrible since the odds were very much in his favor.

Note that I did bet on this at PredictIt and make money, although I’m not convinced I had alpha.

5. At least $250 million in damage from BLM protests this year: 30% [FALSE]

With the verdict in, I don’t see what causes this kind of damage in the next 7 months. That doesn’t mean it can’t happen, but $250 million is a lot. I’m selling this down at least to 20%. Sell.

Nothing happened, despite some potentially provocative things happening, so it seems clear that selling was right, the question is if it was sold far enough, and my answer is probably not. With Biden in office appetite for this kind of thing was always going to be low, so I’ll evaluate the right answer to the 10%-15% range.

6. Significant capital gains tax hike (above 30% for highest bracket): 20% [FALSE]

I don’t think you need to get to 30% to be significant, but that’s not the question. The question is how likely this is, which is asking how likely all 50 senators go along with it. Given there’s already been mention of specifically 29.6% as a Shilling point, I’m guessing 20% is about right. Hold.

Were Manchin and Sinema ever going to sign off on such a thing? Probably not, but if priorities had been different I don’t think it was impossible. I’m all right with 20% but probably is a bit high and 15% would have been better.

7. Trump is allowed back on Twitter: 20% [FALSE]

I’m selling this to 10%. Why would Twitter do this? They’ve already paid the price they’re going to pay, and it’s not like Trump mellowed out.

Good sale. If anything they’re going the other way and banning more people for worse reasons.

8. Tokyo Olympics happen on schedule: 70% [TRUE]

I’m more at the Metaculus number of 80% provided slipping a few days doesn’t count as failing, I’m leaving it alone if a postponement of any length counts because random stuff does happen. I think Japan really, really wants this to happen and there’s no reason for it not to. Buy.

I think the way it played out strongly reinforces that Japan indeed really wanted it to happen and it would have taken quite a lot to stop them. But given that it was plausible that ‘quite a lot’ could have happened anyway, hard to fault stopping at 80%. Calling this one a good prediction.

9. Major flare-up (significantly worse than anything in past 5 years) in Russia/Ukraine war: 20% [FALSE]

It’s definitely a thing that can happen but there isn’t that much time involved, and the timing doesn’t seem attractive for any reason. I’ll sell to at least 15% on reasonable priors.

Given that the crisis is happening now it could have happened earlier, but the whole thing still strikes me as standard posturing and negotiations and I continue to not expect any actual fighting, but occasionally such situations get botched and fighting happens. Reasonably happy with 15%.

10. Major flare-up (significantly worse than anything in past 10 years) in Israel/Palestine conflict: 5% [FALSE]

The last ten years have been unusually quiet here, so it arguably would take very little to count as a major flare up here, but vagueness of what ‘major’ means makes it tough. With a tighter definition I might buy to 10%, if it’s wide enough maybe a little higher. Otherwise, hold.

Didn’t learn much other than we didn’t see anything this year, probably fine but easy.

11. Major flare-up (significantly worse than anything in past 50 years) in China/Taiwan conflict: 5%

Every war game of this situation I’ve read says that it’s a major disaster with no winners, even if China ‘wins,’ so it’s not in China’s interest to push on this, and it seems like it will have better spots in the future. 50 years is a long enough window that this has to be a shooting war. I do worry about this scenario but I think 5% is still high, and I’m selling to 3% if I’m truly risk-neutral. Given I’m very short China/Taiwan conflict due to being alive and liking nice things, I wouldn’t actually bet here, but worth noting my prior is lower.

A lot of people said they were worried but it was all on previously known priors, there wasn’t any substantial new crisis beyond ‘USA looks weak so maybe they’ll try something?’ I continue to think that no, they are highly unlikely to try something and the yearly risk is very low, but one can’t update that much on one year’s evidence.

12. Netanyahu is still Israeli PM: 40% [FALSE]

This is the PredictIt line for him on 6/30, and Scott’s predicting this out to January 1. I’m guessing that he didn’t notice? Otherwise, given how many things can go wrong, it’s a rather large disagreement – those wacky Israelis have elections constantly. I’m going to sell this down to 30% even though I have system 1 intuitions he’s not going anywhere. Math is math.

Given my lack of knowledge of Israeli politics I should have trusted the market somewhat more and sold somewhat lower to reflect the time difference. I overvalued my ‘corrupt guy likely to find way to stick around’ gut.

13. Prospera has at least 1000 residents: 30% [FALSE]

Hold/pass on the principle that Everything I Know About Prospera I Learned From Scott’s Article and he’s thought about this a ton more than I have.

Passing when you know nothing is always good strategy, but I did have one piece of information, which is that Scott was intellectually invested in Prospera and thus likely to be somewhat high on this. Should have sold a bit even though I was blind.

14. GME >$100 (Currently $170): 50% [TRUE]

That’s an interesting place to put the line. GME clearly has upside skew, where it could randomly go to $500 again, whereas it could easily retreat to a reasonable fundamentals price like $30, at least until it gets to sell stock and becomes more valuable for that reason. So what do we think about its chances here? Early in this whole thing I’d have said no way, but here we are three months later and it’s sticky, so how does one now apply Lindy to this phenomenon? If it hasn’t ended by now, why does it have to? So my honest answer is I have no idea, and 50% here seems at least sane, so I’m not going to touch it, but I should be very worried I’m anchored. Then again, I’m pretty sure I’d have sold anything substantially higher than this down to at least 60%, and bought up to at least 40%, so it’s the right ballpark I think?

I still have no idea what’s going on with GameStop. Presumably the actual result is important information. It’s literally at 100.15 as I type this, although it was about 148 at year’s end, so I’m going to conclude 50% was not that far off, since the distribution is skewed, and move on.

15. Bitcoin above 100K: 40% [FALSE]

16. Ethereum above 5K: 50% [FALSE]

Yearly reminder that this is absurdly bullish on crypto, because the bare minimum way to fufill these means crypto is fairly priced now. I’d sell Bitcoin down to 25%, Etherium down to 30%, and then hedge by buying both of them.

That would have been a great trade. Of course, it was also very easy to find.

17. Ethereum above 0.05 BTC: 70%

This is outright saying ETH is likely to outperform BTC, so this is Scott’s biggest f*** you to the efficient market hypothesis yet. I’m going to say he’s wrong and sell to 55%, since it’s currently 0.046, and if it was real I’d consider hedging with ETH.

On reflection I was too hasty to assume this had to be 50% or lower, because the two assets are importantly different and so the distribution could be skewed. For example, there could be worlds where ETH goes to zero or very low while BTC is mostly fine, whereas the opposite is almost never true, perhaps. But after thinking twice, it works the other way. ETH is the riskier asset, and it should outperform less than 50% of the time if things are fair. I do agree that ETH was the better buy at the time, though, since no one here actually believes the EMH. I think the sale was right, but that the hedge was definitely called for.

18. Dow above 35K: 90% [TRUE]

19. …above 37.5K: 70% [FALSE]

It’s currently at 34K so saying it’s 90% to be up over the next 7 months is… complete insanity? It’s twice as likely to be between 35K and 37.5K than below 35K at all? Rather than give a probability, I’ll simply say I’m slamming the arbitrage. Sell the 90% a lot and buy index funds and/or options, ignore the 70% cause it’s not as good.

So it did land at 36.5K, right in the middle of Scott’s range, which has to be some evidence that it can really be this easy. It also means I made like 20% on my hedge, so I made a bunch of money from the arbitrage. These predictions were so over-the-top bullish that I’m very curious what was going on, but I sincerely hope Scott was long and using leverage.

20. Unemployment above 5%: 40% [FALSE]

It’s currently officially 6% and presumably will fall with recovery. They’re pumping in a ton of money, and it was 4% before things got bad, but also a lot of people got a lot of money and there will be a lot of disruption and a lot of money illusion and grumbling. I’m guessing (very naively) that this isn’t going to happen that fast this reliably, and buying to 50%.

It made it to 3.9% in December, after being 4.2% in November, just beating the Omicron rush and also the end of the year, so this was very close and 50% seems like a reasonable prediction in hindsight given that we got something that seems baseline-like and that had the stock market giving very strong returns... if this was about a 4% threshold. It was a 5% threshold, so this wasn't close, and I was clearly high. My guess is Scott was high as well, there was a ton of pent up demand for workers and the way unemployment is measured this was going to end up pretty low.

21. Google widely allows remote work, no questions asked: 20% [TRUE]

EDITED VERSION 4/27: It turns out that Google has explicitly said they will not do this, which I didn’t know/remember and counts as missing information, so editing this to be very low. They might back down, but the announcement was recent (March 2021) so something would have to go very wrong to explicitly back down. I’ll go to 10% on reflection (my instinctive reaction was 5%) on the basis of there being some sort of new variant forcing their hand.
[Original version: I don’t know about the situation at Google but assuming they currently still do this I think it’s more likely than this that they keep doing it. If this is a blind prediction and Scott knows nothing I don’t know, I’d buy to 30%.]

The original prediction seems good. The edited version seems quite bad. Yes, they explicitly said they weren’t going to do this, but I believed them? This much? Under this much uncertainty? Seriously, I need to be smarter than that. Worst prediction of the lot so far, by a wide margin.

22. Starship reaches orbit: 60%

Yeah, no idea. Hold.

This is another Scott prediction of something cool but not all that precedented, so again should have sold a little.

23. Fewer than 10K daily average official COVID cases in US in December 2021: 30% [FALSE]

This is a bad line. If we get things under control everywhere, it will be under 10K, and we’re vaccinating enough to get close to Israeli levels with plenty of time to spare. I’m buying this to 70%, and if someone tried to ‘take it away’ by buying it from me, I’m having none of it.

Well, whoops, the number was rather larger than that, and would have been without Omicron, but the question is the logic. If there was no Delta or Omicron, would we have wound down and ended this? I think the answer is probably. So the key question is, what probability should have been assigned to Delta or Omicron? That’s where I screwed this up, for same central reason I screwed up the Google question. I didn’t put enough weight on that. I still think 30% was too low here, but 70% was aggressive. My guess is I should have been closer to 40%-50%, but I’m still not sure how to think about potential new variants.

24. Fewer than 50K daily average COVID cases worldwide in December 2021: 1% [FALSE]

Yep, that’s right, hold. Not enough vaccines.

Righto.

25. Greater than 66% of US population vaccinated against COVID: 50%

It’s at 42% now. Israel stalled lower than this (in the 50s) so we might hit a wall that’s hard to break. I think we’re favorites so I’ll buy to 60%, but it could go either way. Note that because of children this will play a lot stronger than it might sound.

More than 66% got their first shot but complete vaccinations ended up around 62%. That difference remains weird to me, but this definitely could have gone either way for a variety of reasons. Presumably the buy to 60% was bad but I don’t think it was terrible.

26. India’s official case count is higher than US: 50% [FALSE]

Buy to 80% before I even start thinking, probably willing to go higher still on reflection. I’m confused how this got here.

Why was I willing to do this, on this little thinking? Presumably because this was April, exactly when things looked about to end, but India wouldn’t have sufficient vaccinations and has several times the population. So once again this is the same error.

27. Vitamin D is generally not recognized (eg NICE, UpToDate) as effective COVID treatment: 70% [TRUE]

EDITED VERSION 4/27: I updated a lot on Scott being at 30% for this (e.g. 70% for this being recognized) in the original, and moved it to 50%. With Scott at 70% instead, we’re much closer, but I think I still want to nudge a little higher and buy this to 75%, instead of moving 30% to 50%. This is a sign of how much I’m reluctant to move a reasonable person’s odds in this type of exercise; if you’d asked me before seeing Scott’s number, I’d have said recognition is very unlikely, and put it at something like 85%-90%, and my true probability is still likely 80% or so.
[Original when I thought Scott had this reversed: Vitamin D is good and important, you should be taking it, but I’m skeptical that such sources will recognize this in the future if they haven’t done so by now. Conditional on (I haven’t checked) the sources that matter not having made this call yet, I’d sell it to 50%, while saying that I definitely would use it to treat Covid if I had the choice. ]

This was never going to happen, these sources have no interest in doing anything about the lowest hanging of the fruits. As a result, people are dying, but they don’t care. We already mostly knew that. The 85%-90% before adjusting for Scott was right.

28. Something else not currently used becomes first-line treatment for COVID: 40% [TRUE]

I’ll sell this to 25%, people are slow to adapt to change even when it happens, assuming ‘not currently used’ means not used at all rather than not first-line.

I assume this was evaluated to true because of Paxlovid. In practice it’s still false if that’s what is being counted, but true in the sense of first best legal option. Paxlovid seems even now like we got pretty lucky to find it and have it be that over-the-top amazing, and it was approved only days from the end of the year, so I’m not too upset about losing this one if that’s the only reason I lost. If Scott is counting something else, it was a pretty bad prediction, and in general probably should have been higher.

29. Some new variant not currently known is greater than 25% of cases: 50% {TRUE]

Depends what we mean by ‘known’ and what counts as a fully new variant, but my guess is this should be higher. Probably buy it to 60%, given there’s still a lot of time for this to happen.

I notice I am confused now. If I knew to buy this to 60%, then what are my other predictions here doing? I think this caught me at a strange time when things looked the best they’ve looked the whole pandemic, but still, that’s not an excuse.

30. Some new variant where no existing vaccine is more than 50% effective: 40% [TRUE]

I assume this means versus infection only. If it’s versus death, slam the sell button even more. If it’s versus infection only, I’d still sell this down to 25%, assuming this has to apply to Moderna/Pfizer.

If it’s versus death it would evaluate to false, so Scott meant infection, and then Omicron happened. Note that with a booster you’re back over 50% effective, and that’s now considered full vaccination via an existing vaccine (and was before Omicron), so I don’t think this grades all that obviously, and I’d evaluate it to [MU].

31. US approves AstraZeneca vaccine: 20% [FALSE]

If it does happen it will be after it matters, since it already doesn’t matter, so I’m not sure why we would do it, but I don’t have a good model here. 20% seems low enough that I don’t want to go lower.

Should have gone lower.

32. Most people I see in the local grocery store aren’t wearing a mask: 60% [FALSE]

Buy to 75%. Scott is in Berkeley, so I’m optimistic that the area will be sufficiently vaccinated to be very safe by year’s end. It then comes down to, just how crazy are all you people now that it’s over, and my guess is not this crazy all that often. But often enough that I’ve still got the one in four open.

Presumably Delta had me losing this anyway, so I can’t use Omicron as an excuse, but it’s more of the same mistake.

38. No new residents at our housing cluster: 40% [TRUE]

39. No current residents leave our housing cluster: 60% [FALSE]

My guess is Scott is going to be underconfident on this, and also that he’s not taking into account how late it is in the year, so I’m going to do the ‘blind bet’ thing and sell #38 to 35% and buy #39 to 65%, but not push it.

Careful, Icarus. Got burned on these, but I do still like the logic.

53. At least seven days my house is orange or worse on PurpleAir.com because of fires: 80% [MU]

Note that Scott is only saying he’s 50% to leave Berkeley for a month. I’m going to hold this but also point out that if you can’t breathe the air maybe it’s time to check out the air somewhere else.

I stand by my recommendation.

60. There are no appraisal-related complications to the new house purchase: 50% [TRUE]

Buy to 60% based on what I’ve learned about appraisals, assuming complication means a meaningful one, and assuming Scott’s #61 prediction isn’t nuts. I won’t go further than this due to asymmetrical information disadvantage.

Looking back I continue to like this buy, but have no new info.

61. I live in the new house: 95% [TRUE]

Sell to 90% on the ‘indeed to many things come to pass’ platform. Probably, but let’s not get too confident here.

Lost this one but I do think I got odds.

62. I live in the top bedroom: 60% [FALSE]

Buy to 65% because this feels like a place where if Scott’s thinking it’s a favorite, it’s a bigger favorite than he thinks, but again information issues.

I am curious how this ended up not happening.

63. I can hear / get annoyed by neighbor TV noise: 40% [FALSE]

Sell to 30% but the fact that it’s here at all makes me wonder so I’ll stop there given information issues. I’ve literally never had this happen in a house, and also there are almost no TVs in Berkeley that are ever on in the first place, so I’d be curious to hear more.

I wonder if I should have been lower here, given (again) that I’ve never seen this happen.

64. I’m playing in a D&D campaign: 70% [FALSE]

I’ll trust Scott on this one and hold.

I’m guessing Covid situation hurt his chances here, but also in general predictions like this tend to be overconfident. Would be interesting to look back and check Scott’s calibration by reference class (e.g.: politics/economics, health/Covid, personal doing stuff that isn’t writing, writing-related accomplishments, , personal other, etc.)

65. I go on at least one international trip: 60% [TRUE]

I’m guessing this underestimates the number of things that can go wrong, but Scott seems too skeptical about pandemic outcomes, which cancels that out, so I’ll hold.

Given this happened despite pandemic outcomes, my skepticism of his intentions was wrong, and this was a bad hold.

66. I spend at least a month living somewhere other than the Bay: 50% [FALSE]

I wonder how much this is based on the whole ‘PurpleAir says you literally can’t breathe the air’ issue, and how much is travelling, and without more information I don’t think I can get involved, so staying out.

On reflection I should have sold a bit on the ‘people overestimate probability of making big changes’ principle, but only a bit. Scott did travel a lot, so presumably that didn’t count.

67. I continue my current exercise routine (and get through an entire cycle of it) in Q4 2021: 70% [TRUE]

People tend to be pretty overconfident in such matters, so I’m tempted to sell on general principles, but I do think the public prediction will help somewhat. I guess sell a tiny bit to 65% but keep it light.

This one I did do the ‘sell a little’ thing and it didn’t work out, but I stand behind the principle.

68. I meditate at least 15 days in Q4 2021: 60% [FALSE]

69. I take oroxylum at least 5 times in Q4 2021: 40% [TRUE]

Don’t feel like I have a good enough handle here to do anything beyond hold.

Still have no idea what oroxylum is. Probably should have sold meditation a bit.

70. I take some substance I haven’t discovered yet at least 5 times in Q4 2021 (testing exempted): 30%

That seems aggressive. Haven’t discovered yet seems a lot harsher than haven’t tried yet. I’ll sell to 25% but again, the prediction must have come from somewhere.

Good sale here, I think.

71. I do at least six new biohacking experiments in the next eight months: 40% [FALSE]

This seems like a lower bar to me by a lot than #70, so I’ll hold.

Happy with the hold decision.

73. The Twitter account I check most frequently isn’t one of the five I check frequently now: 20% [FALSE]

I don’t think it’s that likely there will be a big new Twitter account at the top unless Scott is using Twitter for Covid a lot. Assuming his top 5 are mostly not that, I’ll sell this to 15%.

Twitter is mostly the same old Twitter so I doubt there was much danger on this one. The account I check most often is actually different now, it’s @BNONews, but that’s because I’m using Twitter to manage the news aggressively.

74. I make/retweet at least 25 tweets between now and 2022: 70% [FALSE]

I think I bet against a similar thing last time and lost by a wide margin. My guess is this is if anything a little underconfident, since 25 is not that many, so maybe buy to 75%.

I notice I am surprised that I lost this one, but I did definitely lose it. For whatever reason, Scott does not like the Twitter except for (usually very creative) horrible puns, and I was overconfident that he’d be drawn into doing more. Bad prediction.

75. Lorien has 100+ patients: 90% [TRUE]

76. 150+ patients: 20% [FALSE]

77. 200+ patients: 5% [FALSE]

78. I’ve written at least ten more Lorien writeups (so total at least 27): 30% [FALSE]

I’m somewhat sad that #78 is sitting so low, but I don’t feel like I have enough info to disagree with it. #75 is basically ‘does Lorien exist’ since there’s no way Scott either loses or fires his patients, but the 150+ and 200+ thresholds mean taking more, and I’m guessing that won’t happen. It does seem like 70% is a lot of space between 100-149 patients, so I’d probably split the difference and go to 85% and 25% to open up things a bit. The downside represents ‘Lorien experiment fails and Scott transitions to something else’ and the upside seems plausible too. I’ll also go to 10% on 200+ patients if ‘second doctor joins practice’ is a way to get there, hold if not.

This did land in the middle so I definitely lost by shrinking the middle. I don’t have the story, but presumably Scott continued to not want new patients but did want to continue old ones, and that reliably lands us within the window. It still looks like a lot of probability on a narrow window, but my guess is Scott’s prediction was better.

84. I have switched medical records systems: 20% [MU]

85. I have changed my pricing scheme: 20% [FALSE]

Switching EMRs is a bitch and 20% sounds like a lot, sell #84 to 15%. On the pricing scheme, that’s entirely dependent on how much Scott is willing to sacrifice to see it through, so if he says 20% I believe him.

Mu indicates selling to 15% was likely a mistake, but not enough information to say since Scott doesn’t offer details.

86. ACX is earning more money than it is right now: 70% [TRUE]

I have a hard time believing that ACX revenue won’t increase so long as ACX keeps up its quality and quantity levels. I’ll buy to 80%.

My jaw would have been on the floor if this had turned out to be false, so it was mostly a question of whether Scott would quit, and I don’t think that was all that likely. I like this one.

90. There is another article primarily about SSC/ACX/me in a major news source: 10% [FALSE]

I’ll buy this to 25%. Scott’s interesting, his relationship to the press is interesting, there are a lot of major news sources, and also this prediction might give people ideas.

There either is a post or there isn’t, but when I think about reference classes, the chance of this happening in 8 months was not as high as 25%, so bad prediction. I’m guessing 10% was still slightly low.

91. I subscribe to at least 5 new Substacks (so total of 8): 20% [FALSE]

Substack costs can add up fast, so it seems reasonable that going to this many wouldn’t be that likely, but with a lot of revenue it makes sense to be in touch with the greater blogosphere. I’m going to buy this to 30%.

It’s obviously Scott’s choice, as there were plenty of good options to choose from, probably should have trusted him more on reflection.

92. I’ve read and reviewed How Asia Works: 90% [TRUE]

Cool. Presumably this means he’s mostly done, I’ll be comparing this to my own review. Hold.

Points taken away for not actually finding the time to do the comparison after he put out the post, but 90% seems solid.

93. I’ve read and reviewed Nixonland: 70% [FALSE]

Also cool, possible this causes me to read it. Hold.

Me read books? In this economy? Man, that would be nice.

94. I’ve read and reviewed Scout Mindset: 60% [TRUE]

Buy to 70%, it would be pretty weird for Scott not to review this but I have to update on it only being 60%. I plan to read and likely review it as well, once Covid dies down or I otherwise find the time.

On reflection I think 70% was low, this was one of those inevitable book reviews. I haven’t done my version yet but eventually I suppose I should?

95. I’ve read and reviewed at least two more dictator books: 50% [TRUE]

Two is a lot here, so presumably this is important to Scott. I’ll sell it a bit down to 45% because two is tough, but mostly trust him.

This was indeed important to Scott, as it turned out. Mildly sad I sold a bit here.

96. I’ve started and am at least 25% of the way through the formal editing process for Unsong: 30% [FALSE]

97. Unsong is published: 10% [FALSE]

The implication here is that it’s about the halfway point in difficulty to get a quarter of the way through editing (about 1/3 chance of each step). My understanding is that publishing delays are often very long, so unless he plans to self-publish, no way this happens in 2021, but I can totally see a self-publishing for Unsong, so I’ll leave these be because there are too many variables I don’t have a good handle on.

Sad this hasn’t happened, not much else to say.

99. [redacted] wins the book review contest: 60% [FALSE]

There might be a best entry but these things seem more random than that? I’ll sell to 50%.

On reflection it definitely felt unpredictable who won.

100. I run an ACX reader survey: 50% [TRUE]

101. I run a normal ACX survey (must start, but not necessarily finish, before end of year): 90% [FALSE]

Not sure how these two can coexist, so going to wait them out pending clarifications if any.

I suppose they can, but never got clarification. Shrug.

102. By end of year, some other post beats NYT commentary for my most popular post: 10% [FALSE]

I’m guessing such events are slightly less rare than this? But that was a really big event, so I’ll probably still hold.

I don’t think not seeing the event was much evidence on its frequency. I’d still make it a favorite to happen eventually.

103. I finish and post the culture wars essay I’m working on: 90% [TRUE]

104. I finish and post the climate change essay I’m working on: 80% [TRUE]

105. I finish and post the CO2 essay I’m working on: 80% [TRUE]

Good luck, sir, and may the odds be ever in your favor. I don’t think I’m in a position to second guess, if anything I’d be bullish on #104 and #105, maybe a little bearish on #103, but very small.

Overall good posts, but not his best.

106. I have a queue of fewer than ten extra posts: 70% [MU]

Sell to 60% because if I was Scott I would totally end up with a much, much larger queue (and I do in fact have a truly gigantic one to the extent I have a queue at all).

Presumably he’s not sure what it means anymore for something to be in the queue.

107. I double my current amount of money ($1000) on PredictIt: 10% [FALSE]

#107 is all about how much Scott is willing to risk. You can make this at least 40% by ‘betting on black.’ So I can’t really say, but my guess is Scott messes around enough that this can be bought to 15%.

Didn’t happen, and we don’t have details, but I’d buy this again.

108. I post my scores on these predictions before 3/1/22: 70% [TRUE]

This is one of those weird full-control meta-predictions. I think Scott will be that much more likely to post in late February and I’ll bump it to 75%, but there’s a bunch of ways this can fail.

Feels like this was at least 75% likely, but that’s not really much of an evaluation.

Overall

One could do various mathematical assessments, but as I’ve said in the past, I don’t think that is where the biggest value lies. It’s more about the logic. How did we do?

Unfortunately, I think it’s safe to say that I am rather unhappy with my performance here.

There are essentially three sections: Non-Covid world stuff, Covid stuff and Personal stuff.

On the Non-Covid world stuff I think this is a good but not great performance. There are a few big mistakes, missed some opportunities, but mostly seems solid.

On the Covid stuff, this was a disaster. It was a correlated disaster, in the sense that Delta (and later Omicron) wrecked the whole model I was using and made my predictions here look stupid. In addition to looking stupid, they mostly actually were stupid as well. I gave reasonably high probabilities for new variants, and then didn’t think through the implications from those probabilities.

It’s important to own one’s mistakes in spots like this. In many ways and spots, I’ve been in front of the curve and made very good predictions. But in other places, not so much, and I’ve made mistakes. April 2021 was Peak Overly Strong Optimism on my part, and I made bad predictions on that basis because I wasn’t thinking about the right questions. I do think that if we were still dealing with Alpha, we’d have gotten the good scenarios, but the thinking about variants wasn’t consistent or coherent here.

That’s something to keep in mind going forward as well. I have a clear idea of where things are likely headed if new variants don’t change the outlook, but new variants are always a threat. I did a good job responding once they were known, but a much less good job with the possibility before they were known, and that matters for one’s plans. I still think that a variant of Omicron is likely to not pose that big a threat, but I haven’t looked into that as much as other aspects, and it’s an important question.

Finally, there’s the personal stuff, where I’m mostly betting on Scott’s contextual calibration, and it’s hard to know what the right answers are. There are some places I’m happy with my calls, a few places I’m upset, but mostly there isn’t much one can conclude here and I’m left thinking I could have done better.

That gives me an overall lousy grade for this round of predictions. The numerical evaluations Scott listed showed me doing relatively well (beating the market is tough especially when you don’t see a lot of the markets) but they don’t include the whole cluster of horrible Covid predictions, which I think were my worst substantial Covid predictions of the whole pandemic.

Hopefully I, and the world, can do better in 2022.

[-]Pattern3y20

A chance of 50% or so here seems reasonable, with the default being ‘you can’t actually please the whole coalition at once and often there’s still a pandemic and people will blame you for it.’

Have we made more or less progress than you thought we would by now? (Or did you not take that into account?)

29.6% as a Shilling point

Is that an intentional spelling? Or is it [Schelling point]?

a major flare up here

(I didn't finish reading because this was getting to be like reading twitter, except dryer.)

[-]Zvi3y20

Not sure what you mean by progress in context of Biden's approval rating. Biden's probably accomplished less of his goals than I'd expected, but not too surprisingly less.

Was definitely supposed to be Schelling, misspelled in original.

'Progress' relating to the pandemic.

We have made less progress than I expected on that front, to be sure, and far less than Biden expected or promised, or than most people expected or felt they were promised.

[-]FireStormOOO3y10

How much of that is "variants appeared faster, more frequently, and were more important than expected" vs various "performance of person/instutution X was worse that expected"?

Edit: You mostly answered this in your post for yourself/your predictions, so I guess more asking if you think lots of other people had the same misconseption. I wasn't expecting the additional waves to continue to be this big a deal and I think my prediction failure was mostly on not expectign the virus to have this much fight in it.

[-]Dave Orr3y20

I mentioned this over at ACX as well, but the Google WFH thing is scored wrong, I think. Google's policy is that you can WFH while the pandemic is pandemicking, but once that's over you have to apply for remote status, and about 20% of people will get it/have gotten it.

Google has very much not said that workers can work from anywhere.

Source: am manager at Google. Obvious disclaimer: I'm speaking from my own understanding, not representing Google here.

[-]Zvi3y30

Same as Richard, I think this was graded correctly. The question is whether you can do it now, not whether you can do it indefinitely into the future, and right now I presume that you can due to Omicron (or as of 1/1). Your information does make me think my sale was a lot less bad, but I do think I still lost.

Having further parsed the comments at ACX I am now at MU. Questions do seem like they are asked.

[-]Richard Korzekwa 3y20

For what it's worth, what you're describing at Google is consistent with my reading of the prediction. I read it as "Google continues to widely allow remote work, no questions asked". If, as of the resolution date, Google was still allowing people to work from home without special approval, that sounds like "allowing remote work, no questions asked", even if it is not a permanent state of affairs. If there's some process for officially requesting permission to work from home, but it is approved by default, that still seems positive to me but not as clearly positive.

It is ambiguously-worded, so I can see why people are saying it's wrong, but to me the default reading resolves positive based on what Google employees are saying.

[-][anonymous]3y10

Evaluating to [TRUE] would make more intuitive sense to me as downstream prediction tasks would assume this prediction to mean "resistance to infection from being April-2021-definition fully-vaccinated".

[-]Richard Korzekwa 3y10

Shortly before this went up, I made a spreadsheet to do "various mathematical assessments" (brier scores in particular) on your predictions and Scott's. This was purely to satisfy my own curiosity, and to see if my very rough impression of which predictions were faring better was accurate. I did it in a pretty quick-and-dirty way, so it seems likely that I made mistakes. But if anyone else is curious, I'm sharing it here. Feel free leave comments or copy the sheet and do whatever with it.