A chance of 50% or so here seems reasonable, with the default being ‘you can’t actually please the whole coalition at once and often there’s still a pandemic and people will blame you for it.’
Have we made more or less progress than you thought we would by now? (Or did you not take that into account?)
29.6% as a Shilling point
Is that an intentional spelling? Or is it [Schelling point]?
a major flare up here
a major flare up here
(I didn't finish reading because this was getting to be like reading twitter, except dryer.)
Not sure what you mean by progress in context of Biden's approval rating. Biden's probably accomplished less of his goals than I'd expected, but not too surprisingly less.
Was definitely supposed to be Schelling, misspelled in original.
We have made less progress than I expected on that front, to be sure, and far less than Biden expected or promised, or than most people expected or felt they were promised.
How much of that is "variants appeared faster, more frequently, and were more important than expected" vs various "performance of person/instutution X was worse that expected"?
Edit: You mostly answered this in your post for yourself/your predictions, so I guess more asking if you think lots of other people had the same misconseption. I wasn't expecting the additional waves to continue to be this big a deal and I think my prediction failure was mostly on not expectign the virus to have this much fight in it.
I mentioned this over at ACX as well, but the Google WFH thing is scored wrong, I think. Google's policy is that you can WFH while the pandemic is pandemicking, but once that's over you have to apply for remote status, and about 20% of people will get it/have gotten it.
Google has very much not said that workers can work from anywhere.
Source: am manager at Google. Obvious disclaimer: I'm speaking from my own understanding, not representing Google here.
Same as Richard, I think this was graded correctly. The question is whether you can do it now, not whether you can do it indefinitely into the future, and right now I presume that you can due to Omicron (or as of 1/1). Your information does make me think my sale was a lot less bad, but I do think I still lost.
Having further parsed the comments at ACX I am now at MU. Questions do seem like they are asked.
For what it's worth, what you're describing at Google is consistent with my reading of the prediction. I read it as "Google continues to widely allow remote work, no questions asked". If, as of the resolution date, Google was still allowing people to work from home without special approval, that sounds like "allowing remote work, no questions asked", even if it is not a permanent state of affairs. If there's some process for officially requesting permission to work from home, but it is approved by default, that still seems positive to me but not as clearly positive.
It is ambiguously-worded, so I can see why people are saying it's wrong, but to me the default reading resolves positive based on what Google employees are saying.
If it’s versus death it would evaluate to false, so Scott meant infection, and then Omicron happened. Note that with a booster you’re back over 50% effective, and that’s now considered full vaccination via an existing vaccine (and was before Omicron), so I don’t think this grades all that obviously, and I’d evaluate it to [MU].
Evaluating to [TRUE] would make more intuitive sense to me as downstream prediction tasks would assume this prediction to mean "resistance to infection from being April-2021-definition fully-vaccinated".
Shortly before this went up, I made a spreadsheet to do "various mathematical assessments" (brier scores in particular) on your predictions and Scott's. This was purely to satisfy my own curiosity, and to see if my very rough impression of which predictions were faring better was accurate. I did it in a pretty quick-and-dirty way, so it seems likely that I made mistakes. But if anyone else is curious, I'm sharing it here. Feel free leave comments or copy the sheet and do whatever with it.
Sources: Scott’s evaluations (includes original predictions), my buy/sell/hold post.
Remember: Evaluating Predictions in Hindsight
As a yearly tradition, when Scott Alexander comes out with his yearly predictions, I do a buy/sell/hold post, where I say what I would do if given the opportunity to trade against Scott’s probabilities, and outlining how I think about the questions.
Then, when the results come in, I look back at what happened, and evaluate the predictions from both of us in a holistic manner. This is that post.
Scott grades himself on calibration. Calibration is important, and if your calibration is off it is important to notice that and adjust, but it is a small part of making good predictions. When making up one’s own questions, there’s no numerical evaluation that tells you how you are doing, because you can choose easier or harder questions. One must look at reasoning. I’d love if Scott shared more of his reasoning on at least some questions, but this is still a very good exercise, so I certainly can’t complain.
Anyway, here we go. I list my original commentary, then my reaction now.
[FALSE] means did not happen, [TRUE] means did happen, [MU] means neither.
If I skip a question, it’s because I didn’t have anything to say about it due to it being a personal matter or other issues involving private information.
1. Biden approval rating (as per 538) is greater than 50%: 80% [FALSE]
This was a bad prediction. My reasoning was that Democrats outnumber Republicans, and in today’s partisan age would approve pretty much no matter what, and that’s simply wrong. Democrats noticed things not going great on many fronts, and responded accordingly, and also there’s a traditional slump around now when voters realize the big promises from the campaign are not going to happen. I do think there was some chance that the original hypothesis was right, but I put way too much weight on it. A chance of 50% or so here seems reasonable, with the default being ‘you can’t actually please the whole coalition at once and often there’s still a pandemic and people will blame you for it.’
2. Court packing is clearly going to happen (new justices don’t have to be appointed by end of year): 5% [FALSE]
On reflection 5% does seem like the right ballpark here. Good (but easy) prediction.
3. Yang is New York mayor: 80% [FALSE]
Selling to market is never a huge mistake but this feels like a situation where the market was making a dumb prediction and one should call them on it in theory, even if the carrying and transaction involved don’t justify betting. Yang was not a proven politician, his previous success was a wedge campaign, he was a strange match for New York and there were a lot of candidates and a lot of time left in a very strange year. Logic was fine, but still should have sold lower, to at least 60%.
4. Newsom recalled as CA governor: 5% [FALSE]
This is hard to evaluate even in hindsight. Was Newsom close to losing, or was it a ‘still had all these’ situation and not close at all? Was getting that far unlucky or inevitable? My guess is that there were enough ways this could have gone wrong that this should have been in the 10%-20% range, so this prediction was lousy, but not terrible since the odds were very much in his favor.
Note that I did bet on this at PredictIt and make money, although I’m not convinced I had alpha.
5. At least $250 million in damage from BLM protests this year: 30% [FALSE]
Nothing happened, despite some potentially provocative things happening, so it seems clear that selling was right, the question is if it was sold far enough, and my answer is probably not. With Biden in office appetite for this kind of thing was always going to be low, so I’ll evaluate the right answer to the 10%-15% range.
6. Significant capital gains tax hike (above 30% for highest bracket): 20% [FALSE]
Were Manchin and Sinema ever going to sign off on such a thing? Probably not, but if priorities had been different I don’t think it was impossible. I’m all right with 20% but probably is a bit high and 15% would have been better.
7. Trump is allowed back on Twitter: 20% [FALSE]
Good sale. If anything they’re going the other way and banning more people for worse reasons.
8. Tokyo Olympics happen on schedule: 70% [TRUE]
I think the way it played out strongly reinforces that Japan indeed really wanted it to happen and it would have taken quite a lot to stop them. But given that it was plausible that ‘quite a lot’ could have happened anyway, hard to fault stopping at 80%. Calling this one a good prediction.
9. Major flare-up (significantly worse than anything in past 5 years) in Russia/Ukraine war: 20% [FALSE]
Given that the crisis is happening now it could have happened earlier, but the whole thing still strikes me as standard posturing and negotiations and I continue to not expect any actual fighting, but occasionally such situations get botched and fighting happens. Reasonably happy with 15%.
10. Major flare-up (significantly worse than anything in past 10 years) in Israel/Palestine conflict: 5% [FALSE]
Didn’t learn much other than we didn’t see anything this year, probably fine but easy.
11. Major flare-up (significantly worse than anything in past 50 years) in China/Taiwan conflict: 5%
A lot of people said they were worried but it was all on previously known priors, there wasn’t any substantial new crisis beyond ‘USA looks weak so maybe they’ll try something?’ I continue to think that no, they are highly unlikely to try something and the yearly risk is very low, but one can’t update that much on one year’s evidence.
12. Netanyahu is still Israeli PM: 40% [FALSE]
Given my lack of knowledge of Israeli politics I should have trusted the market somewhat more and sold somewhat lower to reflect the time difference. I overvalued my ‘corrupt guy likely to find way to stick around’ gut.
13. Prospera has at least 1000 residents: 30% [FALSE]
Passing when you know nothing is always good strategy, but I did have one piece of information, which is that Scott was intellectually invested in Prospera and thus likely to be somewhat high on this. Should have sold a bit even though I was blind.
14. GME >$100 (Currently $170): 50% [TRUE]
I still have no idea what’s going on with GameStop. Presumably the actual result is important information. It’s literally at 100.15 as I type this, although it was about 148 at year’s end, so I’m going to conclude 50% was not that far off, since the distribution is skewed, and move on.
15. Bitcoin above 100K: 40% [FALSE]
16. Ethereum above 5K: 50% [FALSE]
That would have been a great trade. Of course, it was also very easy to find.
17. Ethereum above 0.05 BTC: 70%
On reflection I was too hasty to assume this had to be 50% or lower, because the two assets are importantly different and so the distribution could be skewed. For example, there could be worlds where ETH goes to zero or very low while BTC is mostly fine, whereas the opposite is almost never true, perhaps. But after thinking twice, it works the other way. ETH is the riskier asset, and it should outperform less than 50% of the time if things are fair. I do agree that ETH was the better buy at the time, though, since no one here actually believes the EMH. I think the sale was right, but that the hedge was definitely called for.
18. Dow above 35K: 90% [TRUE]
19. …above 37.5K: 70% [FALSE]
So it did land at 36.5K, right in the middle of Scott’s range, which has to be some evidence that it can really be this easy. It also means I made like 20% on my hedge, so I made a bunch of money from the arbitrage. These predictions were so over-the-top bullish that I’m very curious what was going on, but I sincerely hope Scott was long and using leverage.
20. Unemployment above 5%: 40% [FALSE]
It made it to 3.9% in December, after being 4.2% in November, just beating the Omicron rush and also the end of the year, so this was very close and 50% seems like a reasonable prediction in hindsight given that we got something that seems baseline-like and that had the stock market giving very strong returns... if this was about a 4% threshold. It was a 5% threshold, so this wasn't close, and I was clearly high. My guess is Scott was high as well, there was a ton of pent up demand for workers and the way unemployment is measured this was going to end up pretty low.
21. Google widely allows remote work, no questions asked: 20% [TRUE]
The original prediction seems good. The edited version seems quite bad. Yes, they explicitly said they weren’t going to do this, but I believed them? This much? Under this much uncertainty? Seriously, I need to be smarter than that. Worst prediction of the lot so far, by a wide margin.
22. Starship reaches orbit: 60%
This is another Scott prediction of something cool but not all that precedented, so again should have sold a little.
23. Fewer than 10K daily average official COVID cases in US in December 2021: 30% [FALSE]
Well, whoops, the number was rather larger than that, and would have been without Omicron, but the question is the logic. If there was no Delta or Omicron, would we have wound down and ended this? I think the answer is probably. So the key question is, what probability should have been assigned to Delta or Omicron? That’s where I screwed this up, for same central reason I screwed up the Google question. I didn’t put enough weight on that. I still think 30% was too low here, but 70% was aggressive. My guess is I should have been closer to 40%-50%, but I’m still not sure how to think about potential new variants.
24. Fewer than 50K daily average COVID cases worldwide in December 2021: 1% [FALSE]
Righto.
25. Greater than 66% of US population vaccinated against COVID: 50%
More than 66% got their first shot but complete vaccinations ended up around 62%. That difference remains weird to me, but this definitely could have gone either way for a variety of reasons. Presumably the buy to 60% was bad but I don’t think it was terrible.
26. India’s official case count is higher than US: 50% [FALSE]
Why was I willing to do this, on this little thinking? Presumably because this was April, exactly when things looked about to end, but India wouldn’t have sufficient vaccinations and has several times the population. So once again this is the same error.
27. Vitamin D is generally not recognized (eg NICE, UpToDate) as effective COVID treatment: 70% [TRUE]
This was never going to happen, these sources have no interest in doing anything about the lowest hanging of the fruits. As a result, people are dying, but they don’t care. We already mostly knew that. The 85%-90% before adjusting for Scott was right.
28. Something else not currently used becomes first-line treatment for COVID: 40% [TRUE]
I assume this was evaluated to true because of Paxlovid. In practice it’s still false if that’s what is being counted, but true in the sense of first best legal option. Paxlovid seems even now like we got pretty lucky to find it and have it be that over-the-top amazing, and it was approved only days from the end of the year, so I’m not too upset about losing this one if that’s the only reason I lost. If Scott is counting something else, it was a pretty bad prediction, and in general probably should have been higher.
29. Some new variant not currently known is greater than 25% of cases: 50% {TRUE]
I notice I am confused now. If I knew to buy this to 60%, then what are my other predictions here doing? I think this caught me at a strange time when things looked the best they’ve looked the whole pandemic, but still, that’s not an excuse.
30. Some new variant where no existing vaccine is more than 50% effective: 40% [TRUE]
If it’s versus death it would evaluate to false, so Scott meant infection, and then Omicron happened. Note that with a booster you’re back over 50% effective, and that’s now considered full vaccination via an existing vaccine (and was before Omicron), so I don’t think this grades all that obviously, and I’d evaluate it to [MU].
31. US approves AstraZeneca vaccine: 20% [FALSE]
Should have gone lower.
32. Most people I see in the local grocery store aren’t wearing a mask: 60% [FALSE]
Presumably Delta had me losing this anyway, so I can’t use Omicron as an excuse, but it’s more of the same mistake.
38. No new residents at our housing cluster: 40% [TRUE]
39. No current residents leave our housing cluster: 60% [FALSE]
Careful, Icarus. Got burned on these, but I do still like the logic.
53. At least seven days my house is orange or worse on PurpleAir.com because of fires: 80% [MU]
I stand by my recommendation.
60. There are no appraisal-related complications to the new house purchase: 50% [TRUE]
Looking back I continue to like this buy, but have no new info.
61. I live in the new house: 95% [TRUE]
Lost this one but I do think I got odds.
62. I live in the top bedroom: 60% [FALSE]
I am curious how this ended up not happening.
63. I can hear / get annoyed by neighbor TV noise: 40% [FALSE]
I wonder if I should have been lower here, given (again) that I’ve never seen this happen.
64. I’m playing in a D&D campaign: 70% [FALSE]
I’m guessing Covid situation hurt his chances here, but also in general predictions like this tend to be overconfident. Would be interesting to look back and check Scott’s calibration by reference class (e.g.: politics/economics, health/Covid, personal doing stuff that isn’t writing, writing-related accomplishments, , personal other, etc.)
65. I go on at least one international trip: 60% [TRUE]
Given this happened despite pandemic outcomes, my skepticism of his intentions was wrong, and this was a bad hold.
66. I spend at least a month living somewhere other than the Bay: 50% [FALSE]
On reflection I should have sold a bit on the ‘people overestimate probability of making big changes’ principle, but only a bit. Scott did travel a lot, so presumably that didn’t count.
67. I continue my current exercise routine (and get through an entire cycle of it) in Q4 2021: 70% [TRUE]
This one I did do the ‘sell a little’ thing and it didn’t work out, but I stand behind the principle.
68. I meditate at least 15 days in Q4 2021: 60% [FALSE]
69. I take oroxylum at least 5 times in Q4 2021: 40% [TRUE]
Still have no idea what oroxylum is. Probably should have sold meditation a bit.
70. I take some substance I haven’t discovered yet at least 5 times in Q4 2021 (testing exempted): 30%
Good sale here, I think.
71. I do at least six new biohacking experiments in the next eight months: 40% [FALSE]
Happy with the hold decision.
73. The Twitter account I check most frequently isn’t one of the five I check frequently now: 20% [FALSE]
Twitter is mostly the same old Twitter so I doubt there was much danger on this one. The account I check most often is actually different now, it’s @BNONews, but that’s because I’m using Twitter to manage the news aggressively.
74. I make/retweet at least 25 tweets between now and 2022: 70% [FALSE]
I notice I am surprised that I lost this one, but I did definitely lose it. For whatever reason, Scott does not like the Twitter except for (usually very creative) horrible puns, and I was overconfident that he’d be drawn into doing more. Bad prediction.
75. Lorien has 100+ patients: 90% [TRUE]
76. 150+ patients: 20% [FALSE]
77. 200+ patients: 5% [FALSE]
78. I’ve written at least ten more Lorien writeups (so total at least 27): 30% [FALSE]
This did land in the middle so I definitely lost by shrinking the middle. I don’t have the story, but presumably Scott continued to not want new patients but did want to continue old ones, and that reliably lands us within the window. It still looks like a lot of probability on a narrow window, but my guess is Scott’s prediction was better.
84. I have switched medical records systems: 20% [MU]
85. I have changed my pricing scheme: 20% [FALSE]
Mu indicates selling to 15% was likely a mistake, but not enough information to say since Scott doesn’t offer details.
86. ACX is earning more money than it is right now: 70% [TRUE]
My jaw would have been on the floor if this had turned out to be false, so it was mostly a question of whether Scott would quit, and I don’t think that was all that likely. I like this one.
90. There is another article primarily about SSC/ACX/me in a major news source: 10% [FALSE]
There either is a post or there isn’t, but when I think about reference classes, the chance of this happening in 8 months was not as high as 25%, so bad prediction. I’m guessing 10% was still slightly low.
91. I subscribe to at least 5 new Substacks (so total of 8): 20% [FALSE]
It’s obviously Scott’s choice, as there were plenty of good options to choose from, probably should have trusted him more on reflection.
92. I’ve read and reviewed How Asia Works: 90% [TRUE]
Points taken away for not actually finding the time to do the comparison after he put out the post, but 90% seems solid.
93. I’ve read and reviewed Nixonland: 70% [FALSE]
Me read books? In this economy? Man, that would be nice.
94. I’ve read and reviewed Scout Mindset: 60% [TRUE]
On reflection I think 70% was low, this was one of those inevitable book reviews. I haven’t done my version yet but eventually I suppose I should?
95. I’ve read and reviewed at least two more dictator books: 50% [TRUE]
This was indeed important to Scott, as it turned out. Mildly sad I sold a bit here.
96. I’ve started and am at least 25% of the way through the formal editing process for Unsong: 30% [FALSE]
97. Unsong is published: 10% [FALSE]
Sad this hasn’t happened, not much else to say.
99. [redacted] wins the book review contest: 60% [FALSE]
On reflection it definitely felt unpredictable who won.
100. I run an ACX reader survey: 50% [TRUE]
101. I run a normal ACX survey (must start, but not necessarily finish, before end of year): 90% [FALSE]
I suppose they can, but never got clarification. Shrug.
102. By end of year, some other post beats NYT commentary for my most popular post: 10% [FALSE]
I don’t think not seeing the event was much evidence on its frequency. I’d still make it a favorite to happen eventually.
103. I finish and post the culture wars essay I’m working on: 90% [TRUE]
104. I finish and post the climate change essay I’m working on: 80% [TRUE]
105. I finish and post the CO2 essay I’m working on: 80% [TRUE]
Overall good posts, but not his best.
106. I have a queue of fewer than ten extra posts: 70% [MU]
Presumably he’s not sure what it means anymore for something to be in the queue.
107. I double my current amount of money ($1000) on PredictIt: 10% [FALSE]
Didn’t happen, and we don’t have details, but I’d buy this again.
108. I post my scores on these predictions before 3/1/22: 70% [TRUE]
Feels like this was at least 75% likely, but that’s not really much of an evaluation.
Overall
One could do various mathematical assessments, but as I’ve said in the past, I don’t think that is where the biggest value lies. It’s more about the logic. How did we do?
Unfortunately, I think it’s safe to say that I am rather unhappy with my performance here.
There are essentially three sections: Non-Covid world stuff, Covid stuff and Personal stuff.
On the Non-Covid world stuff I think this is a good but not great performance. There are a few big mistakes, missed some opportunities, but mostly seems solid.
On the Covid stuff, this was a disaster. It was a correlated disaster, in the sense that Delta (and later Omicron) wrecked the whole model I was using and made my predictions here look stupid. In addition to looking stupid, they mostly actually were stupid as well. I gave reasonably high probabilities for new variants, and then didn’t think through the implications from those probabilities.
It’s important to own one’s mistakes in spots like this. In many ways and spots, I’ve been in front of the curve and made very good predictions. But in other places, not so much, and I’ve made mistakes. April 2021 was Peak Overly Strong Optimism on my part, and I made bad predictions on that basis because I wasn’t thinking about the right questions. I do think that if we were still dealing with Alpha, we’d have gotten the good scenarios, but the thinking about variants wasn’t consistent or coherent here.
That’s something to keep in mind going forward as well. I have a clear idea of where things are likely headed if new variants don’t change the outlook, but new variants are always a threat. I did a good job responding once they were known, but a much less good job with the possibility before they were known, and that matters for one’s plans. I still think that a variant of Omicron is likely to not pose that big a threat, but I haven’t looked into that as much as other aspects, and it’s an important question.
Finally, there’s the personal stuff, where I’m mostly betting on Scott’s contextual calibration, and it’s hard to know what the right answers are. There are some places I’m happy with my calls, a few places I’m upset, but mostly there isn’t much one can conclude here and I’m left thinking I could have done better.
That gives me an overall lousy grade for this round of predictions. The numerical evaluations Scott listed showed me doing relatively well (beating the market is tough especially when you don’t see a lot of the markets) but they don’t include the whole cluster of horrible Covid predictions, which I think were my worst substantial Covid predictions of the whole pandemic.
Hopefully I, and the world, can do better in 2022.