I just reran this test with Gemini 3 Pro Preview in my apartment.
It passed with flying colors. No real mistakes or major inefficiencies. Opus 4.5 performed worse, I didn't end up running that round to completion.
I will note this is impacted by my place being a little smaller and less messy than what's in the post, and I'm also not at all a coffee snob, but the actual coffee is serviceable imo.
I can share the Gemini chat if anyone's interested.
I think a lot of people who talk about being n-th percentile in some domain implicitly only include the set of people who participate in the activity at all. That's a bit less clear-cut than "everyone alive", but makes more sense to talk about and compare against imo.
I'd assume they have orders of magnitude fewer people working on arresting people for memes than their lack of capacity for paramedics or whatever else.
Seems like a lot of paragraphs got collapsed together in this version of the post (vs the Wordpress and Substack ones)?
I don't get a progress bar on mobile (unless I'm missing it somehow), and the word count on hover feature seemingly broke on mobile as well a while ago (I remember it working before).
Why remove "x min read"? Even if it's not gonna be super accurate between different people's reading speeds, I still found it very helpful to decide at a glance how long a post is (e.g. whether to read it on the spot or bookmark it for later).
Showing the word count would also suffice.
I compared the Manifold forecasts with the community prediction on Metaculus and calculated a time-averaged Brier Score to score forecasts over time.
The so-called "nonsense" community prediction is still more accurate on average than Manifold for the same questions.
https://www.metaculus.com/notebooks/15359/predictive-performance-on-metaculus-vs-manifold-markets/
https://drive.google.com/drive/folders/1R_0NeKfGvdSpsR1Mh0FkTj50cvxV20Wa?usp=sharing
Trying to share a chat out of AI Studio has proved annoying, as it turns out. I copied the transcript and took a full page screen capture instead, but the latter also turned out slightly scuffed with my usual tool. Apologies for the quality.