This is a story about a flawed Manifold market, about how easy it is to buy significant objective-sounding publicity for your preferred politics, and about why I've downgraded my respect for all but the largest prediction markets. I've had a Manifold account for a while, but I didn't use it...
(Creating more visibility for a comment thread with Rohin Shah.) Currently, DeepMind's capabilities evals are run on the post-RL*F (RLHF/RLAIF) models and not on the base models. This worries me because RL*F will train a base model to stop displaying capabilities, but this isn't a guarantee that it trains the...
Summary: Recent interpretability work on "grokking" suggests a mechanism for a powerful mesa-optimizer to emerge suddenly from a ML model. Inspired By: A Mechanistic Interpretability Analysis of Grokking Overview of Grokking In January 2022, a team from OpenAI posted an article about a phenomenon they dubbed "grokking", where they trained...
Our society is pretty messed up around arguments of whose ideas we should and shouldn't tolerate. Some of this is inevitable: even without censorship, there are cases where group X can choose to actively show respect to person Y, and members of X will argue about that, and people with...
[EDIT: SimonM pointed out a possibly-fatal flaw with this plan: it would probably discourage more pundits from joining the prediction-making club at all, and adding to that club is a higher priority than comparing the members more accurately.] Stop me if you've heard this one. (Seriously, I may not be...
Epistemic Status: I only know as much as anyone else in my reference class (I build ML models, I can grok the GPT papers, and I don't work for OpenAI or a similar lab). But I think my thesis is original. Related: Gwern on GPT-3 For the last several years,...
I've seen a worrying trend in people who've learned introspection and self-improvement methods from CFAR, or analogous ones from CBT. They make better life decisions, they calm their emotions in the moment. But they still look just as stressed as ever. They stamp out every internal conflict they can see,...