I am extremely interested in this, and all similar efforts in this space. I agree our community should be doing much more along these lines.
Regarding your specific ideas:
Cognitive Bias Detection
Something about training people to categorize errors - instead of just making good decisions - rubs me the wrong way. Also, there's a lot of pre-existing work (I found out about this earlier today).
Calibration Training
The Credence Calibration Game exists. So does my variation on the same idea (see also the associated lesson plan). So do play-money and real-money prediction markets. That said, I do think there's a valuable and unfilled niche for something which doesn't require a download and has a nice user interface and has a four-digit number of questions and lets you check your answers immediately (. . . though I don't know how many people other than me would consider it valuable).
Bite-Sized, Practical Challenges
I am very much in favor of this, to the point where I'm already (tentatively) planning to (eventually) build some games with a similar motivation. Relatedly, the "ask users to predict an outcome based on limited data" example sounds like a description of that genre I invented (though "Bite-Sized" suggests you're thinking in terms of something much more polished/generally-accessible).
(Side note: A subtle benefit of the "Practical Challenges" approach is that it can correct for biases you weren't aiming for. A large part of my motivation for making D&D.Sci was "forcing them to confront the common pitfalls of overconfidence or representativeness heuristics"; I found that a Lesswronger working in a Data Science context will more often be insufficiently confident, and place too little weight on surface appearances; my endeavor 'failed' gracefully and people got a chance to notice those errors instead (plus various other problems I didn't even consider).)
-
I look forward to seeing what comes of this. If you want anything playtested, please let me know.
I appreciate the reply!
Something about training people to categorize errors - instead of just making good decisions - rubs me the wrong way
Are you able to pinpoint exactly what gives you this feeling? The goal of this problem type would be to train the ability to recognize bias to the point where it becomes second nature, with the hope that this same developed skill would also trigger in your own thought processes. I believe it’s generally easier to evaluate the truthfulness of a statement than to come up with one initially, so this training would he...
This is a long answer, in which I list around ten concrete problem types that such a site could have.
Before I go into my concrete proposals, here are some general points:
And overall I think that having an Actually Really Good website for rationality training would be extremely valuable, so I'm supportive of efforts in this direction.
I brainstormed some problem types that I think such a site could include.
1: Probabilistic calibration training for quantifying uncertainty
This is the obvious one. I already commented on this, in particular that I don't think this should be the main focus. (But if one were to execute this: I think that the lack of quantity and/or diversity of questions in existing tools is a core reason I don't do this more.)
2: "Statistical calibration"
I feel like there are lots of quantitative statistics one could ask questions about. Here are some basic ones:
(For more ideas, you can e.g. look at Statistics Finland's list here. And there just are various quantitative statistics floating around: e.g. today I learned that salt intake in 1800s Europe was ~18g/day, which sure is more than I'd have guessed.)
3: Quantitative modeling
(The line between this and the previous one is blurry.)
Fermi estimates are the classic one here; see Quantified Intuitions' The Estimation Game. See also this recent post that's thematically related.
There's room for more sophisticated quantitative modeling, too. Here are two examples to illustrate what I have in mind:
Example 1. How much value would it create to increase the speed of all passenger airplanes by 5%?
Example 2. Consider a company that has two options: either have its employees visit nearby restaurants for lunch, or hire food personnel and start serving lunch at its own spaces. How large does the company need to be for the second one to become profitable?
It's not obvious how to model these phenomena, and the questions are (intentionally) underspecified; I think the interesting part would be comparing modeling choices and estimates of parameters with different users rather than simply comparing outputs.
4: The Wikipedia false-modifications game
See this post for discussion.
5: Discourse-gone-astray in the wild
(Less confident on this one.)
I suspect there's a lot of pedagogically useful examples of poor discourse happening the wild (e.g. tilted or poorly researched newspaper articles, heated discussions in Twitter or elsewhere). This feels like a better way to execute what the "spot cognitive biases / logical fallacies" exercises aim to do. Answering questions like "How is this text misleading?", "How did this conversation go off the rails?" or "What would have been a better response instead of what was said here?" and then comparing one's notes to others seems like it could make a useful exercise.
6: Re-deriving established concepts
Recently it occurred to me that I didn't know how inflation works and what its upsides are. Working this through (with some vague memories and hints from my friend) felt like a pretty good exercise to me.
Another example: I don't know how people make vacuums in practice, but when I sat and thought it through, it wasn't too hard to think of a way to create a space with much less air molecules than atmosphere with pretty simple tools.
Third example: I've had partial success prompting people to re-derive the notion of Shapley value.
I like this sort of problems: they are a bit confusing, in that part of the problem is asking the right questions, but there are established, correct (or at least extremely good) solutions.
(Of course someone might already know the canonical answer to any given question, but that's fine. I think there are lots of good examples in economics - e.g. Vickrey auction, prediction markets, why price controls are bad / price gouging is pretty good, "fair" betting odds - for this, but maybe this is just because I don't know much economics.)
7: Generating multiple ideas/interventions/solutions/hypotheses
An exercise I did at some point is "Generate 25 ideas for interventions that might improve learning and other outcomes in public education". I feel like the ability to come up with multiple ideas to a given problem is pretty useful (e.g. this is something I face in my work all the time, and this list itself is an example of "think of many things"). This is similar to the babble exercises, though I'm picturing more "serious" prompts than the ones there.
Another way to train this skill would be to have interactive exercises that are about doing science (c.f. the 2-4-6 problem) and aiming to complete them as efficiently as possible (This article is thematically relevant.)
(Discussion of half-developed ideas that I don't yet quite see how to turn into exercises.)
8: Getting better results with more effort
Two personal anecdotes:
A type of exercise where you are supposed to first give an initial answer after X time, and then are allowed to revise your answer for Y time, seems like it could train this and other skills. (Maybe brainstorming exercises of the form "if you had a week/month/year of time, how would you solve [problem]?" could help, too.)
9: I think there's something in the genre of "be specific", and more specifically in "operationalize vague claims into something that has a truth value", that'd be nice to have in large-quantity exercise form. See this post for related discussion. I'm also reminded of this comment.
There are definitely things not covered by this list; in particular, I have little of directly training to apply all this in real life (c.f. TAPs, which is definitely a very real-life-y technique). So while I did keep practicality in mind, I'd be happy to see exercises that bridge the theory-practice-gap even more.
Also, the Dungeons and Data Science and the stuff Raymond is doing are something to keep in mind.
FYI I'm working on an angle on this. One of my dreams is to make a proper website, but for now it's been more efficient to assemble a collection of puzzles and exercises that various other people have built, and layer rationality training exercises on top of them.
My agenda is written up in the Feedbackloop-first Rationality sequence. The basic idea is that rationality is bottlenecked on inventing better feedbackloops that train the actually important skills. (You can look over the "exercises" section)
My general strategy has been to take existing puzzles/exercises that have a fair amount of depth, such that in order to solve it you're going to need to:
Which naturally lends itself well to practicing the skills of:
Thanks for sharing this! I’ve read Feedbackloop-first Rationality, and it’s definitely contributed why I want to build something like this. I’ve even been looking for Thinking Physics style problems that might be free to use online. Getting a diverse and quality set of interesting problems I think will be difficult whether its aggregated, crowdsourced, or possibly AI generated.
...My agenda is written up in the Feedbackloop-first Rationality sequence. The basic idea is that rationality is bottlenecked on inventing better feedbackloops that train the actu
I would be interested in this!
Related: an organization called Sage maintains a variety of calibration training tools.
I like this leetcode style aspect of the idea. Maybe if you identify the "Blind 75" of cognitive biases, that might be a good start. Or take practice problems from this course: https://callingbullshit.org/. Maybe if you identify which problems you want to train on, you can use an LLM to continuously rewrite them to prevent users from simply memorizing an answer and force them to think critically. There's several ways to implement this sort of thing and I can easily imagine the high school version of myself falling down this rabbit hole.
Not to self promote too much, but I created a demo of a similarly inspired idea that I call newsbetting at https://www.rashomonnews.com. The idea was to use betting mechanisms to create skin in the game and help news consumers identify their own biases when reading the news. Maybe you can include this betting mechanism as a proxy for your confidence interval.
Regardless, I would very much like to see the outcome of this sort of project and I wish you the best!
I would be very interested to see what you come up with!
I like this idea and have wanted to do something similar, especially something that we could do at a meetup. For what it's worth, I made a calibration trivia site to help with calibration. The San Diego group has played it a couple times during meetups. Feel free to copy anything from it. https://calibrationtrivia.com/
Thanks! I have seen a similar tool like this before and enjoyed it quite a bit. I’d love to know where you source the trivia data, especially if it is available for open use. Also could be interesting to tailor to some functionality for meetups as well.
I think this sounds fun! The versions of this i'd be most likely to use would be:
I imagine a character (Alice) is constantly used as the rational actor in scenarios. We make Alice a likeable character, give her a personality, a series of events and decisions that lead her to the present.
Then, when the user has been around for a sufficient amount of time. Alice starts to slip. She makes mistakes that harm others, perhaps she has disputes with ‘Stupidus’, Maybe she just begins to say untrue things.
How long will it take a user to pry themself out of the rose tinted glasses, and update on Alice?
I've thought about this for a long time and I think one of the big issues is lack of labelled training data in many domains. E.g. people made calibration toys and that helped a lot for that particular dimension. Ditto the tests on which studies replicated. In many cases we'd want more complex blinded data for people to practice on, and that requires, like in games, someone to set up all the non-fun backend for them.
like the calibration game but for a variety of decision problems, where the person has to assign probabilities to things at different stages based on what information is available. Afterwards they get an example brier score based on the average of what people with good prediction track records set at each phase.
the following is motivated by:
I've been a long time lurker on Less Wrong and I've noticed the recurring criticism that despite its focus on rationality, the community lacks structured training to develop practical rationality skills. Eliezer Yudkowsky talks rationality as a martial art, because it's something that can be trained and refined through deliberate practice. But where is our dojo?
A model that comes to mind is a website like LeetCode, where programmers can solve coding challenges, share solutions, and see how others approach the same problems. LeetCode can sometimes encourage overfitting to specific problem types so it's not a perfect analogy. The community driven aspect would interesting to me as you can see how other people approach the problem. Could something similar be adapted for rationality?
Imagine a platform where, instead of solving coding puzzles, users engage with problems designed to train rational thinking. Here are a few types of problems that might fit:
This kind of platform could be a place where people practice and refine their skills, not just absorb entertaining ideas in way that some say is weakly applicable.
I have a few years of experience in Software Engineering (backend and ML) and have been thinking about building a tool like this for my own use. However, if others would find it valuable, I'd be open to expanding it into something that the wider community could use as well. It could even present an opportunity to create a sustainable project with some potential financial benefits along the way. I'd love to hear if there’s interest in such a platform and what features might be most helpful to include.