One way of viewing planning is as an outer-loop on decision theory.
My approach to the general problem of planning skills was to start with decision theory and build up. In my Guild of the Rose Decision Theory courses was to spend time focusing on slowly building the most fundamental skills of decision theory. This included practicing manipulation of probabilities and utilities via decision trees, and practicing all these steps in a variety of both real and synthetic scenarios, to build an intuition regarding the nuances of how to set up decision problems on paper. The ultimate goal was to get the practitioners to the point where they usually don't need to draw up a decision tree on paper, but rather to leverage those intuitions to quickly solve decision problems mentally, and/or recognize when a decision problem is actually tricky enough to merit breaking out the spreadsheet or Guesstimate project.
In my experience, even long-time rationalists are so incredibly bad at basic decision theory that trying to skip the step of learning to correctly set up a basic decision tree might actually be counterproductive. So my inclination is to focus on really mastering this art before attempting planning.
Another way of viewing planning is that planning is search.
For computationally bounded agents like us, search involves a natural tradeoff of breadth versus depth. Breadth is essentially idea generation, depth is idea selection and refinement. The tricky think about planning, in general, is that if 100x solutions exist, then those solutions are going to be found by spending the majority of the time on breadth-search, i.e. blue sky brainstorming for ways that the plan could look wildly different from the default approach, but that most situations don't admit 100x plans. Most things in life, especially in our technological civilization, are already sort of optimized, because there is some existing refined solution that has already accommodated the relevant tradeoffs. I could get to work faster if I flew there in a helicopter, but considering in costs, the Pareto optimum is still driving my car on the freeway. Most things look like this. Well-considered Pareto solutions to real-world problems tend to look boring!
Therefor, if you spend a lot of time looking for 100x solutions, you will waste a lot of time, because these solutions usually won't exist. Then, after failing to find a truly galaxy-brain solution, you will spend some amount of time refining the probably-already-obvious plan, realize that there are a lot of unknown-unknowns, and that the best way to get clarity on these is to just start working. Then you will realize that you would have been better off if you had just started working immediately and not bothered with "planning" at all, and you will either be Enlightened or depressed.
It gives me no pleasure to say this! Ten years ago I was all fired up on the idea that rationalists would Win and take over the world by finding these clever HPJEV-esque lateral thinking solutions. I have since realized that one creative rationalist is usually no match for tens of thousands of smart people exploring the manifold through natural breadth-first and then refining on the best solutions organically.
I am not actually completely blackpilled on the idea of scenario planning. Clearly there are situations for which scenario planning is appropriate. Massive capital allocations and long-term research programs might be two good examples. Even for these types of problems, it's worth remembering that the manifold probably only admits to marginal optimizations, not 100x optimizations, so you shouldn't spend too much time looking for them.
Both of these thoughts are pretty interesting, thanks.
I'd be interested in hearing a bunch more detail about how you trained decision theory and how that went. (naively this sounds like overkill to me, or "not intervening at the best level", but I'm quite interested in what sort of exercises you did and how people responded to them)
re: "how useful is planning", I do think this is specifically useful if you have deep, ambitious goals, without well established practices. (i.e. Rationality !== Winning in General).
Lord grant me the strength to persevere when things are hard the courage to quit when things are impossible and the wisdom to know the difference.
I'm running a small rationality dojo to try to approach this issue from the rat-for-rat-sake direction in a few weeks, trying to incorporate the things I learned from my Seasons of Growth, my Executive Function research, and stuff like Logan's Naturalism sequence (not to mention years of teaching at rat camps and workshops). I plan to do a writeup after, but would also love to chat sometime about this, either before or after.
One of the things that helped a lot with the predictions part was reading Judea Pearl's Heuristics. It seemed to make me better at noticing that a big part of my problem solving was split into two things: my representation of the problem space, and then my traversal of that space. I would notice more readily when I had stuck myself with an intractably sized space for the traversal speed available, and conclude that I needed to switch to trying to find a different representation that was tractable. Others might get very different insights out of the book, the search-inference framework is pretty flexible (also covered in Baron's Thinking and Deciding).
The cleanest example is during Ravens testing, noticing that checking a particular set of hypotheses one by one is taking too long. Zooming out and seeing them as a class of hypotheses, what they have in common, and then asking what else is possible. If the different moving parts of the puzzle are slot machines, then it's an explore exploit problem.
But it's somewhat broader. I think "could I 10x my plans?" can be useful frame even if you feel averse to "what's literally the most important problem I could focus on?".
Even more baby-step version: come up with two plans instead of one and choose between them. The second plan probably won't be 10x better, but count of two (2) is easier than 10x, and builds the necessary muscles of looking for alternatives and choosing.
Yeah something like this has already come up as a necessary stepping stone.
See also: ‘have a plan, at all’
My implicit model is something like "in addition to g factor, there'd turn out to be an 's factor' (i.e. "slow intelligence") that is a product of both "g" and "general reasoning skills."
The old posts on mathematical talent by JonahS (1,2,3) seem maybe related to that? Although I took JonahS to be arguing that people like Grothendieck score highly in “ability to find / build really great mental models (albeit not necessarily quickly)”, which is neither g-factor nor skill-at-planning-and-pivoting, I think. I’m not sure though. I wish JonahS had written more.
This is less of "a plan" and more of "a model", but, something that's really weirded me out about the literature on IQ, transfer learning, etc, is that... it seems like it's just really hard to transfer learn. We've basically failed to increase g, and the "transfer learning demonstrations" I've heard of seemed pretty weaksauce.
But, all my common sense tells me that "general strategy" and "responding to novel information, and updating quickly" are learnable skills that should apply in a lot of domains.
I'm curious why you think this? Or if you have a place where you've explained why you think this at more length? Like my common sense just doesn't agree with this -- although I'll admit my common sense was probably different 5 years ago.
Overall a lot of the stuff here seems predicated on there being a very thick notion of non-domain specific "rationality" or "general strategy" that can be learned, that then after being learned speed you up in widely disparate domains. As in -- the whole effort is to find such a strategy. But there seems to be some (a lot? a little?) evidence that this just isn't that much of a thing, as you say.
I think current ML evidence backs this up. A Transformer is like a brain: when a Transformer is untrained, nearly literally the same architecture could learn to be a language model; to be an image diffusion model; to play Starcraft; etc etc. But once you've trained it, although it can learn very quickly in contexts to which it is adapted, it basically learns pretty poorly outside of these domains.
Similarly, human brains start of very plastic. You can learn to echolocate, or speak a dozen languages, or to ride a unicycle, or to solve IMO problems. And then brains specialize, and learn a lot of mostly domain-specific heuristics, that let them learn very quickly about the things that they already know. But they also learn to kinda suck elsewhere -- like, learning a dozen computer languages is mostly just going to not transfer to learning Chinese.
Like I don't think the distinction here I'm drawing is even well-articulated. And I could spend more time trying to articulate it -- there's probably some generality, maybe at the level of grit -- but the "learn domain-non-specific skills that will then speed up a particular domain" project seems to take a position that's sufficiently extreme that I'm like... ehhhh seems unlikely to succeed? (I'm in the middle of reading The Secret of Our Success fwiw, although it's my pre-existing slant for this position that has inclined me to read it.)
I think two main threads here:
Re: 1
Most of the time, I'm not thinking strategically, I'm just doing some sort of pattern-matchy-find-the-nearest-reasonable-thing-to-do-and-then-do-it. My current guess is this is what most people (and, probably, ML algorithms?) are doing most of the time.
But, there's clusters of habits that seem pretty useful for solving novel problems, like asking:
Each of those feel like "skills" to me, which I've practiced and cultivated, and once cultivated, can be chained into habits.
Re: 2
If you learn to play piano, I'd expect some weak transfer into: hand-finger coordination, understanding chord progression / musical structure, etc. If you learn a couple different instruments you probably have an easier time picking up new instruments. This can pave the way towards... being really good at music, and maybe some related things.
If you learn arithmetic and algebra, you have a building block skill that applies to science, engineering, and business. These things seem more world-changing than music.
(I think music can be world changing, but I think the skill-tree there is more like 'songwriting' and 'connecting with a muse and speaking to the heart of people's souls', which I think is pretty different from piano playing)
Point #1 is sort of a subset of point #2: analyzing your goals, breaking things down into subgoals, breaking down skills into subskills, are all "skills" that I expect to generalize quite a lot in a lot of domains.
...
How much is this worth?
I do think a point you made that stands out is "well, there's only so much you can specialize. If you specialize at meta-skills, i.e. "specialize in being a generalist", does that trade off against being better specialist?
Probably.
I think it depends on how early you pick up the meta-skills – it seems like a travesty that children aren't taught these skills at like age ~10 so that they get to apply them sooner/faster to more domains. If you're 30ish (like me), I don't think it's that obvious, in all cases, that you should "level up at meta". I spent the last month learning "meta", and I could have been learning ML, or math proofs, or web design, and it would have been more immediately applicable.
(See: Rationality !== Winning)
The reason I think this is important is because I think "how do we safely create a superintelligence" (or, avoid doing so in a reliable/safe fashion), are very confusing questions. It isn't obvious if I'm (or others) are supposed to learn ML, or math proofs, or geopolitics. And meta-skills seem more necessary for figuring out how to navigate that, and what specialist skills to learn, and how to apply them. i.e. Specializing in Problems We Don't Understand.
(This does all have implications in what sort of ML training regimes I'd expect to produce a general mind, although I think that's, like, bad and you shouldn't do it. Also it does still look like ML is still bottlenecked more on something like 'g' than something like 's' at the moment).
So I agree with some of what you're saying along "There is such a thing as a generally useful algorithm" or "Some skills are more deep than others" but I'm dubious about some of the consequences I think that you think follow from them? Or maybe you don't think these consequences follow, idk, and I'm imagining a person? Let me try to clarify.
There's clusters of habits that seem pretty useful for solving novel problems
My expectation is that there are many skills / mental algorithms along these lines, such that you could truthfully say "Wow, people in diverse domains have found X mental algorithm useful for discovering new knowledge." But also I think it's probably true that the actually shared information between different domain-specific instances of "X mental algorithm" is going to be pretty small.
Like, take the skill of "breaking down skills into subskills, figuring out what subskills can be worked on, etc". I think there's probably some kind of of algorithm you can run cross-domain that does this kind of thing. But without domain-specific pruning heuristics, and like a ton of domain-specific details, I expect that this algorithm basically just spits back "Well, too many options" rather than anything useful.
So: I expect non-domain specific work put into sharpening up this algorithm to run into steeply diminishing returns, even if you can amortize the cost of sharpening up the algorithm across many different domains that would be benefitted. If you could write down a program that can help you find relevant subskills in some domain, about 95% of the program is going to be domain-specific rather than not domain specific, and there are something like only ~logarithmic returns to working on the domain-specific problem. (Not being precise, just an intuition)
Put alternately, I expect you could specify some kind of algorithm like this in a very short mental program, but when you're running the program most mental compute goes into finding domain-specific program details.
Let me just describe the way the world looks to me. Maybe we actually think the same thing?
-- If you look throughout the history of science, I think that most discoveries look less like "Discoverer had good meta-level principles that let them situate themselves in the right place to solve the issue" and more like "Discoverer happened to be interested in the right chunk of reality that let them figure out an important problem, but it was mostly luck in situating themselves or their skills in this place." I haven't read a ton of history of science, but yeah.
-- Concretely, my bet is that most (many?) scientific discoverers of important things were extremely wrong on other important things, or found their original discovery through something like luck. (And some very important discoveries (Transformers) weren't really identified as such at the time.)
-- Or, concretely, I think scientific progress overall probably hinges less on individual scientists having good meta-level principles, and more on like...whatever social phenomena is necessary to let individuals or groups of scientists run a distributed brute-force search. Extremely approximately.
-- So my belief is that so far we humans just haven't found any such principles like those you're seeking for. Or that a lack of such principles can screw over your group (if you eschew falsifiability to a certain degree you're fucked; if you ignore math you're fucked) but that you can ultimately mostly raise the floor rather than the ceiling through work on them. Like there is a lot of math out there, and different kinds are very useful for different things!
-- I would be super excited to find such meta-level principles, btw. I feel like I'm being relentlessly negative. So to be clear, it would be awesome to find substantive meta-level principles such that non-domain specific work on the meta-level principles could help people situate themselves and pursue work effectively in confusing domains. Like I'm talking about this because I am very much interested in the project. I just right now... don't think the world looks like they exist? It's just in that in the absence of seeing groups that seem to have such principles, nothing that I know about minds in general makes me think that such principles are likely.
Or maybe I'm just confused about what you're doing. Really uncertain about all the above.
I totally agree with how science normally works. I'm sitting here being like "whelp, doesn't seem like the way science normally works can solve the problems I care about in time."
It's a serious question on my end "can I raise the ceiling, or just the floor?" and "Does raising the floor matter?". Thinking about that led to me re-examining "can I actually help senior researchers?", and feeling like I had at least some traction on that, which output the "Help Senior Researchers with Targeted Problems", which indeed feels most important insofar as it's tractable.
My sense is that most senior researchers at least "know, and sometimes think about, all the meta-level principles I've thought about so far." But, they don't always keep them in their "context window". Some things I current expect (at least some) senior researchers to not being attending to enough:
Also, I think a bunch of them have various executive dysfunction stuff or health issues, which isn't what I'm currently focused on but seems important.
(note: I think "pursue things that are shiny/nerdsnipy" is an important motivational system that I'm not sure how to engage with, without breaking important things. But, my guess here is something similar to "if you want to marry into wealth, hang out around rich people and then marry for love". i.e. sink your attention into places where the shiny nerdsnipy problems are important, and then pick research directions based off excitement)
It'd be cool if a second group also worked towards "rationality skill assessment."
This was my project at last year's Epistea, but I sort of had to pause it to work full-time on my interp upskilling experiment.
I only got as far as implementing ~85% of an app to facilitate this (as described here), but maybe a quick chat about this would still be valuable?
something that's really weirded me out about the literature on IQ, transfer learning, etc, is that... it seems like it's just really hard to transfer learn. We've basically failed to increase g, and the "transfer learning demonstrations" I've heard of seemed pretty weaksauce.
You might be referring to the skeptical take on transfer learning, summarized as follows in Surfaces and Essences by Hofstadter & Sander:
Experimental studies have indeed demonstrated that subjects who are shown a source situation and who are then given a target situation are usually unable to see any connection between the two unless they share surface-level traits. Furthermore, in such experiments, when two situations have a superficial resemblance, then the second one invariably brings the first one to mind, no matter whether it is appropriate or not (that is, irrespective of whether there are deeper reasons to connect the two cases). For instance, if subjects first tackle an arithmetic problem concerning items bought in a store, then any other problem concerning purchases will instantly remind them of the initial problem. But if the theme of the first problem is experimentally manipulated say it becomes a visit to a doctor’s office instead of a store — then the participants will almost surely see no link between the two stories, even if the solution method for the first problem applies perfectly to the second problem.
But then the authors argue that this skeptical take is misleading:
Unfortunately, the source–target [experimental] paradigm [in the studies above] has a serious defect that undermines the generality of the conclusions that experiments based upon it produce. This defect stems from the fact that the knowledge acquired about the source situation during the twenty minutes or so of a typical experiment is perforce very limited — often consisting merely in the application of a completely unfamiliar formula to a word problem. By contrast, when in real life we are faced with a new situation and have to decide what to do, the source situations we retrieve spontaneously and effortlessly from our memories are, in general, extremely familiar. We all depend implicitly on knowledge deeply rooted in our experiences over a lifetime, and this knowledge, which has been confirmed and reconfirmed over and over again, has also been generalized over time, allowing it to be carried over fluidly to all sorts of new situations. It is very rare that, in real life, we rely on an analogy to a situation with which we are barely familiar at all. To put it more colorfully, when it comes to understanding novel situations, we reach out to our family and our friends rather than to the first random passerby. But in the source–target paradigm, experimental subjects are required to reach out to a random passerby—namely, the one that was imposed on them as a source situation by the experimenter.
And so, what do the results obtained in the framework of this paradigm really demonstrate? What they show is that when people learn something superficially, they wind up making superficial analogies to it.
To rephrase: The problem is that, in the experimental protocol, the subjects only ever wind up with a crappy surface-level understanding of the source situation, not a deep mental model of the source situation reflective of true familiarity / expertise. When people do have real comfort and familiarity with the source situation, then they find deep structural analogies all over the place.
For example (these are my examples), if you talk to an economist about some weird situation, they will easily notice that there’s a supply-and-demand way to look at it, and ditto gains-from-trade and so on. And physicists will analogize random things to superpositions and fourier-space and so on, etc. Of course, the main thing that everyone is an “expert” in is “intuitive everyday life stuff”, and hence our thinking and speech is full of constant non-surface-level analogies to traveling, seasons, ownership, arguments, etc. etc.
I’m not sure if this is relevant to what you were saying, just thought I’d share.
I was going off a vague sense from having talked to a few people who had scanned the literature more than I.
Right now I'm commissioning a lit review about "transfer learning", "meta learning", and things similar to that. My sense so far is that there aren't a lot of super impressive results, but part of that looks like it's because it's hard to teach people relevant stuff in a "laboratory"-esque setting.
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
They also attempt to generate principles to follow from, well, first principles, and see how many they correctly identify.
Second principles?
========
I'm really glad to see you quoting Three Levels. Seems important.
If I'm building my own training and tests, there's always the risk of ending up "teaching to the test", even if unintentionally. I think it'd be cool if other people were working on "Holdout Questions From Holdout Domains", that I don't know anything about, so that it's possible to test if my programs actually output people who are better-than-baseline (controlling for IQ).
I am hoarding at least one or two fun facts that I have seen smart rationalists get wrong. Specifically, a claim was made, I ask, "huh, really?" they doubled down, and then later I go look it up and find out that they were significantly wrong. Unfortunately I think that if I had read the book first and started the conversation with it in mind, I might not have discovered that they were confidently incorrect. Likewise, I think it would be hard to replicate this in a test setting.
6 months ago I wrote Feedbackloop-first Rationality. I didn't followup on it for awhile (except for sporadic Deliberate (“Purposeful?”) Practice Club).
I just spent 6 weeks actually exploring "how would I build my own cognition training program?". In the process of doing so, I've iterated a bunch. I'm still in an orienting phase, but it seemed worth writing down the current stage of my thoughts.
What's my goal?
A rough overview:
This is all parented under the higher level goal of "contribute meaningfully to x-risk reduction", but it feels relevant/meaty enough to be worth running at this goal for awhile.
"Rationality for the sake of existential risk"
A part of me romantically wants to pursue "rationality training for rationality training's sake." Alas, the world is big and my time is limited and I just can't actually justify putting years of effort into something, if I didn't think it would help with x-risk.
CFAR went through a phase where (some leaders) framed things as:
"Rationality, for the sake of rationality, for the sake of existential risk."
i.e. try to earnestly build something rationality-focused for it's own sake, because that seemed both healthier and better for x-risk than "rationality for the sake of x-risk", directly.
I think this was a reasonable thing to try, but my impression is this didn't work that well. If you tell yourself (and your students) "I'm doing this for the sake of rationality itself", but then in practice you're getting people to delicately open up their soul and figure out their true goals... and all-the-while radiating "man I really hope your goals turn out to involve saving the worlds from AIs", that may fuck up the "earnestly try to figure out your goals" process.
So:
I am not here to help you earnestly figure out your goals. That's an important part of rationality, and it might come about incidentally while people do exercises I develop, but it's not what I'm focused on this year.
I am here to develop and teach cognitive skills, which help you solve confusing problems at the edge of your ability. I'm doing this to push forward humanity's frontier of "how quickly can we do challenging research?", and strive towards 10x science.
I will prioritize learning and teaching those skills to people who seem like they are going to help with x-risk somehow, but I aim to write up a lot of stuff publicly, and trying-where-possible to output exercises that other people can do on their own, for whatever reasons they want. (See Exercise: Solve "Thinking Physics" as an example)
The Story So Far
Feedback-loops and "deliberate practice", vs "Just Clicking"
I just spent a month workshopping various "teaching rationality" plans. My initial ideas were framed around:
i.e. "Deliberate practice kinda sucks, thus it's undervalued, thus there's alpha in it."
I still believe this. But.... I do grudgingly admit to myself that deliberate practice is, like, really costly, and sucks a lot. It's exhausting, and it seems (at least for me) to require coming out of my peak hours of the day, trading off directly against my day job. It's frustrating and easy to bounce off of. It took me 30-50 hours over the course of months to get noticeably better at a videogame. It took me 40 hours over 2 weeks to get noticeably better at Thinking Physics exercises.
I think we could optimize the pedagogy here. The thing that separates actual "deliberate practice" from "regular practice" is that it's been battle tested and found to actually quickly move you to the frontier of expertise. But this still seems like a long, effortful project, so it seems worth asking:
Can we find cognitive skills that just click, rather than requiring dozens of hours of practice, that still provide a major cognitive edge?
What About CFAR? Didn't they teach "just click" skills?
You might ask "how does this relate to the Center for Applied Rationality and all the stuff they did?". In particular, CFAR taught a bunch of stuff in a four day workshop. Shouldn't that stuff have been aimed at "things that just click?". What's my new angle here?
I think the mechanism of CFAR was something like:
"Create a transformative workshop environment. Throw a lot of different tools/skills/ideas at people in one weekend. Most tools/skills/ideas won't necessarily help most people, but each person hopefully finds 1-3 tools that are immediately useful, which gives them a a sense that more is possible. And meanwhile the workshop conveys an overall mindset of systematically/agentically solving your problems."
I'm currently aiming at something more like:
"Convey a tightly clustered set of skills that weave into one 'deeper skill', over the course of 1-2 weeks. Then, build a good followup environment, where people who attended the workshop reliably get practice/checkin session once a week, for the next 6-12 months, to ensure those skills actually permeate their life."
Hamming-nature, 10x plans, OODA Loops
One skill-cluster seemed noteworthy in that:
That skill is: making plans that are 10x better than your current plans. (And, ideally, have a habit of doing this, such that your plans end up 100-1000x better overall).
I mean "plans" in a pretty broad sense. I think it includes going down a research path, launching a product, deciding to go-to-school and get-a-job, etc.
I could simplify the process down to:
The "actually pivot away from current favorite plan" is perhaps the hardest part. It may require grieving important parts of your current favorite, which the new plan won't accomplish. But I think the most important step is "actually have multiple alternative plans that you believe in."[1] This makes pivoting more natural, less painful.
This is related to asking yourself The Hamming Question ("what's the most important problem in your field (or, life) and why aren't you working on it?"). But it's somewhat broader. I think "could I 10x my plans?" can be useful frame even if you feel averse to "what's literally the most important problem I could focus on?". And even if you have set your target on The Most Important Problem, asking "okay but can I do this 10x faster or better?" is still a useful question to ask.
"Planning" vs "OODA Loops"
The direction I'm currently exploring is "Okay, but planning is actually only one facet of a complete decisionmaking loop. Can I learn myself, and can I teach others, the full-stack skillset of a competent OODA Loop?".
I currently feel a bit confused about this. I feel like I have a clear vision of how to improve at planmaking. (Or at least, what next things to try). I feel a lot fuzzier on how the various Observe/Orient/Decide/Act steps fit together into a cohesive skillset, and how to teach it.
My explorations so far have demonstrated "man, people come into this with all kinds of different skill gaps here, and I'm not sure how to build a single program that would teach it reliably."
But, when I imagine just trying to teach the "10x planning" workshop, I imagine people... making some better plans, and becoming temporarily better at planning, and then... sort of forgetting about it and moving on. I feel like "the pedagogical work isn't done" until it's somehow collectively taught the full OODA process, in a way that repeats.
My Process: "Test Driven Development"
My methods here still route through the sorts of exercises I was imagining when I wrote Feedbackloop-first Rationality. But I now have a bit more of a skeleton of "how to design exercises that teach particular skills, which build into an immediately valuable skill."
My process involves interleaving:
An important component is that the 3-hour exercises are in domains that are as different from each other as possible. So you're not merely learning "a skill", you're learning "how to generate solutions to novel problems."
For example, you might train on making "a plan" in a simplified videogame environment, and then go through multiple OODA loops as you implement that plan. Then, go to design real life plans for your real life goals, which refer back to the skills that from the simplified exercise.
This aims to build up the skill of transferring knowledge from one domain to another.
Alternate Strategies and/or Theories of Change
Obviously, if I'm taking "10x planning seriously" I should be applying it to myself. If I'm not ending up conceiving of (and actually pivoting to) plans that are 10x better than what I started with, why should I expect my process is any good?
The "Teach 10x Planning in a week + months of weekly followup sessions" seems much more likely to be work, and time-efficient, than my previous BATNA of "brute force deliberate practice." But my current process involves having 3-10 alternate plans that feel like real contenders, and periodically iterating on each of them as I learn more.
Here are my current contenders for alternate approaches. Some of these are "plans" and some of them are more like "useful project outputs" that aren't quite plan-shaped.
#1: Help senior researchers with specific targeted problems.
When I started this project, I assumed "the best researchers" wouldn't need my help with metacognitive skills. I saw clear gaps in junior and mid-level researchers, but the researchers who produce the work I'm most excited about seemed to have pretty good cognitive strategies, or at least a mysterious process I was afraid to mess around with.
My current guess is that this is largely true, but also it now seems to me that while senior researchers are "good at" metacognition, it's usually not the thing they're specializing in. There's a lot of depth to metacognition that's just hard to master and apply, and keeping track of all the options that have floated outside their context window is difficult.
I think the best time to try helping a senior researcher with metacognition is when something has recently, obviously gone wrong, so that a) the researcher believes it's worth investigating their process, and b) there's a clear object-level example to talk about.
I'm not sure how to scale this, and I'd expect each senior researcher to have pretty unique problems and psychologies, so for now this is more like something I'll opportunistically seize upon rather than aggressively pursue, but I do think it might be much more cost-effective insofar as it's tractable
My current tool here is applying the 5 Whys technique from Lean Startup Methodology to "research process failures." (an important variation is that I think it's usually necessary to do 6-7 whys instead of 5, because the 6th or 7th tend to be the place where "a root rationality failure" happened, and 5 Whys was designed more to deal with physical process failures)
#2: Build a 'Thinking Assistant' Pipeline
One way to improve people's research output is to hire fulltime assistants. There's a few different flavors of this in ascending "skill requirements."
I've heard a mixture of success stories and failure stories about each of these. I think there's an important "matchmaking" element here, such that the assistant feels helpful rather than annoying.
One role that Thinking Assistants can play is "help prototype apps that can eventually be 'AI-assisted alignment research' tools." A lot of LLM technology is not yet powerful enough to help augment a researcher's thought process reliably, but they might later work, and meanwhile you can prototype the experience using a skilled human.
This entire thread can relate to the previous "help particular senior researchers with particular problems" thread – I can imagine meeting with a senior researcher to discuss their problems, and in some cases it might turn out that hiring some kind of assistant is a good longterm solution.
#3. Learning "Generalized Research Taste"
"10x planning" and "10x OODA looping" feel like my most tractable idea. But another major thread I've been following is asking "is there a generalized skill of 'research taste'", which transfers across domains?
I'm interested in this because there's a lot of disagreement about what counts as "real" alignment research. Programs like MATS can match junior researchers up with mentors, to gain research taste in particular domains like Agent Foundations, Interpretability, Evals, Model Organisms, etc. This might help a junior researcher skill up and make contributions in a particular domain.
But, how do you decide which domain to specialize in in the first place? How do you figure out if you should pivot or adapt your domain, later?
I have some hopes that there turns out to be a skill of either...
Chris Olah has explored some exercises for developing research taste that seem like useful stepping stones here.
The sort of plan I'm imagining here is:
The hope is that after doing that in a bunch of fields with different constraints, they'll have some kind of feel for "which sort of intuitions generalize" and which don't, and when they approach the overall field of "somehow design AIs that scale safely to superintelligence", they'll have reasonable intuitions for navigating between agent foundations, interpretability, control, etc.
This agenda feels cool to me, but currently I grudgingly admit to myself that this would take a hella long time and not obviously work that well.
I think some portions of it are still a good idea to build out for individual research domains. (i.e. Chris Olah's exercises seem like good things to do in whatever domain you end up specializing in)
#4. Filtering/enculturation for "Overall Community Epistemic Health"
I think a valuable service CFAR provided was "creating a recruitment/filtering/enculturation pipeline", which resulted in a large cohort of people able to think sanely about important topics. This is notably different from "train rationality skills", it's more of a soft nudge on the overall ecosystem culture.
I would not feel comfortable directly optimizing for this goal. It feels pretty easy to delude yourself about. I like that most of my ideas here involve concrete tests for "you should be able to see people tackling an array of harder and harder problems in different domains."
But I still feel like this is a gap in the current ecosystem. When I imagine pivoting entirely to "help individual good researchers" and "train/deploy thinking assistants", I feel a sadness about giving up on the part of this project that seemed likely to help the broader community culture. I feel unsure how to weigh this, but I do weight it non-zero.
#5. Investigating "s factor?"
This is less of "a plan" and more of "a model", but, something that's really weirded me out about the literature on IQ, transfer learning, etc, is that... it seems like it's just really hard to transfer learn. We've basically failed to increase g, and the "transfer learning demonstrations" I've heard of seemed pretty weaksauce.
But, all my common sense tells me that "general strategy" and "responding to novel information, and updating quickly" are learnable skills that should apply in a lot of domains.
My current model is: IQ tests are designed to test competence quickly, and they typically give you a barrage of questions that you only have a couple minutes for, max. They test which people have the raw horsepower to process information quickly and respond on the fly. It makes sense if that's fairly hardwired and hard to improve on.
But, it seems to me that in order for strategy/general-creativity training to matter, it needs to operate on problems large enough that "planning" is an important subcomponent.
Hypothetically, it seems like you could construct an IQ-ish test, where the questions are expected to take a smart person at least an hour, and where the domain of each question is different so it's hard to train for. My implicit model is something like "in addition to g factor, there'd turn out to be an 's factor' (i.e. "slow intelligence") that is a product of both "g" and "general reasoning skills."
This seems very expensive to test and Do Science To. I think it'd be cool if humanity overall was working on designing longrunning experiments or longitudinal studies around, but I don't think it's competitive enough as an "x-risk intervention."
It'd be cool if a second group also worked towards "rationality skill assessment."
I'm currently trying bootstrap both "a training program" and "an evaluation process." They both seem necessary. I'm not sure if I'm going to end up sticking with my "Test Driven Development", but I put moderate odds on that.
But, in 3 Levels of Rationality Verification, Eliezer notes:
If I'm building my own training and tests, there's always the risk of ending up "teaching to the test", even if unintentionally. I think it'd be cool if other people were working on "Holdout Questions From Holdout Domains", that I don't know anything about, so that it's possible to test if my programs actually output people who are better-than-baseline (controlling for IQ).
This could be something like "TripleByte for Reasoning Skills", and it's primary role might be something like "a place that orgs can outsource difficult interview questions to" for hiring.
What Have I Actually Done?
That was a lot of philosophy. Here's what actually happened:
I focused on this while that MATS program was running at Lighthaven (where I work). MATS scholars seemed like a good potential target audience.
Things I ended up doing:
Experimented with Toybox Exercises
Experimented with "make and compare plans, for real"
Experimented with "prediction mindset"
Think conceptually and learn about the field
What's Next?
I'm currently running at this project for another ~month. I'm hoping to end up with some kind of weeklong beta-test workshop at the end of it.
After that, I'll take a break, evaluate whether this seems longterm promising, and figure out whether there is funding to do the scaled up version of this thing. (My ideal version of this involves hiring textbook authors from various fields, puzzle designers, expert tutors, etc).
A major crux will be "does this seem like something that people would actually pay enough money to pay the salaries of people developing the curriculum and implementing any coaching or workshops that follow?"
See also: eliminating the the feeling of idea scarcity.