6 months ago I wrote Feedbackloop-first Rationality. I didn't followup on it for awhile (except for sporadic Deliberate (“Purposeful?”) Practice Club).

I just spent 6 weeks actually exploring "how would I build my own cognition training program?". In the process of doing so, I've iterated a bunch. I'm still in an orienting phase, but it seemed worth writing down the current stage of my thoughts.

What's my goal?

A rough overview:

  • I want to get more, higher quality "X-risk thinker hours" hours. 
    • This includes AI alignment technical research, AI macrostrategy research, policy, governance, as well as people (such as Lightcone team) deciding which infrastructure to build, 
  • I'm particularly interested in getting more "serial research", as opposed to more "parallel research." We can throw more researchers at a problem, but if there are some problems that require one person to synthesize 10+ years of experience, all the parallel research won't help.
  • An obvious way to improve researcher hours is "via mentorship", but I think there is a mentorship bottleneck. So, I'm interested in strategies that train tacit cognitive skills that either don't require mentorship, or leveraging expertise from outside the current x-risk ecosystem

This is all parented under the higher level goal of "contribute meaningfully to x-risk reduction", but it feels relevant/meaty enough to be worth running at this goal for awhile.

"Rationality for the sake of existential risk" 

A part of me romantically wants to pursue "rationality training for rationality training's sake." Alas, the world is big and my time is limited and I just can't actually justify putting years of effort into something, if I didn't think it would help with x-risk.

CFAR went through a phase where (some leaders) framed things as:

"Rationality, for the sake of rationality, for the sake of existential risk." 

i.e. try to earnestly build something rationality-focused for it's own sake, because that seemed both healthier and better for x-risk than "rationality for the sake of x-risk", directly.

I think this was a reasonable thing to try, but my impression is this didn't work that well. If you tell yourself (and your students) "I'm doing this for the sake of rationality itself", but then in practice you're getting people to delicately open up their soul and figure out their true goals... and all-the-while radiating "man I really hope your goals turn out to involve saving the worlds from AIs", that may fuck up the "earnestly try to figure out your goals" process.

So:

I am not here to help you earnestly figure out your goals. That's an important part of rationality, and it might come about incidentally while people do exercises I develop, but it's not what I'm focused on this year.  

I am here to develop and teach cognitive skills, which help you solve confusing problems at the edge of your ability. I'm doing this to push forward humanity's frontier of "how quickly can we do challenging research?", and strive towards 10x science.

I will prioritize learning and teaching those skills to people who seem like they are going to help with x-risk somehow, but I aim to write up a lot of stuff publicly, and trying-where-possible to output exercises that other people can do on their own, for whatever reasons they want. (See Exercise: Solve "Thinking Physics" as an example)

The Story So Far

Feedback-loops and "deliberate practice", vs "Just Clicking"

I just spent a month workshopping various "teaching rationality" plans. My initial ideas were framed around:

  • Deliberate practice is costly and kinda sucks
  • Therefore, people haven't invested in it much, as either "rationality training programs", or as an "alignment research training programs." 
  • Therefore, there may be opportunity to build useful training programs, premised on the notion of "actually put in the work to do the practice".

i.e. "Deliberate practice kinda sucks, thus it's undervalued, thus there's alpha in it."

I still believe this. But.... I do grudgingly admit to myself that deliberate practice is, like, really costly, and sucks a lot. It's exhausting, and it seems (at least for me) to require coming out of my peak hours of the day, trading off directly against my day job. It's frustrating and easy to bounce off of. It took me 30-50 hours over the course of months to get noticeably better at a videogame. It took me 40 hours over 2 weeks to get noticeably better at Thinking Physics exercises.

I think we could optimize the pedagogy here. The thing that separates actual "deliberate practice" from "regular practice" is that it's been battle tested and found to actually quickly move you to the frontier of expertise. But this still seems like a long, effortful project, so it seems worth asking:

Can we find cognitive skills that just click, rather than requiring dozens of hours of practice, that still provide a major cognitive edge?

What About CFAR? Didn't they teach "just click" skills?

You might ask "how does this relate to the Center for Applied Rationality and all the stuff they did?". In particular, CFAR taught a bunch of stuff in a four day workshop. Shouldn't that stuff have been aimed at "things that just click?". What's my new angle here?

I think the mechanism of CFAR was something like:

"Create a transformative workshop environment. Throw a lot of different tools/skills/ideas at people in one weekend. Most tools/skills/ideas won't necessarily help most people, but each person hopefully finds 1-3 tools that are immediately useful, which gives them a a sense that more is possible. And meanwhile the workshop conveys an overall mindset of systematically/agentically solving your problems."

I'm currently aiming at something more like:

"Convey a tightly clustered set of skills that weave into one 'deeper skill', over the course of 1-2 weeks. Then, build a good followup environment, where people who attended the workshop reliably get practice/checkin session once a week, for the next 6-12 months, to ensure those skills actually permeate their life."

Hamming-nature, 10x plans, OODA Loops

One skill-cluster seemed noteworthy in that:

  • I think someone could learn it in ~a week, if they had the right prerequisites.
  • I think there exist people who are smart and capable, but nonetheless don't have this skill (or, could use to further improve at the skill).
  • It'd be immediately really useful, instead of taking like 6 months or practice.

That skill is: making plans that are 10x better than your current plans. (And, ideally, have a habit of doing this, such that your plans end up 100-1000x better overall). 

I mean "plans" in a pretty broad sense. I think it includes going down a research path, launching a product, deciding to go-to-school and get-a-job, etc. 

I could simplify the process down to:

  1. Generating multiple plans that you feel reasonably excited about.
  2. Noticing the ways that the best plans don't actually work, or could be dramatically improved. Iterate on them until they're the best version of themselves.
  3. Estimating the value of those plans.
  4. Actually shifting away from your current favorite to a plan that you think is 10x better than it.
  5. Having the judgment to either persevere with that plan when it gets hard, or, pivot again.

The "actually pivot away from current favorite plan" is perhaps the hardest part. It may require grieving important parts of your current favorite, which the new plan won't accomplish. But I think the most important step is "actually have multiple alternative plans that you believe in."[1] This makes pivoting more natural, less painful.

This is related to asking yourself The Hamming Question ("what's the most important problem in your field (or, life) and why aren't you working on it?"). But it's somewhat broader. I think "could I 10x my plans?" can be useful frame even if you feel averse to "what's literally the most important problem I could focus on?". And even if you have set your target on The Most Important Problem, asking "okay but can I do this 10x faster or better?" is still a useful question to ask.

"Planning" vs "OODA Loops"

The direction I'm currently exploring is "Okay, but planning is actually only one facet of a complete decisionmaking loop. Can I learn myself, and can I teach others, the full-stack skillset of a competent OODA Loop?".

I currently feel a bit confused about this. I feel like I have a clear vision of how to improve at planmaking. (Or at least, what next things to try). I feel a lot fuzzier on how the various Observe/Orient/Decide/Act steps fit together into a cohesive skillset, and how to teach it.

My explorations so far have demonstrated "man, people come into this with all kinds of different skill gaps here, and I'm not sure how to build a single program that would teach it reliably."

But, when I imagine just trying to teach the "10x planning" workshop, I imagine people... making some better plans, and becoming temporarily better at planning, and then... sort of forgetting about it and moving on. I feel like "the pedagogical work isn't done" until it's somehow collectively taught the full OODA process, in a way that repeats.

My Process: "Test Driven Development"

My methods here still route through the sorts of exercises I was imagining when I wrote Feedbackloop-first Rationality. But I now have a bit more of a skeleton of "how to design exercises that teach particular skills, which build into an immediately valuable skill."

My process involves interleaving:

  1. ~3 hour exercises that have a clear "right answer", but which require you to wrestle with gnarly confusing problems on your own. You get some guidance of how to approach the problem, but in a major component is almost always "figure out how to generate solutions on your own, and then reflect on which solutions actually worked."
  2. Longer sessions where you apply the skills from those exercises on real life problems.

An important component is that the 3-hour exercises are in domains that are as different from each other as possible. So you're not merely learning "a skill", you're learning "how to generate solutions to novel problems."

For example, you might train on making "a plan" in a simplified videogame environment, and then go through multiple OODA loops as you implement that plan. Then, go to design real life plans for your real life goals, which refer back to the skills that from the simplified exercise.

This aims to build up the skill of transferring knowledge from one domain to another.

Alternate Strategies and/or Theories of Change

Obviously, if I'm taking "10x planning seriously" I should be applying it to myself. If I'm not ending up conceiving of (and actually pivoting to) plans that are 10x better than what I started with, why should I expect my process is any good?

The "Teach 10x Planning in a week + months of weekly followup sessions" seems much more likely to be work, and time-efficient, than my previous BATNA of "brute force deliberate practice." But my current process involves having 3-10 alternate plans that feel like real contenders, and periodically iterating on each of them as I learn more.

Here are my current contenders for alternate approaches. Some of these are "plans" and some of them are more like "useful project outputs" that aren't quite plan-shaped.

#1: Help senior researchers with specific targeted problems.

When I started this project, I assumed "the best researchers" wouldn't need my help with metacognitive skills. I saw clear gaps in junior and mid-level researchers, but the researchers who produce the work I'm most excited about seemed to have pretty good cognitive strategies, or at least a mysterious process I was afraid to mess around with.

My current guess is that this is largely true, but also it now seems to me that while senior researchers are "good at" metacognition, it's usually not the thing they're specializing in. There's a lot of depth to metacognition that's just hard to master and apply, and keeping track of all the options that have floated outside their context window is difficult.

I think the best time to try helping a senior researcher with metacognition is when something has recently, obviously gone wrong, so that a) the researcher believes it's worth investigating their process, and b) there's a clear object-level example to talk about.

I'm not sure how to scale this, and I'd expect each senior researcher to have pretty unique problems and psychologies, so for now this is more like something I'll opportunistically seize upon rather than aggressively pursue, but I do think it might be much more cost-effective insofar as it's tractable

My current tool here is applying the 5 Whys technique from Lean Startup Methodology to "research process failures." (an important variation is that I think it's usually necessary to do 6-7 whys instead of 5, because the 6th or 7th tend to be the place where "a root rationality failure" happened, and 5 Whys was designed more to deal with physical process failures)

#2: Build a 'Thinking Assistant' Pipeline

One way to improve people's research output is to hire fulltime assistants. There's a few different flavors of this in ascending "skill requirements."

  • Body Doubles is a low-ish skill position of "sit next to someone while they work, and notice if they are getting distracted, encouraging the person to stay on track rather than bouncing off things that are hard or aversive." Focusmate is a maximally cheap version of this, but IMO it's easy to slide out of the habit of it, and it can feel somewhat less "real."
  • Rubber Duck. Similar to Body Double but the researcher is constantly talking out loud about their thought process. In many cases the Rubber Duck may need enough technical background to follow the conversation.
  • Metacognitive Assistants have the explicit job of tracking your attention, your goals, your metacognitive habits. They keep track of things that have fallen out of your strategic context window. ("Secretary"/"Executive Assistants" often play many of these roles, in addition to basically being a personal assistant who also just deal with various other problems so you don't have to. I'm imagining a version that specializes in improving your research output)
  • Research Assistant/Apprentice. This is a more involved role where you're deeply embedding someone in your research thought process, training them in your paradigm.

I've heard a mixture of success stories and failure stories about each of these. I think there's an important "matchmaking" element here, such that the assistant feels helpful rather than annoying.

One role that Thinking Assistants can play is "help prototype apps that can eventually be 'AI-assisted alignment research' tools." A lot of LLM technology is not yet powerful enough to help augment a researcher's thought process reliably, but they might later work, and meanwhile you can prototype the experience using a skilled human.

This entire thread can relate to the previous "help particular senior researchers with particular problems" thread – I can imagine meeting with a senior researcher to discuss their problems, and in some cases it might turn out that hiring some kind of assistant is a good longterm solution.

#3. Learning "Generalized Research Taste"

"10x planning" and "10x OODA looping" feel like my most tractable idea. But another major thread I've been following is asking "is there a generalized skill of 'research taste'", which transfers across domains?

I'm interested in this because there's a lot of disagreement about what counts as "real" alignment research. Programs like MATS can match junior researchers up with mentors, to gain research taste in particular domains like Agent Foundations, Interpretability, Evals, Model Organisms, etc. This might help a junior researcher skill up and make contributions in a particular domain.

But, how do you decide which domain to specialize in in the first place? How do you figure out if you should pivot or adapt your domain, later?

I have some hopes that there turns out to be a skill of either...

  1. rapidly gaining research taste in multiple domains, and then cross referencing them against each other
  2. learning the skill of generating research taste from first principles, testing that it works, and and then applying that skill to the field of alignment, such that you have some reason to think your taste will be any good.

Chris Olah has explored some exercises for developing research taste that seem like useful stepping stones here.

The sort of plan I'm imagining here is:

  • We get multiple experts in different fields with subtle taste, where it's well established what expertise looks like. (These can be random fields, although it's helpful if they are at least adjacent to plausible-AI-alignment cognitive work)
  • The experts design questions like "in this situation, what would you do? What do you think would happen next in the situation?", and write up lists of tastes/principles they actually follow.
  • Aspiring "general research-taste havers" look at each exercise, attempting to use general reasoning skills to get the right answer, and reflect on why they got the answers right or wrong. They also attempt to generate principles to follow from, well, first principles, and see how many they correctly identify. 
  • Between each exercise, reflect on how they could have arrived at the right answer.

The hope is that after doing that in a bunch of fields with different constraints, they'll have some kind of feel for "which sort of intuitions generalize" and which don't, and when they approach the overall field of "somehow design AIs that scale safely to superintelligence", they'll have reasonable intuitions for navigating between agent foundations, interpretability, control, etc.

This agenda feels cool to me, but currently I grudgingly admit to myself that this would take a hella long time and not obviously work that well.

I think some portions of it are still a good idea to build out for individual research domains. (i.e. Chris Olah's exercises seem like good things to do in whatever domain you end up specializing in)

#4. Filtering/enculturation for "Overall Community Epistemic Health"

I think a valuable service CFAR provided was "creating a recruitment/filtering/enculturation pipeline", which resulted in a large cohort of people able to think sanely about important topics. This is notably different from "train rationality skills", it's more of a soft nudge on the overall ecosystem culture.

I would not feel comfortable directly optimizing for this goal. It feels pretty easy to delude yourself about. I like that most of my ideas here involve concrete tests for "you should be able to see people tackling an array of harder and harder problems in different domains." 

But I still feel like this is a gap in the current ecosystem. When I imagine pivoting entirely to "help individual good researchers" and "train/deploy thinking assistants", I feel a sadness about giving up on the part of this project that seemed likely to help the broader community culture. I feel unsure how to weigh this, but I do weight it non-zero.

#5. Investigating "s factor?"

This is less of "a plan" and more of "a model", but, something that's really weirded me out about the literature on IQ, transfer learning, etc, is that... it seems like it's just really hard to transfer learn. We've basically failed to increase g, and the "transfer learning demonstrations" I've heard of seemed pretty weaksauce.

But, all my common sense tells me that "general strategy" and "responding to novel information, and updating quickly" are learnable skills that should apply in a lot of domains.

My current model is: IQ tests are designed to test competence quickly, and they typically give you a barrage of questions that you only have a couple minutes for, max. They test which people have the raw horsepower to process information quickly and respond on the fly. It makes sense if that's fairly hardwired and hard to improve on.

But, it seems to me that in order for strategy/general-creativity training to matter, it needs to operate on problems large enough that "planning" is an important subcomponent.

Hypothetically, it seems like you could construct an IQ-ish test, where the questions are expected to take a smart person at least an hour, and where the domain of each question is different so it's hard to train for. My implicit model is something like "in addition to g factor, there'd turn out to be an 's factor' (i.e. "slow intelligence") that is a product of both "g" and "general reasoning skills." 

This seems very expensive to test and Do Science To. I think it'd be cool if humanity overall was working on designing longrunning experiments or longitudinal studies around, but I don't think it's competitive enough as an "x-risk intervention."

It'd be cool if a second group also worked towards "rationality skill assessment."

I'm currently trying bootstrap both "a training program" and "an evaluation process." They both seem necessary. I'm not sure if I'm going to end up sticking with my "Test Driven Development", but I put moderate odds on that.

But, in 3 Levels of Rationality Verification, Eliezer notes:

This question of "verification methods good enough to build organizations," is a huge problem at all levels of modern human society.  

If you're going to use the SAT to control admissions to elite colleges, then can the SAT be defeated by studying just for the SAT in a way that ends up not correlating to other scholastic potential?  If you give colleges the power to grant degrees, then do they have an incentive not to fail people?  

(I consider it drop-dead obvious that the task of verifying acquired skills and hence the power to grant degrees should be separated from the institutions that do the teaching, but let's not go into that.) 

If I'm building my own training and tests, there's always the risk of ending up "teaching to the test", even if unintentionally. I think it'd be cool if other people were working on "Holdout Questions From Holdout Domains", that I don't know anything about, so that it's possible to test if my programs actually output people who are better-than-baseline (controlling for IQ).

This could be something like "TripleByte for Reasoning Skills", and it's primary role might be something like "a place that orgs can outsource difficult interview questions to" for hiring.

What Have I Actually Done?

That was a lot of philosophy. Here's what actually happened:

I focused on this while that MATS program was running at Lighthaven (where I work). MATS scholars seemed like a good potential target audience.

Things I ended up doing:

Experimented with Toybox Exercises

  • Ran a one day "Basic Metacognition" workshop (based on Exercise: Solve "Thinking Physics")
  • Had a followup 1-1 workshops with 3 MATS scholars, doing the Planmaking and Surprise-Anticipation exercise.
  • Experimented with GPQA questions, which are hard problems written by grad students in physics, chemistry and biology. (where, for example, a chemist major wouldn't be reliably get the answer to a physics or biology question in 30 minutes even with google). 
  • Eventually hashed out the "multi-hour, multi-domain confusing-problem test" as the benchmark to be shooting for.
  • Experimented with an exercise where people had to find a bug in a small codebase, without running the code.
  • Experimented with an exercise applying "OODA loops" to the game Patrick's Parabox.

Experimented with "make and compare plans, for real"

  • So far done with myself, Eli Tyre, and Robin Goins
  • This seems to depend a lot on where people are starting from
  • Involves:
    • figure out what your goals are
    • make at least 5 plans that can achieve those goals
    • reflect on the assumptions in each plan
    • try to do a fermi estimate on the value of each plan
    • iterate on the plans

Experimented with "prediction mindset"

  • I'm trying out "make lots of predictions about my project and thought processes." I think this might evolve into an important skill, although it's not there yet.
  • It was bottlenecked on: "it's hard to make predictions." It was high friction to open up Fatebook.io, it was hard to operationalize predictions that mattered, and it was hard to make predictions about my thought process without disrupting my thought process.
  • I made progress via:
    • Discovering the fatebook chrome extension which makes it much easier to jot quick predictions down in whatever program I'm in.
    • Establishing a TAP (trigger-action-plan) for "notice I just had an insight that feels 'promising'" -> "write down PROMISING" immediately, in my notes." I can come back to flesh out why it felt promising, and how to make predictions about it, later when I finish my thought process.
    • Experimenting with "write how a prediction felt rather than giving it a number"
  • Demonstration of Fatebook Chrome Extension. I notice I haven't yet made a prediction about 'Writing down PROMISING', so, let's do that now:

Think conceptually and learn about the field

  • Argued a bunch with Eli Tyre and Oliver Habryka about whether various versions of the project made sense. Notable points of confusion/disagreement were:
    • Exactly how worrisome are the warning-skulls from the psychometrics and educational literature?
    • How can we test that any of this is real, and applies in real life?
    • Do people have "traits" that aren't really mutable (other than raw g) which determine whether they can do certain types of cognitive work?
  • Poke around a bit in the literature myself
  • Hire someone to do a literature review on transfer learning and metalearning

What's Next?

I'm currently running at this project for another ~month. I'm hoping to end up with some kind of weeklong beta-test workshop at the end of it. 

After that, I'll take a break, evaluate whether this seems longterm promising, and figure out whether there is funding to do the scaled up version of this thing. (My ideal version of this involves hiring textbook authors from various fields, puzzle designers, expert tutors, etc).

A major crux will be "does this seem like something that people would actually pay enough money to pay the salaries of people developing the curriculum and implementing any coaching or workshops that follow?"

  1. ^

     See also: eliminating the the feeling of idea scarcity.

     

1.
^

 See also: eliminating the the feeling of idea scarcity.

 

New Comment


21 comments, sorted by Click to highlight new comments since:

One way of viewing planning is as an outer-loop on decision theory.

My approach to the general problem of planning skills was to start with decision theory and build up. In my Guild of the Rose Decision Theory courses was to spend time focusing on slowly building the most fundamental skills of decision theory. This included practicing manipulation of probabilities and utilities via decision trees, and practicing all these steps in a variety of both real and synthetic scenarios, to build an intuition regarding the nuances of how to set up decision problems on paper. The ultimate goal was to get the practitioners to the point where they usually don't need to draw up a decision tree on paper, but rather to leverage those intuitions to quickly solve decision problems mentally, and/or recognize when a decision problem is actually tricky enough to merit breaking out the spreadsheet or Guesstimate project.

In my experience, even long-time rationalists are so incredibly bad at basic decision theory that trying to skip the step of learning to correctly set up a basic decision tree might actually be counterproductive. So my inclination is to focus on really mastering this art before attempting planning.

Another way of viewing planning is that planning is search. 

For computationally bounded agents like us, search involves a natural tradeoff of breadth versus depth. Breadth is essentially idea generation, depth is idea selection and refinement. The tricky think about planning, in general, is that if 100x solutions exist, then those solutions are going to be found by spending the majority of the time on breadth-search, i.e. blue sky brainstorming for ways that the plan could look wildly different from the default approach, but that most situations don't admit 100x plans. Most things in life, especially in our technological civilization, are already sort of optimized, because there is some existing refined solution that has already accommodated the relevant tradeoffs. I could get to work faster if I flew there in a helicopter, but considering in costs, the Pareto optimum is still driving my car on the freeway. Most things look like this. Well-considered Pareto solutions to real-world problems tend to look boring!

Therefor, if you spend a lot of time looking for 100x solutions, you will waste a lot of time, because these solutions usually won't exist. Then, after failing to find a truly galaxy-brain solution, you will spend some amount of time refining the probably-already-obvious plan, realize that there are a lot of unknown-unknowns, and that the best way to get clarity on these is to just start working. Then you will realize that you would have been better off if you had just started working immediately and not bothered with "planning" at all, and you will either be Enlightened or depressed.

It gives me no pleasure to say this! Ten years ago I was all fired up on the idea that rationalists would Win and take over the world by finding these clever HPJEV-esque lateral thinking solutions. I have since realized that one creative rationalist is usually no match for tens of thousands of smart people exploring the manifold through natural breadth-first and then refining on the best solutions organically.

I am not actually completely blackpilled on the idea of scenario planning. Clearly there are situations for which scenario planning is appropriate. Massive capital allocations and long-term research programs might be two good examples. Even for these types of problems, it's worth remembering that the manifold probably only admits to marginal optimizations, not 100x optimizations, so you shouldn't spend too much time looking for them.

Both of these thoughts are pretty interesting, thanks.

I'd be interested in hearing a bunch more detail about how you trained decision theory and how that went. (naively this sounds like overkill to me, or "not intervening at the best level", but I'm quite interested in what sort of exercises you did and how people responded to them)

re: "how useful is planning", I do think this is specifically useful if you have deep, ambitious goals, without well established practices. (i.e. Rationality !== Winning in General).  

Lord grant me the strength to persevere when things are hard the courage to quit when things are impossible and the wisdom to know the difference.

I'm running a small rationality dojo to try to approach this issue from the rat-for-rat-sake direction in a few weeks, trying to incorporate the things I learned from my Seasons of Growth, my Executive Function research, and stuff like Logan's Naturalism sequence (not to mention years of teaching at rat camps and workshops). I plan to do a writeup after, but would also love to chat sometime about this, either before or after.

One of the things that helped a lot with the predictions part was reading Judea Pearl's Heuristics. It seemed to make me better at noticing that a big part of my problem solving was split into two things: my representation of the problem space, and then my traversal of that space. I would notice more readily when I had stuck myself with an intractably sized space for the traversal speed available, and conclude that I needed to switch to trying to find a different representation that was tractable. Others might get very different insights out of the book, the search-inference framework is pretty flexible (also covered in Baron's Thinking and Deciding).

can you give an example of a time you implemented that shift?

The cleanest example is during Ravens testing, noticing that checking a particular set of hypotheses one by one is taking too long. Zooming out and seeing them as a class of hypotheses, what they have in common, and then asking what else is possible. If the different moving parts of the puzzle are slot machines, then it's an explore exploit problem.

But it's somewhat broader. I think "could I 10x my plans?" can be useful frame even if you feel averse to "what's literally the most important problem I could focus on?".

 

Even more baby-step version: come up with two plans instead of one and choose between them. The second plan probably won't be 10x better, but count of two (2) is easier than 10x, and builds the necessary muscles of looking for alternatives and choosing.

Yeah something like this has already come up as a necessary stepping stone.

See also: ‘have a plan, at all’

My implicit model is something like "in addition to g factor, there'd turn out to be an 's factor' (i.e. "slow intelligence") that is a product of both "g" and "general reasoning skills." 

The old posts on mathematical talent by JonahS (1,2,3) seem maybe related to that? Although I took JonahS to be arguing that people like Grothendieck score highly in “ability to find / build really great mental models (albeit not necessarily quickly)”, which is neither g-factor nor skill-at-planning-and-pivoting, I think. I’m not sure though. I wish JonahS had written more.

This is less of "a plan" and more of "a model", but, something that's really weirded me out about the literature on IQ, transfer learning, etc, is that... it seems like it's just really hard to transfer learn. We've basically failed to increase g, and the "transfer learning demonstrations" I've heard of seemed pretty weaksauce.

But, all my common sense tells me that "general strategy" and "responding to novel information, and updating quickly" are learnable skills that should apply in a lot of domains.

I'm curious why you think this? Or if you have a place where you've explained why you think this at more length? Like my common sense just doesn't agree with this -- although I'll admit my common sense was probably different 5 years ago.

Overall a lot of the stuff here seems predicated on there being a very thick notion of non-domain specific "rationality" or "general strategy" that can be learned, that then after being learned speed you up in widely disparate domains. As in -- the whole effort is to find such a strategy. But there seems to be some (a lot? a little?) evidence that this just isn't that much of a thing, as you say.

I think current ML evidence backs this up. A Transformer is like a brain: when a Transformer is untrained, nearly literally the same architecture could learn to be a language model; to be an image diffusion model; to play Starcraft; etc etc. But once you've trained it, although it can learn very quickly in contexts to which it is adapted, it basically learns pretty poorly outside of these domains.

Similarly, human brains start of very plastic. You can learn to echolocate, or speak a dozen languages, or to ride a unicycle, or to solve IMO problems. And then brains specialize, and learn a lot of mostly domain-specific heuristics, that let them learn very quickly about the things that they already know. But they also learn to kinda suck elsewhere -- like, learning a dozen computer languages is mostly just going to not transfer to learning Chinese.

Like I don't think the distinction here I'm drawing is even well-articulated. And I could spend more time trying to articulate it -- there's probably some generality, maybe at the level of grit -- but the "learn domain-non-specific skills that will then speed up a particular domain" project seems to take a position that's sufficiently extreme that I'm like... ehhhh seems unlikely to succeed? (I'm in the middle of reading The Secret of Our Success fwiw, although it's my pre-existing slant for this position that has inclined me to read it.)

I think two main threads here:

  1. I think I just have tried to learn 'how to think on purpose', and have basically succeeded (like, somewhat, not necessarily amazingly, but enough to know there's a "there" there)
  2. Even in the world where skills don't transfer, some skills seem just useful in more places, or in "more useful places."

Re: 1

Most of the time, I'm not thinking strategically, I'm just doing some sort of pattern-matchy-find-the-nearest-reasonable-thing-to-do-and-then-do-it. My current guess is this is what most people (and, probably, ML algorithms?) are doing most of the time.

But, there's clusters of habits that seem pretty useful for solving novel problems, like asking:

  1. What is my goal here?
  2. what seem like the main inputs into that goal?
  3. what resources are available that compound?
  4. original seeing on the stimuli I'm looking at
  5. what skills are required here? what subskills make them up? what's the skill-tree?
  6. what would give me good feedbackloops for gaining those subskills, or, checking if I'm making progress towards my goal?

Each of those feel like "skills" to me, which I've practiced and cultivated, and once cultivated, can be chained into habits. 

Re: 2

If you learn to play piano, I'd expect some weak transfer into: hand-finger coordination, understanding chord progression / musical structure, etc. If you learn a couple different instruments you probably have an easier time picking up new instruments. This can pave the way towards... being really good at music, and maybe some related things.

If you learn arithmetic and algebra, you have a building block skill that applies to science, engineering, and business. These things seem more world-changing than music.

(I think music can be world changing, but I think the skill-tree there is more like 'songwriting' and 'connecting with a muse and speaking to the heart of people's souls', which I think is pretty different from piano playing)

Point #1 is sort of a subset of point #2: analyzing your goals, breaking things down into subgoals, breaking down skills into subskills, are all "skills" that I expect to generalize quite a lot in a lot of domains.

...

How much is this worth?

I do think a point you made that stands out is "well, there's only so much you can specialize. If you specialize at meta-skills, i.e. "specialize in being a generalist", does that trade off against being better specialist?

Probably.

I think it depends on how early you pick up the meta-skills – it seems like a travesty that children aren't taught these skills at like age ~10 so that they get to apply them sooner/faster to more domains. If you're 30ish (like me), I don't think it's that obvious, in all cases, that you should "level up at meta". I spent the last month learning "meta", and I could have been learning ML, or math proofs, or web design, and it would have been more immediately applicable.

(See: Rationality !== Winning)

The reason I think this is important is because I think "how do we safely create a superintelligence" (or, avoid doing so in a reliable/safe fashion), are very confusing questions. It isn't obvious if I'm (or others) are supposed to learn ML, or math proofs, or geopolitics. And meta-skills seem more necessary for figuring out how to navigate that, and what specialist skills to learn, and how to apply them. i.e. Specializing in Problems We Don't Understand.

(This does all have implications in what sort of ML training regimes I'd expect to produce a general mind, although I think that's, like, bad and you shouldn't do it. Also it does still look like ML is still bottlenecked more on something like 'g' than something like 's' at the moment).

So I agree with some of what you're saying along "There is such a thing as a generally useful algorithm" or "Some skills are more deep than others" but I'm dubious about some of the consequences I think that you think follow from them? Or maybe you don't think these consequences follow, idk, and I'm imagining a person? Let me try to clarify.

There's clusters of habits that seem pretty useful for solving novel problems

My expectation is that there are many skills / mental algorithms along these lines, such that you could truthfully say "Wow, people in diverse domains have found X mental algorithm useful for discovering new knowledge." But also I think it's probably true that the actually shared information between different domain-specific instances of "X mental algorithm" is going to be pretty small.

Like, take the skill of "breaking down skills into subskills, figuring out what subskills can be worked on, etc". I think there's probably some kind of of algorithm you can run cross-domain that does this kind of thing. But without domain-specific pruning heuristics, and like a ton of domain-specific details, I expect that this algorithm basically just spits back "Well, too many options" rather than anything useful.

So: I expect non-domain specific work put into sharpening up this algorithm to run into steeply diminishing returns, even if you can amortize the cost of sharpening up the algorithm across many different domains that would be benefitted. If you could write down a program that can help you find relevant subskills in some domain, about 95% of the program is going to be domain-specific rather than not domain specific, and there are something like only ~logarithmic returns to working on the domain-specific problem. (Not being precise, just an intuition)

Put alternately, I expect you could specify some kind of algorithm like this in a very short mental program, but when you're running the program most mental compute goes into finding domain-specific program details.


Let me just describe the way the world looks to me. Maybe we actually think the same thing?

-- If you look throughout the history of science, I think that most discoveries look less like "Discoverer had good meta-level principles that let them situate themselves in the right place to solve the issue" and more like "Discoverer happened to be interested in the right chunk of reality that let them figure out an important problem, but it was mostly luck in situating themselves or their skills in this place." I haven't read a ton of history of science, but yeah.

-- Concretely, my bet is that most (many?) scientific discoverers of important things were extremely wrong on other important things, or found their original discovery through something like luck. (And some very important discoveries (Transformers) weren't really identified as such at the time.)

-- Or, concretely, I think scientific progress overall probably hinges less on individual scientists having good meta-level principles, and more on like...whatever social phenomena is necessary to let individuals or groups of scientists run a distributed brute-force search. Extremely approximately.

-- So my belief is that so far we humans just haven't found any such principles like those you're seeking for. Or that a lack of such principles can screw over your group (if you eschew falsifiability to a certain degree you're fucked; if you ignore math you're fucked) but that you can ultimately mostly raise the floor rather than the ceiling through work on them. Like there is a lot of math out there, and different kinds are very useful for different things!

-- I would be super excited to find such meta-level principles, btw. I feel like I'm being relentlessly negative. So to be clear, it would be awesome to find substantive meta-level principles such that non-domain specific work on the meta-level principles could help people situate themselves and pursue work effectively in confusing domains. Like I'm talking about this because I am very much interested in the project. I just right now... don't think the world looks like they exist? It's just in that in the absence of seeing groups that seem to have such principles, nothing that I know about minds in general makes me think that such principles are likely.

Or maybe I'm just confused about what you're doing. Really uncertain about all the above.

I totally agree with how science normally works. I'm sitting here being like "whelp, doesn't seem like the way science normally works can solve the problems I care about in time."

It's a serious question on my end "can I raise the ceiling, or just the floor?" and "Does raising the floor matter?". Thinking about that led to me re-examining "can I actually help senior researchers?", and feeling like I had at least some traction on that, which output the "Help Senior Researchers with Targeted Problems", which indeed feels most important insofar as it's tractable.

My sense is that most senior researchers at least "know, and sometimes think about, all the meta-level principles I've thought about so far." But, they don't always keep them in their "context window". Some things I current expect (at least some) senior researchers to not being attending to enough:

  • not actually maximizing their working memory tools. 
  • not consistently steering towards the most hard-and-uncertain-but-important parts of their problem, so they can falsify early and move on to the next idea
    • relatedly: pursuing things that are shiny and nerdsnipy.
  • not attending much to "deliberately cultivate their meta-strategies", even in ways that just make sense to them. (My guess is often they'll have decent taste for what they should do more of, if prompted, but they don't prompt themselves to think about it as often as is optimal

Also, I think a bunch of them have various executive dysfunction stuff or health issues, which isn't what I'm currently focused on but seems important.

(note: I think "pursue things that are shiny/nerdsnipy" is an important motivational system that I'm not sure how to engage with, without breaking important things. But, my guess here is something similar to "if you want to marry into wealth, hang out around rich people and then marry for love". i.e. sink your attention into places where the shiny nerdsnipy problems are important, and then pick research directions based off excitement)

It'd be cool if a second group also worked towards "rationality skill assessment."

This was my project at last year's Epistea, but I sort of had to pause it to work full-time on my interp upskilling experiment.

I only got as far as implementing ~85% of an app to facilitate this (as described here), but maybe a quick chat about this would still be valuable?

something that's really weirded me out about the literature on IQ, transfer learning, etc, is that... it seems like it's just really hard to transfer learn. We've basically failed to increase g, and the "transfer learning demonstrations" I've heard of seemed pretty weaksauce.

You might be referring to the skeptical take on transfer learning, summarized as follows in Surfaces and Essences by Hofstadter & Sander:

Experimental studies have indeed demonstrated that subjects who are shown a source situation and who are then given a target situation are usually unable to see any connection between the two unless they share surface-level traits. Furthermore, in such experiments, when two situations have a superficial resemblance, then the second one invariably brings the first one to mind, no matter whether it is appropriate or not (that is, irrespective of whether there are deeper reasons to connect the two cases). For instance, if subjects first tackle an arithmetic problem concerning items bought in a store, then any other problem concerning purchases will instantly remind them of the initial problem. But if the theme of the first problem is experimentally manipulated say it becomes a visit to a doctor’s office instead of a store — then the participants will almost surely see no link between the two stories, even if the solution method for the first problem applies perfectly to the second problem.

But then the authors argue that this skeptical take is misleading:

Unfortunately, the source–target [experimental] paradigm [in the studies above] has a serious defect that undermines the generality of the conclusions that experiments based upon it produce. This defect stems from the fact that the knowledge acquired about the source situation during the twenty minutes or so of a typical experiment is perforce very limited — often consisting merely in the application of a completely unfamiliar formula to a word problem. By contrast, when in real life we are faced with a new situation and have to decide what to do, the source situations we retrieve spontaneously and effortlessly from our memories are, in general, extremely familiar. We all depend implicitly on knowledge deeply rooted in our experiences over a lifetime, and this knowledge, which has been confirmed and reconfirmed over and over again, has also been generalized over time, allowing it to be carried over fluidly to all sorts of new situations. It is very rare that, in real life, we rely on an analogy to a situation with which we are barely familiar at all. To put it more colorfully, when it comes to understanding novel situations, we reach out to our family and our friends rather than to the first random passerby. But in the source–target paradigm, experimental subjects are required to reach out to a random passerby—namely, the one that was imposed on them as a source situation by the experimenter.

And so, what do the results obtained in the framework of this paradigm really demonstrate? What they show is that when people learn something superficially, they wind up making superficial analogies to it.

To rephrase: The problem is that, in the experimental protocol, the subjects only ever wind up with a crappy surface-level understanding of the source situation, not a deep mental model of the source situation reflective of true familiarity / expertise. When people do have real comfort and familiarity with the source situation, then they find deep structural analogies all over the place.

For example (these are my examples), if you talk to an economist about some weird situation, they will easily notice that there’s a supply-and-demand way to look at it, and ditto gains-from-trade and so on. And physicists will analogize random things to superpositions and fourier-space and so on, etc. Of course, the main thing that everyone is an “expert” in is “intuitive everyday life stuff”, and hence our thinking and speech is full of constant non-surface-level analogies to traveling, seasons, ownership, arguments, etc. etc.

I’m not sure if this is relevant to what you were saying, just thought I’d share.

I was going off a vague sense from having talked to a few people who had scanned the literature more than I.

Right now I'm commissioning a lit review about "transfer learning", "meta learning", and things similar to that. My sense so far is that there aren't a lot of super impressive results, but part of that looks like it's because it's hard to teach people relevant stuff in a "laboratory"-esque setting.

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

They also attempt to generate principles to follow from, well, first principles, and see how many they correctly identify. 

Second principles?

========

I'm really glad to see you quoting Three Levels. Seems important.

If I'm building my own training and tests, there's always the risk of ending up "teaching to the test", even if unintentionally. I think it'd be cool if other people were working on "Holdout Questions From Holdout Domains", that I don't know anything about, so that it's possible to test if my programs actually output people who are better-than-baseline (controlling for IQ).


I am hoarding at least one or two fun facts that I have seen smart rationalists get wrong. Specifically, a claim was made, I ask, "huh, really?" they doubled down, and then later I go look it up and find out that they were significantly wrong. Unfortunately I think that if I had read the book first and started the conversation with it in mind, I might not have discovered that they were confidently incorrect. Likewise, I think it would be hard to replicate this in a test setting.