A concrete research direction in the "Searching for Search" field is to find out whether ChessGPTs or the Leela chess 0 network are searching. Your "Babble and prune" description seems checkable: maybe something like linear probe for continuation board states in the residual stream, and see if bad continuations are easier to find in earlier layers? Thank you for this writeup.
Aren't LLMs already capable of two very different kinds of search? Firstly, their whole deal is predicting the next token - which is a kind of search. They're evaluation all the tokens at every step, and in the end choose the most probable seeming one. Secondly, across-token search when prompted accordingly. Say "Please come up with 10 options for X, then rate them all according to Y, and select the best option" is something that current LLMs can perform very reliably - whether or not "within token search" exists as well. But then again, one might of course argue that search happening within a single forward pass, and maybe even a type of search that "emerged " via SGD rather than being hard baked into the architecture, would be particularly interesting/important/dangerous. We just shouldn't make the mistake of assuming that this would be the only type of search that's relevant.
I think across-token search via prompting already has the potential to lead to the AGI like problems that we associate with mesa optimizers. Evidently the technology is not quite there yet because PoCs like AutoGPT basically don't quite work, so far. But conditional on AGI being developed in the next few years, it would seem very likely to me that this kind of search would be the one that enables it, rather than some hidden "O(1)" search deeply within the network itself.
Edit: I should of course add a "thanks for the post" and mention that I enjoyed reading it, and it made some very useful points!
I'd take an agnostic view on whether LLMs are doing search internally. Crucially, though, I think the relevant output to be searching over is distributions of tokens, rather than the actual token that gets chosen. Search is not required to generate a single distribution over next tokens.
I agree that external search via scaffolding can also be done, and would be much easier to identify, but without understanding the internals it's hard to know how powerful the search process will be.
Thanks to Leo Gao, Nicholas Dupuis, Paul Colognese, Janus, and Andrei Alexandru for their thoughts.
This post was mostly written in 2022, and pulled out of my drafts after recent conversations on the topic. My main update from then is the framing. Rather that suggesting searching for substeps of search as an approach I'm excited about, I now see it only as a way to potentially reduce inherent difficulties. The main takeaway of this post should be that searching for search seems conceptually fraught to the point that it may not be worth pursuing.
Searching for Search is the research direction that looks into how neural networks implement search algorithms to determine an action. The hope is that if we can find the search process, we can then determine which goal motivates it, which may otherwise be a much more difficult task. We may even be able to retarget the search, specifying a new goal while keeping the capabilities of the model intact. Conclusively showing that search is not happening would also be useful, as it would imply a lack of generalization ability that could make deception less likely. However, search can take many forms, and we do not have a widely accepted definition of search that neatly captures all the variants. Searching for search may fail either by using too narrow a definition and missing the specific type of search that a model actually implements, or by using too wide of definition to check for in practice.
This post analyzes the components of several algorithms for search, looking at commonalities and differences, and aims to provide a relatively tight definition of search that can guide the process of searching for search. Unfortunately, the definition arrived at here is so broad that it does not lead to useful clarification or application. Furthermore, many possible algorithms that fall under search do not suggest an overarching goal for the model. Due to these issues, I instead suggest searching for substeps of search. Gathering evidence on how these substeps are compiled throughout the training process could shed light both on how to search for search and how agency and goals develop.
Search vs. Heuristics
What alternative is there to implementing a search process? Abram Demski splits the class of optimizers into control and selection (which includes search). Selection processes can instantiate and evaluate arbitrary elements of the search space (though doing so may be costly) and then choose the best option, while control processes only instantiate one element and may not even evaluate it. Less formally, a control process can be thought of as a set of heuristics. A standard example of one would be a thermostat, which raises or lowers the temperature according to a set rule depending on the temperature it detects.
The distinction between selection and control is not always clear though, and Abram describes it as more of a continuum. A search process may only be able to generate a subset of possible options for evaluation and may not be able to evaluate them perfectly, instead relying on heuristics to approximate it . A controller’s heuristics to choose an option may instead be thought of as a limited search space, such as a thermostat searching over the options “up” and “down”. If search is important to identify, then so is a control process that imitates search.
Blurring the lines further is the fact that many search processes cannot be cleanly implemented in a neural net. The limited number of steps in a forward pass means that search processes with an unbounded duration must be approximated instead. Rather than implementing a search process directly, a neural network must approximate the search process using a series of heuristics. This means that selection and control are both variations on a pile of heuristics, suggesting that instead of trying to identify holistic search processes in neural networks, it could be more feasible to split it into the heuristics that make up the substeps of search and look for those instead. Doing so also opens the possibility of studying how agency arises through the training process, as heuristics stack up into better and better approximations of search.
The question of whether current cutting edge models are closer to the search or heuristics end of the spectrum remains open to debate, although current evidence at least suggests that models are not doing search particularly well. Search is particularly concerning because of its ability to generalize and find extreme policies beyond what it demonstrated on the training data. If a model were found to be doing a general search would serve as a warning shot, highlighting that the risk from learned optimization is real, as well as provide a model organism on which to run further tests.
Searching Over What?
The output of a search process is a choice of action (or sequence of actions, possibly conditional on how earlier actions resolve), but searching over actions is only one possible implementation of search. In addition to searching for actions that will lead to good outcomes, a process could search for good outcomes and then work backwards to determine the actions that lead to them. Searching over intermediate points, such as chokepoints of a maze, is also a possibility, as is searching over properties of options (or equivalently groups of options that share a property).
For complicated environments, search will often be over all three of actions, intermediate points, and final outcomes. They may be considered as individual options or in groups. At each step of the search process, groups of options (possibly across categories) can be jointly considered and used to determine future steps of the process. Importantly for interpretability, this means that the elements of the space being searched over are not necessarily directly comparable to each other.
Possible Actions Involved in Search
Let us consider three methods of search: babble and prune, stochastic gradient descent, and binary search. In babble and prune, options are randomly generated then evaluated, with the best being used to focus the generation of the next iteration. Stochastic gradient descent involves instantiating a single option, then modifying it with each iteration according to a set procedure until a local minimum is reached. Binary search starts with the entire space of options, and then cuts away half of them every iteration.
These three methods are far from exhaustive, but are illustrative of the wide range of possible approaches to search. The sets of starting points are different, the sets under consideration at intermediate steps are different, and the updating process is different. To gain some clarity on what the space of possible search processes looks like, possible substeps that can be taken in each iteration are listed below (though the list is not exhaustive):
Generating - Coming up with new options
Eliminating - Removing options from consideration
Evaluating - Assigning scores to options under consideration
Comparing - Ranking options without explicitly assigning a score, not necessarily in a globally coherent way
Modifying - Changing the properties of some options
Merging - Replacing a set of options with a smaller set such that all the properties of the new set were represented in the original set
Splitting - Replacing a set of options with a larger set that could be merged to make the original set
Thinking - Maintaining the same set of options while changing how the input is being considered
Rearranging - Maintaining the same set of options but changing the way they’re grouped or ordered
Selecting - Restricting the further steps to a subset of options, including choosing the final outcome of the process
It is important to note that some of these steps can be equivalent to others, so it will not always be clear which substeps are being implemented. For example, merging two options in one step would be equivalent to modifying one and eliminating the other. This can make it difficult to determine how a search process works conceptually and how it will generalize, even if the implementation in a particular case is understood.
Defining Search and Implied Problems
In summary, search can be over actions, intermediate points, or outcomes. There can be any number of options for each (including infinite continuums) under consideration at any time. These options can be considered in groups or individually. There are various processes that can be used to update the options under consideration.
A comprehensive definition of search would then be something like “A process which uses some input and/or background assumptions to generate initial sets of options for actions, outcomes, and/or intermediate points, then iterates by using those sets along with the input and/or background knowledge to generate new sets, until terminating with the choice of a single action”.
I believe this is a broad enough definition to capture all processes that could reasonably be defined as search (though please let me know if any exceptions come to mind). The only problem with it is that it’s useless. It’s far too broad to provide actionable guidance on how to identify search in a neural net. It may be possible to tweak the definition to provide tighter bounds, but I am not too hopeful this can make a meaningful difference. Just the three examples discussed above are sufficiently different to make a certain amount of wideness in the definition necessary.
That leaves us with deliberately looking for an incomplete definition. One way to do this would be to determine some specific implementation of search and look for that, rather than trying to catch any possible implementation. This seems like it’s at least worth a shot, however it fails in the scenario that the approximation of search that a neural net implements doesn’t match up nicely with any search process that seems reasonable to humans.
It also raises the question of whether it is worth searching for search at all. If we expect that most implementations of search will not involve a consistent objective at all, then even if we find search we may well not get insight into the goal of the model or the ability to retarget it. There would still be upside to finding out how the model is doing search, but the cost-benefit analysis of whether it’s worth pursuing is different.
Alternatives to Searching for Search
Rather than trying to find an entire cohesive search process, looking for smaller components of it may be easier. This could either be representations of options under consideration, or substeps of the search process.
Identifying the options under consideration at each step given a particular input would fall short of the understanding necessary to identify the goals of a search process or predict how it will generalize in new environments, but would still provide useful information. Seeing how the options considered evolves throughout the process could significantly narrow down the possible search processes being used, and guide further searching for search. In addition, it may be possible to identify options associated with deception or seizing power, in which case finding them would provide a warning shot.
Searching for some of the individual substeps in the search process could also be more tractable. Substeps such as “evaluating” or “modifying” have tighter definitions than general search, but finding these would still provide strong evidence that search was occurring. The complexity of search comes from all the possible combinations of what is hopefully a relatively small number of substeps with more actionable definitions. By focusing on only one type of substep, the dimensionality is reduced and the problem can become more tractable.
Identifying one substep further provides a base from which to investigate other substeps and how they combine with each other. It is possible that this could cascade to an exhaustive understanding of all substeps being implemented, and therefore the holistic search process. Being able to identify substeps in the training process and knowing how they can combine could also identify the point at which a model starts doing search or provide a warning that the capability is close.
If models are implementing a search process with sequence of heuristics for substeps of search, then searching for search is both more difficult and less rewarding than is commonly thought. However, if it is deemed to still be worth pursuing as a research direction, then identifying substeps of search, searching for those, and using them to reconstruct the search process may be a more promising approach than searching for a holistic search.