Designing Artificial Wisdom: The Wise Workflow Research Organization

Jordan Arel

This essay was a winner of the “Essay competition on the Automation of Wisdom and Philosophy.”

Introduction

In this post I will describe one possible design for Artificial Wisdom (AW.) This post can easily be read as a stand-alone piece, however it is also part of a series on artificial wisdom. In essence:

Artificial Wisdom refers to artificial intelligence systems which substantially increase wisdom in the world. Wisdom may be defined as "thinking/planning which is good at avoiding large-scale errors," or as “having good terminal goals and sub-goals.” By “strapping” wisdom to AI via AW as AI takes off, we may be able to generate enormous quantities of wisdom which could help us navigate Transformative AI and The Most Important Century wisely.

TL;DR

Even simple workflows can greatly enhance the performance of LLM’s, so artificially wise workflows seem like a promising candidate for greatly increasing AW.

This piece outlines the idea of introducing workflows into a research organization which works on various topics related to AI Safety, existential risk & existential security, longtermism, and artificial wisdom. Such an organization could make progressing the field of artificial wisdom one of their primary goals, and as workflows become more powerful they could automate an increasing fraction of work within the organization.

Essentially, the research organization, whose goal is to increase human wisdom around existential risk, acts as scaffolding on which to bootstrap artificial wisdom.

Such a system would be unusually interpretable since all reasoning is done in natural language except that of the base model. When the organization develops improved ideas about existential security factors and projects to achieve these factors, they could themselves incubate these projects, or pass them on to incubators to make sure the wisdom does not go to waste.

Artificial Wisdom Workflows

One highly promising possibility for designing AW in the near future is using workflows. As noted by Andrew Ng, even simple workflows can have surprisingly powerful effects, such as enabling GPT-3.5 to far outperform GPT-4 in coding tasks, and advanced workflows may be able to boost AW even more.

Workflows can be created which mimic or augment various human thought processes, and then do so in an automated, repeatable way which make certain elements of thinking highly more efficient and scalable. Many such workflows would need to be created and tested in order to find ones that reliably, significantly improve human wisdom. There are several ways this could be achieved.

For example, a research organization which works on various topics related to AI Safety, existential risk and existential security, longtermism, and artificial wisdom could make one of their primary goals progressing the field of AW by creating workflows to automate research.

While at first most of the research would be done by humans, various workflows could be designed to mimic certain thinking processes and automate strategically chosen parts of the research process. This could include various things such as generating promising research agendas, hypothesis generation, checking thought processes for common biases, generating pros and cons lists for various options, ranking the promising-ness of various ideas, trying out various combinations of ideas and strategies, etc. This is easiest to demonstrate with an example:

Workflow Example

Thousands of different ideas for research could be generated in parallel, using some source of structured, semi-random input as analogies to inspire creativity on the research topic of interest, such as
1. Top 1,000 Wikipedia pages
2. Top 1,000 ideas in philosophy or politics
3. Top 1,000 most important technologies, etc.
4. Or pre-existing public data, for example various posts on the effective altruism forum (I believe public use license?) related to existential risk or longtermism could be "jumping-off points" to build research agendas around
Any ideas that are intuited as obviously not promising could be dropped
Each idea could go through several iterations of attempting to improve on the idea in various ways and create robust research agendas based on each idea.
Each research agenda would be evaluated in various ways such as pros and cons, cost-effectiveness analysis, theory of change, etc.
A detailed report could be created on each research agenda
Reports and the processes so far could be checked for various biases and errors— essentially “natural language error correcting codes” (this could potentially include a large number of rationality & philosophical epistemic rules; these could also be applied at targeted error-prone spots)
All of the research agendas could be paired up in parallel and compared and contrasted in various ways in a research agenda tournament, so that the top hundred research agendas are rapidly converged on
Research agendas could be combinatorially mixed and matched to see if any of them synergize with each other
Deeper dives into the most promising research agendas, etc.
Humans evaluate the results

It is important that the research organization should start with highly competent researchers doing good research; automating subpar thought processes and subpar research likely will not be helpful. One possibility is that AW workflows could start out as a subproject within an existing successful research organization. Researchers will try out various workflows to find ones which successfully automate important thought processes.

At first progress may be slow, but as a database of successful research automations are built up, and as AW workflows become more powerful, more efficient, and more helpful at performing research, the research organization can gradually transition from one which primarily has humans performing research and uses AW for select purposes, to an organization which is primarily designing workflows and instructing workflows to perform the research; the research organization, whose goal is to increase human wisdom, acts as scaffolding on which to bootstrap artificial wisdom.

Due to the imperfect performance and inability to correct mistakes of current LLM’s, it would likely be necessary to implement various natural language error correcting codes and redundancies, and to break thought processes down into their most basic constituent elements so that it is extremely difficult to mess them up at any step in the process. It may also be necessary to design processes to artificially induce creativity and divergent thinking, as LLM’s often converge on the most obvious possible answers; yet they are very good at working with analogies and following creative direction.

The good news is that every time a thought process is improved and a solution is found to a workflow problem, that improvement is permanent and can be used across all relevant future workflows.

Furthermore, after several successful workflows are designed, they can be chained together and recombined to make even more effective workflows, and they can be scaled up to arbitrary size when high performance is needed, or parallelized when speed is needed.

It would be important for the AW system to somehow keep in mind the worldview and preferences of the research organization. While to some degree this could be implicit in the workflows, perhaps such information could also be kept in the context window or somehow otherwise stored in a memory component of the base model. If custom LLM’s are used, they could be pre-trained with relevant information or be fine-tuned to the needs of the research organization. As the organization does research and evolves its worldviews, it will also be important to have a process for continuing to integrate new ideas, as well as new information about the state of the world.

One significant advantage of this approach to AW is that it is unusually interpretable. Because all processing is done in natural language, the entire workflow research process can be observed and analyzed to see how the workflow came to the result, with only the workings of the base model at each step not being directly observable.

Another great thing about such workflows is that as the underlying LLM base models improve, the same workflows will also become much more powerful, and while at first there may only be a narrow set of workflows that effectively improve research, as models improve, workflows will be able to effectively automate an increasingly large share of wisdom-increasing research.

Meta-Workflows

Another possibility is to create meta-workflows which automate the process of choosing between and combining workflows. This would help make it so researchers don’t have to spend an inordinate amount of time understanding what workflows are effective in what situations and how to manually recombine the hundreds or thousands of workflows in the workflow database in order to answer multi-step research challenges.

For example, when the researcher prompts the AW system, the meta-workflow will ask itself which of the following is true - Is the prompt:

Trying to get me to answer a simple question? If so, implement process A

Trying to get me to come up with a strategy? If so, implement process B

Trying to get me to answer a generate options? If so, implement process C

Etc.

Depending on the results of the first workflow it initiates, the meta-workflow could then initiate further workflows so that workflows chain and branch in optimally intelligent ways, keeping the human in the loop insofar as this is necessary and more efficient/effective.

Again, at first the meta-workflow process will be completely human operated, but over time increasingly powerful automated meta-workflows could be designed so that the AW system automatically determines the optimal workflow process for finding the answer to any question, and eventually does so even more effectively and efficiently than humans are able to.

Metrics and Workflow Optimization

Out of curiosity and in the spirit of "meta-" I asked Claude 3.5 Sonnet if it had any ideas on how to improve this, and one idea it came up with sounded quite good so I asked it to expand on that idea. The answer reminded me very much of something I would come up with myself, and was good enough that it seems about as good for me to simply share the answer as it would be for me to try to edit and plagiarize it - with the one caveat that at first, again, most of this will be done manually by researchers and engineers, and only later could it be fully automated - so here is how Claude 3.5 Sonnet would improve this idea:

“Certainly, I'd be happy to elaborate on the first suggestion regarding feedback loops and iterative improvement for the artificial wisdom (AW) workflows.

The core idea here is to create a system that can learn from its own performance and continuously improve itself. Here's a more detailed breakdown of how this could work:

Performance Metrics:
- Define a set of key performance indicators (KPIs) for the workflows. These could include:
  - Accuracy of predictions or conclusions
  - Novelty of insights generated
  - Speed of processing
  - Resource efficiency
  - Relevance of outputs to the given task
  - Consistency of results across multiple runs
Data Collection:
- Implement logging mechanisms to collect data on these metrics for each run of a workflow.
- This could include both quantitative data (e.g., processing time, number of ideas generated) and qualitative assessments (e.g., expert ratings of insight quality).
Analysis Module:
- Develop an analysis component that can process the collected data and identify patterns, trends, and areas for improvement.
- This module could use machine learning techniques to detect correlations between workflow configurations and performance outcomes.
Automated Optimization:
- Based on the analysis, create an optimization module that can suggest or automatically implement changes to the workflows.
- This could involve adjusting parameters, reorganizing workflow steps, or even generating entirely new workflow components.
A/B Testing:
- Implement a system for running controlled experiments, where slightly modified versions of workflows are tested against each other to determine which performs better.
Version Control:
- Maintain a versioning system for workflows, allowing for easy rollback if changes don't yield improvements and providing a historical record of how the system has evolved.
Human Oversight:
- While much of this process can be automated, incorporate checkpoints for human review and approval of significant changes.
- This ensures that the system doesn't optimize for the wrong things or develop unintended behaviors.
Meta-Learning:
- As the system accumulates data on what kinds of changes tend to yield improvements, it can start to develop higher-level strategies for workflow optimization.
- This could lead to increasingly sophisticated and effective self-improvement over time.
Cross-Workflow Learning:
- Insights gained from improving one workflow could be applied to others, allowing for system-wide enhancements.
Adaptive Complexity:
- The system could learn to adjust its own complexity based on the task at hand, becoming more intricate for complex problems and streamlining itself for simpler tasks.”

One other point I would like to add to this is that I think it is very important that the metrics used on workflows should be ones which actually attempt to measure wisdom. This might be very difficult to measure, but one possible measure is noticing large scale errors that might have been made if an insight wasn't uncovered through an AW workflow, and attempting to rate how bad that error might have been in expectation including how likely it would have been avoided; or ways in which certain terminal goals or sub-goals had been suboptimal, and the expected value of how much those goals were improved through workflows.

Transition To Incubator

At some point, such a research organization could move from just researching ideas to actually implementing or spinning out projects for increasing existential security and improving the long-term future.

Once the organization decided it had sufficiently developed a map of the territory and the most important cruxes and crucial considerations, so that it was able to predict important existential security factors and projects to support these factors with sufficiently high confidence, it could then spin out those projects as an incubator or pass the projects on to a separate incubator organization (see “Charity Entrepreneurship” for an example of this type of hybrid research/incubator organization, and also Rethink Priorities which transitioned from research only to also supporting/spinning out projects.)

This way, the most important ideas developed by the research organization would be sure to be put to good use and the wisdom would not go to waste. Thus the philosopher would be a king, so to speak, rather than confined to an armchair.

Other AW Designs

One more thing I really love about this idea is that anyone can start working on it right away. Even an early-stage independent researcher could start designing workflows to aid their research and set up a Google doc to collaboratively work on with other researchers.

Coincidentally, the above model is just one way workflows could be used to create artificial wisdom. For example this could occur in a more decentralized way. My next post in the series, on how to create artificially wise human coaches, will include one basic design for doing so in a more decentralized GitHub-style fashion - or if you want to learn more about AW, see a full list of posts at the Artificial Wisdom series page.

LESSWRONG
LW