In this post we put all of the pieces we've discussed so far together into an network of epistemologies, where each node is a system that is able to reliably model the world despite differences in ontology, epistemology, and incentives.
The Epistemic Computational Graph
Trust relationships define a directed graph, describing how attestations by one system can influence the beliefs of another. Epistemology is also a lot of work, and it would be nice to be able to reuse all of the computational work that went into turning sensor data into a useful world model.
These considerations point to a computational graph, with attestations as nodes and computational attestations in particular as edges. The anchors of these graphs are attestations like "this data was received on this sensor channel" and "these are the axioms that I assume to be true." One part of the graph is generally concerned with building up counterfactual claims that can be independently verified. Another part of the graph is concerned with logical and computational claims. For their own sake, and for how they can be applied to solve important problems. Scientific and mathematical knowledge, in formats that are designed to be easy for computers to analyze and apply.
Where there is noticeable controversy, bets on the results of experiments can be used as a credible signal of belief, as well as a source of funding for those very experiments. A track record of consistently being the best predictor of experimental outcomes, and a lack of any competing claims attracting noticeable bets on their predictive accuracy, is a credible signal that a given counterfactual claim is a good model of how part of reality works.
It might not always be possible to follow links backwards to the actual sensor data measured during an experiment, for example because the data is private. But a proof can still accompany a counterfactual claim, attesting that it is is the result of applying a particular open-source algorithm to a private data set, along with a cryptographic commitment to which data set was analyzed. An independent auditor might separately sign an NDA and look at the data set, and attest that it appears legitimate and matches the published commitment. Scientific induction is the process of aggregating a bunch of factual claims that look like "I performed this experiment and this was the result" into a counterfactual claim "if you perform this experiment, this is the distribution over results I expect."
For applications outside of math and science, we may want to define trust criteria and use insights from those fields as modules in our own systems. An electrical engineer doesn't need to concern themselves with the history of exactly how we came to know that copper is a conductor and silicon is a semiconductor. They just need a counterfactual model of things like "if I apply this much voltage, how much current will flow?"
The outputs of one system can also be reliably plugged in as the input to another using this sort of epistemology. Redundant computational work is easy to notice in a computational graph, and two systems can straightforwardly determine that they care about the results of the same computation. They don't even need to reveal their individual computational graphs, if those are private; the overlap can be found without revealing any non-overlapping parts. So long as they both have permission to access the relevant information, they can distribute the computational work in a way that assures both that they have received the correct output.
A team of robots navigating the same space, for example, might collaboratively build up a map of their environment using attestations rather than each processing the sensor data from the entire team. Even if they were made for different purposes, by different manufacturers, using different representations of the world. All that's needed are efficient ways of converting attestations in one ontology into model-adjustments in another. And trust that the attestations accurately reflect what the world is really like.
Distributed Strategic Epistemology
Tying it back to game theory, this should also be possible even when system designers have different ways they would like the world to be. The space of policies that a system can implement is exponentially large, and the space of joint policies that a collection of systems can implement adds another exponent on top of that. Even if we somehow had a way of identifying good and safe joint policies, if pointed out from among a superexponentially large space of options, it still seems like it will take a lot of computational work to find good solutions. Even assuming a lot of intelligence being directed at making the problem as easy as possible. Still, we can avoid stacking yet another exponent on top of the total amount of work required by not making each policy search system duplicate the same reasoning.
The resulting models in each systems' heads may not all end up the same. The systems might not trust each other. They might have private information that they don't want to share. They might have capabilities they don't want others to know about. But building up common knowledge of the strategic landscape is still extremely helpful for reasoning about what systems should do. It prevents us from having to do hypergame theory and add yet another exponent on top of our already superexponential tower to keep track of all the different models that all the different systems might have for what is ultimately the same strategic landscape.
The vision of a team of scientist-explorer robots, collaboratively mapping the physical and counterfactual structure of their environment, is very appealing to me. That vision inspired the rest of my thinking about how to robustly implement such a system when those robots use different ontologies or epistemologies, or when their designers have incentives that aren't fully aligned with each other. And I'm optimistic that we can continue the trend of building systems that favor openness, honesty, and collaboration towards our common goals.
In this post we put all of the pieces we've discussed so far together into an network of epistemologies, where each node is a system that is able to reliably model the world despite differences in ontology, epistemology, and incentives.
The Epistemic Computational Graph
Trust relationships define a directed graph, describing how attestations by one system can influence the beliefs of another. Epistemology is also a lot of work, and it would be nice to be able to reuse all of the computational work that went into turning sensor data into a useful world model.
These considerations point to a computational graph, with attestations as nodes and computational attestations in particular as edges. The anchors of these graphs are attestations like "this data was received on this sensor channel" and "these are the axioms that I assume to be true." One part of the graph is generally concerned with building up counterfactual claims that can be independently verified. Another part of the graph is concerned with logical and computational claims. For their own sake, and for how they can be applied to solve important problems. Scientific and mathematical knowledge, in formats that are designed to be easy for computers to analyze and apply.
Where there is noticeable controversy, bets on the results of experiments can be used as a credible signal of belief, as well as a source of funding for those very experiments. A track record of consistently being the best predictor of experimental outcomes, and a lack of any competing claims attracting noticeable bets on their predictive accuracy, is a credible signal that a given counterfactual claim is a good model of how part of reality works.
It might not always be possible to follow links backwards to the actual sensor data measured during an experiment, for example because the data is private. But a proof can still accompany a counterfactual claim, attesting that it is is the result of applying a particular open-source algorithm to a private data set, along with a cryptographic commitment to which data set was analyzed. An independent auditor might separately sign an NDA and look at the data set, and attest that it appears legitimate and matches the published commitment. Scientific induction is the process of aggregating a bunch of factual claims that look like "I performed this experiment and this was the result" into a counterfactual claim "if you perform this experiment, this is the distribution over results I expect."
For applications outside of math and science, we may want to define trust criteria and use insights from those fields as modules in our own systems. An electrical engineer doesn't need to concern themselves with the history of exactly how we came to know that copper is a conductor and silicon is a semiconductor. They just need a counterfactual model of things like "if I apply this much voltage, how much current will flow?"
The outputs of one system can also be reliably plugged in as the input to another using this sort of epistemology. Redundant computational work is easy to notice in a computational graph, and two systems can straightforwardly determine that they care about the results of the same computation. They don't even need to reveal their individual computational graphs, if those are private; the overlap can be found without revealing any non-overlapping parts. So long as they both have permission to access the relevant information, they can distribute the computational work in a way that assures both that they have received the correct output.
A team of robots navigating the same space, for example, might collaboratively build up a map of their environment using attestations rather than each processing the sensor data from the entire team. Even if they were made for different purposes, by different manufacturers, using different representations of the world. All that's needed are efficient ways of converting attestations in one ontology into model-adjustments in another. And trust that the attestations accurately reflect what the world is really like.
Distributed Strategic Epistemology
Tying it back to game theory, this should also be possible even when system designers have different ways they would like the world to be. The space of policies that a system can implement is exponentially large, and the space of joint policies that a collection of systems can implement adds another exponent on top of that. Even if we somehow had a way of identifying good and safe joint policies, if pointed out from among a superexponentially large space of options, it still seems like it will take a lot of computational work to find good solutions. Even assuming a lot of intelligence being directed at making the problem as easy as possible. Still, we can avoid stacking yet another exponent on top of the total amount of work required by not making each policy search system duplicate the same reasoning.
The resulting models in each systems' heads may not all end up the same. The systems might not trust each other. They might have private information that they don't want to share. They might have capabilities they don't want others to know about. But building up common knowledge of the strategic landscape is still extremely helpful for reasoning about what systems should do. It prevents us from having to do hypergame theory and add yet another exponent on top of our already superexponential tower to keep track of all the different models that all the different systems might have for what is ultimately the same strategic landscape.
The vision of a team of scientist-explorer robots, collaboratively mapping the physical and counterfactual structure of their environment, is very appealing to me. That vision inspired the rest of my thinking about how to robustly implement such a system when those robots use different ontologies or epistemologies, or when their designers have incentives that aren't fully aligned with each other. And I'm optimistic that we can continue the trend of building systems that favor openness, honesty, and collaboration towards our common goals.