I like the problem this post poses. I think the "policy approach" is totally the wrong direction.
As I see it, the most-important-in-practice reason that the "epistemic approach" fails is that it treats a worldview as a single monolithic black box model of the whole world. As a general rule, worldviews in practice are "local" in the sense that they focus on a few particular constraints on how-the-world-works, and mostly don't make strong predictions about everything else. You can see this a lot in e.g. last year's MIRI discussions: a very common pattern is "my model strongly predicts X, but does not strongly predict anything about Y". That locality is the main key to integrating multiple worldviews: in most places, either neither model makes a strong claim, or only one model makes a strong claim. The places where the two conflict tend to be rare (more precisely, out-of-distribution on today's world), and require thinking about generalization properties of the underlying constraints on which each worldview focuses.
The "policy approach" goes in basically the opposite direction - rather than opening the black box, it treats the black box as even more impenetrable. It doesn't even try to probe the internal epistemic gears producing policy proposals.
(And this complaint carries over to the analogous alignment strategies and models of cognition mentioned at the end of the post.)
As a general rule, worldviews in practice are "local" in the sense that they focus on a few particular constraints on how-the-world-works, and mostly don't make strong predictions about everything else. ... That locality is the main key to integrating multiple worldviews
Interesting point. I think there are some models of the world for which this is true - e.g. models developed within specific academic disciplines. So you can merge economics with biology pretty well just by combining them locally. But I'm mainly focusing on worldviews which are broader, such that even if they don't make strong predictions about a given area, they still have background beliefs which clash strongly with other worldviews. E.g. an environmentalist meets a neoliberal: the neoliberal knows few details about the environment, the environmentalist knows few details about the economy, but each is confident that the actual details of reality are going to vindicate their own high-level worldview.
(In the MIRI discussions, this tends to look like Eliezer saying "my model doesn't have strong predictions about X or Y or Z, except that you can't find settings of all of these variables which solve the alignment problem". It's not that he's making predictions in different places, it's that he's making predictions at a different level of abstraction.)
the most-important-in-practice reason that the "epistemic approach" fails is that it treats a worldview as a single monolithic black box model of the whole world
So I totally agree that breaking down the black box is going to work better, when you can do it. The question is: given that you're strongly constrained on breaking-down-black-boxes, where should you spend that effort?
So, part of my argument here is that "limited in breaking down the black boxes" is the wrong way to view the limitation. It's limited attention, time, etc. That doesn't necessarily translate into a limitation in breaking down black boxes, especially if you have some general knowledge about how to usefully break down the relevant black boxes.
And that's where the "a few constraints" part comes in. Like, when Eliezer says "my model doesn't have strong predictions about X or Y or Z, except that you can't find settings of all of these variables which solve the alignment problem", that's actually a fairly simple constraint on its own. It's simpler and more useful than a black-box of Eliezer's predictions or policy suggestions. That's the sort of thing we want to extract.
when Eliezer says "my model doesn't have strong predictions about X or Y or Z, except that you can't find settings of all of these variables which solve the alignment problem", that's actually a fairly simple constraint on its own. It's simpler and more useful than a black-box of Eliezer's predictions or policy suggestions. That's the sort of thing we want to extract.
But when Paul (or most other alignment researchers) say "in fact you can find settings of those variables which solve the alignment problem", now we've got another high-level claim about the world which is inconsistent with the first one. So if your strategy for operating multiple worldviews is to build a new worldview by combining claims like these, then you'll hit a contradiction pretty quickly; and in this case it's highly nontrivial to figure out how to resolve that contradiction, or produce a sensible policy from a starting set of contradictory beliefs. Whereas if you calculate the separate policies first, it may well be the case that they're almost entirely consistent with each other.
(As an aside: I once heard someone give a definition of "rationalists" as "people who try to form a single coherent worldview". Feels relevant here.)
Let's walk through an example a bit more. Eliezer says something like "my model doesn't have strong predictions about X or Y, except that you can't have both X and Y at the same time, and that's what you'd need in order for alignment to be easy.". Then e.g. Rohin comes along and says something like "my model is that Z generally solves most problems most of the time, therefore it will also solve alignment". And clearly these make incompatible predictions in the case of alignment specifically. But they're both models which say lots of stuff about lots of things other than alignment. Other than alignment, the models mostly make predictions about different things, so it's not easy to directly test them against each other.
The actual strategy I want here is to take both of those constraints and say "here's one constraint, here's another constraint, both of them seem to hold across a pretty broad range of situations but they're making opposite predictions about alignment difficulty, so one of them must not generalize to alignment for some reason". And if you don't currently understand which situations the two constraints do and do not generalize to, or where the loopholes are, then that's the correct epistemic state. It is correct to say "yup, there's two constraints here which both make lots of correct predictions in mostly-different places and one of them must be wrong in this case but I don't know which".
... and this is totally normal! Like, of course people have lots of different heuristics and model-fragments which mostly don't overlap but do occasionally contradict each other. That's fine, that's a very ordinary epistemic state for a human, we have lots of experience with such epistemic states. Humans have lots of intuitive practice with things like "agonize about which of those conflicting heuristics we trust more in this situation" or "look for a policy which satisfies both of these conflicting model-fragments" or "consider the loopholes in each of these heuristics; does one of them have loopholes more likely to apply to this problem?".
Of course we still try to make our worldview more coherent over time; a conflict is evidence that something in there is wrong. But throwing away information - whether by abandoning one constraint wholesale, or by black-boxing things in various ways - is not the way to do that. We resolve the contradiction by thinking more about the internal details of how and why and when each model works, not by thinking less about the internal details. And if we don't want to do the hard work of thinking about how and why and when each model works, then we shouldn't be trying to force a resolution in the contradiction. Without doing that work, the most-accurate epistemic state we can reach is a model in which we know there's a contradiction, we know there's something wrong, but we don't know what's wrong (and, importantly, we know that we don't know what's wrong). Forcing a resolution to the contradiction, without figuring out where the actual problem is, would make our epistemic state less correct/accurate, not more.
I'm curious what part of your comment you think I disagree with. I'm not arguing for "forcing a resolution", except insofar as you need to sometimes actually make decisions under worldview uncertainty. In fact, "forcing a resolution" by forming "all-things-considered" credences is the thing I'm arguing against in this post.
I also agree that humans have lots of experience weighing up contradictory heuristics and model-fragments. I think all the mechanisms you gave for how humans might do these are consistent with the thing I'm advocating. In particular, "choose which heuristics to apply" or "search for a policy consistent with different model-fragments" seem like basically what the policy approach would recommend (e.g. searching for a policy consistent with both the Yudkowsky model-fragment and the Christiano model-fragment). By contrast, I don't think this is an accurate description of the way most EAs currently think about epistemic deference, which is the concrete example I'm contrasting my approach against.
(My model here is that you see me as missing a mood, like I'm not being sufficiently anti-black-boxes. I also expect that my proposal sounds more extreme than it actually is, because for simplicity I'm focusing on the limiting case of having almost no ability to resolve disagreements between worldviews.)
Huh. Yeah, I got the impression from the post that you wanted to do something like replace "epistemic deference plus black-box tests of predictive accuracy" with "policy deference plus simulated negotiation". And I do indeed feel like a missing anti-black-box mood is the main important problem with epistemic deference. But it's plausible we're just in violent agreement here :).
I like the way you tie real-world advice to principles in ML and RL. In general I think there are a lot of risks to naively applying epistemic deference and worldview aggregation and you articulate some really nicely here.
Something I've noticed with a few of your posts is that they often contain a lot of nuggets of ideas! And for you they seem to cohere into maybe a single high-level thought, but I sometimes want to pull them into smaller chunks[1]. For example, I imagine you (or others) might want to refer individually to the core idea in the paragraph beginning
However, even if in practice we end up mostly evaluating worldviews based on their epistemic track record, I claim that it’s still valuable to consider the epistemic track record as a proxy for the quality of their advice, rather than using it directly to evaluate how much we trust each worldview...
Now, the rest of the post gives this core idea context and support, but I think it stands on its own as well.
One compromise :D between putting lots of ideas together and splitting them apart too atomically could be to add meaningful sub-headings. (This also incidentally makes it easy to link out to the specific part of the text from another place via #
links.)
Maybe we differ in the number of effective working memory slots we have available (for what I mean see https://www.sciencedaily.com/releases/2008/04/080402212855.htm though see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4159388/ which challenges this) ↩︎
Just wanted to note that this comment was quite helpful for me, and has influenced other blog posts that I'm writing. Thanks!
Hi Richard,
the reason for commenting is that I liked this post, and therefore I tried to look for something useful to add. The following is the argument, that without applying the same approach to our current worldview, we will probably fall prey to many of the traps the policy approach tries to avoid.
Moreover, as I want to apply it on my inner self, I am treating the conclusions I have made a priori, as a separate worldview.
I am using Erik Eriksons 8 Stages of Psychosocial development model loosely to construe examples, but it isn’t that important. The main idea is still that during development, even from genetic heritage, we make/come with a lot of presumptions. And so we have already integrated multiple perspectives, but it might not have been very conscious, and therefore possibly not the best approach going forward.
Here is a simplefied example of what I am talking about:
Lets say you were afraid of snakes growing up. In adolescence you were in a culture that was very positive of snakes, and so as not to be seen as weak or different, you decided to force yourself to overcome it - and you chose to do so by staying with snakes until you didn’t feel ‘fear’ anymore. As you noticed other, smaller ‘fears’ in yourself, which you believed were phobias, you used this approach continuously - and it made you feel strong, and you got positive social feedback.
Since you are evaluationg ‘track record’ and advice, this might look good, but this case and probably many others, it might be easy to fall victim to the illusion this inner part has created, advice like it is good to face your fears/phobias till the are conquered. But, if we look at it closely, what is the logic under which this advice is created? - It is To fit in with my peers.
Now we might have invested a lot of energy into this advice, and have done it for years - even urged others to do it as well, and consequently built up vested interest in it being so. Therefore, if you do not pay attention, you might decide to give this worldview a default say when it comes to feelings related to fear. The worldview itself, however, is actually fundamentally interested in fitting in with one’s peers, but for it to work one you do not need to know the inherent logic of the worldview it was made in. It would still be helpful, but rarely clear cut.
So, when going through the process, when you come across things in your life you have already evaluated, you add it to the process without weighing it neutrally. This, in turn, has disrupting effects for evaluating strategies and decision-making, because it confuses cause and effect, and true area of decision-interest for the illusion of interest.
This differentce might lead to some subtle changes in the end-result as well.
Here is a simple use of your model, adding the conclusions you already have, seperately:
1. «Conquering my fears/phobias is good advice, and it has a good track record (10 less phobias, less stress etc. so far)»
Even though the former looks fine on the surface, I will have to thread carefully. If I do not know the worldview it comes from, even more so. Do I have any other beliefs about fear/phobias that might be useful or legitimate, but got discarded along the way? Could they, if given some time and effort, be better/complementing to my current belief?
2. «Conquering fears is a part of my identity, and I feel strong and get positive reactions socially. Therefore I care about any and all aspects where fear is related».
These two sentences look related, but does sentence A give sentence B? There are many other options. Since it seems like I have a vested interest (Part of my identity) in this view, I should be extra discerning. Furthermore, I might have suppressed other worldviews’ viewpoints regarding fear - and will have to look closely for any other beliefs regarding the area of fear that are valid.
The third point is the most interesting one, but also the most complex. The example I have used so far can be used as a part of the overall strategy.
Lets say you are planning on moving in with a SO. One part of the equation is that you fear you will lose parts of your autonomy.
Adding in all the former points, this is what it would look like:
A little sidenote:
*Conquering fear in this case can look very differently. Since we are cheating, by knowing the origin, we would try to fit in. So, instead of trying to fight for our autonomy, we might just get it over with and get used to having less. Giving it up till we do not fear it anymore.*
So, since the relationship is important, this belief is loudly suggesting you should use it. Give up your autonomy, till your fear is gone.
However, since you have isolated this as just one worldview’s belief, your search for a different/discarded view has yielded the following: «Allowing and expressing fear can foster a closer bond in close relationships.» You have found some evidence for this as well.
The conquering fears strategy is interested in anything regarding fear, and as this is a new and also relational fear, it is making it clear that it wants to have the say.
The belief that allowing and expressing fear can foster a closer bond in close relationships, does also have an interest in the area of fear. Since the other has vested interest however, this new belief does not shout, it whispers. Since you are aware, you try to listen as best you can.
So, what to do now. That is not easy, and beyond the scope of my comment, to answer.
Moreover, even though you add this extra precaution, we usually are blind to our blindspots - but at least you are trying to find some.
I also wanted to thank you for sharing this, and putting in all the work. It is a small node in a growing cluster of nodes concerned with Rationality and Epistemology.
Tl;dr: the problem of how to make decisions using multiple (potentially incompatible) worldviews (which I'll call the problem of meta-rationality) comes up in a range of contexts, such as epistemic deference. Applying a policy-oriented approach to meta-rationality, and evaluating worldviews by the quality of their advice, dissolves several undesirable consequences of the standard "epistemic" approach to deference.
Meta-rationality as the limiting case of separate worldviews
When thinking about the world, we’d ideally like to be able to integrate all our beliefs into a single coherent worldview, with clearly-demarcated uncertainties, and use that to make decisions. Unfortunately, in complex domains, this can be very difficult. Updating our beliefs about the world often looks less like filling in blank parts of our map, and more like finding a new worldview which reframes many of the things we previously believed. Uncertainty often looks less like a probability distribution over a given variable, and more like a clash between different worldviews which interpret the same observations in different ways.
By “worldviews” I include things like ideologies, scientific paradigms, moral theories, perspectives of individual people, and sets of heuristics. The key criterion is that each worldview has “opinions” about the world which can be determined without reference to any other worldview. Although of course different worldviews can have overlapping beliefs, in general their opinions can be incompatible with those of other worldviews - for example:
I think of “intelligence” as the core ability to develop and merge worldviews; and “rationality” as the ability to point intelligence in the most useful directions (i.e. taking into account where intelligence should be applied). Ideally we’d like to always be able to combine seemingly-incompatible worldviews into a single coherent perspective. But we usually face severe limitations on our ability to merge worldviews together (due to time constraints, cognitive limitations, or lack of information). I’ll call the skill of being able to deal with multiple incompatible worldviews, when your ability to combine them is extremely limited, meta-rationality. (Analogously, the ideal of emotional intelligence is to have integrated many different parts of yourself into a cohesive whole. But until you’ve done so, it’s important to have the skill of facilitating interactions between them. I won’t talk much about internal parts as an example of clashing worldviews throughout this post, but I think it’s a useful one to keep in mind.)
I don’t think there’s any sharp distinction between meta-rationality and rationality. But I do think meta-rationality is an interesting limiting case to investigate. The core idea I’ll defend in this post is that, when our ability to synthesize worldviews into a coherent whole is very limited, we should use each worldview to separately determine an overall policy for how to behave, and then combine those policies at a high level (for example by allocating a share of resources to each). I’ll call this the policy approach to meta-rationality; and I’ll argue that it prevents a number of problems (such as over-deference) which arise when using other approaches, particularly the epistemic approach of combining the credences of different worldviews directly.
Comparing the two approaches
Let’s consider one central example of meta-rationality: taking into account other people’s disagreements with us. In some simple cases, this is straightforward - if I vaguely remember a given statistic, but my friend has just looked it up and says I’m wrong, I should just defer to them on that point, and slot their correction into my existing worldview. But in some cases, other people have worldviews that clash with our own on large-scale questions, and we don’t know how to (or don’t have time to) merge them together without producing a frankenstein worldview with many internal inconsistencies.
How should we deal with this case, or other cases involving multiple inconsistent worldviews? The epistemic approach suggests:
This seems sensible, but leads to a few important problems:
The key problem which underlies these different issues is that the epistemic approach evaluates and merges the beliefs of different worldviews too early in the decision-making process, before the worldviews have used their beliefs to evaluate different possible strategies. By contrast, the policy approach involves:
One intuitive description of how this might occur is the parliamentary approach. Under this approach, each worldview is treated as a delegate in a parliament, with a number of votes proportional to how much weight is placed on that worldview; delegates can then spread their votes over possible policies, with the probability of a policy being chosen proportional to how many votes are cast for it.
The policy approach largely solves the problems I identified previously:
I also think that the policy approach is much more compatible with good community dynamics than the epistemic approach. I’m worried about cycles where everyone defers to everyone else’s opinion, which is formed by deferring to everyone else’s opinion, and so on. Groupthink is already a common human tendency even in the absence of explicit epistemic-modesty-style arguments in favor of it. By contrast, the policy approach eschews calculating or talking about all-things-considered credences, which pushes people towards talking about (and further developing) their own worldviews, which has positive externalities for others who can now draw on more distinct worldviews to make their own decisions.
Problems with the policy approach
Having said all this, there are several salient problems with the policy approach; I’ll cover four, but argue that none of them are strong objections.
Firstly, although we have straightforward ways to combine credences on different claims, in general it can be much harder to combine different policies. For example, if two worldviews disagree on whether to go left or right (and both think it’s a very important decision) then whatever action is actually taken will seem very bad to at least one of them. However, I think this is mainly a problem in toy examples, and becomes much less important in the real world. In the real world, there are almost always many different strategies available to us, rather than just two binary options. This means that there’s likely a compromise policy which doesn’t differ too much from any given worldview’s policy on the issues it cares about most. Admittedly, it’s difficult to specify a formal algorithm for finding that compromise policy, but the idea of fairly compromising between different recommendations is one that most humans find intuitive to reason about. A simple example: if two policies disagree on many spending decisions, we can give each a share of our overall budget and let it use that money how it likes. Then each policy will be able to buy the things it cares about most: getting control over half the money is usually much more than twice as valuable as getting control over all the money.
Secondly, it may be significantly harder to produce a good estimate of the value of each worldview’s advice than the accuracy of each worldview’s predictions, because we tend to have much less data about how well personalized advice works out. For example, if a worldview tells us what to do in a dozen different domains, but we only end up entering one domain, it’s hard to evaluate the others. Whereas if a worldview makes predictions about a dozen different domains, it’s easier to evaluate all of them in hindsight. (This is analogous to how credit assignment is much harder in reinforcement learning than in supervised learning.)
However, even if in practice we end up mostly evaluating worldviews based on their epistemic track record, I claim that it’s still valuable to consider the epistemic track record as a proxy for the quality of their advice, rather than using it directly to evaluate how much we trust each worldview. For example: suppose that a worldview is systematically overconfident. Using a direct epistemic approach, this would be a big hit to its trustworthiness. However, the difference between being overconfident and being well-calibrated plausibly changes the worldview’s advice very little, e.g. because it doesn’t change that worldview’s relative ranking of options. Another example: predictions which many people disagree with can allow you to find valuable neglected opportunities, even if conventional wisdom is more often correct. So when we think of predictions as a proxy for advice quality, we should place much more weight on whether predictions were novel and directionally correct than whether they were precisely calibrated.
Thirdly, the policy approach as described thus far doesn’t allow worldviews to have more influence over some individuals than others - perhaps individuals who have skills that one worldview cares about far more than any other; or perhaps individuals in worlds where one worldview’s values can be fulfilled much more easily than others’. Intuitively speaking, we’d like worldviews to be able to get more influence in those cases, in exchange for having less influence in other cases. In the epistemic approach, this is addressed via variance normalization across many possible worlds - but as discussed above, this could be significantly affected by how you differentiate the possibilities (and also what your prior is over those worlds). I think the policy approach can deal with this in a more principled way: for any set of possible worlds (containing people who follow some set of worldviews) you can imagine the worldviews deciding on how much they care about different decisions by different people in different possible worlds before they know which world they’ll actually end up in. In this setup, worldviews will trade away influence over worlds they think are unlikely and people they think are unimportant, in exchange for influencing the people who will have a lot of influence over more likely worlds (a dynamic closely related to negotiable reinforcement learning).
This also allows us a natural interpretation of what we’re doing when we assign weights to worldviews: we’re trying to rederive the relative importance weights which worldviews would have put on the branch of reality we actually ended up in. However, the details of how one might construct this “updateless” original position are an open problem.
One last objection: hasn’t this become far too complicated? “Reducing” the problem of epistemic deference to the problem of updateless multi-agent negotiation seems very much like a wrong-way reduction - in particular because in order to negotiate optimally, delegates will need to understand each other very well, which is precisely the work that the whole meta-rationality framing was attempting to avoid. (And given that they understand each other, they might try adversarial strategies like threatening other worldviews, or choosing which decisions to prioritize based on what they expect other worldviews to do.)
However, even if finding the optimal multi-agent bargaining solution is very complicated, the question that this post focuses on is how to act given severe constraints on our ability to compare and merge worldviews. So it’s consistent to believe that, if worldviews are unable to understand each other, they’ll do better by merging their policies than merging their beliefs. One reason to favor this idea is that multi-agent negotiation makes sense to humans on an intuitive level - which hasn’t proved to be true for other framings of epistemic modesty. So I expect this “reduction” to be pragmatically useful, especially when we’re focusing on simple negotiations over a handful of decisions (and given some intuitive notion of worldviews acting “in good faith”).
I also find this framing useful for thinking about the overall problem of understanding intelligence. Idealized models of cognition like Solomonoff induction and AIXI treat hypotheses (aka worldviews) as intrinsically distinct. By contrast, thinking of these as models of the limiting case where we have no ability to combine worldviews naturally points us towards the question of what models of intelligence which involve worldviews being merged might look like. This motivates me to keep a hopeful eye on various work on formal models of ideal cognition using partial hypotheses which could be merged together, like finite factored sets (see also the paper) and infra-bayesianism. I also note a high-level similarity between the approach I've advocated here and Stuart Armstrong's anthropic decision theory, which dissolves a number of anthropic puzzles via converting them to decision problems. The core insight in both cases is that confusion about how to form beliefs can arise from losing track of how those beliefs should relate to our decisions - a principle which may well help address other important problems.