The Elicit integrations aren't working. I'm looking into it; it looks like we attempted to migrate away from the Elicit API 7 months ago and make the polls be self-hosted on LW, but left the UI for creating Elicit polls in place in a way where it would produce broken polls. Argh.
I can find the polls this article uses, but unfortunately I can't link to them; Elicit's question-permalink route is broken? Here's what should have been a permalink to the first question: link.
Second-order rationality
Definition
By “second-order rationality” (and intelligence) I mean the study of rationally reasoning about other people’s rationality (and intelligence) in order to inform us about the world.
This is opposed to evaluating a proposition at face value, using the first-order evidence supporting it.
Second-order rationality is about updating your beliefs just from understanding the distribution of different beliefs or belief histories, possibly by grouping them across populations with different characteristics, without referring to any first-order evidence related to the nature of the belief.
I think it’s an area worth exploring more.
What’s your probability that this is a useful area to study? You can use your own operationalization. For this exercise to work, you should record your prediction before continuing to read this article. This will get used as an example of system rationality at the end of this post.
Edit to add: Use this link.
Note: I think I'm following the instructions from Embedded Interactive Predictions on LessWrong—I don't know what the above forecasting widget doesn't seem to work; do you know? (Edit to add: See comment)
How to use this second-order knowledge?
I’m guessing many of you are already intuitively anxious about the Dangers of deference and trying to psychoanalyze people into knowing reality. Me too! That’s okay–we can use our first-order rationality to review the value of second-order rationality, theoretically and experimentally.
Ultimately, I think second-order rationality can help with finding the hypotheses which are most valuable to verify at the object-level but cannot allow us to defer all the way.
And we already defer to a very large extent, at least in terms of deciding what's worth learning (anyone here avoided all human interactions since birth in order to rederive everything by themselves?[1])–this is just meant to study that practice.
Factors that correlate with beliefs
Other beliefs
Obliviously, beliefs will correlate with each other, often rationally so. For example, if you believe the pavement is wet, you’re more likely to believe it recently rained.
Values
For example, studying how values and beliefs correlate might help us correct for the just-world fallacy. Presumably, for rational agents, beliefs and values shouldn’t be correlated[2]. If they are, then that’s an indication of irrationality. (see:)
As an example of such an indication, it seems like very few people[3] believe that (biological) immortality is feasible / likely / unavoidable, whether through a religious afterlife or a techno-optimist future, and yet still think this would be undesirable (and, conversely, there should also be a bias for thinking it’s desirable and impossible)[4]. Note this isn’t a perfect example as the desirability of immortality still also entangles epistemic questions about the world which could rationally be correlated with one’s judgement on the feasibility of immortality.
Pleasure
For example, it seems like if we start with 2 rational humans that have the same values and beliefs, with 1 finding meat less tasty, then after reading an argument on the moral value of meat-composed beings, they should both update in the same way. If they don’t, then that’s indicative of a bias.
Emotional state
Belief updates obviously affect emotional states, but it seems likely that the opposite also happens.
Intelligence / track-record
That one is kind of obvious–answering correctly some questions is predictive of answering correctly other questions. But there are still interesting questions, such as how one’s answering-capabilities for one domain correlates with one’s answering-capabilities in another domain. This can also be applied to forecasting specifically–how can we predict a forecaster’s performance based on their past performance.
Meta-beliefs
Here are 2 examples.
In Against a General Factor of Doom from AI Impacts, Jeffrey Heninger starts with (emphasis added):
In On the belief that beliefs should change according to evidence: Implications for conspiratorial, moral, paranormal, political, religious, and science beliefs, it was found that (emphasis added):
Side note: Of course, that doesn’t say anything about the causality. One hypothesis would be that if you have lower raw cognitive capabilities, then it’s rational to rely more on traditions and instincts as you’re less likely to outperform those.
Personality
For example, although not directly related to belief but indirectly through intelligence, the abstract of Low Correlations between Intelligence and Big Five Personality Traits: Need to Broaden the Domain of Personality says:
Environment
All the above factors were internal: looking at how some part of one’s mind influences their beliefs.
Another lens through which we can look at that is by how environmental factors influence beliefs. Of course, this will just be updating one’s belief through one of the above internal paths, but it still seems helpful to study this indirect influence.
For example, Robin Hanson says in a series of tweets [5] (emphasis added):
Inside the body, but external to the mind
We can also consider our blood composition as part of the environment (from the perspective of the mind). For example, blood sugar, hormones, and drugs seem to have an impact on our cognition, and possibly also on our fundamental values.
Genes
This is also an indirect cause rather than directly related to another internal process in one’s mind, but can similarly correlate with rationality.
System rationality
Definition
Incentives for being rational will naturally correlate with being rational, by assumption.
This is qualitatively different from the factors mentioned above as that’s something that’s done intentionally to elicit and evaluate beliefs, as well as aggregating them into a “superorganism”. System rationality is a form of "intentional second-order rationality".
Examples
For example, in general, we should presumably update that someone is saying something true the more costly it would be for them to say something false. Here are some articles about eliciting true beliefs:
At the aggregated level, the prototypical example of this category is a prediction market. The idea is that you should update your beliefs simply based on people’s willingness to bet on their beliefs–everything else staying a black box
Information consumption
Affecting beliefs
Epistemic status: This is an embryonic idea I have, which I find very appealing, and got me thinking about this post in the first place.
I would like to know for people that were/are “like” me (operationalization TBD) how they end up updating their mind. Imagine we had some systematic ways to keep track of our beliefs and the information we were consuming, and from that we could determine what information is most likely to update one’s beliefs. Something more systemic and personalized than “check the Facebook posts of people you remember having historically done that”. This might make our information consumption much more efficient. This might give us a deliberation elevator to rapidly climb up a deliberation ladder. I would particularly like having that when consuming information about AI alignment.
Affecting values
In a Facebook post, Kat Woods suggest an hypothesis for how the order in which we consume information might influence our values (initial emphasis added):
Simple experiment idea
Here’s an example of an experiment that could be run to study “Update flow”.
We have a group of people attend a day-long study of AI x-risks.
To track beliefs, we ask the participants to report their beliefs on N short-term forecasting questions related to AI x-risks which will be scored using a proper scoring rule. The scoring rule also incentivizes participants to update their predictions as soon as they update their beliefs.
Participants are free to read any article from a large library of articles. Once they pick an article, they must read it, report the link, and update their predictions (if applicable).
We have 2 groups:
Then we check how helpful that mechanism was.
That’s just a simple idea for an experiment with a lot of potential for variants.
(It's also the personal pain point I'm trying to solve as there's so much to read on that topic, and I wish I had a better idea on what to prioritize.)
Feature suggestion for LessWrong
This should also work for predictions that might never get a ground truth, or that aren’t well operationalized. You’re still incentivized to report your true predictions because that’s how the system will know what articles to recommend you and which other users to partner-match you to.
Hence, we can revisit our initial question:
This could become a common practice on LessWrong, and this could be used to predict how much a given article will make a given user update based on how much it made different users update (ex.: weighted based on how similar their historical updating behavior has been to the given user). This could be used to inform users on what to read, as an additional metric along with karma.
If some questions are frequently used across different article, we could also create models to predict what you would believe "if you read all of LessWrong" or "all articles with that question". I'm aware this idea needs fleshing out, but I figured I'd share a first version, and if reception seemed good, then possibly work on it more or allow someone else to do that.
Further ideas
Maybe there’s the possibility for something more ambitious, like a browser extension that tracks what you read, LLMs used to aggregate the core ideas that make people update their beliefs, allowing for freeform reporting of belief update, etc.
Annexes
Background
The goal of this post was to share the broad idea along with some examples I’ve collected since I first thought of this concept on 2020-09-20.
Another example
From Superforecasting: Summary and Review:
Related article
I had a section listing hypotheses on the advantages of not being more intelligent and rational from an evolutionary perspective, but that didn't seem central to this post, so I'll keep that for a potential future post.
I sometimes imagine that’s what I’d want in Utopia–but not fully what I want here 😅
The ratio that believes is feasible and infeasible should be the same for those that believe it’s desirable and undesirable, and vice versa.
As an exception, I know someone who was relieved when they stopped believing in religion because of zir fear of Hell.
That’s partly why in the book “Ending Aging”, Aubrey de Grey focuses on the feasibility of anti-aging and doesn’t discuss its desirability.
I think there are other plausible explanations. For example, maybe religion did play the role of trust, and modern society has now replaced this with something better. But I’m sharing that for the general line of thinking about how the environment correlates with beliefs.