You're pointing to good problems, but fuzzy truth values seem to approximately-totally fail to make any useful progress on them; fuzzy truth values are a step in the wrong direction.
Walking through various problems/examples from the post:
I would like to defend fuzzy logic at greater length, but I might not find the time. So, here is my sketch.
Like Richard, I am not defending fuzzy logic as exactly correct, but I am defending it as a step in the right direction.
As Richard noted, meaning is context-dependent. When I say "is there water in the fridge?" I am not merely referring to h2o; I am referring to something like a container of relatively pure water in easily drinkable form.
However, I claim: if we think of statements as being meaningful, we think these context-dependent meanings can in principle be rewritten into a language which lacks the context-independence.
In the language of information theory, the context-dependent language is what we send across the communication channel. The context-independent language is the internal sigma algebra used by the agents attempting to communicate.
You seem to have a similar picture:
...It is totally allowed for semantics of a proposition to be very dependent on context within that model - more precisely, there would be a context-free interpretation of the proposition in terms of latent variables, but the way those latents relate to the world would involve a lot o
Ty for the comment. I mostly disagree with it. Here's my attempt to restate the thrust of your argument:
The issues with binary truth-values raised in the post are all basically getting at the idea that the meaning of a proposition is context-dependent. But we can model context-dependence in a Bayesian way by referring to latent variables in the speaker's model of the world. Therefore we don't need fuzzy truth-values.
But this assumes that, given the speaker's probabilistic model, truth-values are binary. I don't see why this needs to be the case. Here's an example: suppose my non-transhumanist friend says "humanity will be extinct in 100 years". And I say "by 'extinct' do you include genetically engineered until future humans are a different species? How about being uploaded? How about all being cryonically frozen, to be revived later? How about...."
In this case, there is simply no fact of the matter about which of these possibilities should be included or excluded in the context of my friend's original claim, because (I'll assume) they hadn't considered any of those possibilities.
More prosaically, even if I have considered some possibilities in the past, at the time when I make a s...
But this assumes that, given the speaker's probabilistic model, truth-values are binary.
In some sense yes, but there is totally allowed to be irreducible uncertainty in the latents - i.e. given both the model and complete knowledge of everything in the physical world, there can still be uncertainty in the latents. And those latents can still be meaningful and predictively powerful. I think that sort of uncertainty does the sort of thing you're trying to achieve by introducing fuzzy truth values, without having to leave a Bayesian framework.
Let's look at this example:
suppose my non-transhumanist friend says "humanity will be extinct in 100 years". And I say "by 'extinct' do you include genetically engineered until future humans are a different species? How about being uploaded? How about all being cryonically frozen, to be revived later? How about...."
In this case, there is simply no fact of the matter about which of these possibilities should be included or excluded in the context of my friend's original claim...
Here's how that would be handled by a Bayesian mind:
That's still not a problem of fuzzy truth values, it's a problem of a fuzzy category boundaries. These are not the same thing.
The standard way to handle fuzzy category boundaries in a Bayesian framework is to treat semantic categories as clusters, and use standard Bayesian cluster models.
That's an important move to make, but it is also important to notice how radically context-dependent and vague our language is, to the point where you can't really eliminate the context-dependence and vagueness via taboo (because the new words you use will still be somewhat context-dependent and vague). Working against these problems is pragmatically useful, but recognizing their prevalence can be a part of that. Richard is arguing against foundational pictures which assume these problems away, and in favor of foundational pictures which recognize them.
Richard is arguing against foundational pictures which assume these problems away, and in favor of foundational pictures which recognize them.
I think you should handle the problems separately. In which case, when reasoning about truth, you should indeed assume away communication difficulties. If our communication technology was so bad that 30% of our words got dropped from every message, the solution would not be to change our concept of meanings; the solution would be to get better at error correction, ideally at a lower level, but if necessary by repeating ourselves and asking for clarification a lot.
You seem to be assuming that these issues arise only due to communication difficulties, but I'm not completely on board with that assumption. My argument is that these issues are fundamental to map-territory semantics (or, indeed, any concept of truth).
One argument for this is to note that the communicators don't necessarily have the information needed to resolve the ambiguity, even in principle, because we don't think in completely unambiguous concepts. We employ vague concepts like baldness, table, chair, etc. So it is not as if we have completely unambiguous pictures i...
It's a decent exploration of stuff, and ultimately says that it does work:
Language is not the problem, but it is the solution. How much trouble does the imprecision of language cause, in practice? Rarely enough to notice—so how come? We have many true beliefs about eggplant-sized phenomena, and we successfully express them in language—how?
These are aspects of reasonableness that we’ll explore in Part Two. The function of language is not to express absolute truths. Usually, it is to get practical work done in a particular context. Statements are interpreted in specific situations, relative to specific purposes. Rather than trying to specify the exact boundaries of all the variants of a category for all time, we deal with particular cases as they come up.
If the statement you're dealing with has no problematic ambiguities, then proceed. If it does have problematic ambiguities, then demand further specification (and highlighting and tabooing the ambiguous words is the classic way to do this) until you have what you need, and then proceed.
I'm not claiming that it's practical to pick terms that you can guarantee in advance will be unambiguous for all possible readers and all possib...
When I read this post I feel like I'm seeing four different strands bundled together:
1. Truth-of-beliefs as fuzzy or not
2. Models versus propositions
3. Bayesianism as not providing an account of how you generate new hypotheses/models
4. How people can (fail to) communicate with each other
I think you hit the nail on the head with (2) and am mostly sold on (4), but am sceptical of (1) - similar to what several others have said, it seems to me like these problems don't appear when your beliefs are about expected observations, and only appear when you start to invoke categories that you can't ground as clusters in a hierarchical model.
That leaves me with mixed feelings about (3):
- It definitely seems true and significant that you can get into a mess by communicating specific predictions relative to your own categories/definitions/contexts without making those sufficiently precise
- I am inclined to agree that this is a particularly important feature of why talking about AI/x-risk is hard
- It's not obvious to me that what you've said above actually justifies knightian uncertainty (as opposed to infrabayesianism or something), or the claim that you can't be confident about superintelligence (although it might be true for other reasons)
I find it surprising/confusing/confused/jarring that you speak of models-in-the-sense-of-mathematical-logic=:L-models as the same thing as (or as a precise version of) models-as-conceptions-of-situations=:C-models. To explain why these look to me like two pretty much entirely distinct meanings of the word 'model', let me start by giving some first brushes of a picture of C-models. When one employs a C-model, one likens a situation/object/etc of interest to a situation/object/etc that is already understood (perhaps a mathematical/abstract one), that one expects to be better able to work/play with. For example, when one has data about sun angles at a location throughout the day and one is tasked with figuring out the distance from that location to the north pole, one translates the question to a question about 3d space with a stationary point sun and a rotating sphere and an unknown point on the sphere and so on. (I'm not claiming a thinker is aware of making such a translation when they make it.) Employing a C-model making an analogy. From inside a thinker, the objects/situations on each side of the analogy look like... well, things/situations; from outside a thinker, bo...
One thing I don't understand / don't agree with here is the move from propositions to models. It seems to me that models can be (and usually are) understood in terms of propositions.
For example, Solomonoff understands models as computer programs which generate predictions. However, computer programs are constructed out of bits, which can be understood as propositions. The bits are not very meaningful in isolation; the claim "program-bit number 37 is a 1" has almost no meaning in the absence of further information about the other program bits. However, this isn't much of an issue for the formalism.
Similarly, I expect that any attempt to formally model "models" can be broken down into propositions. EG, if someone claimed that humans understand the world in terms of systems of differential equations, this would still be well-facilitated by a concept of propositions (ie, the equations).
It seems to me like a convincing abandonment of propositions would have to be quite radical, abandoning the idea of formalism entirely. This is because you'd have to explain why your way of thinking about models is not amenable to a mathematical treatment (since math is commonly understood in terms of propositions).
So (a) I'm not convinced that thinking in terms of propositions makes it difficult to think in terms of models; (b) it seems to me that refusing to think in terms of propositions would make it difficult to think in terms of models.
I am not well-read on this topic (or at-all read, really), but it struck me as bizarre that a post about epistemology would begin by discussing natural language. This seems to me like trying to grasp the most fundamental laws of physics by first observing the immune systems of birds and the turbulence around their wings.
The relationship between natural language and epistemology is more anthropological* that it is information-theoretical. It is possible to construct models that accurately represent features of the cosmos without making use of any language at all, and as you encounter in the "fuzzy logic" concept, human dependence on natural language is often an impediment to gaining accurate information.
Of course, natural language grants us many efficiencies that make it extremely useful in ancestral human contexts (as well as most modern ones). And given that we are humans, to perform error correction on our models, we have to model our own minds and the process of examination and modelling itself as part of the overall system we are examining and modelling. But the goal of that recursive modelling is to reduce the noise and error caused by the fuzziness of natural language and oth...
I think I agree with this post directionally.
You cannot apply Bayes' Theorem until you have a probability space; many real-world situations, especially the ones people argue about, do not have well-defined probability spaces, including a complete set of mutually exclusive and exhaustive possible events, which are agreed upon by all participants in the argument.
You will notice that, even on LessWrong, people almost never have Bayesian discussions where they literally apply Bayes' Rule. It would probably be healthy to try to literally do that more often! But making a serious attempt to debate a contentious issue "Bayesianly" typically looks more like Rootclaim's lab leak debate, which took a lot of setup labor and time, and where the result of quantifying the likelihoods was to reveal just how heavily your "posterior" conclusion depends on your "prior" assumptions, which were outside the scope of debate.
I think prediction markets are good, and I think Rootclaim-style quantified debates are worth doing occasionally, but what we do in most discussion isn't Bayesian and can't easily be made Bayesian.
I am not so sure about preferring models to propositions. I think what you'r...
tentative claim: there are models of the world, which make predictions, and there is "how true they are", which is the amount of noise you fudge the model with to get lowest loss (maybe KL?) in expectation.
E.g. "the grocery store is 500m away" corresponds to "my dist over the grocery store is centered at 500m, but has some amount of noise"
related to the claim that "all models are meta-models", in that they are objects capable of e.g evaluating how applicable they are for making a given prediction. E.g. "newtonian mechanics" also carries along with it information about how if things are moving too fast, you need to add more noise to its predictions, i.e. it's less true/applicable/etc.
Statements do often have ambiguities: there are a few different more-precise statements they could be interpreted to mean, and sometimes those more-precise statements have different truth values. But the solution is not to say that the ambiguous statement has an ambiguous truth value and therefore discard the idea of truth. The solution is to do your reasoning about the more-precise statements, and, if someone ever hands you ambiguous statements whose truth value is important, to say "Hey, please explain more precisely what you meant." Why would one do otherwise?
By the way:
colorless green ideas sleep furiously
There is a straightforward truth value here: there are no colorless green ideas, and therefore it is vacuously true that all of them sleep furiously.
I think things (minds, physical objects, social phenomena) should be characterized by computations that they could simulate/incarnate. The most straightforward example is a computer that holds a program, it could start running it. The program is not in any way fundamentally there, it's an abstraction of what the computer physically happens to be. And it still characterizes the computer even if it's not inevitable that it will start running, merely the possibility that it could start running is significant to the interactive behavior of the computer, the wa...
Curated. I really like that even though LessWrong is 1.5 decades old now and has Bayesianism assumed as background paradigm while people discuss everything else, nonetheless we can have good exploration of our fundamental epistemological beliefs.
The descriptions of unsolved problems, or at least incompleteness of Bayesianism strikes me as technically correct. Like others, I'm not convinced of Richard's favored approach, but it's interesting. In practice, I don't think these problems undermine the use of Bayesianism in typical LessWrong thought. For example...
Verbal statements often have context dependent or poorly defined truth value, but observations are pretty (not completely) solid. Since useful models eventually shake out into observations, the binary truth values tagging observations "propagate back" through probability theory to make useful statements about models. I am not convinced that we need a fuzzier framework - though I am interested in the philosophical justification for probability theory in the "unrealizable" case where no element of the hypothesis class is true. For instance, it seems that universal distributions mixture is over probabilistic models none of which should necessarily be assumed true, but rather only the widest class we can compute.
Yes, propositions are abstractions which don't exactly correspond to anything in our mind. But they do seem to have advantages: When communicating, we use sentences, which can be taken to express propositions. And we do seem to intuitively have propositional attitudes like "beliefs" (believing a proposition to be true) and "desires" (wanting a proposition to be true) in our mind. Which are expressible in sentences again. So propositions seem to be a quite natural abstraction. Treating them as being either true or false is a further simplification which wor...
A: Is there any water in the refrigerator?
B: Yes.
A: Where? I don’t see it.
B: In the cells of the eggplant.
The issue here is ambiguity between root-cause analysis (nobody has channeled a container for water to be among the objects currently in the refrigerator) vs reductionism (eggplants diminish into (among lots of things) water).
The problem with Bayesianism here is not that it uses binary rather than fuzzy truth-values (fuzzy truth-values don't really solve this, as you admit, though they're also not really incrementally closer to solving it), but rather ...
Can you help me tease out the difference between language being fuzzy and truth itself being fuzzy?
It's completely impractical to eliminate ambiguity in language, but for most scientific purposes, it seems possible to operationalize important statements into something precise enough to apply Bayesian reasoning to. This is indeed the hard part though. Bayes' theorem is just arithmetic layered on top of carefully crafted hypotheses.
The claim that the Earth is spherical is neither true nor false in general but usually does fall into a binary if we specify wha...
Is this a fair summary (from a sort of reverse direction)?
We start with questions like "Can GR and QM be unified?" where we sort of think we know what we mean based on a half-baked, human understanding both of the world and of logic. If we were logically omniscient we could expound a variety of models that would cash out this human-concept-space question more precisely, and within those models we could do precise reasoning - but it's ambiguous how our real world half-baked understanding actually corresponds to any given precise model.
I shy away from fuzzy logic because I used it as a formalism to justify my religious beliefs. (In particular, "Possibilistic Logic" allowed me to appear honest to myself—and I'm not sure how much of it was self-deception and how much was just being wrong.)
The critical moment in my deconversion came when I realized that if I was looking for truth, I should reason according to the probabilities of the statements I was evaluating. Thirty minutes later, I had gone from a convinced Christian speaking to others, leading in my local church, and basing my life and...
First of all Popper and Deutsch don't discard induction entirely. They just argue against induction as a source/foundation of knowledge.
Now one comment on the Bayesian endevour: As a laymen mathematician I have little authority in saying this, but isn't it obvious that a probability calculation fails if the absolute value is infinite?
I was there at the beginning of the lesswrong movement and this emphasis on probabilistic thinking is new to me. Bayesianism also is blacklisted under my personal philosophical firewall for being gameable by social engin...
You are over-simplifying Bayesian reasoning. Giving partial credence to propositions doesn't work; numerical values representing partial credence must be attached to the basic conjunctions.
For example, if the propositions are A, B, and C, the idea for coping with incomplete information that every-one has, is to come up with something like P(A)=0.2, P(B)=0.3, P(C)=0.4 This doesn't work.
One has to work with the conjunctions and come up with something like
P(A and B and C) = 0.1
P(A and B and not C) = 0.1
P(A and not B and C) = 0.1
P(A and not B and n...
Natural languages, by contrast, can refer to vague concepts which don’t have clear, fixed boundaries
I disagree. I think it's merely the space is so large that it's hard to pin down where the boundary is. However, language does define natural boundaries (that are slightly different for each person and language, and shift over time). E.g., see "Efficient compression in color naming and its evolution" by Zaslavsky et al.
without a principled distinction between credences that are derived from deep, rigorous models of the world, and credences that come from vague speculation
Double counting issues here as well, in communities.
Your article is a great read!
In my view, we can categorize scientists into two broad types: technician scientists, who focus on refining and perfecting existing theories, and creative scientists, who make generational leaps forward with groundbreaking ideas. No theory is ever 100% correct—each is simply an attempt to better explain a phenomenon in a way that’s useful to us.
Take Newton, for example. His theory of gravity was revolutionary, introducing concepts no one had thought of before—it was a generational achievement. But then Einstein came along...
My attempt at a TLDR for this: Bayesian assign a probability to each belief in order to represent uncertainty but this is insufficient because there are multiple kinds of uncertainty: vagueness, approximation, context-dependence, and sense vs nonsense, knightian. And sometimes we humans make logical errors when we try to do bayesian inference.
This post focuses on philosophical objections to Bayesianism as an epistemology. I first explain Bayesianism and some standard objections to it, then lay out my two main objections (inspired by ideas in philosophy of science). A follow-up post will speculate about how to formalize an alternative.
Degrees of belief
The core idea of Bayesian epistemology: we should ideally reason by assigning credences to propositions which represent our degrees of belief that those propositions are true. (Note that this is different from Bayesianism as a set of statistical techniques, or Bayesianism as an approach to machine learning, which I don’t discuss here.)
If that seems like a sufficient characterization to you, you can go ahead and skip to the next section, where I explain my objections to it. But for those who want a more precise description of Bayesianism, and some existing objections to it, I’ll more specifically characterize it in terms of five subclaims. Bayesianism says that we should ideally reason in terms of:
I won’t go into the case for Bayesianism here except to say that it does elegantly formalize many common-sense intuitions. Bayes’ rule follows directly from a straightforward Venn diagram. The axioms of probability are powerful and mathematically satisfying. Subjective credences seem like the obvious way to represent our uncertainty about the world. Nevertheless, there are a wide range of alternatives to Bayesianism, each branching off from the claims listed above at different points:
It’s not crucial whether we classify Garrabrant induction and radical probabilism as variants of Bayesianism or alternatives to it, because my main objection to Bayesianism doesn’t fall into any of the above categories. Instead, I think we need to go back to basics and reject #1. Specifically, I have two objections to the idea that idealized reasoning should be understood in terms of propositions that are true or false:
I’ll defend each claim in turn.
Degrees of truth
Formal languages (like code) are only able to express ideas that can be pinned down precisely. Natural languages, by contrast, can refer to vague concepts which don’t have clear, fixed boundaries. For example, the truth-values of propositions which contain gradable adjectives like “large” or “quiet” or “happy” depend on how we interpret those adjectives. Intuitively speaking, a description of something as “large” can be more or less true depending on how large it actually is. The most common way to formulate this spectrum is as “fuzzy” truth-values which range from 0 to 1. A value close to 1 would be assigned to claims that are clearly true, and a value close to 0 would be assigned to claims that are clearly false, with claims that are “kinda true” in the middle.
Another type of “kinda true” statements are approximations. For example, if I claim that there’s a grocery store 500 meters away from my house, that’s probably true in an approximate sense, but false in a precise sense. But once we start distinguishing the different senses that a concept can have, it becomes clear that basically any concept can have widely divergent category boundaries depending on the context. A striking example from Chapman:
The claim that there’s water in the refrigerator is technically true, but pragmatically false. And the concept of “water” is far better-defined than almost all abstract concepts (like the ones I’m using in this post). So we should treat natural-language propositions as context-dependent by default. But that’s still consistent with some statements being more context-dependent than others (e.g. the claim that there’s air in my refrigerator would be true under almost any interpretation). So another way we can think about fuzzy truth-values is as a range from “this statement is false in almost any sense” through “this statement is true in some senses and false in some senses” to “this statement is true in almost any sense”.
Note, however, that there’s an asymmetry between “this statement is true in almost any sense” and “this statement is false in almost any sense”, because the latter can apply to two different types of claims. Firstly, claims that are meaningful but false (“there’s a tiger in my house”). Secondly, claims that are nonsense—there are just no meaningful interpretations of them at all (“colorless green ideas sleep furiously”). We can often distinguish these two types of claims by negating them: “there isn’t a tiger in my house” is true, whereas “colorless green ideas don’t sleep furiously” is still nonsense. Of course, nonsense is also a matter of degree—e.g. metaphors are by default less meaningful than concrete claims, but still not entirely nonsense.
So I've motivated fuzzy truth-values from four different angles: vagueness, approximation, context-dependence, and sense vs nonsense. The key idea behind each of them is that concepts have fluid and amorphous category boundaries (a property called nebulosity). However, putting all of these different aspects of nebulosity on the same zero-to-one scale might be an oversimplification. More generally, fuzzy logic has few of the appealing properties of classical logic, and (to my knowledge) isn’t very directly useful. So I’m not claiming that we should adopt fuzzy logic wholesale, or that we know what it means for a given proposition to be X% true instead of Y% true (a question which I’ll come back to in a follow-up post). For now, I’m just claiming that there’s an important sense in which thinking in terms of fuzzy truth-values is less wrong (another non-binary truth-value) than only thinking in terms of binary truth-values.
Model-based reasoning
The intuitions in favor of fuzzy truth-values become clearer when we apply them, not just to individual propositions, but to models of the world. By a model I mean a (mathematical) structure that attempts to describe some aspect of reality. For example, a model of the weather might have variables representing temperature, pressure, and humidity at different locations, and a procedure for updating them over time. A model of a chemical reaction might have variables representing the starting concentrations of different reactants, and a method for determining the equilibrium concentrations. Or, more simply, a model of the Earth might just be a sphere.
In order to pin down the difference between reasoning about propositions and reasoning about models, philosophers of science have drawn on concepts from mathematical logic. They distinguish between the syntactic content of a theory (the axioms of the theory) and its semantic content (the models for which those axioms hold). As an example, consider the three axioms of projective planes:
There are infinitely many models for which these axioms hold; here’s one of the simplest:
If propositions and models are two sides of the same coin, does it matter which one we primarily reason in terms of? I think so, for two reasons. Firstly, most models are very difficult to put into propositional form. We each have implicit mental models of our friends’ personalities, of how liquids flow, of what a given object feels like, etc, which are far richer than we can express propositionally. The same is true even for many formal models—specifically those whose internal structure doesn’t directly correspond to the structure of the world. For example, a neural network might encode a great deal of real-world knowledge, but even full access to the weights doesn’t allow us to extract that knowledge directly—the fact that a given weight is 0.3 doesn’t allow us to claim that any real-world entity has the value 0.3.
What about scientific models where each element of the model is intended to correspond to an aspect of reality? For example, what’s the difference between modeling the Earth as a sphere, and just believing the proposition “the Earth is a sphere”? My answer: thinking in terms of propositions (known in philosophy of science as the syntactic view) biases us towards assigning truth values in a reductionist way. This works when you’re using binary truth-values, because they relate to each other according to classical logic. But when you’re using fuzzy truth-values, the relationships between the truth-values of different propositions become much more complicated. And so thinking in terms of models (known as the semantic view) is better because models can be assigned truth-values in a holistic way.
As an example: “the Earth is a sphere” is mostly true, and “every point on the surface of a sphere is equally far away from its center” is precisely true. But “every point on the surface of the Earth is equally far away from the Earth’s center” seems ridiculous—e.g. it implies that mountains don’t exist. The problem here is that rephrasing a proposition in logically equivalent terms can dramatically affect its implicit context, and therefore the degree of truth we assign to it in isolation.
The semantic view solves this by separating claims about the structure of the model itself from claims about how the model relates to the world. The former are typically much less nebulous—claims like “in the spherical model of the Earth, every point on the Earth’s surface is equally far away from the center” are straightforwardly true. But we can then bring in nebulosity when talking about the model as a whole—e.g. “my spherical model of the Earth is closer to the truth than your flat model of the Earth”, or “my spherical model of the Earth is useful for doing astronomical calculations and terrible for figuring out where to go skiing”. (Note that we can make similar claims about the mental models, neural networks, etc, discussed above.)
We might then wonder: should we be talking about the truth of entire models at all? Or can we just talk about their usefulness in different contexts, without the concept of truth? This is the major debate in philosophy of science. I personally think that in order to explain why scientific theories can often predict a wide range of different phenomena, we need to make claims about how well they describe the structure of reality—i.e. how true they are. But we should still use degrees of truth when doing so, because even our most powerful scientific models aren’t fully true. We know that general relativity isn’t fully true, for example, because it conflicts with quantum mechanics. Even so, it would be absurd to call general relativity false, because it clearly describes a major part of the structure of physical reality. Meanwhile Newtonian mechanics is further away from the truth than general relativity, but still much closer to the truth than Aristotelian mechanics, which in turn is much closer to the truth than animism. The general point I’m trying to illustrate here was expressed pithily by Asimov: “Thinking that the Earth is flat is wrong. Thinking that the Earth is a sphere is wrong. But if you think that they’re equally wrong, you’re wronger than both of them put together.”
The correct role of Bayesianism
The position I’ve described above overlaps significantly with the structural realist position in philosophy of science. However, structural realism is usually viewed as a stance on how to interpret scientific theories, rather than how to reason more generally. So the philosophical position which best captures the ideas I’ve laid out is probably Karl Popper’s critical rationalism. Popper was actually the first to try to formally define a scientific theory's degree of truth (though he was working before the semantic view became widespread, and therefore formalized theories in terms of propositions rather than in terms of models). But his attempt failed on a technical level; and no attempt since then has gained widespread acceptance. Meanwhile, the field of machine learning evaluates models by their loss, which can be formally defined—but the loss of a model is heavily dependent on the data distribution on which it’s evaluated. Perhaps the most promising approach to assigning fuzzy truth-values comes from Garrabrant induction, where the “money” earned by individual traders could be interpreted as a metric of fuzzy truth. However, these traders can strategically interact with each other, making them more like agents than typical models.
Where does this leave us? We’ve traded the crisp, mathematically elegant Bayesian formalism for fuzzy truth-values that, while intuitively compelling, we can’t define even in principle. But I’d rather be vaguely right than precisely wrong. Because it focuses on propositions which are each (almost entirely) true or false, Bayesianism is actively misleading in domains where reasoning well requires constructing and evaluating sophisticated models (i.e. most of them).
For example, Bayesians measure evidence in “bits”, where one bit of evidence rules out half of the space of possibilities. When asking a question like “is this stranger named Mark?”, bits of evidence are a useful abstraction: I can get one bit of evidence simply by learning whether they’re male or female, and a couple more by learning that their name has only one syllable. Conversely, talking in Bayesian terms about discovering scientific theories is nonsense. If every PhD in fundamental physics had contributed even one bit of usable evidence about how to unify quantum physics and general relativity, we’d have solved quantum gravity many times over by now. But we haven’t, because almost all of the work of science is in constructing sophisticated models, which Bayesianism says almost nothing about. (Formalisms like Solomonoff induction attempt to sidestep this omission by enumerating and simulating all computable models, but that’s so different from what any realistic agent can do that we should think of it less as idealized cognition and more as a different thing altogether, which just happens to converge to the same outcome in the infinite limit.)
Mistakes like these have many downstream consequences. Nobody should be very confident about complex domains that nobody has sophisticated models of (like superintelligence); but the idea that “strong evidence is common” helps justify confident claims about them. And without a principled distinction between credences that are derived from deep, rigorous models of the world, and credences that come from vague speculation (and are therefore subject to huge Knightian uncertainty), it’s hard for public discussions to actually make progress.
Should I therefore be a critical rationalist? I do think Popper got a lot of things right. But I also get the sense that he (along with Deutsch, his most prominent advocate) throws the baby out with the bathwater. There is a great deal of insight encoded in Bayesianism which critical rationalists discard (e.g. by rejecting induction). A better approach is to view Bayesianism as describing a special case of epistemology, which applies in contexts simple enough that we’ve already constructed all relevant models or hypotheses, exactly one of which is exactly true, and we just need to decide between them. Interpreted in that limited way, Bayesianism is both useful (e.g. in providing a framework for bets and prediction markets) and inspiring: if we can formalize this special case so well, couldn’t we also formalize the general case? What would it look like to concretely define degrees of truth? I don’t have a solution, but I’ll outline some existing attempts, and play around with some ideas of my own, in a follow-up post.