i.e. if each forecaster has an first-order belief , and is your second-order belief about which forecaster is correct, then should be your first-order belief about the election.
I think there might be a typo here. Did you instead mean to write: "" for the second order beliefs about the forecasters?
Kosoy's infrabayesian monad is given by
There are a few different varieties of infrabayesian belief-state, but I currently favour the one which is called "homogeneous ultracontributions", which is "non-empty topologically-closed ⊥–closed convex sets of subdistributions", thus almost exactly the same as Mio-Sarkis-Vignudelli's "non-empty finitely-generated ⊥–closed convex sets of subdistributions monad" (Definition 36 of this paper), with the difference being essentially that it's presentable, but it's much more like than .
I am not at all convinced by the interpretation of here as terminating a game with a reward for the adversary or the agent. My interpretation of the distinguished element in is not that it represents a special state in which the game is over, but rather a special state in which there is a contradiction between some of one's assumptions/observations. This is very useful for modelling Bayesian updates (Evidential Decision Theory via Partial Markov Categories, sections 3.5-3.6), in which some variable is observed to satisfy a certain predicate : this can be modelled by applying the predicate in the form where means the predicate is false, and means it is true. But I don't think there is a dual to logical inconsistency, other than the full set of all possible subdistributions on the state space. It is certainly not the same type of "failure" as losing a game.
For the sake of potential readers, a (full) distribution over is some with finite support and , whereas a subdistribution over is some with finite support and . Note that a subdistribution over is equivalent to a full distribution over , where is the disjoint union of with some additional element, so the subdistribution monad can be written .
I am not at all convinced by the interpretation of here as terminating a game with a reward for the adversary or the agent. My interpretation of the distinguished element in is not that it represents a special state in which the game is over, but rather a special state in which there is a contradiction between some of one's assumptions/observations.
Doesn't the Nirvana Trick basically say that these two interpretations are equivalent?
Let be and let be . We can interpret as possibility, as a hypothesis consistent with no observations, and as a hypothesis consistent with all observations.
Alternatively, we can interpret as the free choice made by an adversary, as "the game terminates and our agent receives minimal disutility", and as "the game terminates and our agent receives maximal disutility". These two interpretations are algebraically equivalent, i.e. is a topped and bottomed semilattice.
Unless I'm mistaken, both and demand that the agent may have the hypothesis "I am certain that I will receive minimal disutility", which is necessary for the Nirvana Trick. But also demands that the agent may have the hypothesis "I am certain that I will receive maximal disutility". The first gives bounded infrabayesian monad and the second gives unbounded infrabayesian monad. Note that Diffractor uses in Infra-Miscellanea Section 2.
I agree that each of and has two algebraically equivalent interpretations, as you say, where one is about inconsistency and the other is about inferiority for the adversary. (I hadn’t noticed that).
The variant still seems somewhat irregular to me; even though Diffractor does use it in Infra-Miscellanea Section 2, I wouldn’t select it as “the” infrabayesian monad. I’m also confused about which one you’re calling unbounded. It seems to me like the variant is bounded (on both sides) whereas the variant is bounded on one side, and neither is really unbounded. (Being bounded on at least one side is of course necessary for being consistent with infinite ethics.)
Does this article have any practical significance, or is it all just abstract nonsense? How does this help us solve the Big Problem? To be perfectly frank, I have no idea. Timelines are probably too short agent foundations, and this article is maybe agent foundations foundations...
I do think this is highly practically relevant, not least of which because using an infrabayesian monad instead of the distribution monad can provide the necessary kind of epistemic conservatism for practical safety verification in complex cyber-physical systems like the biosphere being protected and the cybersphere being monitored. It also helps remove instrumentally convergent perverse incentives to control everything.
Acknowledgements:
This research began during the SERI MATS program, under the joint mentorship of John Wentworth, Nicholas Kees, and Janus. Thanks also to Davidad, Jack Sagar, and David Jaz Myers for discussion.
Abstract:
I think that there is a uniform correspondence between flavours of uncertainty and monads taking state-spaces to belief-state-spaces, for different characterisation of belief. In this essay, I describe this correspondence explicitly and list 15 diverse and well-motivated examples. I explore some applications to model-building and agent foundations. Along the way, I characterise infrabayesianism uncertainty as the minimal way to encompass possibilistic uncertainty, probabilistic uncertainty, and reward.
No prerequisites are required beyond a high-school familiarity with sets, functions, real numbers, etc. Feedback welcome.
Introduction
Suppose I'm facing the following problem. There's an upcoming election between n candidates, and you're uncertain who will win. How can I model both your belief about the election and the election itself in a coherent way? By "belief" here, I mean your epistemic attitude, your internal model, your opinion, judgement, prediction, etc, etc. Think map-territory distinction: the election is the territory, your belief is the map, and I need to model both the map and the territory coherently despite the fact that the map and the territory are (typically speaking) two completely different types of thing.
Well, to model the election itself, I'll use a set S={s1,s2,s3,…sn} with an element for each electoral candidate. To represent your belief about the election, I must find another set B(S) with an element for each belief that you might have about the election. I'll call S the state space and B(S) the belief-state space. A solution to our problem is given by a mathematical operator B sending each state-space S to the matching belief-state space B(S).
One may feel prompted to ask: does any operator B suffice here? Can the belief-state space be anything whatsoever, or must it carry some extra structure, possibly satisfying some additional constraints? Or, stated more philosophically, can any territory serve as a map for any other? I say no. Roughly speaking, the operator B must be a so-called monad, which will be the central object of this essay. But more on that later.
The first thing to note is that the appropriate operator B will depend on how exactly I wish to characterise a "belief" about the election, and there are multiple options here. For example, I might choose to characterise your belief by the set of candidates that you think have a possibility of winning. In this case, B(S):=P+(S), denoting the set of non-empty subsets of S. Alternatively, I might choose to characterise your belief by the likelihood that you give each candidate. In this case, B(S):=Δ(S), denoting the set of finite-support probability distributions over S, i.e. functions p:S→[0,1] such that {s∈S:p(s)≠0} is finite and ∑s∈Sp(s)=1.
In the first option, I'm characterising your belief-state by your possibilistic uncertainty, often encountered in doxastic or epistemic logic. In the second option, I'm characterising your belief-state by your probabilistic uncertainty, which is a finer-grained characterisation of belief because it differentiates between e.g. thinking a coin is fair and thinking a coin is slightly biased.
The second option has its merits. Indeed, many readers will instinctively reach for Δ as soon as they hear the word "uncertainty", and this instinct would serve them well. There's been a fruitful enterprise (in philosophy, mathematics, computer science, linguistics, etc) of replacing possibilistic uncertainty with probabilistic uncertainty in any model or concept where one finds it. But I want to note that both P+ and Δ would count as a solution to the problem. I'll return to these two examples throughout this essay because they are the flavours of uncertainty which will be most familiar to the reader.
As we will see, these two operators, P+ and Δ, are both monads. The central claim of this essay is that there is a uniform correspondence between flavours of uncertainty and monads. By "flavour of uncertainty" I mean a particular way of characterising someone's potentially uncertain belief about something. Possibilistic and probabilistic are paradigm cases, but in this essay we'll meet fifteen examples.
The forward-implication of this claim, that every flavour of uncertainty is a monad, is perhaps uncontroversial in some circles.[1] The backwards-implication, that every monad is a flavour of uncertainty, is worthy of more scepticism.
In this essay —
Don't worry if you don't yet know what monads are. By the end of this essay you'll understand them as well as I do, which is enough to nod along when you hear "monad this" and "monad that".
The correspondence explicitly.
What's a flavour of uncertainty?
Recall from the introduction that I'm tasked with representing or modelling both the election itself and your belief about the election. The first step of this task is to settle on a particular flavour of uncertainty to characterise the belief-states — possibilistic, probabilistic, infrabayesian, etc. One might ask, of this flavour of uncertainty, the following four questions —
What's counts as a distinct belief about the election? Concretely, if there are n electoral candidates then how many distinct belief-states are there?
If you're certain that a particular candidate will win the election (and I know which candidate) then how should I determine your belief-state?
Suppose a number of forecasters are speculating on the election. If I'm given the belief of each forecaster about the election, and I'm given your belief about the forecasters' beliefs, then how should I determine your belief about the election itself?
Suppose there are two completely unrelated elections happening somewhere. If I'm given your belief about the first election, and your belief about the second election, then how should I determine your belief about the pair of elections?
These four questions — Count? Certainty? Collapse? Combine? — are essentially epistemological questions, and they collectively pin down what I mean by a flavour of uncertainty.[2] As we will see, a monad corresponds to answers to the first three questions and a commutative monad corresponds to answers to all four questions.
Exercise 1: How would you answer these questions for possibilistic uncertainty? Or for probabilistic uncertainty?
Exercise 2: As I mentioned before, an answer to Count? is a set B(S) for each set S. What about for Certainty? Collapse? and Combine?
What's a (commutative) monad?
Monads were born of category theory — a field of mathematics which many regard as arcane, mystical, or downright kabbalistic — but monads can (I think) be understood by someone lacking any acquaintance with category theory whatsoever. Indeed, my claim in this essay is that monads correspond exactly to Map-Territory-like relations, and such relations will be familiar to anyone who's both got a brain and pondered this predicament.
I'll first write down the mathematical definition of a monad, and then I'll explain how this definition mirrors the four epistemological questions.
How do they correspond to each other?
In short, there is an exact correspondence between the operators of a (commutative) monad and the four epistemological questions. Let's go one-by-one.
An answer to this question is the constructor operator, assigning a set B(S) to each set S. If S is the set of potential outcomes of an event then B(S) is the set of beliefs about the event.
As we discussed before, for possibilistic uncertainty B(S):=P+(S), and for probabilistic uncertainty B(S):=Δ(S).
Here, an answer will be the return operator assigning a function ηS:S→B(S) to each set S. If you're certain that a state s∈S will occur, then ηS(s)∈B(S) is your belief-state.
For possibilistic uncertainty, ηS(s):={s}∈P+(S), the singleton set containing s. And for probabilistic uncertainty, ηS(s):=δs∈Δ(S), the dirac distribution at s given by δs:s′↦{1if s′=s0otherwise.
The function ηS:S→B(S) describes how the state-space embeds in the belief-state-space. This is related, I think, to the idea that each territory can serve as its own map. (See Borges' On Exactitude in Science for an exploration of this theme.) Or in the words of Norbert Wiener, “The best model of a cat is another, or preferably the same, cat.”
Here, an answer will be the bind operator assigning a function ⊳WS:B(W)×(W→B(S))→B(S) to each pair of sets W and S. You should think of the bind operator as collapsing your second-order beliefs to your first-order beliefs — i.e. if each forecaster w∈W has an first-order belief f(w)∈B(S), and w∈B(S) is your second-order belief about which forecaster is correct, then (w⊳WSf)∈B(S) should be your first-order belief about the election.
For possibilistic uncertainty, w⊳f∈P+(S) is the union ⋃w∈wf(w). And for probabilistic uncertainty, w⊳f∈Δ(S) is the summation/integral s′↦∑w∈ww(s)⋅f(w)(s′).
This is related to the idea that a map of a map of a territory is a map of that same territory; a depiction of a depiction of person is a depiction of that same person, a representation of a representation of an idea is a representation of that same idea; etc.
One might think of f:W→B(S) as some parameterisation of the belief-state B(S) using some parameters W. Then the bind operator gives us the function for finding your S-belief from you W-belief. Explicitly, this function is(−⊳WSf):B(W)→B(S),w↦w⊳WSf.
Moreover, the bind operator doesn't just flatten one level of "meta". Often we have an entire hierarchy of state-spaces S0,S1,S2,…,Sn where beliefs about Si are parameterised by some "higher" state-space Si+1 via a function fi:Si+1→B(Si). Here, the state-space S0 is the object-level system, the state-space S1 parametrises your first-order beliefs about S0, the state-space S2 parameterises your second-order beliefs about S1, and so on. Then the bind operator says that I can collapse your nth-order beliefs all the way to your first-order beliefs via the function (−⊳fn−1⊳⋯⊳f0):B(Sn)→B(S0).[4]
An answer will be the product operator ⊗ assigning a function ⊗AB:B(A)×B(B)→B(A×B) to each pair of sets A and B. If a∈B(A) is your belief about the first election and b∈B(B) is your belief about an unrelated second election, then a⊗ABb∈B(A×B) is your belief about the pair of elections.
For possibilistic uncertainty, a⊗b∈P+(A×B) is the cartesian product {(a,b)∈A×B:a∈a,b∈b}. And for probabilistic uncertainty, a⊗b∈Δ(A×B) is the joint distribution (a,b)↦a(a)⋅b(b).
Thinking of S1×⋯×Sn as a factorisation of the state-space S, the product operator implies that your beliefs about each Si combine to yield your overall belief about S. That is, a commutative monad B corresponds to a flavour of uncertainty that you can have to parts of the world, whereas a non-commutative monad B corresponds to a flavour of uncertainty that you can only have to the world in its entirety.
Historical note: The central thesis of this essay is that there is a uniform correspondence between flavours of uncertainty and monads. I call this Myers' correspondence after David Jaz Myers, because I first encountered the idea in his book Categorical Systems Theory, where he devotes a chapter to using commutative monads to model various nondeterminism of automata. Nonetheless, he idea did not originate with him, he's never claimed it is true, and I don't know if he agrees with it.
Examples of Myers' correspondence
The correspondence between he operators of the (commutative) monad and the epistemological questions also serves as a practical recipe for formalising different flavours of uncertainty using monads. I've personally found it useful. First, think about the particular flavour of uncertainty, then answer the Four C's (Count? Certainty? Collapse? Combine?), convert those answers into mathematical operators, and voilà you've got yourself a monad.
I'll now zoom through fifteen examples, beginning (without commentary) with the paradigm examples of P+ and Δ.
1 - nonempty powerset monad
2 - distribution monad
3 — reader monad from H
Okay, now let's deal with a flavour of uncertainty which is sometimes called "indeterminacy". An indeterminate belief is something like "Well, if h1 is true then x1, but if h2 is true then x2, but–", i.e. it's a belief which is uncertain because your best guess depends on some unknown variable. More formally, your belief-state is given by a particular function from H (the possible values of the unknown variable) to S (the state-space).
This is an ordinary usage of the word "uncertain" so, by Myers' correspondence, it must correspond to a monad, and we can discover which monad by answering the four Cs. If S is the state-space then the belief-state-space is given by SH, the set of functions s:H→S. So our construct operator is (−)H. If you're certain tha tthe outcome is s∈S then your belief-state is the constant function cs:h↦s. The intuitive answers to Collapse? and Combine? give us our bind and product operators.
Overall, we get what's called the reader monad from H.
4 — writer monad to [0,1]
Often, people will report their uncertain beliefs like "The coin will land heads (98%)" or "AI will disempower humanity (60%)". That is, their belief is a best guess paired with their confidence, which they offer as a lower-bound on the likelihood of that their guess is correct. A certain belief-state would be something like "The coin will land heads (100%)".
What monad corresponds to this flavour of uncertainty?
If S is the state-space then S×[0,1] is the belief-state-space, i.e. there's a distinct belief-state for each pair s=(s,q)∈S×[0,1]. If you're certain that the outcome is s∈S then your belief-state is (s,100%)∈S×[0,1]. Uncertainty is collapsed by multiplying the confidences. Uncertainty is combined also by multiplying the confidences.
Ta-da! The writer to [0,1] monad..
Using the writer to [0,1] monad, we've characterised a belief-state as an outcome marked with some additional metadata, namely a confidence p∈[0,1]. What properties of the interval [0,1] did we appeal to in this definition? Well, firstly that we can multiply different elements (see bind and product operators). And secondly, that there's a fixed element such that multiplying with this element does nothing (see return operator).
Hence we can generalise: given any monoid (M,e,⊙) we have a monad B(S)=S×M called the writer-to-M monad.[5] By using different monoids, we can model different flavours of uncertainty, but note that this is only a commutative monad when (M,e,⊙) is a commutative monoid.
There's another ordinary usage of the word "uncertainty" where an uncertain belief would be something like "AGI arrives before 2040 unless there's a nuclear war" and a certain belief would be something like "AI will arrive before 2040." At least, with regards to teh binary question of whether AGI arrives before 2040. That is, an uncertain belief is one with an "unless..." clause.
Formalising this, we have a fixed set of events F, and a belief-state is a pair (s,E)∈S×F. Your belief-state is (s,E) when you commit to the state s∈S occurring unless the event E∈F occurs. This flavour of uncertainty corresponds to the writer monad B(S)=S×F, where F is a monoid when equipped with union ∪:F×F→F and the empty set ∅∈F.
One might use this flavour of uncertainty to models various kinds of defeasible reasoning, where a belief-state (s,E) is characterised by the precondition E under which the belief would be defeated or disavowed.
Or maybe an uncertain belief is a one full of amendments, clarifications, conditions, disclaimers, excuses, hedges, limitations, qualification, refinements, reservations, restrictions, stipulations, temperings, etc. By contrast, a certain belief is made "with no ifs or buts", bare and direct.
Formalising this, we have a fixed set of clarifications C, and a belief-state is a pair (s,l)∈S×List(C). Here, List(C) is the free monoid over the set of clarifications C equipped with concatenation +:List(C)×List(C)→List(C) and the empty list []∈List(C).
Now, the writer to List(C) monad isn't a commutative monad. Or interpreted philosophically, a clarified guess isn't the kind of uncertainty you can have to parts of the world. Suppose "I think Alice is happy but I don't know her very well" is my belief-state about Alice, and "I think Bob is happy but he's difficult to read" is my belief-state about Bob. What's my belief-state about both Alice and Bob? Is it (1) "Alice and Bob are both happy, but I don't know Alice very well and Bob is difficult to read" or (2) "Alice and Bob are both happy, but Bob is difficult to read and I don't know Alice very well". That is, in which order should we combine the clarifications?
The instinctive trick is to declare that two belief-states are equal if the lists of clarifications are equal up-to-permutation — this implies that (1) and (2) are the same belief-state, which does seem intuitive to me. If we play this trick, then the resulting flavour of uncertainty is captured by the writer-to-N[C] monad, where N[C] is the free commutative monoid. This does indeed give a commutative monad!
5 — identity monad
If we've anticipating an election between n candidates, then the simplest way to characterise your belief about the election by your best guess with no additional information about how unsure you are. If S is the state-space then S is also the belief-state-space, i.e. there's a distinct belief-state for each s∈S. The set of belief-states is therefore equal (up to bijection) to the set of outcomes itself.
I'll admit that this flavour of uncertainty is somewhat degenerate — e.g. every belief-state is a certainty in some particular state — but it's worth including nonetheless. On some readings of Wittgenstein's Tractatus, this is his model of how language represents the world, our utterances stand in direct isomorphism with the state-of-affairs.
Anyway, answering the four Cs would give the identity monad!
6 — maybe monad
The last example was a bit silly, so how about this instead..?
If we've anticipating an election between n candidates, then I'll characterise your belief about the election either by your best guess (with no additional information) or an "I don't know" response. This is an very coarse-grained flavour of uncertainty — the only belief-state about the election (other than certainty in a particular candidate) is the belief-state of utter cluelessness, or shrugging one's shoulders!
Despite the coarse-grained-ness, it's pretty commonly encountered in the wild. For example, it's the typical flavour of uncertainty encountered in surveys/questionnaires, where ⊥ is read as "no opinion/don't know". It's also encountered in voting, where ⊥ is read as "abstention".
Formally speaking, if S is the state-space then there's a distinct belief-state for each state s∈S plus an additional option denoted ⊥. The belief-state-space is therefore S+1, denoting the disjoint union of S with the singleton set {⊥}. If you're certain that the outcome is s∈S then your belief-state is s∈S. This flavour of uncertainty corresponds to the famous maybe monad.
7 — K-distribution monad
You might, at this point, feel short-changed. I've discussed so far a range of flavours of uncertainty which are all coarser-grained than probabilistic knowledge, so why not stick to Δ? Let's consider then a more fined-grained characterisation of belief-state, one that tracks infinitesimal differences between probability assignments.
The Levi-Civita Field is an extension of the real numbers which contains infinitesimal values like ϵ,ϵ2,2ϵ+ϵ2,π2√ϵ and infinite values like ϵ−1,ϵ−2,ϵ1/3+ϵ−1/3+2. We can replace [0,1] in the definition of Δ with LCF to obtain a monad ΔLCF corresponding this flavour of uncertainty. On this account, a belief-state x∈ΔLCF(X) is something which tracks the potentially infinitesimal likelihood x(x)∈LCF of each outcome x∈X. This flavour of uncertainty has applications in infinite ethics and cooperation in large worlds.
For example, in a universe with infinite radius ϵ−1, what's your prior likelihood that you occupy the most central galaxy? Presumably, the likelihood should be ϵ3/ρ∈LCF, where ρ∈R+ is the density of galaxies.
Now suppose you were offered a lottery which promises to benefit everyone by δ if you indeed occupy the most central galaxy but otherwise benefits no one. What's this lottery worth? Presumably, it's worth δ, because the infinitary stakes δ⋅ρ⋅ϵ−3 are cancelled out by the infinitesimal chance of winning ϵ3/ρ.
Note that because LCF is totally-ordered, once we assign LCF values to different lotties, we can perform expected utility maximisation as usual, and get sensible results. I think that infinitesimal probabilities resolves some (but not all) problems in infinite ethics. I'm particularly lured by the hope that, in an infinite cosmos, the infinitary stakes might somehow cancel out with infinitesimal probabilities to yield finite values. See Joe Carlsmith's essay On Infinite Ethics for further discussion.
How far can one generalise the kind of entity that a "probability" must be, before our definition breaks? Well, so long as we have some rig K, we can define a monad ΔK by replacing [0,1] with K. A rig is a set K equipped with a zero element 0∈K, a unit element 1∈K, an addition function ⊕:K×K→K, and a multiplication function ⊗:K×K→K, satisfying certain algebraic laws. By choosing different rigs K then we obtain different monads ΔK corresponding to different flavours of uncertainty.
When K:=[0,1] we obtain the ordinary probability distributions, and when K:=Q∩[0,1] we obtain the rational probability distributions, etc. Toby Fritz suggests that by using similar tricks we might obtain quantum uncertainty, fuzzy uncertainty, and Dempster–Shafer uncertainty, but I haven't checked whether this is true.
8 — quantum monad
For sure, quantum mechanics is endowed with its own flavour of uncertainty, hence the term Heisenberg's Uncertainty Principle. It's not impossible to catch a physicist saying "it's uncertain whether the qubit is 0 or 1" or "it's uncertain whether the cat is alive or dead", regardless of whether they consider quantum uncertainty as strictly speaking epistemic. By Myers' correspondence, this flavour of uncertainty must correspond to a monad.
Exercise 3: Which?[6]
9 — smooth state monad
The position of the North Star in the night sky is constant, static, immutable, certain; the position of Mercury, by contrast, is variable, dynamic, mutable, uncertain. Is this not a common sense of the word? Might one not say that my belief-state about Mercury's position will forever be uncertain, no matter how accurate my telescope or exhaustive my calculations, because my belief is always revised? If so, then by Myers' correspondence this flavour of uncertainty corresponds to a monad.
To formalise this, let's fix a differentiable manifold Θ parameterising your internal mental state as you think about a question. Note that because Θ is a differentiable manifold, it's equipped with tangent space Tθ at every θ∈Θ.
If S is the state-space, then ∏θ∈Θ(S×Tθ) is your belief-state-space. In other words, we have a distinct belief-state for each smooth transition function s∈∏θ∈Θ(S×Tθ). A belief-state s is characterised by a pair s(θ)=(s,v) for each θ∈Θ, where s∈S is your current guess and v∈Tθ is the tangent vector describing how your mental state is evolving. If you're certain that the winner is s∈S then your belief-state is the static transition function η(s):θ↦(s,0) where 0∈Tθ is the zero vector.
This is the smooth state monad — it's a differentiable version of the discrete-time state monad, with the additional benefit that it's commutative monad.
10 — continuation monad
What are belief-states actually for anyway? What purpose do they play in rational decision-making? According to one school of thought, belief-states are simply gadgets for taking expected values, and chiefly for taking expected utility values.
Let's say S is the set of candidates running in the election, and v:S→R is your utility function, i.e. v(s)∈R measures how happy you'd be to hear that the candidate s∈S has won. Then your ex-ante utility is some r∈R measuring how happy you are now in anticipation of the outcome. Given your belief-state, I should be able to determine r∈R from v:S→R, which implies that I can just characterise your belief-state about the election by how r∈R is determined from v:S→R. Neat.
This is formalised by the so-called continuation to R monad. If S is the state-space then K(S,R) is the belief-state-space, where K(S,R) is the set of functionals s:(S→R)→R. And a belief-state s:(S→R)→R is certain in the outcome s∈S if s determines your ex-ante utility simply by evaluating your utility function at s, i.e. s=λv:S→R.v(s).
The continuation monad encompasses both possibilistic uncertainty and probabilistic uncertainty. If the nonempty subset A∈P+(X) models your possibilistic uncertainty then the associated functional x∈K(X,R) is given by λv:X→R.min{v(x):x∈A}. If the distribution μ∈Δ(X) models your probabilistic uncertainty then the associated functional x∈K(X,R) is given by λv:X→R.Ex∼μ[v(x)].
Exercise 4: (Beginner) Prove that the two maps P+(X)→K(X,R) and Δ(X)→K(X,R) are injections. (Advanced) Prove these injections are monad transformers.[8]
11 — signature monad
Maybe I should characterise your belief-state about something by the sentence that you'd utter about the outcome. This will result in a more syntactic or linguistic account of belief. You might imagine here a shared language, like English or Python, with which a speaker may report their beliefs to a friend. Or you might imagine a private mental language in which a brain/AI will store their knowledge about the world.
To make this rigorous, I must introduce a language containing all the sentences that you might utter about the outcome. Our language will include an atomic sentence ┌s┐ for every outcome s∈S, along with certain connectives for combining sentences. For example, suppose we have a language with two symbols, a binary connective ∨ called disjunction and a unary connective ¬ called negation. If S:={s1,…,sn} are the candidates in an election, then a belief-state about the electoral outcome is a sentence like ┌s5┐ or ┌(s2∨¬s3)∨¬(s4∨s6)┐.
The logical connectives can be specified by a signature(Σ,arity:Σ→N}. A signature is a set Σ equipped with a map arity:Σ→N sending each connective to its arity. So the aforementioned language has the signature Σ={∨,¬} with arity(∨)=2 and arity(¬)=1.
We denote the resulting set of sentences by L(Σ,S). This is a set containing all the sentences freely generated from S using the connectives in Σ. Explicitly, L(Σ,S) is the smallest set such that ┌s┐∈L(Σ,S) for every s∈S and ┌σ(ϕ1,…,ϕk)┐∈L(Σ,S) for every σ∈Σ, arity(σ)=k, and ┌ϕi┐∈L(Σ,S).
With this machinery in place, we can answer the Four C's, and thereby find the corresponding monad.
Many monads are equivalent to L(Σ,−) for some signature Σ, including many monads we've already encountered.
Isn't the archetypal symbol of uncertainty... a fork in the road? Imagine a traveller facing two paths, left and right, each forking further ahead, and so on unboundedly, forming a fractal canopy of binary choices.
12 — algebraic theory
There's something a bit perverse about characterising your belief-state with a single utterance about the outcome. Namely, some utterances will be logically equivalent to each other, such as ┌ϕ┐ and ┌(ϕ∨ϕ)┐, and therefore the belief-state in which you're willing to utter ┌ϕ┐ is the exact same as the belief-state in which you're willing to utter ┌(ϕ∨ϕ)┐, assuming that you're both rational and honest. Therefore, our previous characterisation was overcounting the belief-states by distinguishing logically-equivalent sentences. Bizarrely, there would be infinitely-many belief-states about a single coin flip — i.e. ┌H┐, ┌(H∨H)┐, ┌(H∨(H∨H))┐, and so on.
To fix this, what we need isn't just a signature Σ, but rather a signature Σ paired with a set E of equational axioms, which is called an algebraic theory. An equational axiom is a pair of sentences built using the connectives in Σ and some placeholder sentence variables {a,b,c,…}. We use E to define an equivalence relation ∼E on L(Σ,X) by taking the deductive closure of the axioms, and then the equivalence classes of the sentences will be our belief-states.
For example, if our signature is {∨} and we intend to interpret the ∨ connective as disjunction, then E should consist of three axioms:
Furnished with the concept of an algebraic theory, we can now improve our answers:
If a monad B is equivalent to L(Σ,E,−) for some algebraic theory (Σ,E) then we call (Σ,E) a presentation of the monad.[11] A presentation of a monad is a rather nice description of a flavour of uncertainty via some operators for defining belief-states in terms of other belief-states and some rules governing those operators.
Exercise 5: Find a presentation for ΔK for an arbitrary rig K.
13 — convex powerset of distributions monad
As we saw before, the continuation monad K(−,R) encompasses both possibilistic and probabilistic uncertainty. Unfortunately K(−,R) lacks any presentation, even if we allow connectives with infinite arity![12] Fortunately, there exists a monad encompassing both possibilistic and probabilistic uncertainty which is presentable.
Recall that the nonempty finite powerset monad P+f, which corresponds to possibilistic uncertainty, is presented by the theory of semilattices (Σ1,E1). And the distribution monad Δ, which corresponds to probabilistic uncertainty, is presented by the theory of convex algebras (Σ2,E2). Consider the theory (Σ1∪Σ2,E1∪E2∪D) where D={a+p(b∨c)=(a+pb)∨(a+pc)} is an additional axiom of describing how the +p connectives distribute over the ∨ connective.
This new theory is a presentation the convex powerset of distributions monad. This monad, denoted by C, corresponds to a flavour of uncertainty wherein a belief-state is a convex set of distributions, e.g. "The coin lands either heads (20-30%) or tails (70-80%)." (See credal sets.)
Now, we could have defined C in an entirely non-syntactic way, i.e. "C(X) is the set of nonempty finitely-generated convex-closed sets of finite-support distributions over X." But I think the syntactic definition, in terms of the algebraic theories for P+ and Δ, elucidates why C is a well-motivated unification of probabilistic and possibilistic uncertainty. We will employ a similar strategy for motivating infrabayesianism — roughly speaking, infrabayesianism is exactly what you get when you combine probabilistic and possibilistic uncertainty with reward.
∨ is semilattice,
i.e. a∨a=a
a∨b=b∨a
a∨(b∨c)=(a∨b)∨c
{+p:p∈(0,1)} is convex algebra,
i.e. a+pa=a
a+pb=b+1−pa
((a+pb)+qc)=(a+p⋅q(b+(1−p)⋅q1−p⋅qc))
+p distributes over ∨,
i.e. a+p(b∨c)=(a+pb)∨(a+pc)
┌x┐ is certainty in an outcome x∈X.
┌ϕ1∨ϕ2┐ is possibilistic uncertainty between ┌ϕ1┐ and ┌ϕ2┐.
┌ϕ1+pϕ2┐ is probabilistic uncertainty between ┌ϕ1┐ (with chance p) and ┌ϕ2┐ (with chance 1−p).
14 — free convex lattice monad
There's a common usage of the word "uncertainty", where the uncertainty is modulo strategic choice. For example, you might hear "Black is certain to win" from a chess commentator if Black can force a checkmate, or hear "the winner is still uncertain" from a poker commentator during the flop. By Myers' correspondence, this flavour of uncertainty — call it "ludic uncertainty" — must correspond to some monad, but which?
Consider the theory of convex lattices — with signature ΣG={∨,∧,0,1}∪{+p:p∈(0,1)} and the following axioms:
Then G:=L(ΣG,EG,−) is a monad corresponding, I think, to the aforementioned flavour of uncertainty. It sends a set X to the set G(X), the free convex lattices over X. An element of G(X) should be read as a game-tree whose non-leaf nodes are either a free binary choice by White, a free binary choice by Black, or a biased coin flip. The leaf nodes may be either wins for White, wins for Black, or an element of the set X.
We treat game-trees g,g′∈G(X) as equivalent if the same outcome would result from g and g′ regardless of the player's preferences over the elements of X. For example, the lattice axioms ϕ∨0=ϕ and ϕ∧1=ϕ will hold because no player would willingly choose to loose, and the axioms ϕ∨(ϕ∧ψ)=ϕ and ϕ∧(ϕ∨ψ)=ϕ establish that the players are adversarial, i.e. would never willingly empower one another.
Exercise 7: Consider the game ┌((1∨x2)∧((x2+0.80)∨(x2∧1)))∧(x2∨(x5+0.5(x3∧x4)))┐ shown below. Which outcome is (ludically) certain?
Note that G(X) aren't really games in the usual sense, because leaf nodes might be elements of X, and we treat these elements are pairwise incomparable to both players. So you should think of G(X) as a set of partially-specified game trees. A fully-specified game tree would be an element of G([0,1]), which is a game tree where each leaf-node returns some [0,1]-valued utility to Black and disutility to White. You may notice that [0,1] can itself be equipped with the structure of a convex lattice, which just means there exists a G-algebra V:G([0,1])→[0,1].[14] This G-algebra is exactly the well-known used in combinatorial game theory.
{∧,∨,0,1} is a lattice.
{+p:p∈(0,1)} is convex algebra.
+p distributes over both ∧ and ∨,
i.e. a+p(b∨c)=(a+pb)∨(a+pc)
┌x┐ is a game which will certainly result in outcome x∈X.
┌0┐ is a game where White wins and ┌1┐ is a game where Black wins.
┌ϕ∧ψ┐ is a game where White can choose to play ϕ or to play ψ.
┌ϕ∨ψ┐ is a game where Black can choose to play ϕ or to play ψ.
┌ϕ+pψ┐ is a game where ϕ is played with chance p and ψ with chance 1−p.
15 — infrabayesianism
When agents have beliefs about the same environment that they're embedded in, weird things can happen. Over the past few years, Vanessa Kosoy and Alex Appell have been exploring a novel flavour of uncertainty — infrabayesian uncertainty — which they claim more fruitfully characterises the belief-states of embedded agents. In particular, it characterises belief-states concerning Newcomb-like environments, where the state of the environment is correlated with the agent's choice under consideration. Their flavour of uncertainty corresponds to the infrabayesian monad □.
Roughly speaking, □ is the same as G above except without the ∧ connective. Consider (Σ,E) the theory of convex semilattices with top and bottom, which is a presentation of the composite monad P+f∘Δ∘(−+2).[15] From what I understand, this monad P+f∘Δ∘(−+2) is Kosoy's infrabayesian monad □.[16] This justifies the claim that infrabayesianism is the flavour of uncertainty that minimally encompasses both possibilistic uncertainty (via the P+f monad), probabilistic uncertainty (via the Δ monad), and reward (via the (−+{0,1}) monad). I think that this motivates infrabayesianism as a characterisation of an agent's belief-state about their environment.
∨ is a semilattice with a∨0=a and a∨1=1.
{+p:p∈(0,1)} is convex algebra.
+p distributes over ∨,
i.e. a+p(b∨c)=(a+pb)∨(a+pc)
┌x┐ is an environment which certainly results in outcome x∈X.
0 is an impossible/contradictory environment where the agent achieves no disutility, called Nirvana.
1 is an environment where the agent suffers maximal disutility.
(ϕ∨ψ) is a environment which is either like ϕ or like ψ, and our agent should be pessimistic here.
(ϕ+pψ) is an environment which is like ϕ with chance p and ψ with chance 1−p.
Unfortunately, □ isn't a commutative monad, which means it's not a flavour of uncertainty that you can have to parts of the world, but only to the world in its entirety. Put starkly, there's no way to combine my infrabayesian belief-states about two coin toss to yield a single infrabayesian belief-state about the pair of coin tosses, even when the coin tosses are completely unrelated.[17] This, I think, limits both the theoretical appeal of infrabayesianism and its tractability.
Theoretically speaking, the fact that □ isn't a commutative monad weakens the analogy between infrabayesian uncertainty and possibilistic or probabilistic uncertainty. Many concepts are built upon possibilistic or probabilistic uncertainty which appeal, in an essential way, to the product operators ⊗P+ or ⊗Δ. And infrabayesianism, lacking such an operator, is not guaranteed the analogous concept.
Practically speaking, the lack of an infrabayesian product operator is an obstacle to parallelising algorithms which assume infrabayesian belief-states. There is no way to decompose the environment into separate components, discover an infrabayesian belief-state for each component, and then combine those belief-states into a single belief-state about the environment as a whole.
Implications for AI safety
Does this essay have any practical significance, or is it all just abstract nonsense? How does this help us solve the Big Problem? To be perfectly frank, I have no idea. Timelines are probably too short agent foundations, and this essay is maybe agent foundations foundations or something like that. But I feel compelled to offer some practical implications for AI safety to validate my decision to write this essay and your decision to read it.
Further questions
In so far as "flavours of uncertainty" is an informal term, there's little we can do to test the correspondence other than enumerating well-known flavours of uncertainty and checking that they do in fact correspond to monads, and vice-versa, enumerating the well-known monads and giving them natural doxastic interpretations. I think my own attempt has been positive, but this result is open to revision.
Secondly, the the biggest asterisks of my essay: my treatment of belief-states has been silent on their most important property, namely that they are learned. For example, a probability distribution can be conditioned on new evidence, and possibilistic uncertainty also carries an analogous notion of conditioning. Perhaps any characterisation of belief should answer additional questions about how those belief-state revised in light of new evidence/observations/considerations, etc. Perhaps we should append to Count? Certainty? Collapse? Combine? a fifth question, Condition? I'm sympathetic to this worry.
And if indeed learning is a phenomenon which must be modelled by any characterisation of belief, then monads do not themselves carry enough structure to characterise beliefs. Rather, we would need to equip the monad B with some additional structure, perhaps a family of maps learnS:O(S)×B(S)→B(S) for some spce of observations O(S), possibly satisfying some additional constraints such as learnS(o,−)∘ηS=ηS and learnS(o1⋅o2,−)=learnS(o2,−)∘learnS(o1,−). I'm just improvising here.
This is best left to future work, if the need arises.
In particular, I'm thinking of the applied category theory community.
Traditionally, the field of analytic epistemology has been concerned with defining epistemological concepts — i.e. constructing definitions for the concepts of knowledge, belief, evidence, learning, testimony, justification, etc. However, in recent years analytic epistemology has reorientated itself, chiefly under the influence of Timothy Williamson, towards modelling epistemological phenomena — i.e. constructing mathematical models for phenomena relating knowledge, belief, evidence, learning, testimony, justification, etc. This reorientation in epistemology, from concept-defining to model-building, was inspired by the natural sciences.
An operator F assigns, to every set X, another set/function F(X).
For example, P is the powerset operator, which assigns to every set X another set P(X). You can informally think of an operator as a function — but strictly speaking, an operator can't be a function because its domain would be the "set of all sets" (which doesn't exist).
Formally, the domain of an operator is something called a category. Categories can be larger than sets — in particular there is a category containing all the sets and the functions between them. For pedagogical purposes, I've framed everything in this article in terms of sets and functions, but most of the content of this article can applied to any category with enough structure.
And I suppose, by "generalising backwards", that my zeroth-order belief about the coin toss is the actual result of the coin toss..?
(M,e,⊙) is a monoid if (a⊙b)⊙c=a⊙(b⊙c) and e⊙a=a=a⊙e.
A monoid is like a group except the elements might not have inverses, e.g. (Z,0,+) is a group but (N,0,+) is only a monoid.
(M,e,⊙) is a commutative monoid if also a⊙b=b⊙a.
The writer monad for (M,e,⊙) is given by the data B(X)=X×M,ηBX(x)=(x,e), and x⊳BXYf=(y,a⊙b) where x=(x,a) and f(x)=(y,b).
Solution: I think B(X) is the X-dimensional hilbert space, but this isn't my expertise.
Suppose R has two distinct elements r1 and r2. Let a=λv:A→R.r1∈K(A,R) and b=λv:B→R.r2∈K(B,R). Then there are two ways to combine a and b into a single belief in K(A×B,R), i.e. λv:A×B→R.r1∈K(A×B,R) and λv:A×B→R.r2∈K(A×B,R). But these differ so K(−,R) is not a commutative monad for |R|≥2.
In fact, K(−,R) encompasses every other monad T such that α:T(R)→R is a T-algebra. This explains why K(−,R) encompasses both possibilistic and probabilistic uncertainty — specifically, it's because min:P+(R)→R is a P+-algebra and E:Δ(R)→R is a Δ-algebra.
Moreover, K(−,R) is the smallest monad with this property, because there's a bijection between T-algebras α:T(R)→R and monad morphisms T⇒K(−,R). See here for details.
That being said, K(−,R) isn't the smallest monad encompassing both P+ and Δ in particular. If you only need to encompass P+ and Δ then Vanessa Kosoy's infrabayesian monad □ will suffice, but □ is strictly contained within K(−,R).
For example, suppose w=┌((w1∨w2)∧¬w2)┐∈L(Σ,W) and f:W→L(Σ,S) satisfies f(w1)=┌(s3→¬s1)┐ and f(w2)=┌¬s5┐. Then we find s via uniform substitution.
s=w⊳WSf=┌((f(w1)∨f(w2))∧¬f(w2)┐=┌(((s3→¬s1)∨¬s5)∧¬¬s5)┐∈L(Σ,S)
In pythonese, S_string = ''.join(t if t in Sigma else f(t) for t in W_string)
Equivalently, we can define the bind operator recursively on the depth of w. For atomic sentences, ┌w┐⊳WSf=┌f(w)┐, and for compound sentences, ┌σ(ϕ1,…,ϕn)┐⊳WSf=┌σ(f(ϕ1),…,f(ϕk))┐.
In particular, suppose Σ={¬,!} contains two unary connectives. Suppose a=┌¬a┐∈L(Σ,A) is my belief-state about A and b=┌!b┐∈L(Σ,B) is my belief-state about B. Then there are two ways to combine these two beliefs into a single belief in L(Σ,A×B), i.e. ┌¬!⟨a,b⟩┐ and ┌!¬⟨a,b⟩┐. But these differ so L(Σ,−) is not a commutative monad.
Note that a monad might have many distinct presentations, and this non-uniqueness is rather distasteful. The more elegant treatment of monads is with Lawvere theories, where both atomic connectives and compound connectives are treated on par.
For any cardinality κ, we say that a monad has rank κ if it has a presentation with operations of arity at most κ. The continuation monad has no rank (not even an infinitary one) which is a somewhat perverse property for a monad. A rankless monad isn't generated by any algebraic theory, even if we allow infinitary operators.
We can see that K(−,R) is rankless monad because it contains P as a submonad for every |R|≥2, but P is a monad without rank.
The lattice axioms for the signature (∨,∧,0,1) consists of the semilattice axioms for ∨, the semilattice axioms for ∧, the boundary axioms a∨0=a and a∧1=a, and the absorption laws a∨(a∧b)=a and a∧(a∨b)=a.
The position evaluation function V:G([0,1])→[0,1] is defined inductively:
V(┌r┐)=r
V(┌0┐)=0
V(┌1┐)=1
V(┌ϕ1∨ϕ2┐)=max{V(┌ϕ1┐),V(┌ϕ2┐)}
V(┌ϕ1∧ϕ2┐)=min{V(┌ϕ1┐),V(┌ϕ2┐)}V(┌ϕ1+pϕ2┐)=p⋅V(┌ϕ1┐)+(1−p)⋅V(┌ϕ2┐)
That is, the signature Σ consists of the connectives {∨,0,1}∪{+p:p∈(0,1)}, and E contains the axioms: a∨a=a, a∨b=b∨a, a∨(b∨c)=(a∨b)∨c, a∨0=a, a∨1=1, a+p(b∨c)=(a+pb)∨(a+pc).
Strictly speaking, it's improper to speak of composing monads S and T unless you provide a distributive law of S over T, i.e. λ:S∘T→T∘S. But P∘Δ yields a monad C given by the convex powerset of distributions monad, and the exception monad (−+A) distributes over any monad, so no worries here.
A technical caveat:
Kosoy's infrabayesian monad □ is actually given by P+∘Δ∘(−+2) rather than P+f∘Δ∘(−+2) — that is, □ contains sets of distributions with arbitrary cardinality. A least, this is my reading from Diffractor's Infra-Miscellanea Section 2.
Unfortunately, □ is a rankless monad, i.e. it isn't generated by any algebraic theory even if we allow infinitary operators.
Fortunately, we may approximate □ with a monad of rank κ for any cardinality κ. Let's define □κ:=P+κ∘Δ∘(−+2), where P+κ(X) is the set of non-empty subsets of X of cardinality no greater than κ. Algebraically, we obtain □κ by adding the κ-ary disjunction connective ⋁κ to the signature for Δ∘(−+2).
This leaves the open question, for which cardinality κ is □κ an adequate and tractable approximation, if indeed any? I suspect κ=2ℵ0 suffices for all theoretical purposes, and that κ=2 suffices for all practical purposes.
This also applies to imprecise probability C and to strategic uncertainty G.
For example, given a series of two-player games g1,…,gn∈G([0,1]), there's no natural way to combine them into a single two-player game g1⊗⋯⊗gn∈G([0,1]n) because G isn't a commutative monad.
More generally, there's no commutative monad which contains both a ∨ operator and a +0.5 operator without conflating them. See here for details.