There's definitely some literature about "probability of probability" (I remember one bit from Jaynes' book). Usually when people try to go turbo-meta with this, they do something a little different than you, and just ask for "probability of probability of probability" - i.e. they ask only for the meta-meta-distribution of the value of the meta-distribution (or density function) at its object-level value.
Unsure if that's in Jaynes too.
Connection to logic seems questionable because it's hard to make logic and probability play nice together formally (maybe the intro to the Logical Inductors paper has good references for complaints about this).
Philosophically I think that there's something fishy going on here, and that calling something a "distribution over probabilities" is misleading. You have probability distributions when you're ignorant of something. But you're not actually ignorant about what probability you'd assign to the next flip being heads (or at least, not under Bayesian assumptions of infinite computational power).
Instead, the thing you're putting a meta-probability distribution over has to be something else that looks like your Bayesian probability but can be made distinct, like "long-run frequency if I flip the coin 10,000 times" or "correct value of some parameter in my physical model of the coin." It's very common for us to want to put probability distributions over these kinds of things, and so "meta-probabilities" are common.
And then your meta-meta-probability has to be about something distinct from the meta-probability! But now I'm sort of scratching my head about what that something is. Maybe "correct value of some parameter in a model of my reasoning about a physical model of the coin?"
In the setup of the question you caused my type checker to crash and so I'm not giving an answer to the math itself so much as talking about the choices I think you might need to make to get the question to type check for me...
Here is a the main offending bit:
So I... attach beliefs to statements of the form "my initial degree of belief is represented with probability density function ."
Well this is not quite possible since the set of all such is uncountable. However something similar to the probability density trick we use for continuous variables should do the job here as well.
When you get down into the foundations of math and epistemology it is useful to notice when you're leaping across the entire conceptual universe in question in single giant bounds.
(You can of course, do this, but then to ask "where would I be heading if I kept going like this?" means you leave the topic, or bounce off the walls of your field, or become necessarily interdisciplinary, or something like that.)
When you "attach beliefs to statements" you might be attaching them to string literals (where you might have logical uncertainty about whether they are even syntactically valid), or maybe you're attaching to the semantic sense (Frege's Sinn) that you currently impute to those string literals? Or maybe to the semantic sense that you WILL impute to those string literals eventually? Or to the sense that other people who are better at thinking will impute?
...or maybe are you really attaching beliefs to possible worlds (that is, various logically possible versions of the totality of what Frege's Bedeutung are embedded within) that one or another of those "senses" points at (refers to) and either "rules in or rules out as true" under a correspondence theory of truth...
...or maybe something else? There's lots of options here!
When I search for [possible worlds foundations bayes] the best of the first couple hits is to a team trying to deploy modal logics: The Modal Logic of Bayesian Belief Revision (2017).
When I search for [bayesian foundations in event spaces] there's an weird new paper struggling with fuzzy logic (which is known to cause bayesian logic to explode because fuzzy logic violates the law of the excluded middle) and Pedro Teran's 2023 "Towards objective Bayesian foundations with fuzzy events" found some sort of (monstrous?) alternative to bayes that don't work totally the same way?
Basically, there's a lot of flexibility in how you ground axioms to things that seem like they could be realized in physics (or maybe mere "realized" in lower level intuitively accessible axioms).
Using my default assumptions, my type checker crashed on what you said because all of the ways I could think to ground some of what you said in a coherent way... lead to incoherence based on other things you said.
I was able to auto-correct your example S(f) to something like you having a subjective probability that could be formalized P("As a skilled subjective Bayesian, fryolysis should represent fryolysis's uncertainty about a single stable fair coin's possible mechanical/structural biases that could affect fair tosses with the pdf after observing heads out of tosses of the coin.")
But then, for your example S(f), you claimed they were uncountable!?
But... you said statements, right?
And so each S(f) (at least if you actually say what the f is using symbols) can be turned into a gödel number, and gödel numbers are COUNTABLY finite, similarly to (and for very similar reasons as) the algebraic numbers.
One of the main ideas with algebraic numbers is that they don't care if they point to a specific thing hiding in an uncountable infinity. Just because the real neighborhood of π (or "pi" for the search engines) is uncountable doesn't necessarily make π itself uncountable. We can point to π in a closed and finite way, and since the pointing methods are countable, the pointing methods (tautologically)... are countable!
You said (1) it was statements you were "attaching" probabilities to but then you said (2) there were uncountably many statements to handle.
I suspect you can only be in reflective equilibrium about at most one of these claims (and maybe neither claim will survive you thinking about this for an adequately long time).
This is being filed as an "Answer" instead of a "Comment" because I am pointing to some of the nearby literature, and maybe that's all you wanted? <3
>Suppose that I have a coin with probability of heads . I certainly know that is fixed and does not change as I toss the coin. I would like to express my degree of belief in and then update it as I toss the coin.
It doesn't change, because as you said, you "certainly know" that p is fixed and you know the value of p.
So if you would like to express your degree of belief in p, it's just p.
>But let's say I'm a super-skeptic guy that avoids accepting any statement with certainty, and I am aware of the issue of parametrization dependence too.
In that case use Bayes' Theorem to update your beliefs about p. Presumably there will be no change, but there's always going to be at least a tiny chance that you were wrong and your prior needs to be updated.
Knowing (or assuming) that the value of does not change between experiments is a different kind of knowledge than knowing the value of .
OK. But if you yourself state that you "certainly know" -- certainly -- that p is fixed, then you have already accounted for that particular item of knowledge.
If you do not, in fact, "certainly know" the probability of p -- as could easily be the case if you picked up a coin in a mafia-run casino or whatever -- then your prior should be 0.5 but you should also be prepared to update that value according to Bayes' Theorem.
I see that you are gesturing towards assigning also the probability that the coin is a fair coin (or generally such a coin that has a p of a certain value). That is also amenable to Bayes' Theorem in a normal way. Your prior might be based on how common biased coins are amongst the general population of coins, or somewhat of a rough guess based on how many you think you might find in a mafia-run casino. But by all means, your prior will become increasingly irrelevant the more times you flip the coin. So, I don't think you need to be too concerned about how nebulous that prior and its origins are!
Suppose that I have a coin with probability of heads p. I certainly know that p is fixed and does not change as I toss the coin. I would like to express my degree of belief in p and then update it as I toss the coin.
Using a constant pdf to model my initial belief, the problem becomes a classic one and it turns out that my belief in p should be expressed with the pdf f(x)=(nh)xh(1−x)n−h after observing h heads out of n tosses. That's fine.
But let's say I'm a super-skeptic guy that avoids accepting any statement with certainty, and I am aware of the issue of parametrization dependence too. So I dislike this solution and instead choose to attach beliefs to statements of the form S(f)= "my initial degree of belief is represented with probability density function f."
Well this is not quite possible since the set of all such f is uncountable. However something similar to the probability density trick we use for continuous variables should do the job here as well. After observing some heads and tails, each initial belief function will be updated just as we did before, which will create a new uneven "density" distribution over S(f). When I want to express my belief that p is in between numbers a and b, now I have a probability density function instead of a definite number, which is a collection of all definite numbers from each (updated) prior. Now I can use the mean of this function to express my guess and I can even be skeptic about my own belief!
This first meta level is still somewhat manageable, as I computed the Var(μ) = 1/12 for the initial uniform density over S(f) where μ is the mean of a particular f. I am not sure whether my approach is correct, though. Since the domain of each f is finite, I discretize this domain and represent the uniform density over S(f) as a finite collection of continuous random variables whose joint density is constant. Then taking the limit to infinity.
The whole thing may not make sense at all. I'm just curious what would happen if we use even deeper meta levels, with the outermost level being the uniform "thing". Is there any math literature anybody knows that already explored something similar to this idea? Like maybe use of probability theory in higher-order logics?
Edit 1:
Let me rephrase my question in a more formal way so that everything becomes more clear.
Let S1=(Ω1,E1,P1) be our first probability space where Ω1 is the sample space coming from our original problem, E1 is the set of events considered that satisfy the rules for being a σ-algebra and P1 is the probability measure.
First of all, for full generality let us choose Ei=2Ωi for all i, that is, the set of all subsets of sample space is our event set. Such an Ei is always a σ-algebra for any Ωi.
Now let me define Ωi+1 to be the set of all possible probability measures Pi:2Ωi→[0,1] for all i. Note that Ωi+1 depends only on Ωi.
Let Sn=(Ωn,2Ωn,Pn) be the nth probability space where Ωn is constructed eventually from Ω1. The final ingredient missing is Pn, we would like it to be a "uniform" probability measure in some sense.
After we invent some nice "uniform" Pn, I plan to use this construct {Si}ni=1 as follows: An event en∈2Ωn occurs with probability Pn(en), which is just a set of probability measures all belonging to the (n−1)st level. Now we use each of these measures to create a set of probability spaces: {(Ωn−1,2Ωn−1,P)∣P∈en}.
Then for each of these spaces an event en−1 occurs with probability determined by the probability measure of that space and so on. A tree will be created whose leaves are elements of 2Ω1, the events of our original problem.
Now the same element of 2Ω1 can appear more than once among the leaves of this tree. So to compute the total probability that an event e1∈2Ω1 occurs, we should add up probabilities of all paths. The depth of the tree is finite, but the number of branches spawned at each level may not be countable at all, which seems to be a dead-end to our journey.
Additional constraints may mitigate this problem which I plan to explore in a later edit.