Here are some brief reasons why I dislike things like imprecise probabilities and maximality rules (somewhat strongly stated, medium-strongly held because I've thought a significant amount about this kind of thing, but unfortunately quite sloppily justified in this comment; also, sorry if some things below approach being insufficiently on-topic):
Thanks for the detailed answer! I won't have time to respond to everything here, but:
I like the canonical arguments for bayesian expected utility maximization ( https://www.alignmentforum.org/posts/sZuw6SGfmZHvcAAEP/complete-class-consequentialist-foundations ; also https://web.stanford.edu/~hammond/conseqFounds.pdf seems cool (though I haven't read it properly)). I've never seen anything remotely close for any of this other stuff
But the CCT only says that if you satisfy [blah], your policy is consistent with precise EV maximization. This do...
My initial impulse is to treat imprecise probabilities like I treat probability distributions over probabilities: namely, I am not permanently opposed, but have promised myself that before I resort to one, I would first try a probability and a set of "indications" about how "sensitive" my probability is to changes: e.g., I would try something like
My probability is .8, but with p = .5, it would change by at least a factor of 2 (more precisely, my posterior odds would end up outside the interval [.5,2] * my prior odds) if I were to spend 8 hours pondering the question in front of a computer with an internet connection; also with p = .25, my probability a year in the future will differ from my current probability by at least a factor of 2 even if I never set aside any time to ponder the question.
I agree that higher-order probabilities can be useful for representing (non-)resilience of your beliefs. But imprecise probabilities go further than that — the idea is that you just don't know what higher-order probabilities over the first-order ones you ought to endorse, or the higher-higher-order probablities over those, etc. So the first-order probabilities remain imprecise.
Sets of distributions are the natural elements of Bayesian reasoning: each distribution corresponds to a hypothesis. Some people pretend that you can collapse these down to a single distribution by some prior (and then argue about "correct" priors), but the actual machinery of Bayesian reasoning produces changes in relative hypothesis weightings. Those can be applied to any prior if you have reason to prefer a single one, or simply composed with future relative changes if you don't.
Partially ordering options by EV over all hypotheses is likely to be a very weak order with nearly all options being incomparable (and thus permissible). However, it's quite reasonable to have bounds on hypothesis weightings even if you don't have good reason to choose a specific prior.
You can use prior bounds to form very much stronger partial orders in many cases.
For humans (and probably generally for embedded agents), I endorse acknowledging that probabilities are a wrong but useful model. For any given prediction, the possibility set is incomplete, and the weights are only estimations with lots of variance. I don't think that a set of distributions fixes this, though in some cases it can capture the model variance better than a single summary can.
EV maximization can only ever be an estimate. No matter HOW you come up with your probabilities and beliefs about value-of-outcome, you'll be wrong fairly often. But that doesn't make it useless - there's no better legible framework I know of. Illegible frameworks (heuristics embedded in the giant neural network in your head) are ALSO useful, and IMO best results come from blending intuition and calculation, and from being humble and suspicious when they diverge greatly.
A couple years ago, my answer would have been that both imprecise probabilities and maximality seem like ad-hoc, unmotivated methods which add complexity to Bayesian reasoning for no particularly compelling reason.
I was eventually convinced that they are useful and natural, specifically in the case where the environment contains an adversary (or the agent in question models the environment as containing an adversary, e.g. to obtain worst-case bounds). I now think of that use-case as the main motivation for the infra-Bayes framework, which uses imprecise probabilities and maximization as central tools. More generally, the infra-Bayes approach is probably useful for environments containing other agents.
Thanks! Can you say a bit on why you find the kinds of motivations discussed in (edit: changed reference) Sec. 2 of here ad hoc and unmotivated, if you're already familiar with them (no worries if not)? (I would at least agree that rationalizing people's intuitive ambiguity aversion is ad hoc and unmotivated.)
I think this quote nicely summarizes the argument you're asking about:
Not only do we not have evidence of a kind that allows us to know the total consequences of our actions, we seem often to lack evidence of a kind that warrants assigning precise probabilities to relevant states.
This, I would say, sounds like a reasonable critique if one does not really get the idea of Bayesianism. Like, if I put myself in a mindset where I'm only allowed to use probabilities when I have positive evidence which "warrants" those precise probabilities, then sure, it's a reasonable criticism. But a core idea of Bayesianism is that we use probabilities to represent our uncertainties even in the absence of evidence; that's exactly what a prior is. And the point of all the various arguments for Bayesian reasoning is that this is a sensible and consistent way to handle uncertainty, even when the available evidence is weak and we're mostly working off of priors.
As a concrete example, I think of Jaynes' discussion of the widget problem (pg 440 here): one is given some data on averages of a few variables, but not enough to back out the whole joint distribution of the variables from the data, and then various decision/inference problems are posed. This seems like exactly the sort of problem the quote is talking about. Jaynes' response to that problem is not "we lack evidence which warrants assigning precise probabilities", but rather, "we need to rely on priors, so what priors accurately represent our actual state of knowledge/ignorance?".
Point is: for a Bayesian, the point of probabilities is to accurately represent an agent's epistemic state. Whether the probabilities are "warranted by evidence" is a nonsequitur.
we need to rely on priors, so what priors accurately represent our actual state of knowledge/ignorance?
Exactly — and I don't see how this is in tension with imprecision. The motivation for imprecision is that no single prior seems to accurately represent our actual state of knowledge/ignorance.
"No single prior seems to accurately represent our actual state of knowledge/ignorance" is a really ridiculously strong claim, and one which should be provable/disprovable by starting from some qualitative observations about the state of knowledge/ignorance in question. But I've never seen someone advocate for imprecise probabilities by actually making that case.
Let me illustrate a bit how I imagine this would go, and how strong a case would need to be made.
Let's take the simple example of a biased coin with unknown bias. A strawman imprecise-probabilist might argue something like: "If the coin has probability of landing heads, then after flips (for some large-ish ) I expect to see roughly (plus or minus ) heads. But for any particular number , that's not actually what I expect a-priori, because I don't know which is right - e.g. I don't actually confidently expect to see roughly heads a priori. Therefore no distribution can represent my state of knowledge.".
... and then the obvious Bayesian response would be: "Sure, if you're artificially restricting your space of distributions/probabilistic models to IID distributions of coin flips. But our actual prior is not in that space; our actual prior involves a latent variable (the bias), and the coin flips are not independent if we don't know the bias (since seeing one outcome tells us something about the bias, which in turn tells us something about the other coin flips). We can represent our prior state of knowledge in this problem just fine with a distribution over the bias.".
Now, the imprecise probabilist could perhaps argue against that by pointing out some other properties of our state of knowledge, and then arguing that no distribution can represent our prior state of knowledge over all the coin flips, no matter how much we introduce latent variables. But that's a much stronger claim, a much harder case to make, and I have no idea what properties of our state of knowledge one would even start from in order to argue for it. On the other hand, I do know of various sets of properties of our state-of-knowledge which are sufficient to conclude that it can be accurately represented by a single prior distribution - e.g. the preconditions of Cox' Theorem, or the preconditions for the Dutch Book theorems (if our hypothetical agent is willing to make bets on its priors).
really ridiculously strong claim
What's your prior that in 1000 years, an Earth-originating superintelligence will be aligned to object-level values close to those of humans alive today [for whatever operationalization of "object-level" or "close" you like]? And why do you think that prior uniquely accurately represents your state of knowledge? Seems to me like the view that a single prior does accurately represent your state of knowledge is the strong claim. I don’t see how the rest of your comment answers this.
(Maybe you have in mind a very different conception of “represent” or “state of knowledge” than I do.)
Right, so there's room here for a burden-of-proof disagreement - i.e. you find it unlikely on priors that a single distribution can accurately capture realistic states-of-knowledge, I don't find it unlikely on priors.
If we've arrived at a burden-of-proof disagreement, then I'd say that's sufficient to back up my answer at top-of-thread:
both imprecise probabilities and maximality seem like ad-hoc, unmotivated methods which add complexity to Bayesian reasoning for no particularly compelling reason.
I said I don't know of any compelling reason - i.e. positive argument, beyond just "this seems unlikely to Anthony and some other people on priors" - to add this extra piece to Bayesian reasoning. And indeed, I still don't. Which does not mean that I necessarily expect you to be convinced that we don't need that extra piece; I haven't spelled out a positive argument here either.
It's not that I "find it unlikely on priors" — I'm literally asking what your prior on the proposition I mentioned is, and why you endorse that prior. If you answered that, I could answer why I'm skeptical that that prior really is the unique representation of your state of knowledge. (It might well be the unique representation of the most-salient-to-you intuitions about the proposition, but that's not your state of knowledge.) I don't know what further positive argument you're looking for.
Someone could fail to report a unique precise prior (and one that's consistent with their other beliefs and priors across contexts) for any of the following reasons, which seem worth distinguishing:
I'd be inclined to treat all three cases like imprecise probabilities, e.g. I wouldn't permanently commit to a prior I wrote down to the exclusion of all other priors over the same events/possibilities.
What use case are you intending these for? Any given use of probabilities I think depends on what you're trying to do with them, and how long it makes sense to spend fleshing them out.
Predicting the long-term future, mostly. (I think imprecise probabilities might be relevant more broadly, though, as an epistemic foundation.)
An alternative to always having a precise distribution over outcomes is imprecise probabilities: You represent your beliefs with a set of distributions you find plausible.
And if you have imprecise probabilities, expected value maximization isn't well-defined. One natural generalization of EV maximization to the imprecise case is maximality:[1] You prefer A to B iff EV_p(A) > EV_p(B) with respect to every distribution p in your set. (You're permitted to choose any option that you don't disprefer to something else.)
If you don’t endorse either (1) imprecise probabilities or (2) maximality given imprecise probabilities, I’m interested to hear why.
I think originally due to Sen (1970); just linking Mogensen (2020) instead because it's non-paywalled and easier to find discussion of Maximality there.