Infrabayesianism seems to me (Abram) like a very promising framework for addressing at least some of the problems of AI alignment.
- Like logical induction, it solves the realizability problem, creating an epistemic theory suitable for embedded agents.
- Unlike logical induction, and unlike standard Bayesian decision theory, it presents a theory of epistemics directly relevant to proving decision-theoretic results (in particular, useful learning-theoretic guarantees). Logical induction and standard Bayesian decision theories both can produce meaningful loss-bounding guarantees with respect to predictive error, but bounding decision error appears challenging for these approaches. Infrabayes provides a systematic way to get around this problem. Since decision error is much more meaningful for bounding risk, this seems highly relevant to AI safety.
- Being a new perspective on very basic issues, Infrabayesianism (or perhaps successors to the theory) may turn out to shed light on a number of other important questions.
(For more information on InfraBayes, see the infrabayesianism sequence.)
However, I believe infrabayesianism appears to have a communication problem. I've chatted with several people who have strongly "bounced off" the existing write-ups. (I'm tempted to conclude this is a near-universal experience.)
There was even a post asking whether a big progress write-up -- applying InfraBayes to naturalized induction -- had simply fallen through the cracks.
Personally, even though I've carefully worked through the first three posts and re-visited my notes to study them more than once, I still am not fluent enough to confidently apply the concepts in my own work when they seem relevant.
I would like to change this situation if possible. It's not obvious to me what the best solution is, but it seems to me like it could be possible to find someone who can help.
Properties which would make an applicant interesting:
- Must be capable of fully understanding the mathematics.
- See the sequence to get an idea of what kind of mathematics is involved; mainly topology, functional analysis, measure theory and convex analysis. Background in reinforcement learning theory is a bonus.
- Must have good judgement when it comes to math exposition.
- Must be a good editor.
Details of the job description are to be worked out, but probable activities include producing independent write-ups re-explaining InfraBayes from the ground up, in a more accessible way, assisting with the creation of a textbook and exercise sheet, and editing/writeups of additional posts.
(Even if not applying, discussion in the comments about possible ways to approach this bottleneck may be fruitful!)
The reason you can't just update all the distributions in the set is, it wouldn't be dynamically consistent. That is, planning ahead what to do in every contingency versus updating and acting accordingly would produce different policies.
The correct update rule actually does appear in the literature (Gilboa and Schmeidler 1993). They don't introduce any of our dual formalisms of a-measures and nonlinear functionals, instead just viewing beliefs as orders on actions, but the result is equivalent. So, our main novelty is really combining imprecise probability with reinforcement learning theory (plus consequences such as FDT-like behavior and extensions such as physicalism) rather than the update rule (even though our formulation of the update rule has some advantages).
I'm not sure the part about "update rule was necessary" is true. Having a nice update rule is nice, but in practice it seems more important to have nice learning algorithms. Learning algorithms is something I only began to work on[1]. As to what kind of infradistributions do we actually need (on the range between crisp and fully general), it's not clear. Physicalism seems to work better with cohomogeneous compared to crisp, but the inroads in learning suggest affine infradistributions which is even narrower than crisp. In infra-Bayesian logic, both have different advantages (cohomogeneous admits continuous conjunction, affine might admit efficient algorithms). Maybe some synthesis is possible, but at present I don't know.
See this for some initial observations. Since then I arrived at regret bounds for stochastic linear affine bandits (both ~O(√n) for the general case and ~O(logn) for the gap case, given an appropriate definition of "gap") with a UCB-type algorithm. In addition, there is Tian et al 2020 which is stated as studying zero-sum games but can be viewed as a regret bound for infra-MDPs. ↩︎