Infrabayesianism seems to me (Abram) like a very promising framework for addressing at least some of the problems of AI alignment.
- Like logical induction, it solves the realizability problem, creating an epistemic theory suitable for embedded agents.
- Unlike logical induction, and unlike standard Bayesian decision theory, it presents a theory of epistemics directly relevant to proving decision-theoretic results (in particular, useful learning-theoretic guarantees). Logical induction and standard Bayesian decision theories both can produce meaningful loss-bounding guarantees with respect to predictive error, but bounding decision error appears challenging for these approaches. Infrabayes provides a systematic way to get around this problem. Since decision error is much more meaningful for bounding risk, this seems highly relevant to AI safety.
- Being a new perspective on very basic issues, Infrabayesianism (or perhaps successors to the theory) may turn out to shed light on a number of other important questions.
(For more information on InfraBayes, see the infrabayesianism sequence.)
However, I believe infrabayesianism appears to have a communication problem. I've chatted with several people who have strongly "bounced off" the existing write-ups. (I'm tempted to conclude this is a near-universal experience.)
There was even a post asking whether a big progress write-up -- applying InfraBayes to naturalized induction -- had simply fallen through the cracks.
Personally, even though I've carefully worked through the first three posts and re-visited my notes to study them more than once, I still am not fluent enough to confidently apply the concepts in my own work when they seem relevant.
I would like to change this situation if possible. It's not obvious to me what the best solution is, but it seems to me like it could be possible to find someone who can help.
Properties which would make an applicant interesting:
- Must be capable of fully understanding the mathematics.
- See the sequence to get an idea of what kind of mathematics is involved; mainly topology, functional analysis, measure theory and convex analysis. Background in reinforcement learning theory is a bonus.
- Must have good judgement when it comes to math exposition.
- Must be a good editor.
Details of the job description are to be worked out, but probable activities include producing independent write-ups re-explaining InfraBayes from the ground up, in a more accessible way, assisting with the creation of a textbook and exercise sheet, and editing/writeups of additional posts.
(Even if not applying, discussion in the comments about possible ways to approach this bottleneck may be fruitful!)
Infradistributions are a generalization of sets of probability distributions. Sets of probability distributions are used in "imprecise bayesianism" to represent the idea that we haven't quite pinned down the probability distribution. The most common idea about what to do when you haven't quite pinned down the probability distribution is to reason in a worst-case way about what that probability distribution is. Infrabayesianism agrees with this idea.
One of the problems with imprecise bayesianism is that they haven't come up with a good update rule -- turns out it's much trickier than it looks. You can't just update all the distributions in the set, because [reasons i am forgetting]. Part of the reason infrabayes generalizes imprecise bayes is to fix this problem.
So you can think of an infradistribution mostly as a generalization of "sets of probability distributions" which has a good update rule, unlike "sets of probability distributions".
Why is this great?
Mainly because "sets of probability distributions" are actually a pretty great idea for decision theory. Regular Bayes has the "realizability" problem: in order to prove good loss bounds, you need to assume the prior is "realizable", which means that one of the hypotheses in the prior is true. For example, with Solomonoff, this amounts to assuming the universe is computable.
Using sets instead, you don't need to have the correct hypothesis in your prior; you only need to have an imprecise hypothesis which includes the correct hypothesis, and "few enough" other hypotheses that you get a reasonably tight bound on loss.
Unpacking that a little more: if the learnability condition is met, then if the true environment is within one of the imprecise hypotheses in the prior, then we can eventually do as well as an agent who just assumed that particular imprecise hypothesis from the beginning (because we eventually learn that the true world is within that imprecise hypothesis).
This allows us to get good guarantees against non-computable worlds, if they have some computable regularities. Generalizing imprecise probabilities to the point where there's a nice update rule was necessary to make this work.
There is currently no corresponding result for logical induction. (I think something might be possible, but there are some onerous obstacles in the way.)