Following the examples of Rob Bensinger and Rohin Shah, this post will try to clarify the aims of part of my research interests, and disclaim some possible misunderstandings about it. (I'm obviously only speaking for myself and not for anyone else doing decision theory research.)
I think decision theory research is useful for:
- Gaining information about the nature of rationality (e.g., is “realism about rationality” true?) and the nature of philosophy (e.g., is it possible to make real progress in decision theory, and if so what cognitive processes are we using to do that?), and helping to solve the problems of normativity, meta-ethics, and metaphilosophy.
- Better understanding potential AI safety failure modes that are due to flawed decision procedures implemented in or by AI.
- Making progress on various seemingly important intellectual puzzles that seem directly related to decision theory, such as free will, anthropic reasoning, logical uncertainty, Rob's examples of counterfactuals, updatelessness, and coordination, and more.
- Firming up the foundations of human rationality.
To me, decision theory research is not meant to:
- Provide a correct or normative decision theory that will be used as a specification or approximation target for programming or training a potentially superintelligent AI.
- Help create "safety arguments" that aim to show that a proposed or already existing AI is free from decision theoretic flaws.
To help explain 5 and 6, here's what I wrote in a previous comment (slightly edited):
One meta level above what even UDT tries to be is decision theory (as a philosophical subject) and one level above that is metaphilosophy, and my current thinking is that it seems bad (potentially dangerous or regretful) to put any significant (i.e., superhuman) amount of computation into anything except doing philosophy.
To put it another way, any decision theory that we come up with might have some kind of flaw that other agents can exploit, or just a flaw in general, such as in how well it cooperates or negotiates with or exploits other agents (which might include how quickly/cleverly it can make the necessary commitments). Wouldn’t it be better to put computation into trying to find and fix such flaws (in other words, coming up with better decision theories) than into any particular object-level decision theory, at least until the superhuman philosophical computation itself decides to start doing the latter?
Comparing my current post to Rob's post on the same general topic, my mentions of 1, 2, and 4 above seem to be new, and he didn't seem to share (or didn't choose to emphasize) my concern that decision theory research (as done by humans in the foreseeable future) can't solve decision theory in a definitive enough way that would obviate the need to make sure that any potentially superintelligent AI can find and fix decision theoretic flaws in itself.
It looks like you're interpreting this post as arguing for doing more decision theory research relative to other kinds of research, which is not really my intention, since as you note, that would require comparing decision theory research to other kinds of research, which I didn't do. (I would be interested to know how I might have given this impression, so I can recalibrate my writing in the future to avoid such misunderstandings.) My aim in writing this post was more to explain why, given that I'm not optimistic that we can solve decision theory in a definitive way, I'm still interested in decision theory research.
No, but I have considered it afterwards, and have added to my research interests (such as directly attacking 1) as a result. (If you're curious about how I got interested in decision theory originally, the linked post List of Problems That Motivated UDT should give a pretty good idea.)
If we do compare decision theory to other philosophical problems relevant to AI safety (say "how can we tell whether a physical system is having a positive or negative experience?" which I'm also interested in, BTW) decision theory feels relatively more tractable to me, and less prone to the sort of back-and-forth arguments between different camps preferring different solutions, common elsewhere in philosophy, because decision theory seems constrained by having to simultaneously solve so many problems that it's easier to detect when clear progress has been made. (However, lack of clear evidence of progress in decision theory in recent years could be considered argument against this.)
If other people have different intuitions (and there's no reason to think that they have especially bad intuitions) I definitely think they should pursue whatever problems/approaches seem most promising to them.
I'm not sure I understand this part. Are you saying there are problems that don't have direct relevance to AI safety, but have indirect relevance via 1/3/4? If so, sure you should write them up, depending on the amount of indirect relevance...
As explained above, it's not as simple as this, and I wasn't prepared to give a full discussion of "should you choose to work on decision theory or something else" in this post.