Fundamentally, finding a good mathematical definition of decision theory that encompasses all the phenomena people care about is a big open problem.
I think the most fundamental thing might be taking in a sequences of bits (or distribution over sequences if you think it's important to be analog) and outputting bits (or, again, distributions) that happen to control actions.
All this talk about taking causal models as an input is merely a useful abstraction of what happens when we do sequence prediction in our causal universe, and it might always be possible to find some plausible excuse to violate this abstraction.
If we want a space of all decision theories, what mathematical objects does it contain? For example, if a decision theory is a function, what are its domain and codomain?
The only approach I'm familiar with is to view expected utility maximizing decision theories as ways of building counterfactuals (section 5 in the FDT paper). A decision theory could then be described as a function that takes in a state s and an action a and spits out a distribution over world states that result from counterfactually taking action a in state s.
But EDT, CDT and FDT require different amounts and kinds of structure in the description of the state s they take as input (pure probability distributions, causal models and logical models respectively), so this approach only works if there is some kind of structure that is sufficient for all decision theories we might come up with at some point.