I'm going to write a review of functional decision theory, I'll use the two papers.
It's going to be around as long as the papers themselves, coupled with school work, I'm not sure when I'll finish writing.
Before I start it, I want to be sure my criticisms are legitimate; is anyone willing to go over my criticisms with me?
 
My main points of criticism are:
Functional decision theory is actually algorithmic decision theory. It has an algorithmic view of decision theories. It relies on algorithmic equivalence and not functional equivalence.
Quick sort, merge sort, heap sort, insertion sort, selection sort, bubble sort, etc are mutually algorithmically dissimilar, but are all functionally equivalent.
 
If two decision algorithms are functionally equivalent, but algorithmically dissimilar, you'd want a decision theory that recognises this.
 
Causal dependence is a subset of algorithmic dependence which is a subset of functional dependence.
 
So, I specify what an actual functional decision theory would look like.
 
I then go on to show that even functional dependence is "impoverished".
 
Imagine a greedy algorithm that gets 95% of problems correct.
 
Let's call this greedy algorithm f'.
Let's call a correct algorithm f.
 
f and f' are functionally correlated, but not functionally equivalent.
 
FDT does not recognise this.
 
If f is your decision algorithm, and f' is your predictor's decision algorithm, then FDT doesn't recommend one boxing on Newcomb's problem.
 
EDT can deal with functional correlations.
 
EDT doesn't distinguish functional correlations from spurious correlations, while FDT doesn't recognise functional correlations.
 
I use this to specify EFDT (evidential functional decision theory), which considers P(f(π) = f'(π)) instead of P(f = f').
 
I specify the requirements for a full Implementation of FDT and EFDT.
 
I'll publish the first draft of the paper here after I'm done.
 
The paper would be long, because I specify a framework for evaluating decision theories in the paper.
 
Using this framework I show that EFDT > FDT > ADT > CDT.
I also show that EFDT > EDT.
This framework is basically a hierarchy of decision theories.
 
A > B means that the set of problems that B correctly decides is a subset of the set of problems that A correctly decides.
 
The dependence hierarchy is why CDT < ADT < FDT.
 
EFDT > FDT because EFDT can recognise functional correlations.
 
EFDT > EDT because EFDT can distinguish functional correlations from spurious correlations.
 
I plan to write the paper as best as I can, and if I think it's good enough, I'll try submitting it.

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 2:35 PM

You could also simply continue working on the review: you are clearly motivated to explore these issues deeper so why not start fleshing out the paper?

Note that I said "continue" rather than start. The barrier is often not the ideas themselves but getting it written in something approaching a complete paper. this is still the issue for me and I have 50+ peer reviewed papers in the past 20 years (although not in this field).

I will then.

I suggest you check with Nate what exactly he thinks, but my opinion is:

If two decision algorithms are functionally equivalent, but algorithmically dissimilar, you'd want a decision theory that recognises this.

I think Nate agrees with this, and any lack of functional equivalence is due to not being able to fully specify that yet.

f and f' are functionally correlated, but not functionally equivalent. FDT does not recognise this.

Can't this be modelled as uncertainty over functional equivalence? (or over input-output maps)?

Can't this be modelled as uncertainty over functional equivalence? (or over input-output maps)?

Hm, that's an interesting point. Is what we care about just the brute input-output map? If we're faced with a black-box predictor, then yes, all that matters is the correlation even if we don't know the method. But I don't think any sort of representation of computations as input-output maps actually helps account for how we should learn about or predict this correlation - we learn and predict the predictor in a way that seems like updating a distribution over computations. Nor does it seem to help in the case of trying to understand to what extend two agents are logically dependent on one another. So I think the computational representation is going to be more fruitful.