I think you're collapsing some levels here, but it's making my head hurt to think about it, having the definition-deriver and the subject be the same person.
Making this concrete: let 'foobar' refer to the set {1, 2, 3} in a shared language used by us and our subject, Alice. Alice would agree that it is true that "foobar = what X would say about 'foobar' after being exposed to every possible argument concerning 'foobar'" where X is some algorithmic description of Alice. She would say something like "foobar = {1, 2, 3}, X would say {1, 2, 3}, {1, 2, 3} = {1, 2, 3} so this all checks out."
Clearly then, any procedure that correctly determines what X would say about 'foobar' should result in the correct definition of foobar, namely {1, 2, 3}. This is what theoretically lets our "simple" solution work.
However, Alice would not agree that "what X would say about 'foobar' after being exposed to every possible argument concerning 'foobar'" is a correct definition of 'foobar'. The issue is that this definition has the wrong properties when we consider counterfactuals concerning X. It is in fact the case that foobar is {1, 2, 3}, and further that 'foobar' means {1, 2, 3} in our current language, as stipulated at the beginning of this thought experiment. If-counterfactually X would say '{4, 5, 6}', foobar is still {1, 2, 3}, because what we mean by 'foobar' is {1, 2, 3} and {1, 2, 3} is {1, 2, 3} regardless of what X says.
Having written that, I now think I can return to your question. The answer is that firstly, by replacing the true definition "foobar = {1, 2, 3}" with "foobar is what X would say about 'foobar' after being exposed to every possible argument concerning 'foobar'" in the subject's mind, you have just deleted the only reference to foobar that actually exists in the thought experiment. The subject has to reason about 'foobar' using their built in definition, since that is the only thing that actually points directly to the target object.
Secondly, as described above "foobar is what X would say about 'foobar' after being exposed to every possible argument concerning 'foobar'" is an inaccurate definition of foobar when considering counterfactuals concerning what X would say about foobar. Which is exactly what you are doing when reasoning that "if-counterfactually I say {4, 5, 6} about foobar, then what X would say about 'foobar' is {4, 5, 6}, so {4, 5, 6} is correct."
Which is to say that, analogising, the contents of our subject's head is a pointer (in the programming sense) to the object itself, while "what X would say about 'foobar' after being exposed to every possible argument concerning 'foobar'" is a pointer to the first pointer. You can dereference it, and get the right answer, but you can't just substitute it in for the first pointer. That gives you nothing but a pointer referring to itself.
ETA: Dear god, this turned into a long post. Sorry! I don't think I can shorten it without making it worse though.
Right, so my point is that if your theory (that moral reasoning is probabilistic reasoning about some mathematical object) is to be correct, we need a definition of morality as a mathematical object which isn't "what X says after considering all possible moral arguments". So what could it be then? What definition Y can we give, such that it makes sense to say "when we reason about morality, we are really doing probabilistic reasoning about the mathematical object Y"?
Secondly, until we have a candidate definition Y at hand, we can't show...
What do I mean by "morality isn't logical"? I mean in the same sense that mathematics is logical but literary criticism isn't: the "reasoning" we use to think about morality doesn't resemble logical reasoning. All systems of logic, that I'm aware of, have a concept of proof and a method of verifying with high degree of certainty whether an argument constitutes a proof. As long as the logic is consistent (and we have good reason to think that many of them are), once we verify a proof we can accept its conclusion without worrying that there may be another proof that makes the opposite conclusion. With morality though, we have no such method, and people all the time make moral arguments that can be reversed or called into question by other moral arguments. (Edit: For an example of this, see these posts.)
Without being a system of logic, moral philosophical reasoning likely (or at least plausibly) doesn't have any of the nice properties that a well-constructed system of logic would have, for example, consistency, validity, soundness, or even the more basic property that considering arguments in a different order, or in a different mood, won't cause a person to accept an entirely different set of conclusions. For all we know, somebody trying to reason about a moral concept like "fairness" may just be taking a random walk as they move from one conclusion to another based on moral arguments they encounter or think up.
In a recent post, Eliezer said "morality is logic", by which he seems to mean... well, I'm still not exactly sure what, but one interpretation is that a person's cognition about morality can be described as an algorithm, and that algorithm can be studied using logical reasoning. (Which of course is true, but in that sense both math and literary criticism as well as every other subject of human study would be logic.) In any case, I don't think Eliezer is explicitly claiming that an algorithm-for-thinking-about-morality constitutes an algorithm-for-doing-logic, but I worry that the characterization of "morality is logic" may cause some connotations of "logic" to be inappropriately sneaked into "morality". For example Eliezer seems to (at least at one point) assume that considering moral arguments in a different order won't cause a human to accept an entirely different set of conclusions, and maybe this is why. To fight this potential sneaking of connotations, I suggest that when you see the phrase "morality is logic", remind yourself that morality isn't logical.