A child will eventually learn the independence. It's a very different thing to hardcode assume the independence from the start. Values are also complex and there might not be a natural line on what is natural to care about. If I take the the morality of the agent to be independent of it's decisions really far it will mean the AI believes it's a good boy or a bad boy already no matter what it does (and it just doesn't know which). This seems to be opposite what moral behavior is about. There is also no telling that the valued actions need to simple concepts. After all cake is good unless diabetes and death might not be all bad if euthanasia.
Also if I am inconsistent or plain out make an error on assigning the labels I would rather have the AI have a dialogue back to me about it than make silently a really convoluted utility function.
We could off course think that a value loader is a two step process where the values loaded are not taken as is as the actual utility function but modified to make sense in a "non-cheating way". That is the actual utility function always contains terms about moral fairness even if not explicitly input. This reduces the amount of values softcoded as meta-morality is hardcoded. But this layering also solves the original problem. The agent might evaluate the new situation with lower expected utility and would choose not to go there - if it had a choice. But by layering we are taking the choice away so the wrong valuation doesn't matter. By enforcing the hand-inputted conservation of moral evidence we tie the AIs hand to form a moral strategy. "We know better" before hand.
An interesting point, hinting that my approach at moral updating ( http://lesswrong.com/lw/jxa/proper_value_learning_through_indifference/ ) may be better than I supposed.
You know that when you title a post with "clarified", that you're just asking for the gods to smite you down, but let's try...
There has been some confusion about the concept of "conservation of expected moral evidence" that I touched upon in my posts here and here. The fault for the confusion is mine, so this is a brief note to try and explain it better.
The canonical example is that of a child who wants to steal a cookie. That child gets its morality mainly from its parents. The child strongly suspects that if it asks, all parents will indeed confirm that stealing cookies is wrong. So it decides not to ask, and happily steals the cookie.
I argued that this behvaiour showed a lack of "conservation of expected moral evidence": if the child knows what the answer would be, then that should be equivalent with actually asking. Some people got this immediately, and some people were confused that the agents I defined seemed Bayesian, and so should have conservation of expected evidence already, so how can they violate that principle?
The answer is... both groups are right. The child can be modelled as a Bayesian agent reaching sensible conclusions. If it values "I don't steal the cookie" at 0, "I steal the cookie without being told not to" at 1, and "I steal the cookie after being told not to" at -1, then its behaviour is rational - and those values are acceptable utility values over possible universes. So the child (and many value loading agents) are Bayesian agents with the usual properties.
But we are adding extra structure to the universe. Based on our understanding of what value loading should be, we are decreeing that the child's behaviour is incorrect. Though it doesn't violate expected utility, it violates any sensible meaning of value loading. Our idea of value loading is that, in a sense, values should be independent of many contingent things. There is nothing intrinsically wrong with "stealing cookies is wrong iff the Milky Way contains an even number of pulsars", but it violates what values should be. Similarly for "stealing cookies is wrong iff I ask about it".
But lets dig a bit deeper... Classical conservation of expected evidence fails in many cases. For instance, I can certainly influence the variable X="what Stuart will do in the next ten seconds" (or at least, my decision theory is constructed on assumptions that I can influence that). My decisions change X's expected value quite dramatically. What I can't influence is facts that are not contingent on my actions. For instance, I can't change my expected estimation of the number of pulsars in the galaxy last year. Were I super-powerful, I could change my expected estimation of the number of pulsars in the galaxy next year - by building or destroying pulsars, for instance.
So conservation of expected evidence only applies to things that are independent of the agent's decisions. When I say we need to have "conservation of expected moral evidence" I'm saying that the agent should treat their (expected) morality as independent of their decisions. The kid failed to do this in the example above, and that's the problem.
So conservation of expected moral evidence is something that would be automatically true if morality were something real and objective, and is also a desiderata when constructing general moral systems in practice.