This post was from a long time ago. I think it is important to reconsider everything written, after developments in machine learning.
How are humans exploitable, given that they don't have utility functions?
Since humans are not EU maximizers and are exploitable, can someone give an example of how they are exploitable?
Is exploitability necessarily unstable? Could there be a tolerable level of exploitability, especially if it allows for tradeoffs with desirable characteristics that are only available to non-EU maximizers?"
Why is this not true for most humans? Many religious people would not want to modify the lightcone as they think that it's God's territory to modify.
The initial distribution of values need not be highly related to the resultant values after moral philosophy and philosophical self-reflection. Optimizing hedonistic utilitariansm, for example, looks very little like any values from the outer optimization loop of natural selection.
Although there would be pressure for an AI to not be exploitable, wouldn't there also be pressure for adaptability and dynamism? The ability to alter preferences and goals given new environments?
Why can't the true values live at the level of anatomy and chemistry?
Would this be solved if cresting a copy is creating someone functionally the same as you but who is someone else's identity, and not you?
Is this trivializing the concept of a Utility Function?