General response: I think you should revise the chances of this working way downwards until you have some sort of toy model where you can actually prove, completely, with no "obvious" assumptions necessary, that this will preserve values or at least the existence of an agent in a world. But I think enough has been said about this already.
Specific response:
Is the point you're making about the unpredictability of the outcome of optimizing for f? Because the abstract patterns favored by f will look like noise relative to physics?
"Looks like noise" here means uncompressability, and thus logical shallowness. I'll try again to explain why I think that relative logical depth turns out to not look like human values at all, and you can tell me what you think.
Consider an example.
Imagine, if you will, logical depth relative to a long string of nearly-random digits, called the Ongoing Tricky Procession. This is the computational work needed to output a string from its simplest description, if our agent already knows the Ongoing Tricky Procession.
On the other hand, boring old logical depth is the computational work needed to output a string from its simplest description period. The logical depth of the Ongoing Tricky Procession is not very big, even though it has a long description length.
Now imagine a contest between two agents, Alice and Bob. Alice knows the Ongoing Tricky Procession, and wants to output a string of high logical depth (to other agents who know the Ongoing Tricky Procession). The caveat is Bob has to think that the string has low logical depth. Is this possible?
The answer is yes. Alice and Bob are spies on opposite sides, and Alice is encrypting her deep message with a One Time Pad. Bob can't decrypt the message because, as every good spy knows, One Time Pads are super-duper secure, and thus Bob can't tell that Alice's message is actually logically deep.
Even if the Ongoing Tricky Procession is not actually that K-complex, Alice can still hide a message in it - she just isn't allowed to give Bob a simple description that actually decomposes into the OTP and the message.
This is almost the opposite of the Slow Growth Law. Slow Growth is where you have shallow inputs and you want to make a deep output. Alice has this deep message she wants to send to her homeland, but she wants her output to be shallow according to Bob. Fast Decay :P
Re: Generality.
Yes, I agree a toy setup and a proof are needed here. In case it wasn't clear, my intentions with this post was to suss out if there was other related work out there already done (looks like there isn't) and then do some intuition pumping in preparation for a deeper formal effort, in which you are instrumental and for which I am grateful. If you would be interested in working with me on this in a more formal way, I'm very open to collaboration.
Regarding your specific case, I think we may both be confused about the math. I think you are right...
I attended Nick Bostrom's talk at UC Berkeley last Friday and got intrigued by these problems again. I wanted to pitch an idea here, with the question: Have any of you seen work along these lines before? Can you recommend any papers or posts? Are you interested in collaborating on this angle in further depth?
The problem I'm thinking about (surely naively, relative to y'all) is: What would you want to program an omnipotent machine to optimize?
For the sake of avoiding some baggage, I'm not going to assume this machine is "superintelligent" or an AGI. Rather, I'm going to call it a supercontroller, just something omnipotently effective at optimizing some function of what it perceives in its environment.
As has been noted in other arguments, a supercontroller that optimizes the number of paperclips in the universe would be a disaster. Maybe any supercontroller that was insensitive to human values would be a disaster. What constitutes a disaster? An end of human history. If we're all killed and our memories wiped out to make more efficient paperclip-making machines, then it's as if we never existed. That is existential risk.
The challenge is: how can one formulate an abstract objective function that would preserve human history and its evolving continuity?
I'd like to propose an answer that depends on the notion of logical depth as proposed by C.H. Bennett and outlined in section 7.7 of Li and Vitanyi's An Introduction to Kolmogorov Complexity and Its Applications which I'm sure many of you have handy. Logical depth is a super fascinating complexity measure that Li and Vitanyi summarize thusly:
The mathematics is fascinating and better read in the original Bennett paper than here. Suffice it presently to summarize some of its interesting properties, for the sake of intuition.