Ah, I see. Thanks for taking the time to discuss this-- you've raised some helpful points about how my argument will need to be strengthened ("universal action" is good food for thought) and clarified (clearly, my account of wireheading is unconvincing).
The paper's been accepted, and I have a ton of editing to do (need to cut four pages!), so I may not be very quick to respond for the time being. I didn't want to disappear without warning, and without saying thanks for your time!
OK. I am skepical that the wirehead problem can be solved simply by invoking expected utillity maximisation. IMO, there are at least two problems that go beyond that:
How do you tell the system to maximise (say) temperature - and not some kind of proxy or perception of temperature?
How do you construct a practical inductive inference engine without using reinforcement learning?
FWIW, my current position is that this probably isn't our problem. The wirehead problem doesn't become serious until relatively late on - leaving plenty of scope for transforming the world into a smarter place in the mean time.
Daniel Dewey, 'Learning What to Value'
Abstract: I.J. Good's theory of an "intelligence explosion" predicts that ultraintelligent agents will undergo a process of repeated self-improvement. In the wake of such an event, how well our values are fulfilled will depend on whether these ultraintelligent agents continue to act desirably and as intended. We examine several design approaches, based on AIXI, that could be used to create ultraintelligent agents. In each case, we analyze the design conditions required for a successful, well-behaved ultraintelligent agent to be created. Our main contribution is an examination of value-learners, agents that learn a utility function from experience. We conclude that the design conditions on value-learners are in some ways less demanding than those on other design approaches.