That sounds pretty scary too. I don't think I am close enough to being an agent to have a well-defined utility function. If I do (paradoxical as it sounds), it's probably not something I would reflectively like. For example, I think I have more empathy for things I am sexually attracted to. But the idea of a world where everyone else (excluding me and a few people I really like) is a second-class citizen to hot babes horrifies me. But with the wrong kind of extrapolation, I bet that could be said to be what I want.
I can't easily describe any procedure I know I would like for getting a utility function out of me. If I or some simulated copy of me remained to actually be deciding things, I think I could get things I would not only like, but like and like liking. Especially if I can change myself from an insane ape who wishes it was a rationalist, to an actual rationalist through explicitly specified modifications guided by wished-for knowledge.
The best way I can think of to ensure that the extrapolated utility function is something like whatever is used in making my decisions, is to just use the brain circuits I already have that do that the way I like.
I also think a good idea might be to have a crowd of backup copies of me. One of us would try making some self-modifications in a sandboxed universe where their wishes could not get outside, and then the others would vote on whether to keep them.
Well, you don't prefer a world "where everyone else (excluding me and a few people I really like) is a second-class citizen to hot babes horrifies me." to the current world. If you can express such a judgement preferring one universe over another, and those judgements are transitive, you have a utility function.
And if you want to like liking them, that is also part of your utility function.
One confounding factor that you do bring up- the domain of one's utility function really doesn't need to include things outside the realm of possibility.
Here's the new thread for posting quotes, with the usual rules: