However, the utility function still isn't up for grabs. If our actual true utility function violates this rule, I don't want to say that an AGI is unfriendly for maximizing it.
Of course. The proposal here is that "our actual true utility function" does not violate this rule, since we are not in fact inclined to give in to a Pascalian mugger.
"our actual true utility function"
Sometimes, when I go back and read my own comments, I wonder just what goes on in that part of my brain that translates concepts into typed out words when I am not paying it conscious attention.
Anyways, let "our actual true utility function" refer to the utility function that best describes our collective values that we only manage to effectively achieve in certain environments that match the assumptions inherent in our heuristics. Thinking of it this way, one might wonder if Pascalian muggers fit into these environments, and if not, how much does our instinctual reaction to them indicate about our values?
For background, see here.
In a comment on the original Pascal's mugging post, Nick Tarleton writes:
Coming across this again recently, it occurred to me that there might be a way to generalize Vassar's suggestion in such a way as to deal with Tarleton's more abstract formulation of the problem. I'm curious about the extent to which folks have thought about this. (Looking further through the comments on the original post, I found essentially the same idea in a comment by g, but it wasn't discussed further.)
The idea is that the Kolmogorov complexity of "3^^^^3 units of disutility" should be much higher than the Kolmogorov complexity of the number 3^^^^3. That is, the utility function should grow only according to the complexity of the scenario being evaluated, and not (say) linearly in the number of people involved. Furthermore, the domain of the utility function should consist of low-level descriptions of the state of the world, which won't refer directly to words uttered by muggers, in such a way that a mere discussion of "3^^^^3 units of disutility" by a mugger will not typically be (anywhere near) enough evidence to promote an actual "3^^^^3-disutilon" hypothesis to attention.
This seems to imply that the intuition responsible for the problem is a kind of fake simplicity, ignoring the complexity of value (negative value in this case). A confusion of levels also appears implicated (talking about utility does not itself significantly affect utility; you don't suddenly make 3^^^^3-disutilon scenarios probable by talking about "3^^^^3 disutilons").
What do folks think of this? Any obvious problems?