Vladimir_Nesov comments on The I-Less Eye - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (83)
If preference is expressed in terms of what you should do, not what's true about the world, new observations never influence preference, so we can fix it at the start and never revise it (which is an important feature for constructing FAI, since you only ever have a hand in its initial construction).
(To whoever downvoted this without comment -- it's not as stupid an idea as it might sound; what's true about the world doesn't matter for preference, but it does matter for decision-making, as decisions are made depending on what's observed. By isolating preference from influence of observations, we fix it at the start, but since it determines what should be done depending on all possible observations, we are not ignoring reality.)
In the situation described by Roko the agent has doubt about its understanding of the very ontology that its values are expressed in. If it were an AI that would effectively mean that we designed it using mathematics that we thought was consistent but turns out to have a flaw. The FAI has self improved to a level where it has a suspicion that the ontology that is used to represent its value system is internally inconsistent and must decide whether to examine the problem further. (So we should have been able to fix it at the start but couldn't because we just weren't smart enough.)
If its values are not represented in terms of an "ontology", this won't happen.