I unfortunately lack time at the moment; rather than write a badly-thought-out response to the complete structure of reasoning considered, I will for the moment write fully-thought-out thoughts on minor parts thereof that my (?) mind/curiosity has seized on.
'As for “taking over the world by proxy”, again SUAM applies.': this sentence stands out, but glancing upwards and downwards does not immediately reveal what SUAM refers to. Ctrl+F and looking at all appearances of the term SUAM on the page does not reveal what SUAM refers to. The first page of Google results for 'SUAM' does not reveal what SUAM refers to.
Hopefully SUAM is a reference to an S* U* A* M* acronym used elsewhere in the article or in a different well-known article, but a suggestion may be helpful that if the first then S* U* A* M* (SUAM) would be convenient in terms of phrase->acronym, and if the second then a reference to the location or else the expanded form of the acronym would be convenient.
The diamond case: Even if I did want a diamond, I simulate that I would feel nervous, alarmed even, if I indicated that I wanted it to bring me one box and I was brought a different box instead. I'm reminded--though this is not directly relevant--of Google searches, where I on occasion look up a rare word I'm unfamiliar with, and instead am given a page of results for a different (more common) word, with a question at the top asking me if I instead want to search for the word I searched for.
For Google, I would be much less frustrated if it always gave me the results I asked for, and maybe asked if I wanted to search for something else. (That way, when I do misspell something, I'm rightfully annoyed at myself and rightfully pleased with the search engine's consistent behaviour.) For the diamond case, I would be happy if it for instance noticed that I wanted the diamond and alerted me to its actual location, giving me a chance to change my official decision.
Otherwise, I would be quite worried about it making other such decisions without my official consent, such as "Hmm, you say you want to learn about these interesting branches of physics, but I can tell that you say that because you anticipate doing so will make you happy, so I'll ignore your request and pump your brain full of drugs instead forever.". Even if in most cases the outcome is acceptable, for something to second-guess your desires at all means there's always the possibility of irrevocably going against your will.
People may worry that a life of getting whatever one wants(/asks for) may not be ideal, but I'm reminded of the immortality/bat argument in that a person who gets whatever that person wants would probably not want to give that up for the sake of the benefits that would arguably come with not having those advantages.
In a more general sense, given that I already possess priorities and want them to be fulfilled (and know how I want to fulfill them), I would appreciate an entity helping me to do so, but would not want an entity to fulfill priorities that I don't hold or try to fulfill them in ways which conflict with my chosen methods of fulfilling them. If creating something that would act according to what one woul want if one /were/ more intelligent or more moral or more altruistic, then A) that would only be desirable if one were such a person currently instead of being the current self, or B) that would be a good upgraded-replacement-self to let loose on the universe while oneself ceasing to exist without seeking to have one's own will be done (other than on that matter of self-replacement).
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
My brief recapitulation of Yudkowsky’s diamond example (which you can read in full in his CEV document) probably misled you a little bit. I expect that you would find Yudkowsky’s more thorough exposition of “extrapolating volition” somewhat more persuasive. He also warns about the obvious moral hazard involved in mere humans claiming to have extrapolated someone else’s volition out to significant distances – it would be quite proper for you to be alarmed about that!
Taken to the extreme this belief would imply that every time you gain some knowledge, improve your logical abilities or are exposed to new memes, you are changed into a different person. I’m sure you don’t believe that – this is where the concept of “distance” comes into play: extrapolating to short distance (as in the diamond example) allows you to feel that the extrapolated version of yourself is still you, but medium or long distance extrapolation might cause you to see the extrapolated self as alien.
It seems to me that whether a given extrapolation of you is still “you” is just a matter of definition. As such it is orthogonal to the question of the choice of CEV as an AI Friendliness proposal. If we accept that an FAI must take as input multiple human value sets in order for it to be safe – I think that Yudkowsky is very persuasive on this point in the sequences – then there has to be a way of getting useful output from those value sets. Since our existing value computations are inconsistent in themselves, let alone with each other the AI has to perform some kind of transformations to cohere a useful signal from this input – this screens off any question of whether we’d be happy to run with our existing values (although I’d certainly choose the extrapolated volition in any case). “Knowing more”, “thinking faster”, “growing up closer together” and so on seem like the optimal transformations for it to perform. Short-distance extrapolations are unlikely to get the job done, therefore medium or long-distance extrapolations are simply necessary, whatever your opinion on the selfhood question.
Eliezer says: “If our extrapolated volitions say we don't want our extrapolated volitions manifested, the system replaces itself with something else we want, or vanishes in a puff of smoke.” A possible cause of such an output might be the selfhood concern that you have raised.
Diamond: Ahh. I note that looking at the equivalent diamond section, 'advise Fred to ask for box B instead' (hopefully including the explanation of one's knowledge of the presence of the desired diamond) is a notably potentially-helpful action, compared to the other listed options which can be variably undesirable.
Varying priorities: That I change over time is an accepted aspect of existence. There is uncertainty, granted; on the one hand I don't want to make decisions that a later self would be unable to reverse and might disapprove of, but on the other hand I am willing to sacrifice the happiness of a hypothetical future self for the happiness of my current self (and different hypothetical future selves)... hm, I should read more before I write more, as otherwise redundancy is likely. (Given that my priorities could shift in various ways, one might argue that I would prefer something to act on what I currently definitely want, rather than on what I might or might not want in the future (yet definitely do not want (/want not to be done) /now/). An issue of possible oppression of the existing for the sake of the non-existant... hm.)
To check, does 'in order for it to be safe' refer to 'safe from the perspectives of multiple humans', compared to 'safe from the perspective of the value-set source/s'? If so, possibly tautologous. If not, then I likely should investigate the point in question shortly.
Another example that comes to mind regarding a conflict of priorities: 'If your brain was this much more advanced, you would find this particular type of art the most sublime thing you'd ever witnessed, and would want to fill your harddrive with its genre. I have thus done so, even though to you who owns the harddrive and can't appreciate it it consists of uninteresting squiggles, and has overwritten all the books and video files that you were lovingly storing.'
Digression: If such an entity acts according to a smarter-me's will, then theoretically existing does the smarter-me necessarily 'exist' as simulated/interpreted by the entity? Put another way, for a chatterbot to accurately create the exact interactions/responses that a sapient entity would, is it theoretically necessary for a sapient entity to effectively exist, simulated by the non-sapient entity, or could such an entity mimic a sapient entity withou sapience entering into the matter? (Would then a mimicked-sapient entity exist in a meaningful sense, but only if there were sapient entities hearing its words and benefiting from its willed actions, compared to if there were only multple mimicked-entities talking to each other? Hrm.) | If a smarter-me was necessarily simulated in a certain sense in order to carry out its will, I might be willing to accede to it in the same spirit as to extremely-intelligent aliens/robots wanting to wipe out humanity for their own reasons, but I would be unwilling to accept things which are against my interests being carried out for the interests of an entity which does not in fact in any sense exist.
Manifestation: It occurs to me that a sandbox version could be interesting to oberve, one's non-extrapolated volition wanting our extrapolated volitions to be modelled in simulated world-section level 2, and as a result of such a contradiction instead the extrapolated volitions of those in level 2 /not/ being modelled in level 3, yet still being modelled in level 2... again, though, while such a tool might be extremely useful for second-guessing one's decisions and discussing with one very, very good reasons to rethink them (and thus in fact oneself changing hopefully-beneficially as a person (?) where applicable), something which directly defies one's will(/one's curiosity) lacks appeal as a goal (/stepping stone) to work towards.