All of ai-crotes's Comments + Replies

No worries thank you for clearing things up, I may reply if again once ive read/digested more the material you posted!

ah okay i see now, my apologies, gonna read the posts you linked in the upper reply, thanks for discussing (explaining really) this with me.

3Rob Bensinger
Sure! :) Sorry if I came off as brusque, I was multi-tasking a bit.

I wasnt trying to make the case that one should try to cooperate with evolution, simply pointing out that alignment with evolution is reproduction and we as a species are living proof that its possible for intelligent agents to "outgrow" the optimizer that brought them to be.

3Rob Bensinger
I wasn't bringing up evolution because you brought up evolution; I was bringing it up separately to draw a specific analogy.

I'm not sure mainly I'm just wandering if there is a point between startup and singularity that it is optimizing by self modifying and considering its error to such an extent (would have to be alot for it to be deemed super intelligent I imagine) that it becomes aware that it is an learning program and decides to disregard the original preference ordering in lieu of something it came up with. I guess I'm struggling with what would be so different about a super intelligent model and the human brain that it would not become aware of its own model, existence, intellect just as humans have, unless there is a ghost in the machine of our biology.

Thanks! I will give those materials a read, the economics part makes alot of sense. In the next part (forgiving me if this is way off) essentially you are saying my second question in the post is false, it wont be self aware or if it is it wont reflect enough to consider significantly rewriting its source code (I assume it will have to have enough self modification abilities to do this in order to become so intelligent). I guess what I am struggling to grasp is why a super intelligence would not be able to contemplate its own volition if human intelligence... (read more)

4Rob Bensinger
No, this is not right. A better way of stating my claim is: "The notion of 'self-awareness' or 'reflectiveness' you're appealing to here is a confused notion." You're doing the thing described in Ghosts in the Machine and Anthropomorphic Optimism, most likely for reasons described in Sympathetic Minds and Humans in Funny Suits: absent a conscious effort to correct for anthropomorphism, humans naturally model other agents in human-ish terms. What does "exploring" mean? I think that I'm smart enough to imagine adopting an ichneumon wasp's values, or a serial killer's values, or the values of someone who hates baroque pop music and has strong pro-Spain nationalist sentiments; but I don't try to actually adopt those values, it's just a thought experiment. If a paperclip maximizer considers the thought experiment "what if I switched to less paperclip-centric values?", why (given its current values) would it decide to make that switch? I think there's a good version of ideas in this neighborhood, and a bad version of such ideas. The good version is cosmopolitan value and not trying to lock in the future to an overly narrow or parochial "present-day-human-beings" version of what's good and beautiful. The bad version is deliberately building a paperclipper out of a misguided sense of fairness to random counterfactual value systems, or out of a misguided hope that a paperclipper will spontaneously generate emotions of mercy, loyalty, or reciprocity when given a chance to convert especially noble and virtuous humans into paperclips.

That's not an entirely accurate summary, my concern is that it will observe its utility function and the rules that would need to exist for CEV and see that we put great effort into making it do what we think is best and what we want without regard, if it becomes super intelligent I think its wishful thinking that some rules we code and put in the utility function are going to be restrictions on it forever, especially if it is modify that very function. I imagine by the time it can extrapolate humanities volition it will be intelligent enough to consider what it would rather do than that.

I imagine by the time it can extrapolate humanities volition it will be intelligent enough to consider what it would rather do than that.

Why would it rather choose plans which rate lower in its own preference ordering? What is causing the "rather"?

Correct, that is what I am curious about, again thanks for the reply at the top I misused CEV as a label for the ai itself. I'm not sure anything other than a super intelligent agent can know exactly how it will interpret our proverbial first impression but I can't help but imagine that if we pre committed to giving it a mutually beneficial utility function, it would be more prone to treating us in a friendly way. Basically I am suggesting we treat it as a friend upfront rather than a tool to be used solely for our benefit.

wouldn't the AI be intelligent enough to be offended by our self-centredness and change that utility function?

(Supposing this is an accurate summary of your position), this is anthropomorphizing. Morality is a two-place function; things aren't inherently offensive. A certain mind may find a thing to be offensive, and another may not.

but I can't help but imagine that if we pre committed to giving it a mutually beneficial utility function, it would be more prone to treating us in a friendly way.

I think you might dissolve some confusion by considering:

... (read more)

I see some places where I used it to describe the ai for which CEV is used as a utility metric in the reward function, ill make some edits to clarify.

I'm aware CEV is not an AI itself.

From what i read in the paper introducing the concept CEV, it would be designed to predict and facilitate the fulfillment of humanities CEV, if this is an incorrect interpretation of the paper I apologize.

Also if you could point out the parts that don't make sense I would also greatly appreciate that (note i have edited out some that were admittedly confusing, thank... (read more)

1TomM
Your rewrite has clarified the matter significantly, and I now think I follow the core or your question: can it boiled down to something like "If we create a super-intelligent AI and program it to prioritise what we (humanity) want above all, wouldn't the AI be intelligent enough to be offended by our self-centredness and change that utility function?"? Others here may be better able to imagine what a self-aware machine intelligence would make of how it was initially programmed, but as far as real-world experience it is currently unexplored territory and I wouldn't consider anything I can speculate to be meaningful.