How do you think the "Greenpeace by default" AI might define either "harm" or "value", and "life"?
How do you think the "Greenpeace by default" AI might define either "harm" or "value", and "life"?
It simply won't. Harm, value, life, we never defined those; they are the commonly agreed upon labels which we apply to things for communication purposes, and it works on a limited set of things that already exist but does not define anything outside context of this limited set.
It would have maximization of some sort of complexity metric (perhaps while acting conservatively and penalizing actions it can't undo to avoid...
Here's my draft document Concepts are Difficult, and Unfriendliness is the Default. (Google Docs, commenting enabled.) Despite the name, it's still informal and would need a lot more references, but it could be written up to a proper paper if people felt that the reasoning was solid.
Here's my introduction:
And here's my conclusion:
For the actual argumentation defending the various premises, see the linked document. I have a feeling that there are still several conceptual distinctions that I should be making but am not, but I figured that the easiest way to find the problems would be to have people tell me what points they find unclear or disagreeable.