Separation from hyperexistential risk

Discuss the wikitag on this page. Here is the place to ask questions and propose changes.
New Comment
2 comments, sorted by

I find it hard to picture a method of learning what humans value that does not produce information about what they disvalue in equal supply; value is for the most part a relative measure rather than an absolute. (e.g. to determine whether I value eating a cheeseburger it is necessary to compare the state of eating-a-cheeseburger to the state of not-eating-a-cheeseburger, to assess whether I value not-being-in-pain you must compare it to being-in-pain). Is the suggested path, taking this principle into account, that the learner does not produce this information? Some other method, like being forbidden from storing that information? Or is it still an open problem?

The reasoning which could cause us to remove our minimal utility situations from the AI's utility function are the ones which cause the AI to change its utility function. Resistance to blackmail and cosmic ray errors. And It suffers from the same problem. If the universe decides to give our AI a choice between an existential catastrophe and a hyper-existential catastrophe, it won't care. This works on the individual level too. If there is someone severely ill and begging for death, This AI won't give it to them. (non-zero chance of mind starting to enjoy self again.) Of course, how much any of this is a problem depends on how likely reality is to hand you such a bad position.