A putative new idea for AI control; index here.
This is a problem that developed from the "high impact from low impact" idea, but is a legitimate thought experiment in its own right (it also has connections with the "spirit of the law" idea).
Suppose that, next 1st of April, the US president may or may not die of natural causes. I chose this example because it's an event of potentially large magnitude, but not overwhelmingly so (neither a butterfly wing nor an asteroid impact).
Also assume that, for some reason, we are able to program an AI that will be nice, given that the president does die on that day. Its behaviour if the president doesn't die is undefined and potentially dangerous.
Is there a way (either at the initial stages of programming or at the later) to extend the "niceness" from the "presidential death world" into the "presidential survival world"?
To focus on how tricky the problem is, assume for argument's sake that the vice-president is a war monger that will start a nuclear war if they become president. Then "launch a coup on the 2nd of April" is a "nice" thing of the AI to do, conditional on the president dying. However, if you naively import that requirement into the "presidential survival world", the AI will launch a pointeless and counterproductive coup. This is illustrative of the kind of problems that could come up.
So the question is, can we transfer niceness in this way, without needing a solution to the full problem of niceness in general?
EDIT: Actually, this seems ideally setup for a Bayes network (or for the requirement that a Bayes network be used).
EDIT2: Now the problem of predicates like "Grue" and "Bleen" seem to be the relevant bit. If you can avoid concepts such as "X={nuclear war if president died, peace if president lived}", you can make the extension work.
The coup was just an example, to show that ("nice" | president dead) does not imply ("nice" | president alive). The coup thing can be patched if we know about it, but it's just an example of the general problem.
So the question is how do we solve a problem that we don't know exists. We only know it might exist, and that it will be solved under some conditions but not in others. And we don't know which conditions will be good and which will be bad. Yes, that is a tricky problem.