Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "
That was pretty much my take. I get the feeling that "okay" outcomes are a vanishingly small portion of probability space. This suggests to me that the additional marginal effort to stipulate "okay" outcomes instead of perfect CEV is extremely small, if not negative. (By negative, I mean that it would actually take additional effort to program an AI to maximize for "okay" outcomes instead of CEV.)
However, I didn't want to ask a leading question, so I left it in the present form. It's perhaps academically interesting that the desirability of outcomes as a function of “similarity to CEV” is a continuous curve rather than a binary good/bad step function. However, I couldn't really see any way of taking advantage of this. I posted mainly to see if others might spot potential low hanging fruit.
I guess the interesting follow up questions are these: Is there any chance that humans are sufficiently adaptable that human values are more than just an infinitesimally small sliver of the set of all possible values? If so, is there any chance this enables an easier alternative version of the control problem? It would be nice to have a plan B.